LPA3D: 3D Room-Level Scene Generation from In-the-Wild Images

Ming-Jia Yang 1    Yu-Xiao Guo 2    Yang Liu 2    Bin Zhou 1    Xin Tong 2   
1 Beihang University    2 Microsoft Research Asia   
Computational Visual Media Journal 2025

Abstract

Generating realistic, room-level indoor scenes with semantically plausible and detailed appearance from in-the-wild images is important for various applications in VR, AR, and robotics. The success of NeRF-based generative methods indicates a promising direction to address this challenge. However, unlike their object level counterparts, existing scene-level generative methods require additional information, such as multiple views, depth images, or semantic guidance, rather than relying solely on RGB images. This is because NeRF-based methods necessitate prior knowledge of camera poses, which is challenging to approximate for indoor scenes due to the complexity of defining alignment and the difficulty of globally estimating poses from a single image, given the unseen parts behind the camera. To address this challenge, we redefine global poses within the framework of local-pose-alignment (LPA)---an anchor-based multi-local-coordinate system that uses a selected number of anchors as the roots of these coordinates. Building on this foundation, we introduce LPA-GAN, a novel NeRF-based generative approach that incorporates specific modifications to estimate priors of camera poses under LPA. It also co-optimizes the pose predictor and scene generation processes. Our ablation study and comparisons with straightforward extensions of NeRF-based object generation methods demonstrate the effectiveness of our approach. Furthermore, visual comparisons with other techniques reveal that our method achieves superior inter-view consistency and semantic normality.

Results

Comparision

Links

Paper [arXiv]