3D indoor scene generation is challenging due to the lack of high-quality and diverse 3D datasets. Recent generative approaches using in-the-wild image collections offer promising solutions, but issues with scene quality, diversity, and multi-view consistency persist. In this paper, we introduce WildRoomGen, an efficient image-conditioned 3D room generation framework designed to overcome these limitations. WildRoomGen comprises two key components: (1) RoomGen, a GAN-based single-view conditioned 3D room generator that learns from large-scale, single-view room images to generate diverse NeRF-based 3D rooms. RoomGen significantly improves generation quality and diversity through enhanced camera estimation, perspective projection-based image feature embedding, and the utilization of pretrained image feature and pseudo-depth priors. (2) RoomRecon, a feedforward NeRF reconstruction network that addresses 3D inconsistency issues of RoomGen and prior methods due to the use of image super-resolution for image enhancement, while being trained solely on RoomGen's generated results without the need for 3D room data. We extensively evaluate the quality and diversity of the 3D rooms generated by WildRoomGen, highlighting its effectiveness and efficiency. Furthermore, we demonstrate the generality of our approach and its scalability to data sizes.