BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering

Abstract

Inverse rendering of urban scenes from captured videos enables numerous applications, including content creation and autonomous driving simulation. Physically-based rendering methods follow and control lighting physics, but suffer from reconstruction and rendering artifacts. While generative models produce realistic videos, they offer limited consistency and controllability. We present BRDFusion, a unified framework that combines two complementary models for inverse and forward rendering. Specifically, BRDFusion recovers explicit, consistent scene properties with physical modeling and alleviates optimization ambiguity with generative priors. During forward rendering, the physical model provides controllable rendering from the scene configuration, and the generative model denoises and fixes artifacts. Therefore, our method produces high-quality videos while allowing precise control, outperforming baselines in real and synthetic scenes. Moreover, BRDFusion supports novel-view relighting, night simulation, and dynamic object insertion/editing.

Method

BRDFusion rendering pipeline diagram — **Rendering.** We represent the scene with a 3DGS scene graph and HDR lighting. We first volume-render the G-buffer and perform PBR rendering. Generative rendering is then applied to reduce rendering artifacts and improve rendering quality.

BRDFusion optimization pipeline diagram — **Optimization.** We adopt a staged approach to scene optimization. First, we optimize the scene using the volume-rendered color loss and regularize the rendered G-buffer with the DiffusionRenderer prior. Then, we refine the DiffusionRenderer prior to enhance its temporal consistency and perform volume-rendering optimization again to reconstruct a better scene. Next, we fix the geometry and material and optimize the lighting using the PBR-rendered color loss and the DiffusionLight regularization loss. Finally, we jointly refine the geometry, material, and lighting to improve overall visual fidelity.

Results

Application

Acknowledgements

This research was funded by the National Science and Technology Council, Taiwan, under Grants NSTC 112-2222-E-A49-004-MY2 and 113-2628-E-A49-023-. The authors are grateful to Google, NVIDIA, and MediaTek Inc. for their generous donations. Yu-Lun Liu acknowledges the Yushan Young Fellow Program by the MOE in Taiwan.

BibTeX