3D Gaussian Splatting (3DGS) techniques have recently enabled high-quality 3D scene reconstruction and real-time novel view synthesis. These approaches, however, are limited by the pinhole camera model and lack effective modeling of defocus effects. Departing from this, we introduce DOF-GS--a new 3DGS-based framework with a finite-aperture camera model and explicit, differentiable defocus rendering, enabling it to function as a post-capture control tool. By training with multi-view images with moderate defocus blur, DOF-GS learns inherent camera characteristics and reconstructs sharp details of the underlying scene, particularly, enabling rendering of varying DOF effects through on-demand aperture and focal distance control, post-capture and optimization. Additionally, our framework extracts circle-of-confusion cues during optimization to identify in-focus regions in input views, enhancing the reconstructed 3D scene details. Experimental results demonstrate that DOF-GS supports post-capture refocusing, adjustable defocus and high-quality all-in-focus rendering, from multi-view images with uncalibrated defocus blur.
Pipeline of the proposed DOF-GS. We start with camera poses and an initial sparse point cloud estimated from defocused images. For each view, we introduce and initialize two learnable parameters: focal distance and aperture parameter. During optimization, for each sampled view \( m \), we render a defocused image utilizing the DOF rendering scheme with camera parameters \( \{f_m, Q_m\} \) to fit the target view. Meanwhile, we render an All-in-Focus (AiF) image with the fixed aperture parameter \( Q^* \). To enhance scene details by appropriate AiF image supervision, we introduce an In-Focus Localization Network that utilizes the rendered CoC map and other cues to localize the in-focus regions within the target view. The underlying 3D scene, camera parameters, and network parameters are updated via backpropagation.
Illustration of the depth-of-field rendering and camera model. During the DOF rendering (left), 3D Gaussians \( \mathcal{G}_k \) are projected to 2D screen space. Each projected 2D Gaussian \( \mathcal{G}'_k \) is then convolved with a blur kernel, and the final color is composited from convolved Gaussians \( \mathcal{G}''_k \). White dashed lines highlight convolution effects. The blur kernel for each 2D Gaussian is determined by the radius of its Circle-of-Confusion (CoC), which results from the finite-aperture camera model (right). With the adopted camera model with an aperture parameter \( Q \), which is not a pinhole, object points deviating from the focal distance \( f \) form a region, known as the CoC, rather than a point.
DOF rendering results with post-capture aperture and focal distance control. Adjusting focal distance mainly affects locations of out-of-focus and in-focus regions (highlighted in pink), while increasing aperture parameter makes out-of-focus regions increasingly blurry.
@misc{wang2024dofgs, title={DOF-GS: Adjustable Depth-of-Field 3D Gaussian Splatting for Post-Capture Refocusing, Defocus Rendering and Blur Removal}, author={Yujie Wang, Praneeth Chakravarthula, Baoquan Chen}, year={2024}, eprint={2405.17351}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2405.17351} }