Researchers at the Max Planck Institute for Informatics and the University of Hong Kong have developed StyleNeRFa 3D-aware generative model trained on unstructured 2D images that synthesizes high-resolution images with a high level of multi-view consistency.
Compared to existing approaches, which either struggle to synthesize high-resolution images with fine details or produce 3D-inconsistent artifacts, StyleNeRF integrates its neural radiance field (NeRF) into a style-based generator. By employing this approach, StyleNeRF delivers improved render efficiency and better consistency with 3D generation.
A comparison between StyleNeRF (column five) and four competing generative models, including HoloGAN, GRAF, pi-GAN and GIRAFFE. Each image is generated with four different viewpoints. As you can see, StyleNeRF performs exceptionally well here compared to the alternatives. Click to enlarge. |
StyleNeRF uses volume rendering to produce a low-resolution feature map and progressively applies 2D upsampling to improve quality and produce high-resolution images with fine detail. As part of the full paperthe team outlines a better upsampler (sections 3.2 and 3.3) and a new regularization loss (section 3.3).
In the real-time demo video below, you can see that StyleNeRF works very quickly and offers an array of impressive tools. For example, you can adjust the mixing ratio of a pair of images to generate a new mix and adjust the generated image’s pitch, yaw, and field of view.
Compared to alternative 3D generative models, StyleNeRF’s team believes that its model works best when generating images under direct camera control. While GIRAFFE synthesizes with better quality, it also presents 3D inconsistent artifacts, a problem that StyleNeRF promises to overcome. The research states, ‘Compared to the baselines, StyleNeRF achieves the best visual quality with high 3D consistency across views.’
Measuring the visual quality of image generation by using the Frechet Inception Distance (FID) and Kernel Inception Distance (KID), StyleNeRF performs well across three sets.
Table 1 – Quantitative comparisons at 256 ^ 2. The team calculated FID, KID x 10 ^ 3 and presented the average rendering time for a single batch. The 2D GAN (StyleGAN2) numbers are for reference. Lower FID and KID numbers are better. Click to enlarge. |
Figure 7 from the research paper shows the results of style mixing and interpolation. The paper states, ‘As shown in the style mixing experiments, copying styles before 2D aggregation affects geometry aspects (shape of noses, glasses, etc.), while copying those after 2D aggregation brings changes in appearance (colors of skins, eyes, hairs , etc.), which indicates clear disentangled styles of geometry and appearance. In the style interpolation results, the smooth interpolation between two different styles without visual artifacts further demonstrates that the style space is semantically learned. ‘
Click to enlarge. |
If you’d like to learn more about how StyleNeRF works and dig into the algorithms underpinning its impressive performance, be sure to check out the research paper. StyleNeRF is developed by Jiatao Gu, Lingjie Liu, Peng Wang and Christian Theobalt of the Max Planck Institute for Informatics and the University of Hong Kong.
All figures and tables credit: Jiatao Gu, Lingerie Liu, Peng Wang and Christian Theobalt / Max Planck Institute for Informatics and the University of Hong Kong