The Ultimate Guide to 3D Model and Scene Generation Papers (Feb 2023)

This article is written in collaboration with Matt White. The cherished generative A.I. expert, researcher, and educator of Berkeley Synthetic and UC Berkeley.

7 min readFeb 6, 2023

Following the interest to my January article, The Ultimate Glossary of 3D Asset and Scene Generation Models (Jan. 2023), Matt and I have decided to publish this article as a follow up. In there you’ll find the usual suspects like GET 3D and Point-E, but you’ll also be surprised with new additions like Adan, Atlas, ECON, and Score Jacobian Chaining that were not in the former list.

Significant advancements have recently been realized in generative A.I., making its applications highly practical and producing a lot of media buzz. In the generative text, we have seen the power of ChatGPT (a fine-tuned model built on GPT-3) and generative images with platforms and models like DALL-E 2, Midjourney, and Stable Diffusion. The 3D asset and scene generation area has been slower to develop because 3D generation presents significant challenges due to its multidimensional nature.

As Matt explained to me earlier, we can use only a few publicly available 3D shape datasets without special licensing or copyright infringement. The available samples need to be more diverse and represent many different styles, as we see in the variety of 2D images that are used to train generative image models.

3D reconstruction from 2D images is computationally expensive. NeRF has become the de facto algorithm for volumetric and material property representation (volume is the 3D shape itself, and texture is applied to the 3D model to provide its aesthetic properties.) Improvements to NeRF, like NVIDIA’s Instant-NGP, have shown substantial performance improvements, but 3D construction can still take hours, if not days, to render complex 3D objects and scenes in high resolution.

With the rapid pace of innovation in the space of generative 3D, we expect to see breakthroughs and the emergence of new commercially viable platforms hitting the market in 2023.

Papers with Code

Get3D by NVIDIA: A Generative Model of High Quality 3D Textured Shapes Learned from Images

GitHub - nv-tlabs/GET3D

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images Jun Gao , Tianchang Shen , Zian Wang…

github.com

Point-E by OPEN AI

GitHub - openai/point-e: Point cloud diffusion for 3D model synthesis

This is the official code and model release for Point-E: A System for Generating 3D Point Clouds from Complex Prompts…

github.com

Clip Mesh

Text to Mesh

We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D…

www.nasir.lol

Score Jacobian Chaining

A diffusion model learns to predict a vector field of gradients. We propose to apply chain rule on the learned…

pals.ttic.edu

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NeRF: Neural Radiance Fields

We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing…

www.matthewtancik.com

NVIDIA: Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

description [Jan 19th 2022] Paper released on arXiv. integration_instructions [Jan 14th 2022] Code released on GitHub…

nvlabs.github.io

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

PIFuHD

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization University of Southern…

shunsukesaito.github.io

Diffusion Probabilistic Models for 3D Point Cloud Generation

GitHub - luost26/diffusion-point-cloud: Diffusion Probabilistic Models for 3D Point Cloud…

The official code repository for our CVPR 2021 paper "Diffusion Probabilistic Models for 3D Point Cloud Generation"…

github.com

InfiniteNature-Zero

InfiniteNatureZero

We present a method for learning to generate unbounded flythrough videos of natural scenes starting from a single view…

infinite-nature-zero.github.io

Atlas. End-to-End 3D Scene Reconstruction from Posed Images

Atlas: End-to-End 3D Scene Reconstruction from Posed Images

End-to-End 3D Scene Reconstruction from Posed Images

End-to-End 3D Scene Reconstruction from Posed Imageszak.murez.com

Stable-Dreamfusion

GitHub - ashawkey/stable-dreamfusion: A pytorch implementation of text-to-3D dreamfusion, powered…

A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. The…

github.com

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

GitHub - sail-sg/Adan: Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

This is an official PyTorch implementation of Adan. See the paper here. If you find our adan helpful or heuristic to…

github.com

Online Real-Time Volumetric NeRF+SLAM

GitHub - ToniRV/NeRF-SLAM: NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields…

NeRF-SLAM: Real-Time Dense Monocular SLAM with Neural Radiance Fields. https://arxiv.org/abs/2210.13641 + Sigma-Fusion…

github.com

GeoCode: Interpretable Shape Programs

GitHub - threedle/GeoCode: GeoCode maps 3D shapes to a human-interpretable parameter space…

GeoCode maps 3D shapes to a human-interpretable parameter space, allowing to intuitively edit the recovered 3D shapes…

github.com

ECON: Explicit Clothed humans Obtained from Normals

GitHub - YuliangXiu/ECON: ECON: Explicit Clothed humans Obtained from Normals (arXiv 2022)

ECON is designed for "Human digitization from a color image", which combines the best properties of implicit and…

github.com

NeROIC: Neural Object Capture and Rendering from Online Image Collections

GitHub - snap-research/NeROIC

This repository is the official implementation of the NeROIC model from NeROIC: Neural Object Capture and Rendering…

github.com

Papers only

Dream Fields

Zero-Shot Text-Guided Object Generation with Dream Fields

CVPR 2022 and AI4CC 2022 (Best Poster) UC Berkeley, Google Research We combine neural rendering with multi-modal image…

ajayj.com

Dream Fusion

DreamFusion: Text-to-3D using 2D Diffusion

Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text…

dreamfusion3d.github.io

Novel View Synthesis with Diffusion Models

3D generation from a single image We present 3DiM, a diffusion model for 3D novel view synthesis, which is able to…

3d-diffusion.github.io

Portrait Neural Radiance Fields from a Single Image

We present a method for estimating Neural Radiance Fields (NeRF) from a single headshot portrait. While NeRF has…

portrait-nerf.github.io

InstantAvatar: Avatars in 60 Seconds

InstantAvatar: Learning Avatars from Monocular Video in 60 Seconds

@article{jiang2022instantavatar, author = {Jiang, Tianjian and Chen, Xu and Song, Jie and Hilliges, Otmar}, title =…

tijiang13.github.io

Rodin: 3D Avatars Using Diffusion:

3D Avatar Diffusion

Rodin: A generative model for sculpting 3d digital avatars using diffusion

3d-avatar-diffusion.microsoft.com

Human Diffusion Motion

MDM: Human Motion Diffusion Model

Natural and expressive human motion generation is the holy grail of computer animation. It is a challenging task, due…

guytevet.github.io

3D HumanGAN

PV3D: A 3D Generative Model for Portrait Video Generation

1Show Lab, National University of Singapore, 2 ByteDance Recent advances in generative adversarial networks (GANs) have…

showlab.github.io

Climate Nerf

ClimateNeRF

Physical simulations produce excellent predictions of weather effects. Neural radiance fields produce SOTA scene…

climatenerf.github.io

Generating Holistic 3D Human Motion from Speech

TALKSHOW

Acknowledgement. We thank Wojciech Zielonka, Justus Thies for helping us incorporate MICA into our reconstruc tionDing…

talkshow.is.tue.mpg.de

RANA: Neural Avatar by NVIDIA

RANA

We propose RANA, a relightable and articulated neural avatar for the photorealistic synthesis of humans under arbitrary…

nvlabs.github.io

One-shot Implicit Animatable Avatars with Model-based Priors

ELICIT

@article{huang2022elicit, title={One-shot Implicit Animatable Avatars with Model-based Priors}, author={Huang, Yangyi…

elicit3d.github.io

3D Designer: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

3DDesigner

Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models Gang Li, Heliang Zheng…

3ddesigner-diffusion.github.io

Spin-Nerf: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

SPIn-NeRF

ashkan,jkelly,gilitschenski}@cs.toronto.edu, tristan.a@partner.samsung.com, {kosta,mab}@eecs.yorku.ca…

spinnerf3d.github.io

Magic3D: High-Resolution Text-to-3D Content Creation

https://deepimagination.cc/Magic3D
This link is not in the same format as other papers because somehow Medium does not allow the same treatment for this link.

3D generation from Single Image

Novel View Synthesis with Diffusion Models

3D generation from a single image We present 3DiM, a diffusion model for 3D novel view synthesis, which is able to…

3d-diffusion.github.io

Gaudi

GitHub - apple/ml-gaudi

Samples from GAUDI (Allow a couple minutes of loading time for videos.) Miguel Angel Bautista, Pengsheng Guo, Samira…

github.com

Platform and Tools

Kaedim

Kaedim | 3D models in minutes

Kaedim is your own personal 3D artist. Use any basic image to create a textured, production-ready 3D model.

www.kaedim3d.com

3DFY.ai

3dfy.ai

3DFY.ai uses artificial intelligence to create high-quality 3D models with just a few existing images. Now anyone can…

3dfy.ai

Sloyd

Sloyd - Generate 3D. Fast.

3D modeling has never been easier. Select a generator, tweak it, done. Generate with realtime in the Sloyd Webapp or…

www.sloyd.ai

Kaolin (Nvidia)

Kaolin Suite of Tools

Accelerate 3D deep learning research for neural fields, rendering, and more, with Kaolin Library, Kaolin Wisp, and…

developer.nvidia.com

Alpha AR

From a 2D image to a 3D model

Creating realistic 3D content for augmented reality (AR) projects is known to be a costly and time-consuming process…

alphaar.io

Artlabs

artlabs studio Waitlist

The future of 3D content generation is here with artlabs studio. Are you ready for it?

n6t1xfdu9gc.typeform.com

Imagine by Luma

Luma AI - Imagine 3D v1.2 (alpha)

Text to 3D with Luma AI

captures.lumalabs.ai

Other papers

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

GitHub - isl-org/MiDaS: Code for robust monocular depth estimation described in "Ranftl et. al…

Code for robust monocular depth estimation described in "Ranftl et. al., Towards Robust Monocular Depth Estimation…

github.com

REALY: Rethinking the Evaluation of 3D Face Reconstruction

REALY

Edit description

realy3dface.com

Reconstruction of a 3D Model from Single 2D Image by GAN

In this paper, we propose a method for reconstructing the 3D model from a single 2D image. The current cutting-edge…

link.springer.com

AgileAvatar

AgileAvatar Figure 1: (a) Given a front-facing user image as input, (b) our method progressively bridges the domain gap…

ssangx.github.io

High-Res Facial Appearance Capture from Polarized Smartphone Images

@InProceedings{azinovic2022polface, author = {Azinovi\'c, Dejan and Maury, Olivier and Hery, Christophe and…

dazinovic.github.io

Edge: Editable Dance Generation from music

EDGE: Editable Dance Generation from Music

EDGE supports arbitrary spatial and temporal constraints. This can be used to support many end-user applications…

edge-dance.github.io

MineDojo

GitHub - MineDojo/MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

is a new AI research framework for building open-ended, generally capable embodied agents. MineDojo features a massive…

github.com