The Ultimate Guide to 3D Model and Scene Generation Papers (Feb 2023)

This article is written in collaboration with Matt White. The cherished generative A.I. expert, researcher, and educator of Berkeley Synthetic and UC Berkeley.

Pinar Seyhan Demirdag
7 min readFeb 6, 2023

Following the interest to my January article, The Ultimate Glossary of 3D Asset and Scene Generation Models (Jan. 2023), Matt and I have decided to publish this article as a follow up. In there you’ll find the usual suspects like GET 3D and Point-E, but you’ll also be surprised with new additions like Adan, Atlas, ECON, and Score Jacobian Chaining that were not in the former list.

Significant advancements have recently been realized in generative A.I., making its applications highly practical and producing a lot of media buzz. In the generative text, we have seen the power of ChatGPT (a fine-tuned model built on GPT-3) and generative images with platforms and models like DALL-E 2, Midjourney, and Stable Diffusion. The 3D asset and scene generation area has been slower to develop because 3D generation presents significant challenges due to its multidimensional nature.

As Matt explained to me earlier, we can use only a few publicly available 3D shape datasets without special licensing or copyright infringement. The available samples need to be more diverse and represent many different styles, as we see in the variety of 2D images that are used to train generative image models.

3D reconstruction from 2D images is computationally expensive. NeRF has become the de facto algorithm for volumetric and material property representation (volume is the 3D shape itself, and texture is applied to the 3D model to provide its aesthetic properties.) Improvements to NeRF, like NVIDIA’s Instant-NGP, have shown substantial performance improvements, but 3D construction can still take hours, if not days, to render complex 3D objects and scenes in high resolution.

With the rapid pace of innovation in the space of generative 3D, we expect to see breakthroughs and the emergence of new commercially viable platforms hitting the market in 2023.

Papers with Code

Get3D by NVIDIA: A Generative Model of High Quality 3D Textured Shapes Learned from Images

Point-E by OPEN AI

Clip Mesh

Score Jacobian Chaining

NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

NVIDIA: Instant Neural Graphics Primitives with a Multiresolution Hash Encoding

PIFuHD: Multi-Level Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization

Diffusion Probabilistic Models for 3D Point Cloud Generation

InfiniteNature-Zero

Atlas. End-to-End 3D Scene Reconstruction from Posed Images

Stable-Dreamfusion

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models

Online Real-Time Volumetric NeRF+SLAM

GeoCode: Interpretable Shape Programs

ECON: Explicit Clothed humans Obtained from Normals

NeROIC: Neural Object Capture and Rendering from Online Image Collections

Papers only

Dream Fields

Dream Fusion

Novel View Synthesis with Diffusion Models

Portrait Neural Radiance Fields from a Single Image

InstantAvatar: Avatars in 60 Seconds

Rodin: 3D Avatars Using Diffusion:

Human Diffusion Motion

3D HumanGAN

Climate Nerf

Generating Holistic 3D Human Motion from Speech

RANA: Neural Avatar by NVIDIA

One-shot Implicit Animatable Avatars with Model-based Priors

3D Designer: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

Spin-Nerf: Multiview Segmentation and Perceptual Inpainting with Neural Radiance Fields

Magic3D: High-Resolution Text-to-3D Content Creation

https://deepimagination.cc/Magic3D
This link is not in the same format as other papers because somehow Medium does not allow the same treatment for this link.

3D generation from Single Image

Gaudi

Platform and Tools

Kaedim

3DFY.ai

Sloyd

Kaolin (Nvidia)

Alpha AR

Artlabs

Imagine by Luma

Other papers

Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer

REALY: Rethinking the Evaluation of 3D Face Reconstruction

Reconstruction of a 3D Model from Single 2D Image by GAN

AgileAvatar

High-Res Facial Appearance Capture from Polarized Smartphone Images

Edge: Editable Dance Generation from music

MineDojo

--

--

Pinar Seyhan Demirdag

AI director, Co-Founder of Cuebric. I write about provocative innovative intelligence and the confluence of science and spirit.