The Ultimate Glossary of 3D Asset and Scene Generation Models (Jan. 2023)

6 min readJan 11, 2023

Thanks to text-to-image models like CLIP, DALL-E, and Stable Diffusion, as well as unicorns like Midjourney, 2022, we have seen an explosion in human creativity. 2022 was only the trailer for what was to come in 2023. If you are a researcher, entrepreneur, or futurist, you want to keep a close eye on the upcoming 3D models. How games, VFX, film production, and the metaverse are made is about to change significantly. The next unicorn lies in one or several of the models in this article. The main challenge with developing 3-D A.I. models is that there isn’t enough data. Existing 3-D datasets are much smaller than 2-D datasets and have less variation.

I thank Scy from our team at Seyhan Lee for helping me create The Ultimate Glossary of Text-to-3D Generative A.I. Models (Jan. 2023).

I made sure to include examples with pictures whenever I could — so that your one-scroll-down experience would be more enjoyable. Save this article, mark it as a favorite, or put it under your pillow so it can always be within reach.

Here goes nothing…
Godbless,
Pinar

Open-source

Text2Mesh

Just like Clip Matrix, text2mesh uses a base mesh that can be changed and textured by using a text prompt.

Text2Mesh Text-Driven Neural Stylization for Meshes

Text2Mesh produces color and geometric details over a variety of source meshes driven by a target text prompt. Our…

threedle.github.io

Clip Matrix

Home

This is a collection of 500 unique animated 3D creatures, created by AI-Human collaboration. Each of them is shown…

clipmatrix.wordpress.com

https://arxiv.org/pdf/2109.12922.pdf
This paper by Nikolay Jetchev has a very special place in my heart. Not only is he an extraordinary human being, but we have also been looking into ways to advance and commercialize Clip Matrix since the early days of his experiments in 2021 before he released the paper later in 2022. I can attest that Nikolay was experimenting with text-to-3D creations; before 90% of the models in this article were published.

“The Bear Warrior, in Helmet and Armor” by Nikolaj Jetchev by using Clip Matrix

CLIP-Mesh

Text to Mesh

We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D…

www.nasir.lol

CLIP Forge (Autodesk)

GitHub - AutodeskAILab/Clip-Forge

Generating shapes using natural language can enable new ways of imagining and creating the things around us. While…

github.com

Stable-dreamfusion (based on Google’s DreamFusion)

GitHub - ashawkey/stable-dreamfusion: A pytorch implementation of text-to-3D dreamfusion, powered…

A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. The…

github.com

Point-E (OpenAI)

This model turns a caption into a point cloud, which is a step along the way to making high-quality 3-D meshes. When imported into a 3-D program, it still needs to be converted, but it does so much faster than most text-to-3-D models.

GitHub - openai/point-e: Point cloud diffusion for 3D model synthesis

This is the official code and model release for Point-E: A System for Generating 3D Point Clouds from Complex Prompts…

github.com

GET3D (Nvidia)

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that…

nv-tlabs.github.io

Dreamfields (Google)

Zero-Shot Text-Guided Object Generation with Dream Fields

CVPR 2022 and AI4CC 2022 (Best Poster) UC Berkeley, Google Research We combine neural rendering with multi-modal image…

ajayj.com

Google Colaboratory

Edit description

colab.research.google.com

Zero-Shot Text-Guided Object Generation with Dream Fields

Gaudi (Apple)

GitHub - apple/ml-gaudi

Samples from GAUDI (Allow a couple minutes of loading time for videos.) Miguel Angel Bautista, Pengsheng Guo, Samira…

github.com

MDM: Human Motion Diffusion Model

Natural and expressive human motion generation is the holy grail of computer animation. It is a challenging task, due…

guytevet.github.io

Closed tools

Imagine 3D (Luma AI)

Luma AI - Imagine 3D (alpha)

Text to 3D with Luma AI

captures.lumalabs.ai

Luma is currently offering early access to their models. So far, based on the results they showcase on their website, the fidelity is on the high side compared to other models.

Magic3D (Nvidia)

https://deepimagination.cc/Magic3D/
This model is one of the more advanced ones on the list due to its additional functionalities. You can edit the output with text and use an image input to guide the output.

Magic3D: High-Resolution Text-to-3D Content Creation

DreamFusion (Google)

DreamFusion is based on Google’s private 2-D image generation tool: Imagen, and Neural Radiance Fields (Nerf). Score distillation sampling (SDS) is used to turn the output of the diffusion model into a 3D model.

DreamFusion: Text-to-3D using 2D Diffusion

Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text…

dreamfusion3d.github.io

DreamFusion: Text-to-3D using 2D Diffusion

LION: Latent Point Diffusion Models for 3D Shape Generation (Nvidia)

LION: Latent Point Diffusion Models for 3D Shape Generation

Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make…

nv-tlabs.github.io

LION: Latent Point Diffusion Models for 3D Shape Generation

Other papers (More abstract and theoretical)

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

Learning Spatial Knowledge for Text to 3D Scene Generation

Text to 3D Scene Generation with Rich Lexical Grounding

Text-Driven 3D Photo-Realistic Talking Head

3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows

Text-driven 3D Avatar Animation with Emotional and Expressive Behaviors

Text-driven 3D Avatar Animation with Emotional and Expressive Behaviors | Proceedings of the 29th…

Text-driven 3D avatar animation has been an essential part of virtual human techniques, which has a wide range of…

dl.acm.org

Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences

Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and…

Han, Z., Shang, M., Wang, X., Liu, Y.-S., & Zwicker, M. (2019). Y2Seq2Seq: Cross-Modal Representation Learning for 3D…

ojs.aaai.org

The Ultimate Glossary of 3D Asset and Scene Generation Models (Jan. 2023)

Open-source

Text2Mesh

Text2Mesh Text-Driven Neural Stylization for Meshes

Text2Mesh produces color and geometric details over a variety of source meshes driven by a target text prompt. Our…

Clip Matrix

Home

This is a collection of 500 unique animated 3D creatures, created by AI-Human collaboration. Each of them is shown…

CLIP-Mesh

Text to Mesh

We present a technique for zero-shot generation of a 3D model using only a target text prompt. Without any 3D…

CLIP Forge (Autodesk)

GitHub - AutodeskAILab/Clip-Forge

Generating shapes using natural language can enable new ways of imagining and creating the things around us. While…

Stable-dreamfusion (based on Google’s DreamFusion)

GitHub - ashawkey/stable-dreamfusion: A pytorch implementation of text-to-3D dreamfusion, powered…

A pytorch implementation of the text-to-3D model Dreamfusion, powered by the Stable Diffusion text-to-2D model. The…

Point-E (OpenAI)

GitHub - openai/point-e: Point cloud diffusion for 3D model synthesis

This is the official code and model release for Point-E: A System for Generating 3D Point Clouds from Complex Prompts…

GET3D (Nvidia)

GET3D: A Generative Model of High Quality 3D Textured Shapes Learned from Images

As several industries are moving towards modeling massive 3D virtual worlds, the need for content creation tools that…

Dreamfields (Google)

Zero-Shot Text-Guided Object Generation with Dream Fields

CVPR 2022 and AI4CC 2022 (Best Poster) UC Berkeley, Google Research We combine neural rendering with multi-modal image…

Google Colaboratory

Edit description

Gaudi (Apple)

GitHub - apple/ml-gaudi

Samples from GAUDI (Allow a couple minutes of loading time for videos.) Miguel Angel Bautista*, Pengsheng Guo*, Samira…

MDM: Human Motion Diffusion Model

MDM: Human Motion Diffusion Model

Natural and expressive human motion generation is the holy grail of computer animation. It is a challenging task, due…

Closed tools

Imagine 3D (Luma AI)

Luma AI - Imagine 3D (alpha)

Text to 3D with Luma AI

Magic3D (Nvidia)

DreamFusion (Google)

DreamFusion: Text-to-3D using 2D Diffusion

Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text…

LION: Latent Point Diffusion Models for 3D Shape Generation (Nvidia)

LION: Latent Point Diffusion Models for 3D Shape Generation

Denoising diffusion models (DDMs) have shown promising results in 3D point cloud synthesis. To advance 3D DDMs and make…

Other papers (More abstract and theoretical)

CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields

3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models

Learning Spatial Knowledge for Text to 3D Scene Generation

Text to 3D Scene Generation with Rich Lexical Grounding

Text-Driven 3D Photo-Realistic Talking Head

3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows

Text-driven 3D Avatar Animation with Emotional and Expressive Behaviors

Text-driven 3D Avatar Animation with Emotional and Expressive Behaviors | Proceedings of the 29th…

Text-driven 3D avatar animation has been an essential part of virtual human techniques, which has a wide range of…

Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences

Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and…

Han, Z., Shang, M., Wang, X., Liu, Y.-S., & Zwicker, M. (2019). Y2Seq2Seq: Cross-Modal Representation Learning for 3D…

Written by Pinar Seyhan Demirdag

Samples from GAUDI (Allow a couple minutes of loading time for videos.) Miguel Angel Bautista, Pengsheng Guo, Samira…