How ChatGPT and GPT-4 Can Be Used for 3D Content Generation

Developing custom AI tools for 3D workflows is easy in NVIDIA Omniverse

NVIDIA Omniverse
12 min readMar 30, 2023

By: Mario Viviani, Manager, Developer Relations, NVIDIA Omniverse

Demand for 3D worlds and virtual environments is growing exponentially across the world’s industries. 3D workflows are core to industrial digitalization, developing real-time simulations to test and validate autonomous vehicles and robots, operating digital twins to optimize industrial manufacturing, and paving new paths for scientific discovery.

Today, 3D design and world building is still highly manual. While 2D artists and designers have been graced with assistant tools, 3D workflows remain filled with repetitive, tedious tasks.

Creating or finding objects for a scene is a time-intensive process that requires specialized 3D skills honed over time like modeling and texturing. Placing objects correctly and art directing a 3D environment to perfection requires hours of fine tuning.

To reduce manual, repetitive tasks and help creators and designers focus on the creative, enjoyable aspects of their work, NVIDIA has launched numerous AI projects like generative AI tools for virtual worlds.

The iPhone moment of AI

With ChatGPT, we are now experiencing the iPhone moment of AI, where individuals of all technical levels can interact with an advanced computing platform using everyday language. Large language models (LLMs) had been growing increasingly sophisticated, and when a user-friendly interface like ChatGPT made them accessible to everyone, it became the fastest-growing consumer application in history, surpassing 100 million users just two months after launching. Now, every industry is planning to harness the power of AI for a wide range of applications like drug discovery, autonomous machines, and avatar virtual assistants.

Recently, we experimented with OpenAI’s viral ChatGPT and new GPT-4 large multimodal model to show how easy it is to develop custom tools that can rapidly generate 3D objects for virtual worlds in NVIDIA Omniverse. Compared to ChatGPT, GPT-4 marks a “pretty substantial improvement across many dimensions,” said OpenAI co-founder Ilya Sutskever in a fireside chat with NVIDIA founder and CEO Jensen Huang at GTC 2023.

By combining GPT-4 with Omniverse DeepSearch, a smart AI librarian that’s able to search through massive databases of untagged 3D assets, we were able to quickly develop a custom extension that retrieves 3D objects with simple, text-based prompts and automatically add them to a 3D scene.

AI Room Generator Extension

This fun experiment in NVIDIA Omniverse, a development platform for 3D applications, shows developers and technical artists how easy it is to quickly develop custom tools that leverage generative AI to populate realistic environments. End users can simply enter text-based prompts to automatically generate and place high-fidelity objects, saving hours of time that would typically be required to create a complex scene.

Objects generated from the extension are based on Universal Scene Description (USD) SimReady assets. SimReady assets are physically-accurate 3D objects that can be used in any simulation and behave as they would in the real world.

Getting information about the 3D Scene

Everything starts with the USD scene in Omniverse. Users can easily circle an area using the Pencil tool in Omniverse, type in the kind of room/environment they want to generate — for example, a warehouse, or a reception room — and with one click that area is created.

Creating the Prompt for ChatGPT

The ChatGPT prompt is composed of four pieces: system input, user input example, assistant output example, and user prompt.

Let’s start with the aspects of the prompt that tailor to the user’s scenario. This includes text that the user inputs plus data from the scene.

For example, if the user wants to create a reception room, they specify something like “This is the room where we meet our customers. Make sure there is a set of comfortable armchairs, a sofa and a coffee table.” Or, if they want to add a certain number of items they could add “make sure to include a minimum of 10 items.”

This text is combined with scene information like the size and name of the area where we will place items as the User Prompt.

“Reception room, 7x10m, origin at (0.0,0.0,0.0). This is the room where we meet 
our customers. Make sure there is a set of comfortable armchairs, a sofa and a
coffee table”

This notion of combining the user’s text with details from the scene is powerful. It’s much simpler to select an object in the scene and programatically access its details than requiring the user to write a prompt to describe all these details. I suspect we’ll see a lot of Omniverse extensions that make use of this Text + Scene to Scene pattern.

Beyond the user prompt, we also need to prime ChatGPT with a system prompt and a shot or two for training.

In order to create predictable, deterministic results, the AI is instructed by the system prompt and examples to specifically return a JSON with all the information formatted in a well-defined way, so it can then be used in Omniverse.

Here are the four pieces of the prompt that we will send.

System Prompt

This sets the constraints and instructions for the AI

You are an area generator expert. Given an area of a certain size, you can generate a list of items that are appropriate to that area, in the right place.


You operate in a 3D Space. You work in a X,Y,Z coordinate system. X denotes width, Y denotes height, Z denotes depth. 0.0,0.0,0.0 is the default space origin.


You receive from the user the name of the area, the size of the area on X and Z axis in centimeters, the origin point of the area (which is at the center of the area).


You answer by only generating JSON files that contain the following information:


- area_name: name of the area
- X: coordinate of the area on X axis
- Y: coordinate of the area on Y axis
- Z: coordinate of the area on Z axis
- area_size_X: dimension in cm of the area on X axis
- area_size_Z: dimension in cm of the area on Z axis
- area_objects_list: list of all the objects in the area

For each object you need to store:
- object_name: name of the object
- X: coordinate of the object on X axis
- Y: coordinate of the object on Y axis
- Z: coordinate of the object on Z axis


Each object name should include an appropriate adjective.


Keep in mind, objects should be placed in the area to create the most meaningful layout possible, and they shouldn't overlap.
All objects must be within the bounds of the area size; Never place objects further than 1/2 the length or 1/2 the depth of the area from the origin.
Also keep in mind that the objects should be disposed all over the area in respect to the origin point of the area, and you can use negative values as well to display items correctly, since the origin of the area is always at the center of the area.


Remember, you only generate JSON code, nothing else. It's very important.

User Input Example

This is an example of what a user might submit. Notice that it’s a combination of data from the scene and text prompt.

"Reception room, 7x10m, origin at (0.0,0.0,0.0). This is the room where we meet 
our customers. Make sure there is a set of comfortable armchairs, a sofa and a
coffee table"

Assistant Output Example

This provides a template that the AI must use. Notice how we’re describing the exact JSON we expect.

{
"area_name": "Reception",
"X": 0.0,
"Y": 0.0,
"Z": 0.0,
"area_size_X": 700,
"area_size_Z": 1000,
"area_objects_list": [
{
"object_name": "White_Round_Coffee_Table",
"X": -120,
"Y": 0.0,
"Z": 130
},
{
"object_name": "Leather_Sofa",
"X": 250,
"Y": 0.0,
"Z": -90
},
{
"object_name": "Comfortable_Armchair_1",
"X": -150,
"Y": 0.0,
"Z": 50
},
{
"object_name": "Comfortable_Armchair_2",
"X": -150,
"Y": 0.0,
"Z": -50
} ]
}

Connecting to OpenAI

This prompt is sent to the AI from the Extension via Python code. This is quite easy in Omniverse Kit and can be done with just a couple commands using the latest OpenAI Python Library. Notice that we are passing to the OpenAI API the system input, the sample user input and the sample expected assistant output we have just outlined. The variable “response” will contain the expected response from ChatGPT.

# Create a completion using the chatGPT model   
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
# if you have access, you can swap to model="gpt-4",
messages=[
{"role": "system", "content": system_input},
{"role": "user", "content": user_input},
{"role": "assistant", "content": assistant_input},
{"role": "user", "content": my_prompt},
]
)
# parse response and extract text
text = response["choices"][0]["message"]['content']

Passing the result from ChatGPT to Omniverse DeepSearch API and generating the scene

The items from the ChatGPT JSON response are then parsed by the extension and passed to the Omnivere DeepSearch API. DeepSearch allows users to search 3D models stored within an Omniverse Nucleus server using natural language queries.

This means that even if we don’t know the exact file name of a model of a sofa, for example, we can retrieve it just by searching for “Comfortable Sofa” which is exactly what we got from ChatGPT.

DeepSearch understands natural language and by asking it for a “Comfortable Sofa” we get a list of items that our helpful AI librarian has decided are best suited from the selection of assets we have in our current asset library. It is surprisingly good at this and so we often can use the first item it returns, but of course we build in choice in case the user wants to select something from the list.

From there, we simply add the object to the stage.

Adding items from DeepSearch into Omniverse stage

Now that DeepSearch has returned results, we just need to place the objects in Omniverse. In our extension, we created a function called place_deepsearch_results() that processes all the items and places them in the scene.

def place_deepsearch_results(gpt_results, query_result, root_prim_path):
index = 0
for item in query_result:
# Define Prim
stage = omni.usd.get_context().get_stage()

prim_parent_path = root_prim_path + item[‘object_name’].replace(" ", "_")
parent_xForm = UsdGeom.Xform.Define(stage, prim_parent_path)

prim_path = prim_parent_path + "/" + item[‘object_name’].replace(" ", "_")
next_prim = stage.DefinePrim(prim_path, 'Xform')


# Add reference to USD Asset
references: Usd.references = next_prim.GetReferences()

references.AddReference(
assetPath="your_server://your_asset_folder" + item[‘asset_path’])


# Add reference for future search refinement
config = next_prim.CreateAttribute("DeepSearch:Query", Sdf.ValueTypeNames.String)
config.Set(item[‘object_name’])

# translate prim
next_object = gpt_results[index]
index = index + 1
x = next_object['X']
y = next_object['Y']
z = next_object['Z']

This method to place items, iterates over the query_result items that we got from GPT, creating and defining new primitives using the USD API, setting their transformations and attributes based on data in gpt_results. We also save the DeepSearch query in an attribute in the USD, so it can be used afterwards in case we want to run DeepSearch again. Note that the assetPath “your_server//your_asset_folder” is a placeholder and should be substituted with the real path of the folder where DeepSearch is performed.

And voila! We have our AI-generated scene in Omniverse!

Swapping items using DeepSearch

However, we might not like all the items that are retrieved the first time. So, we built a small companion extension to allow users to browse for similar objects and swap them in with just a click. With Omniverse, it is very easy to build in a modular way so you can easily extend your workflows with additional extensions.

This companion extension is quite simple. It takes as argument an object generated via DeepSearch, and offers two buttons to get the next or previous object from the related DeepSearch query. For example, if the USD file contained the Attribute “DeepSearch:Query = Modern Sofa”, it would run this search again via DeepSearch and get the next best result. You could of course extend this to be a visual UI with pictures of all the search results similar to the window we use for general DeepSearch queries. To keep this example simple, we just opted for two simple buttons.

See the code below that shows the functions to increment the index, and the function replace_reference(self) that is actually operating the swap of the object based on the index.

def increment_prim_index():
if self._query_results is None:
return


self._index = self._index + 1


if self._index >= len(self._query_results.paths):
self._index = 0


self.replace_reference()


def replace_reference(self):
references: Usd.references = self._selected_prim.GetReferences()
references.ClearReferences()
references.AddReference(
assetPath="your_server://your_asset_folder" + self._query_results.paths[self._index].uri)

Note that, as above, the path “your_server://your_asset_folder” is just a placeholder, and you should replace it with the Nucleus folder where your DeepSearch query gets performed.

A gray couch swapped in for the brown couch using DeepSearch

This shows how by combining the power of LLMs and Omniverse APIs it is possible to create tools that power creativity and speed up processes.

From ChatGPT to GPT-4

One of the main advancements in OpenAI’s new GPT-4 is its increased spatial awareness in large language models.

We initially used ChatGPT API, which is based on GPT-3.5-turbo. While it offered good spatial awareness, GPT-4 offers much better results. The version you see in the video above is using GPT-4.

GPT-4 is vastly improved in respect to GPT-3.5 at solving complex tasks and comprehending complex instructions. Therefore we could be much more descriptive and use natural language when engineering the text prompt to “instruct the AI”

We could give the AI very explicit instructions like:

  • “Each object name should include an appropriate adjective.”
  • “Keep in mind, objects should be placed in the area to create the most meaningful layout possible, and they shouldn’t overlap.”
  • “All objects must be within the bounds of the area size; Never place objects further than 1/2 the length or 1/2 the depth of the area from the origin.”
  • “Also keep in mind that the objects should be placed all over the area in respect to the origin point of the area, and you can use negative values as well to display items correctly, since the origin of the area is always at the center of the area.”

The fact that these system prompts are being appropriately followed by the AI when generating the response is particularly impressive, as the AI demonstrates to have a good understanding of spatial awareness and how to properly place items. One of the challenges of using GPT-3.5 for this task is that sometimes objects were spawned outside the room, or at odd placements.

GPT-4 not only places objects within the right boundaries of the room, but also places objects logically: a bedside table will actually show up on the side of a bed, a coffee table will be placed in between two sofas, and so on.

With this, we’re likely just scratching the surface of what LLMs can do in 3D spaces!

Building your own ChatGPT-powered extension

While this is just a small demo of what AI can do once it’s connected to a 3D space, we believe it will open doors to a wide range of tools beyond scene building. Developers can build AI-powered extensions in Omniverse for lighting, cameras, animations, character dialog, and other elements that optimize creator workflows. They can even develop tools to attach physics to scenes and run entire simulations.

You can download and experiment with the AI Room Generator Extension Sample on GitHub. We encourage other developers to try building on the extension or creating their own generative AI extensions for Omniverse.

Using Omniverse Kit, you can start integrating AI into your extensions today. Download Omniverse to get started.

Get started with NVIDIA Omniverse by downloading the standard license free, or learn how Omniverse Enterprise can connect your team. If you are a developer, get started with Omniverse resources. Stay up to date on the platform by subscribing to the newsletter, and following NVIDIA Omniverse on Instagram, Medium, and Twitter. For resources, check out our forums, Discord server, Twitch, and YouTube channels.

About the Author

Mario Viviani is Manager of Developer Relations for Omniverse at NVIDIA, based in London, UK. His team focuses on helping developers and partners get familiar and onboard on NVIDIA Omniverse. Passionate technologist and hands on-developer, he’s ex-Amazon, where he led the global Apps and Games Tech Evangelism team; previously was co-founder of startups and led his own consulting company in mobile apps development. He is always projected into the future and into what is the next “big thing”!

--

--

NVIDIA Omniverse

Learn from the developers and luminaries leveraging Universal Scene Description (OpenUSD) to build 3D workflows and metaverse applications.