7 Jul 2025

Applying Stable Diffusion to Viewer scenes through Comfy UI: Desktop

thumbnail

Introduction

Rendering is a required and sometimes painful part of the process for designers. With a quick search using the words AI and render, you can find a set of solutions already addressing this process with the help of AI models. In this blog, we'll share one option to generate a photorealistic scene starting with one image generated from a scene rendered with APS Viewer as input. We will combine Stable Diffusion models with Viewer scenes using Comfy UI in our Desktop.

We'll share a second blog post with this approach where we'll apply the same concepts in the cloud.

The work shared in this blog post has been presented at our past events below:

  1. DevCon 2024 (Munich and Boston)
  2. Autodesk University 2024: Introducing AI Rendering: From Okay to Wow! | Autodesk University

What we'll be using

Before we deep dive into the method used, let's introduce some concepts we'll be referring to in this blog:

  • Comfy UI is a tool that lets you design and execute advanced stable diffusion pipelines using a graph/nodes/flowchart based interface.

  • Viewer SDK is Autodesk's powerful JavaScript library for creating applications to view and interact with 2D and 3D design models directly on websites.

Combining everything

First thing you'll need to do is install Comfy UI.

Once you have it installed, you can then start applying the models required.

Feel free to explore the options available through the Hugging Face.

In our sample we used the model epicrealism_naturalSinRC1VAE.safetensors to generate the image from our prompt.

Using Comfy UI we can leverage an input image to influence our output, and in this sample we leverage the Viewer scene as input.

To achieve that you can simply use Viewer's getScreenShot method.

Hold on and use our sample shared at the end of this blog post ;)

And with that we have an output like the one below:

simple output

Improving the result

As you can see, there's still a lot of hallucination if you look through the windows of the scene and pay attention to details such as the stairs from the right side of our scene (there are no stairs in the output).

To improve our workflow, we can apply a ControlNet.

You can think of ControlNet as a translation assistant that converts our reference image into instructions that AI can understand, transmitting them to the AI model to generate images that meet our requirements.

In this sample, we utilize ControlNets based on Depth and Normal maps from the input image.

You can find the ControlNets used at lllyasviel/ControlNet-v1-1 at main:

The Depth and Normal maps are acquired from the Viewer scene using the snippet available in our sample (it all started with this request)

You can retrieve all of that using our sample

The workflow to apply this ControlNet is described in the complete demo video ;)

And that is a wrap!

You can find the complete demo video and source code below:

DEMO

SOURCE

Stay tuned for the second part of this workflow ;)

Related Article