Tutorial

How to Install Stable Diffusion: ComfyUI on a GPU Server

This tutorial walks you through setting up Stable Diffusion with ComfyUI on a dedicated GPU server — from the VRAM check through models and LoRAs to a locked-down remote access.

See the Stable Diffusion server

ComfyUI is the most flexible front end for Stable Diffusion: a node-based editor that covers everything from simple text-to-image workflows to complex pipelines with ControlNet, upscaling and video. On a dedicated RTX 4090 it renders briskly — no shared GPU, no queue, no preemption .

Check GPU and VRAM

How much VRAM you need depends on the model. SDXL runs comfortably from around 12 GB, and below that with optimizations. Flux and demanding pipelines that keep several models in memory at once benefit noticeably from 24 GB (RTX 4090) or 32 GB ( RTX 5090 ). Our guide sizing GPU and VRAM correctly covers the fundamentals.

VRAM guidelines for image generation
Model / workflow	Recommended VRAM	Suitable card
SDXL, simple text-to-image	~12 GB+	RTX 4090
SDXL with ControlNet / upscaling	~16–24 GB	RTX 4090
Flux and large pipelines	24–32 GB	RTX 4090 / 5090
Video and multi-model workflows	32 GB	RTX 5090

How to set up ComfyUI — step by step

Provision the GPU server: pick a dedicated RTX 4090 (24 GB) or RTX 5090 (32 GB) with a recent Linux and root access.
Install driver and CUDA: apply the NVIDIA driver, confirm it with the nvidia-smi command, and provide the CUDA version that matches your PyTorch build.
Install ComfyUI: clone the repository, create an isolated Python environment (venv or conda), and install the dependencies including a GPU-enabled PyTorch.
Load models: place checkpoints such as SDXL or Flux into the models/checkpoints folder; copy the VAE and other components to their designated paths.
Add LoRAs and extensions: drop LoRA files into models/loras and install additional custom nodes via the ComfyUI Manager as needed.
Secure remote access: bind ComfyUI to localhost and expose it only through a reverse proxy with TLS and authentication — never put it on the network unprotected.
Test the first workflow: load a default text-to-image graph, set a prompt, start generation, and watch VRAM usage with nvidia-smi.

Placing models, VAEs and LoRAs correctly

Stable Diffusion thrives on model economics: a base checkpoint sets the style, a matching VAE the color rendering, LoRAs add concepts or characters. What matters is that each file sits in the right folder — otherwise ComfyUI simply will not find it in the loader node.

Checkpoints (SDXL, Flux) belong in models/checkpoints and then appear directly in the loader node.
A separate VAE can improve colors and contrast; place it in models/vae and select it in the graph.
Put LoRAs into models/loras and apply them with sensible weights — values that are too high visibly over-bake the image.
Upscaler models and ControlNet files have their own folders; the ComfyUI Manager helps install missing nodes.

Speed up rendering

Speed comes down to three factors: model size, resolution and sampler steps. On a dedicated GPU with no shared resources, performance stays constant — no drop from neighbours on the same card.

Resolution in stages: generate at a base resolution first, then upscale deliberately, rather than rendering everything in one expensive pass.
Tune sampler and steps: beyond a point, more steps add barely visible gains while still costing time.
Keep an eye on VRAM: when it runs tight, a leaner attention setting helps, or move to the RTX 5090 with 32 GB .

Frequently asked questions

How much VRAM does SDXL or Flux need?

Why ComfyUI instead of AUTOMATIC1111?

Can I use ComfyUI securely from remote?

Is the RTX 4090 enough for image generation?