Glossary

Inference vs. Training

The two phases of an AI model — with very different resource needs.

Training and inference are the two phases of an AI model — with very different hardware demands. Training is where the model learns from data: it's compute- and memory-heavy because gradients and optimizer states also live in VRAM , roughly two to four times the raw model size. Inference is the finished model in use: it only needs the weights plus context, so far less memory and compute. That's why a model an RTX 4090 infers comfortably can still hit a VRAM wall during training. A GPU server for AI handles both — just size the card for the more memory-hungry phase.

← Back to the glossary