Skip to main content

GPU Setup

ForgeAI uses GPU acceleration for inference (Test module), fine-tuning (Training module), and model conversion (Convert module).

Auto-Detection

Go to Settings (07) to see your detected hardware:
FieldDescription
NVIDIAGPU name, VRAM, CUDA version
VULKANCross-platform GPU API support
METALApple Silicon support (macOS)

llama.cpp Variants

For GGUF inference and quantization, install the appropriate llama.cpp variant:
Fastest option for NVIDIA GPUs.Requirements:
  • NVIDIA GPU (GTX 1060+ / RTX series)
  • NVIDIA drivers 515+
In Settings → llama.cpp Tools → select CUDADOWNLOAD & INSTALL

Python Environments (Training & Convert)

ForgeAI manages two separate Python virtual environments, each with GPU-aware PyTorch:

Training Environment

Used by the Training module for LoRA, QLoRA, SFT, DPO, and full fine-tuning:
  • NVIDIA GPU detected: PyTorch is installed with CUDA support automatically during setup
  • No GPU: CPU-only PyTorch (training will be slow)
  • Includes: transformers, peft, trl, bitsandbytes, datasets

Convert Environment

Used by the Convert module for SafeTensors-to-GGUF conversion and by Test for SafeTensors inference:
  • NVIDIA GPU detected: PyTorch is installed with CUDA support automatically during setup
  • No GPU: CPU-only PyTorch is installed
  • OOM fallback: If the model doesn’t fit in GPU VRAM, ForgeAI automatically falls back to CPU inference
Both environments can be managed (viewed, cleaned, deleted) in Settings.

VRAM Requirements

Inference

Approximate VRAM needed to load models on GPU:
Model SizeQ4_K_MQ8_0F16
7B~4.5 GB~7.5 GB~14 GB
13B~8 GB~14 GB~26 GB
70B~40 GB~70 GB~140 GB
If your model exceeds VRAM, GGUF inference via llama.cpp can offload some layers to CPU RAM. SafeTensors inference will auto-fallback to full CPU mode.

Training

Approximate VRAM needed for fine-tuning a 7B model:
MethodMinimum VRAMRecommended
QLoRA (4-bit)4 GB8 GB
LoRA6 GB12 GB
SFT8 GB16 GB
DPO8 GB16 GB
Full Fine-Tune16 GB24+ GB
Layer surgery (remove/duplicate layers) is pure Rust and requires no GPU or Python — it works on any system.

Troubleshooting

IssueSolution
GPU not detectedUpdate NVIDIA/Vulkan drivers
CUDA variant fails to installEnsure NVIDIA drivers are 515+
Slow inference despite GPUCheck Settings to confirm CUDA/Vulkan variant is installed, not CPU
Out of memory (inference)Use a smaller quantization (Q4_K_M instead of Q8_0) or switch to CPU
Out of memory (training)Switch to QLoRA, reduce batch size or max_seq_length
SafeTensors shows CPU deviceRe-run Convert module setup to reinstall PyTorch with CUDA
Training not using GPUVerify CUDA is available in Settings → Training Environment