Documentation Index
Fetch the complete documentation index at: https://forge-64364c0e.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
GPU Setup
ForgeAI uses GPU acceleration for inference (Test module), fine-tuning (Training module), and model conversion (Convert module).Auto-Detection
Go to Settings (07) to see your detected hardware:| Field | Description |
|---|---|
| NVIDIA | GPU name, VRAM, CUDA version |
| VULKAN | Cross-platform GPU API support |
| METAL | Apple Silicon support (macOS) |
llama.cpp Variants
For GGUF inference and quantization, install the appropriate llama.cpp variant:- CUDA (NVIDIA)
- Vulkan (Cross-platform)
- CPU
Fastest option for NVIDIA GPUs.Requirements:
- NVIDIA GPU (GTX 1060+ / RTX series)
- NVIDIA drivers 515+
Python Environments (Training & Convert)
ForgeAI manages two separate Python virtual environments, each with GPU-aware PyTorch:Training Environment
Used by the Training module for LoRA, QLoRA, SFT, DPO, and full fine-tuning:- NVIDIA GPU detected: PyTorch is installed with CUDA support automatically during setup
- No GPU: CPU-only PyTorch (training will be slow)
- Includes:
transformers,peft,trl,bitsandbytes,datasets
Convert Environment
Used by the Convert module for SafeTensors-to-GGUF conversion and by Test for SafeTensors inference:- NVIDIA GPU detected: PyTorch is installed with CUDA support automatically during setup
- No GPU: CPU-only PyTorch is installed
- OOM fallback: If the model doesn’t fit in GPU VRAM, ForgeAI automatically falls back to CPU inference
Both environments can be managed (viewed, cleaned, deleted) in Settings.
VRAM Requirements
Inference
Approximate VRAM needed to load models on GPU:| Model Size | Q4_K_M | Q8_0 | F16 |
|---|---|---|---|
| 7B | ~4.5 GB | ~7.5 GB | ~14 GB |
| 13B | ~8 GB | ~14 GB | ~26 GB |
| 70B | ~40 GB | ~70 GB | ~140 GB |
Training
Approximate VRAM needed for fine-tuning a 7B model:| Method | Minimum VRAM | Recommended |
|---|---|---|
| QLoRA (4-bit) | 4 GB | 8 GB |
| LoRA | 6 GB | 12 GB |
| SFT | 8 GB | 16 GB |
| DPO | 8 GB | 16 GB |
| Full Fine-Tune | 16 GB | 24+ GB |
Troubleshooting
| Issue | Solution |
|---|---|
| GPU not detected | Update NVIDIA/Vulkan drivers |
| CUDA variant fails to install | Ensure NVIDIA drivers are 515+ |
| Slow inference despite GPU | Check Settings to confirm CUDA/Vulkan variant is installed, not CPU |
| Out of memory (inference) | Use a smaller quantization (Q4_K_M instead of Q8_0) or switch to CPU |
| Out of memory (training) | Switch to QLoRA, reduce batch size or max_seq_length |
| SafeTensors shows CPU device | Re-run Convert module setup to reinstall PyTorch with CUDA |
| Training not using GPU | Verify CUDA is available in Settings → Training Environment |