Skip to main content

Training & Fine-Tuning Guide

This guide covers everything you need to know about training models in ForgeAI — from basic LoRA fine-tuning to advanced capability-targeted training and layer surgery.

Prerequisites

Hardware Requirements

MethodMinimum VRAMRecommended
QLoRA (4-bit)4 GB8 GB
LoRA6 GB12 GB
SFT8 GB16 GB
DPO8 GB16 GB
Full Fine-Tune16 GB24+ GB

Software Requirements

  • Python 3.10+ (installed on your system)
  • ForgeAI handles the rest: creates a virtual environment and installs PyTorch, Transformers, PEFT, TRL, and BitsAndBytes automatically.
Layer surgery requires no Python or GPU — it’s pure Rust.

Choosing a Training Method

Low-Rank Adaptation trains small adapter matrices alongside frozen base weights. Best balance of quality and efficiency. When to use: General fine-tuning, instruction tuning, task adaptation.

QLoRA (Best for Low VRAM)

Same as LoRA but quantizes the base model to 4-bit, dramatically reducing VRAM usage with minimal quality impact. When to use: Limited GPU memory (4–8 GB), large models.

SFT (Standard Training)

Supervised Fine-Tuning on instruction/completion datasets. All parameters are updated. When to use: When you have a large, high-quality dataset and sufficient VRAM.

DPO (Preference Learning)

Direct Preference Optimization learns from chosen/rejected response pairs — no reward model needed. When to use: Alignment, preference tuning, RLHF-style training.

Full Fine-Tune

Updates every parameter in the model. Maximum quality but maximum VRAM. When to use: When LoRA quality isn’t sufficient and you have abundant GPU memory.

Capability-Targeted Training

Instead of fine-tuning every layer, ForgeAI can target layers responsible for specific capabilities:

How It Works

  1. ForgeAI analyzes the model architecture and maps layers to capabilities
  2. You select which capabilities to train (e.g., “Code Generation” + “Reasoning”)
  3. Only layers associated with those capabilities are included in training
  4. Other layers remain frozen, preserving existing knowledge

Available Capabilities

CapabilityLayer PositionExample Use Case
Tool CallingUpper-midTeach function calling
ReasoningMid-upperImprove logic/CoT
CodeUpper-midBetter code output
MathMidMathematical ability
MultilingualEarly-midAdd languages
InstructionMidFollow instructions
SafetyFinalAlignment tuning
Capability targeting reduces training time and preserves the model’s existing knowledge in untargeted areas.

Preparing Datasets

Supported Templates

ForgeAI auto-detects your dataset format:
TemplateRequired ColumnsTraining Methods
Alpacainstruction, input, outputSFT, LoRA, QLoRA, Full
ShareGPTconversationsSFT, LoRA, QLoRA, Full
ChatMLmessagesSFT, LoRA, QLoRA, Full
DPOprompt, chosen, rejectedDPO
TexttextSFT, LoRA, QLoRA, Full
Prompt/Completionprompt, completionSFT, LoRA, QLoRA, Full

Dataset Tips

  • Use DataStudio (10) to explore and validate your dataset before training
  • For DPO, ensure each row has both chosen and rejected responses
  • Longer sequences require more VRAM — reduce max_seq_length if running out of memory
  • Parquet is the most efficient format for large datasets

Training Presets

PresetVRAMMethodRankSeq LenBest For
LOW VRAM~4 GBQLoRA8256Tight GPU budget
BALANCED~6 GBQLoRA16512General purpose
QUALITY~12 GBLoRA321024High-quality output
MAX QUALITY~24 GBLoRA642048Maximum quality

Layer Surgery

Layer surgery is a separate mode that operates directly on model tensors — no training, no GPU, no Python.

Remove Layers

Remove layers to create smaller, faster models. Useful for:
  • Creating smaller test models
  • Removing redundant layers
  • Reducing inference latency

Duplicate Layers

Duplicate layers to increase model depth. Useful for:
  • Expanding model capacity
  • Experimental architecture modifications

Surgery Process

  1. Select a model (GGUF or SafeTensors)
  2. Load layer details to see full tensor breakdown
  3. Select layers to remove or positions to duplicate
  4. Review the preview (final layer count, estimated size)
  5. Run surgery — a new file is created, the original is never modified
  6. ForgeAI automatically updates config.json / GGUF metadata
Removing too many layers will significantly degrade model quality. Start with removing 1–2 layers and test output quality.

Troubleshooting

IssueSolution
”Python not found”Install Python 3.10+ and ensure it’s in PATH
CUDA out of memorySwitch to QLoRA, reduce batch size, or reduce max_seq_length
Slow training on GPUVerify CUDA is available in Settings → Training Environment
Training loss not decreasingTry lower learning rate, more epochs, or different preset
”No target modules found”Model architecture may not support LoRA — try Full Fine-Tune
Surgery output has errorsAvoid removing embedding/output layers (first/last)