Training & Fine-Tuning Guide

This guide covers everything you need to know about training models in ForgeAI — from basic LoRA fine-tuning to advanced capability-targeted training and layer surgery.

Prerequisites

Hardware Requirements

Method	Minimum VRAM	Recommended
QLoRA (4-bit)	4 GB	8 GB
LoRA	6 GB	12 GB
SFT	8 GB	16 GB
DPO	8 GB	16 GB
Full Fine-Tune	16 GB	24+ GB

Software Requirements

Python 3.10+ (installed on your system)
ForgeAI handles the rest: creates a virtual environment and installs PyTorch, Transformers, PEFT, TRL, and BitsAndBytes automatically.

Layer surgery requires no Python or GPU — it’s pure Rust.

Choosing a Training Method

LoRA (Recommended for Most Users)

Low-Rank Adaptation trains small adapter matrices alongside frozen base weights. Best balance of quality and efficiency. When to use: General fine-tuning, instruction tuning, task adaptation.

QLoRA (Best for Low VRAM)

Same as LoRA but quantizes the base model to 4-bit, dramatically reducing VRAM usage with minimal quality impact. When to use: Limited GPU memory (4–8 GB), large models.

SFT (Standard Training)

Supervised Fine-Tuning on instruction/completion datasets. All parameters are updated. When to use: When you have a large, high-quality dataset and sufficient VRAM.

DPO (Preference Learning)

Direct Preference Optimization learns from chosen/rejected response pairs — no reward model needed. When to use: Alignment, preference tuning, RLHF-style training.

Full Fine-Tune

Updates every parameter in the model. Maximum quality but maximum VRAM. When to use: When LoRA quality isn’t sufficient and you have abundant GPU memory.

Capability-Targeted Training

Instead of fine-tuning every layer, ForgeAI can target layers responsible for specific capabilities:

How It Works

ForgeAI analyzes the model architecture and maps layers to capabilities
You select which capabilities to train (e.g., “Code Generation” + “Reasoning”)
Only layers associated with those capabilities are included in training
Other layers remain frozen, preserving existing knowledge

Available Capabilities

Capability	Layer Position	Example Use Case
Tool Calling	Upper-mid	Teach function calling
Reasoning	Mid-upper	Improve logic/CoT
Code	Upper-mid	Better code output
Math	Mid	Mathematical ability
Multilingual	Early-mid	Add languages
Instruction	Mid	Follow instructions
Safety	Final	Alignment tuning

Capability targeting reduces training time and preserves the model’s existing knowledge in untargeted areas.

Preparing Datasets

Supported Templates

ForgeAI auto-detects your dataset format:

Template	Required Columns	Training Methods
Alpaca	instruction, input, output	SFT, LoRA, QLoRA, Full
ShareGPT	conversations	SFT, LoRA, QLoRA, Full
ChatML	messages	SFT, LoRA, QLoRA, Full
DPO	prompt, chosen, rejected	DPO
Text	text	SFT, LoRA, QLoRA, Full
Prompt/Completion	prompt, completion	SFT, LoRA, QLoRA, Full

Dataset Tips

Use DataStudio (10) to explore and validate your dataset before training
For DPO, ensure each row has both chosen and rejected responses
Longer sequences require more VRAM — reduce max_seq_length if running out of memory
Parquet is the most efficient format for large datasets

Training Presets

Preset	VRAM	Method	Rank	Seq Len	Best For
LOW VRAM	~4 GB	QLoRA	8	256	Tight GPU budget
BALANCED	~6 GB	QLoRA	16	512	General purpose
QUALITY	~12 GB	LoRA	32	1024	High-quality output
MAX QUALITY	~24 GB	LoRA	64	2048	Maximum quality

Layer Surgery

Layer surgery is a separate mode that operates directly on model tensors — no training, no GPU, no Python.

Remove Layers

Remove layers to create smaller, faster models. Useful for:

Creating smaller test models
Removing redundant layers
Reducing inference latency

Duplicate Layers

Duplicate layers to increase model depth. Useful for:

Expanding model capacity
Experimental architecture modifications

Surgery Process

Select a model (GGUF or SafeTensors)
Load layer details to see full tensor breakdown
Select layers to remove or positions to duplicate
Review the preview (final layer count, estimated size)
Run surgery — a new file is created, the original is never modified
ForgeAI automatically updates config.json / GGUF metadata

Removing too many layers will significantly degrade model quality. Start with removing 1–2 layers and test output quality.

Troubleshooting

Issue	Solution
”Python not found”	Install Python 3.10+ and ensure it’s in PATH
CUDA out of memory	Switch to QLoRA, reduce batch size, or reduce max_seq_length
Slow training on GPU	Verify CUDA is available in Settings → Training Environment
Training loss not decreasing	Try lower learning rate, more epochs, or different preset
”No target modules found”	Model architecture may not support LoRA — try Full Fine-Tune
Surgery output has errors	Avoid removing embedding/output layers (first/last)

Getting Started

Modules

Guides

Training & Fine-Tuning

Training & Fine-Tuning Guide

Prerequisites

Hardware Requirements

Software Requirements

Choosing a Training Method

LoRA (Recommended for Most Users)

QLoRA (Best for Low VRAM)

SFT (Standard Training)

DPO (Preference Learning)

Full Fine-Tune

Capability-Targeted Training

How It Works

Available Capabilities

Preparing Datasets

Supported Templates

Dataset Tips

Training Presets

Layer Surgery

Remove Layers

Duplicate Layers

Surgery Process

Troubleshooting

Getting Started

Modules

Guides

​Training & Fine-Tuning Guide

​Prerequisites

​Hardware Requirements

​Software Requirements

​Choosing a Training Method

​LoRA (Recommended for Most Users)

​QLoRA (Best for Low VRAM)

​SFT (Standard Training)

​DPO (Preference Learning)

​Full Fine-Tune

​Capability-Targeted Training

​How It Works

​Available Capabilities

​Preparing Datasets

​Supported Templates

​Dataset Tips

​Training Presets

​Layer Surgery

​Remove Layers

​Duplicate Layers

​Surgery Process

​Troubleshooting

Training & Fine-Tuning Guide

Prerequisites

Hardware Requirements

Software Requirements

Choosing a Training Method

LoRA (Recommended for Most Users)

QLoRA (Best for Low VRAM)

SFT (Standard Training)

DPO (Preference Learning)

Full Fine-Tune

Capability-Targeted Training

How It Works

Available Capabilities

Preparing Datasets

Supported Templates

Dataset Tips

Training Presets

Layer Surgery

Remove Layers

Duplicate Layers

Surgery Process

Troubleshooting