Supported Formats
ForgeAI works with two primary model formats (GGUF and SafeTensors) and four dataset formats (JSON, JSONL, CSV, Parquet).Model Formats
GGUF
The GGUF (GPT-Generated Unified Format) is the standard format for llama.cpp and its ecosystem. Characteristics:- Single file containing weights, metadata, and tokenizer
- Supports quantized dtypes (Q2_K through Q8_0, plus F16/F32)
- Used by llama.cpp, Ollama, LM Studio, KoboldCpp
| Module | Support |
|---|---|
| Load | Single file |
| Inspect | Full analysis with capabilities |
| Compress | Quantize to any level |
| Hub | Download from HuggingFace |
| Convert | Output format |
| Training | Fine-tune and surgery |
| M-DNA | Parent and output |
| Test | Via llama.cpp |
SafeTensors
SafeTensors is HuggingFace’s format for storing model weights safely and efficiently. Characteristics:- Typically paired with
config.jsonand tokenizer files in a directory - Large models are sharded across multiple
.safetensorsfiles - Supports F16, BF16, F32 dtypes
- Used by HuggingFace Transformers, vLLM, ExLlamaV2, MLX
| Module | Support |
|---|---|
| Load | Single file or sharded folder |
| Inspect | Full analysis with capabilities |
| Compress | Not supported (convert to GGUF first) |
| Hub | Download from HuggingFace |
| Convert | Input format (→ GGUF) |
| Training | Fine-tune and surgery |
| M-DNA | Parent and output |
| Test | Via HuggingFace Transformers |
Folder Structure
SafeTensors models from HuggingFace typically have this structure:Model Format Comparison
| Feature | GGUF | SafeTensors |
|---|---|---|
| File count | Single file | Multiple files |
| Metadata | Embedded in file | Separate JSON files |
| Tokenizer | Embedded | Separate files |
| Quantization | Native (Q2–Q8) | F16/BF16/F32 only |
| Sharding | No | Yes |
| Ecosystem | llama.cpp | HuggingFace |
Dataset Formats
JSON
Array of objects in a single file.JSONL
One JSON object per line (JSON Lines format).CSV
Comma-separated values with a header row.Parquet
Apache Parquet columnar binary format. Most HuggingFace datasets use this format. ForgeAI reads Parquet natively in Rust using Apache Arrow — no Python required.Dataset Format Support
| Module | JSON | JSONL | CSV | Parquet |
|---|---|---|---|---|
| DataStudio | ✓ | ✓ | ✓ | ✓ |
| Training | ✓ | ✓ | ✓ | ✓ |
| HuggingFace | ✓ | ✓ | ✓ | ✓ |
Dataset Template Detection
ForgeAI auto-detects these common dataset templates:| Template | Key Columns |
|---|---|
| Alpaca | instruction, input, output |
| ShareGPT | conversations |
| ChatML | messages |
| DPO | prompt, chosen, rejected |
| Text | text |
| Prompt/Completion | prompt, completion |