Skip to main content

Documentation Index

Fetch the complete documentation index at: https://forge-64364c0e.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

DataStudio (10)

DataStudio is ForgeAI’s dataset explorer. Load datasets from local files or HuggingFace, analyze column structure, detect templates, and preview data — all powered by native Rust parsing (including Parquet via Apache Arrow). DataStudio

Source Modes

DataStudio has two source modes, toggled via the source bar:
Browse and load dataset files from your local disk.
  1. Click BROWSE FILE
  2. Select a JSON, JSONL, CSV, or Parquet file
  3. Dataset loads automatically with metadata, column analysis, and data preview

Supported Formats

FormatParserNotes
JSONRust serde_jsonArray of objects
JSONLRust serde_jsonOne JSON object per line
CSVRust CSV readerComma-separated with headers
ParquetApache Arrow + ParquetColumnar binary format (most HF datasets)

Dataset Metadata

After loading, a metadata panel shows:
FieldDescription
PATHFull file path
FORMATDetected format (JSON/JSONL/CSV/PARQUET)
ROWSTotal row count
SIZEFile size
COLUMNSNumber of columns
TEMPLATEAuto-detected template (if applicable)

Template Detection

ForgeAI auto-detects common dataset templates:
TemplateDescriptionKey Columns
AlpacaStanford Alpaca formatinstruction, input, output
ShareGPTMulti-turn conversationsconversations
ChatMLChat markup languagemessages
DPODirect Preference Optimizationprompt, chosen, rejected
TextPlain texttext
Prompt/CompletionOpenAI formatprompt, completion

Column Analysis

Each column is analyzed and displayed:
MetricDescription
NameColumn name
DtypeData type (STRING, INTEGER, FLOAT, OBJECT, NULL)
ValidCount of non-null values
NullCount of null/empty values (highlighted if > 0)
Avg LengthAverage string length (for string columns)

Data Preview

A scrollable table showing the first rows of the dataset. Long cell values are truncated with ellipsis for readability.

Workflow

1

Choose source

Toggle between LOCAL and HUGGINGFACE mode
2

Load dataset

Browse a local file or fetch + download from HuggingFace
3

Review analysis

Check metadata, template detection, and column analysis
4

Use in training

The dataset path can be used directly in the Training module