Merge Methods
M-DNA Forge supports 12 merge strategies, each suited to different scenarios. Methods are organized by difficulty level.
Easy Methods
SLERP
Spherical Linear Interpolation — the recommended method for merging 2 models.
| Parameter | Range | Default | Effect |
|---|
| t | 0–1 | 0.5 | Interpolation factor. 0 = 100% model A, 1 = 100% model B |
SLERP interpolates on the hypersphere rather than linearly, which preserves the magnitude of weight vectors and typically produces more coherent results than simple averaging.
Best for: 2 models with similar architecture and training data.
Average
Weighted Mean — simplest merge, averages all tensors by weight.
AVERAGE on dissimilar models often produces incoherent output. Use SLERP or Frankenmerge instead.
Weights are normalized to sum to 1.0 automatically.
Best for: Very similar models (e.g., same model at different training checkpoints).
Passthrough
Direct Copy — copies tensors directly from a single parent.
Useful as a starting point or for simple layer stacking.
Best for: Creating deeper models or debugging merge pipelines.
Task Arithmetic
Task Vector Addition — requires a base model.
Computes task vectors (finetune - base) for each parent, scales them, and adds them back to the base.
| Parameter | Range | Default | Effect |
|---|
| scaling | 0–2 | 1.0 | How strongly task vectors are applied |
Best for: Combining multiple finetunes of the same base model.
Frankenmerge
Layer Cherry-Picking — select which parent provides each output layer.
Requires explicit layer assignments (set in the Layers tab). Each offspring layer is a direct copy from a chosen parent’s layer.
Auto-assign options:
- SPLIT: Alternating (A, B, A, B…)
- INTERLEAVE: Equal chunks
Best for: Combining strengths from different layers (e.g., reasoning layers from one model + language layers from another).
DARE
Drop And REscale — prunes delta parameters before merging. Requires a base model.
| Parameter | Range | Default | Effect |
|---|
| density | 0–1 | 0.5 | Fraction of delta parameters to keep |
Randomly drops delta parameters and rescales the remainder to maintain expected magnitude.
Best for: Efficient merging with sparsity-based parameter selection.
TIES
Trim, Elect Sign, Merge — resolves interference between task vectors. Requires a base model.
| Parameter | Range | Default | Effect |
|---|
| trim_threshold | 0–1 | 0.2 | Fraction of small-magnitude parameters to trim |
Trims small-magnitude parameters, resolves sign conflicts by majority vote, then merges. Produces cleaner results than Task Arithmetic.
Best for: Base + multiple finetunes where task vectors conflict.
Advanced Methods
DeLLa
Density-based Layer-Level Adaptive — adapts merge density per layer. Requires a base model.
| Parameter | Range | Default | Effect |
|---|
| lambda | 0–1 | 0.5 | Lambda interpolation factor |
| della_density | 0–1 | 0.5 | Base density, adapted per layer |
Uses layer-level analysis to set different densities for different layers, giving more weight to layers that differ more.
Best for: Advanced merges where uniform density across layers is suboptimal.
Component Merge
Per-Component Routing — route attention, MLP, and norm components to different parents.
Instead of merging at the tensor level, assign each component type (attention projections, MLP layers, normalization) to a specific parent.
Best for: When different parents excel at different architectural components.
Tensor Surgery
Per-Tensor Source Mapping — individually assign each tensor to a parent.
Maximum control: specify exactly which parent provides each tensor in the output model.
Best for: Expert users who need fine-grained control over every tensor.
Parameter Slice
Dimensional Slicing — slice tensors along specific dimensions across parents.
| Parameter | Range | Default | Effect |
|---|
| slice_dim | 0+ | 0 | Dimension to slice along |
Takes slices of tensor dimensions from different parents to create hybrid tensors.
Best for: Experimental merges with dimensional composition.
MoE Conversion
Mixture-of-Experts — convert dense models to MoE architecture.
| Parameter | Range | Default | Effect |
|---|
| num_experts | 2–8 | 4 | Number of expert models |
| experts_per_token | 1–4 | 2 | Experts active per token |
Uses parent models as expert networks and creates a gating mechanism.
Best for: Creating sparse models that route tokens to specialized experts.
Quick Reference
| Scenario | Recommended Method |
|---|
| 2 similar models | SLERP (t=0.5) |
| Base + 1 finetune | TIES or DARE |
| Base + multiple finetunes | Task Arithmetic |
| Cherry-pick specific layers | Frankenmerge |
| Create deeper model | Passthrough |
| Route components separately | Component Merge |
| Create MoE from dense | MoE Conversion |
| Different-sized models | Any method + Interpolation strategy |
| Quick experiment | Average (with caution) |
Cross-Dimension Resolution Strategies
When merging models with different hidden dimensions or vocab sizes, ForgeAI applies a resolution strategy to adapt tensor shapes before running the merge method.
| Strategy | How It Works | Output Size | Quality |
|---|
| Interpolation | Linear (1D) or nearest-neighbor (2D) resize to target shape | Matches first parent | Medium |
| Zero Padding | Pads smaller tensors with zeros to match largest parent | Matches largest parent | Medium |
| Truncation | Narrows larger tensors to match smallest parent | Matches smallest parent | Low |
Interpolation is auto-selected when a dimension mismatch is detected. You can change the strategy in the compatibility panel before merging.
Cross-dimension merging is experimental. Results depend heavily on how similar the model architectures are despite different sizes. Test output quality carefully.
Quick Presets
ForgeAI includes 5 one-click presets:
| Preset | Method | Params | Use Case |
|---|
| Quick Blend | Average | — | Fast simple merge |
| Smooth Merge | SLERP | t=0.5 | Balanced 2-model blend |
| Task Tuner | Task Arithmetic | scaling=1.0 | Add finetune capabilities |
| Sparse Mix | DARE | density=0.5 | Efficient sparse merge |
| Consensus | TIES | trim=0.2 | Resolve task conflicts |