Skip to main content

Merge Methods

M-DNA Forge supports 12 merge strategies, each suited to different scenarios. Methods are organized by difficulty level.

Easy Methods

SLERP

Spherical Linear Interpolation — the recommended method for merging 2 models.
ParameterRangeDefaultEffect
t0–10.5Interpolation factor. 0 = 100% model A, 1 = 100% model B
SLERP interpolates on the hypersphere rather than linearly, which preserves the magnitude of weight vectors and typically produces more coherent results than simple averaging. Best for: 2 models with similar architecture and training data.

Average

Weighted Mean — simplest merge, averages all tensors by weight.
AVERAGE on dissimilar models often produces incoherent output. Use SLERP or Frankenmerge instead.
Weights are normalized to sum to 1.0 automatically. Best for: Very similar models (e.g., same model at different training checkpoints).

Passthrough

Direct Copy — copies tensors directly from a single parent. Useful as a starting point or for simple layer stacking. Best for: Creating deeper models or debugging merge pipelines.

Intermediate Methods

Task Arithmetic

Task Vector Addition — requires a base model. Computes task vectors (finetune - base) for each parent, scales them, and adds them back to the base.
ParameterRangeDefaultEffect
scaling0–21.0How strongly task vectors are applied
Best for: Combining multiple finetunes of the same base model.

Frankenmerge

Layer Cherry-Picking — select which parent provides each output layer. Requires explicit layer assignments (set in the Layers tab). Each offspring layer is a direct copy from a chosen parent’s layer. Auto-assign options:
  • SPLIT: Alternating (A, B, A, B…)
  • INTERLEAVE: Equal chunks
Best for: Combining strengths from different layers (e.g., reasoning layers from one model + language layers from another).

DARE

Drop And REscale — prunes delta parameters before merging. Requires a base model.
ParameterRangeDefaultEffect
density0–10.5Fraction of delta parameters to keep
Randomly drops delta parameters and rescales the remainder to maintain expected magnitude. Best for: Efficient merging with sparsity-based parameter selection.

TIES

Trim, Elect Sign, Merge — resolves interference between task vectors. Requires a base model.
ParameterRangeDefaultEffect
trim_threshold0–10.2Fraction of small-magnitude parameters to trim
Trims small-magnitude parameters, resolves sign conflicts by majority vote, then merges. Produces cleaner results than Task Arithmetic. Best for: Base + multiple finetunes where task vectors conflict.

Advanced Methods

DeLLa

Density-based Layer-Level Adaptive — adapts merge density per layer. Requires a base model.
ParameterRangeDefaultEffect
lambda0–10.5Lambda interpolation factor
della_density0–10.5Base density, adapted per layer
Uses layer-level analysis to set different densities for different layers, giving more weight to layers that differ more. Best for: Advanced merges where uniform density across layers is suboptimal.

Component Merge

Per-Component Routing — route attention, MLP, and norm components to different parents. Instead of merging at the tensor level, assign each component type (attention projections, MLP layers, normalization) to a specific parent. Best for: When different parents excel at different architectural components.

Tensor Surgery

Per-Tensor Source Mapping — individually assign each tensor to a parent. Maximum control: specify exactly which parent provides each tensor in the output model. Best for: Expert users who need fine-grained control over every tensor.

Parameter Slice

Dimensional Slicing — slice tensors along specific dimensions across parents.
ParameterRangeDefaultEffect
slice_dim0+0Dimension to slice along
Takes slices of tensor dimensions from different parents to create hybrid tensors. Best for: Experimental merges with dimensional composition.

MoE Conversion

Mixture-of-Experts — convert dense models to MoE architecture.
ParameterRangeDefaultEffect
num_experts2–84Number of expert models
experts_per_token1–42Experts active per token
Uses parent models as expert networks and creates a gating mechanism. Best for: Creating sparse models that route tokens to specialized experts.

Quick Reference

ScenarioRecommended Method
2 similar modelsSLERP (t=0.5)
Base + 1 finetuneTIES or DARE
Base + multiple finetunesTask Arithmetic
Cherry-pick specific layersFrankenmerge
Create deeper modelPassthrough
Route components separatelyComponent Merge
Create MoE from denseMoE Conversion
Different-sized modelsAny method + Interpolation strategy
Quick experimentAverage (with caution)

Cross-Dimension Resolution Strategies

When merging models with different hidden dimensions or vocab sizes, ForgeAI applies a resolution strategy to adapt tensor shapes before running the merge method.
StrategyHow It WorksOutput SizeQuality
InterpolationLinear (1D) or nearest-neighbor (2D) resize to target shapeMatches first parentMedium
Zero PaddingPads smaller tensors with zeros to match largest parentMatches largest parentMedium
TruncationNarrows larger tensors to match smallest parentMatches smallest parentLow
Interpolation is auto-selected when a dimension mismatch is detected. You can change the strategy in the compatibility panel before merging.
Cross-dimension merging is experimental. Results depend heavily on how similar the model architectures are despite different sizes. Test output quality carefully.

Quick Presets

ForgeAI includes 5 one-click presets:
PresetMethodParamsUse Case
Quick BlendAverageFast simple merge
Smooth MergeSLERPt=0.5Balanced 2-model blend
Task TunerTask Arithmeticscaling=1.0Add finetune capabilities
Sparse MixDAREdensity=0.5Efficient sparse merge
ConsensusTIEStrim=0.2Resolve task conflicts