sft

Motivation

Why we built sft

sft turns agent trajectories into ready-to-train datasets and fine-tunes SWE coding models on them.

It is the supervised-fine-tuning stage of the SWE-Lego-Live pipeline, sitting between trajectory generation (trajgen) and reinforcement learning (rl). It consumes the good rollouts produced upstream, converts them into LLaMA-Factory ShareGPT data, and trains a model with LLaMA-Factory + DeepSpeed ZeRO-3 — all driven by a single config.yaml.

swegen ─▶ trajgen ─▶ sft ─▶ rl

sft provides:

  • One-config runs — source trajectories, conversion settings, dataset, model, hyperparameters, and infrastructure all live in config.yaml
  • Scaffold-aware conversion — Claude Code, OpenCode, OpenHands SDK, and Terminus-2 trajectories convert through the swe_data_process package
  • Quality scoring — rule-based scoring runs automatically; optional LLM scoring layers on top
  • Eval-leak protection — SWE-bench benchmark repos are filtered out of training data by default
  • Multi-GPU training — LLaMA-Factory through torchrun with DeepSpeed ZeRO-3 on a single 8-GPU node
  • A live web dashboard that reads the training output directly, with multi-run comparison and optional wandb

Where to go next

  • Getting Started — set up the environment and run your first training job
  • Core Concepts — trajectories, IM/LF formats, datasets, runs, and checkpoints
  • Data Pipeline — convert trajectories into LLaMA-Factory datasets
  • Training — launch the pipeline, tune the configuration, and read the outputs
  • Dashboard — monitor training runs in real time

On this page