Motivation
Why we built sft
sft turns agent trajectories into ready-to-train datasets and fine-tunes SWE coding models on them.
It is the supervised-fine-tuning stage of the SWE-Lego-Live pipeline, sitting between trajectory generation (trajgen) and reinforcement learning (rl). It consumes the good rollouts produced upstream, converts them into LLaMA-Factory ShareGPT data, and trains a model with LLaMA-Factory + DeepSpeed ZeRO-3 — all driven by a single config.yaml.
swegen ─▶ trajgen ─▶ sft ─▶ rlsft provides:
- One-config runs — source trajectories, conversion settings, dataset, model, hyperparameters, and infrastructure all live in
config.yaml - Scaffold-aware conversion — Claude Code, OpenCode, OpenHands SDK, and Terminus-2 trajectories convert through the
swe_data_processpackage - Quality scoring — rule-based scoring runs automatically; optional LLM scoring layers on top
- Eval-leak protection — SWE-bench benchmark repos are filtered out of training data by default
- Multi-GPU training — LLaMA-Factory through torchrun with DeepSpeed ZeRO-3 on a single 8-GPU node
- A live web dashboard that reads the training output directly, with multi-run comparison and optional wandb
Where to go next
- Getting Started — set up the environment and run your first training job
- Core Concepts — trajectories, IM/LF formats, datasets, runs, and checkpoints
- Data Pipeline — convert trajectories into LLaMA-Factory datasets
- Training — launch the pipeline, tune the configuration, and read the outputs
- Dashboard — monitor training runs in real time