Motivation

sft turns agent trajectories into ready-to-train datasets and fine-tunes SWE coding models on them.

It is the supervised-fine-tuning stage of the SWE-Lego-Live pipeline, sitting between trajectory generation (trajgen) and reinforcement learning (rl). It consumes the good rollouts produced upstream, converts them into LLaMA-Factory ShareGPT data, and trains a model with LLaMA-Factory + DeepSpeed ZeRO-3 — all driven by a single config.yaml.

swegen ─▶ trajgen ─▶ sft ─▶ rl

sft provides:

One-config runs — source trajectories, conversion settings, dataset, model, hyperparameters, and infrastructure all live in config.yaml
Scaffold-aware conversion — Claude Code, OpenCode, OpenHands SDK, and Terminus-2 trajectories convert through the swe_data_process package
Quality scoring — rule-based scoring runs automatically; optional LLM scoring layers on top
Eval-leak protection — SWE-bench benchmark repos are filtered out of training data by default
Multi-GPU training — LLaMA-Factory through torchrun with DeepSpeed ZeRO-3 on a single 8-GPU node
A live web dashboard that reads the training output directly, with multi-run comparison and optional wandb

Where to go next

Getting Started — set up the environment and run your first training job
Core Concepts — trajectories, IM/LF formats, datasets, runs, and checkpoints
Data Pipeline — convert trajectories into LLaMA-Factory datasets
Training — launch the pipeline, tune the configuration, and read the outputs
Dashboard — monitor training runs in real time

Motivation

Where to go next

On this page