sft

Reference

Inputs & Outputs

The config-driven input/output contract

sft is configured entirely through config.yaml. It uses config.yaml rather than a separate inputs.yaml because the file describes a full training run profile, not just upstream inputs. Treat config.yaml as the source of truth before launching a run, and validate it with bash scripts/dryrun.sh.

Inputs

Read from config.yamlruntime_info.input:

NameDescriptionSource
source.typeWhere the LF dataset comes from: harbor_job | hf_lf | local_lfexternal
sourcePer-type fields: scaffold+job_dir (harbor), hf_hub_url+hf_subset+hf_split (hf), lf_path (local)dependency / external
conversionmax_instances, exclude_repos_file, data_nameexternal
datasetLLaMA-Factory dataset registration (auto-derived from data_name)derived
modelBase model path and trust_remote_codeexternal
trainingStage, deepspeed, template, cutoff, batch size, lr, epochs, …external
infrastructuren_gpus_per_nodeexternal
experimentWandB mode and run nameexternal
credentialsWandB API key, hf_token (private hf_lf datasets)external

When source.type is harbor_job (the default), source.job_dir is the handoff from trajgen, wired via meta_info.dependencies.trajectories_dir (trajgen.output.raw_trajectories_dir). For hf_lf / local_lf the input is a ready-made LF/ShareGPT dataset and conversion is skipped — see Input sources.

Active runtime values

Excerpted from config.yaml:

source:
  type: harbor_job             # harbor_job | hf_lf | local_lf
  scaffold: claude-code        # openhands-sdk | claude-code | open-code | terminus2
  job_dir: ".../trajgen/artifacts/jobs/<job>"
  # hf_lf:    hf_hub_url / hf_subset / hf_split
  # local_lf: lf_path
conversion:
  max_instances: 64
  exclude_repos_file: artifacts/data/excluded_repos.txt
  data_name: "m_2.7_swerebenchv2-200-260429_cc_test"
model:
  model_name_or_path: /mnt/public/models/Qwen3-8B
  trust_remote_code: true
training:
  stage: sft
  finetuning_type: full
  deepspeed: artifacts/training_config/deepspeed/ds_z3_config.json
  template: qwen3_nothink
  cutoff_len: 131072
  rope_scaling: yarn
  per_device_train_batch_size: 1
  gradient_accumulation_steps: 8
  learning_rate: 1.0e-4
  num_train_epochs: 4.0
  lr_scheduler_type: cosine
  warmup_ratio: 0.1
  bf16: true
  flash_attn: fa2
infrastructure:
  n_gpus_per_node: 8
experiment:
  wandb_mode: offline

See Configuration for what each training field controls.

Outputs

Written to config.yamlruntime_info.output:

OutputPath / valueConsumer
checkpoint_pathartifacts/model/<run>/checkpoint-<step>/rl, eval
training_metricsfinal_loss, train_runtime, total_stepsreporting
artifactstrain_results.json, training_loss.png, console logreporting
training_curveswandb run id, when availablereporting

See Results & Artifacts for the full layout, and Config Variants for running experiments without disturbing the active profile.

Environment

The training environment is a uv venv at meta_info.environment.sft_uv (default artifacts/env/lf, Python 3.12). Recreate it with bash scripts/install_env.sh. It is excluded from the repo, so its contents are not version-controlled.

On this page