Inputs & Outputs
The config-driven input/output contract
sft is configured entirely through config.yaml. It uses config.yaml rather than a separate inputs.yaml because the file describes a full training run profile, not just upstream inputs. Treat config.yaml as the source of truth before launching a run, and validate it with bash scripts/dryrun.sh.
Inputs
Read from config.yaml → runtime_info.input:
| Name | Description | Source |
|---|---|---|
source.type | Where the LF dataset comes from: harbor_job | hf_lf | local_lf | external |
source | Per-type fields: scaffold+job_dir (harbor), hf_hub_url+hf_subset+hf_split (hf), lf_path (local) | dependency / external |
conversion | max_instances, exclude_repos_file, data_name | external |
dataset | LLaMA-Factory dataset registration (auto-derived from data_name) | derived |
model | Base model path and trust_remote_code | external |
training | Stage, deepspeed, template, cutoff, batch size, lr, epochs, … | external |
infrastructure | n_gpus_per_node | external |
experiment | WandB mode and run name | external |
credentials | WandB API key, hf_token (private hf_lf datasets) | external |
When source.type is harbor_job (the default), source.job_dir is the handoff from trajgen, wired via meta_info.dependencies.trajectories_dir (trajgen.output.raw_trajectories_dir). For hf_lf / local_lf the input is a ready-made LF/ShareGPT dataset and conversion is skipped — see Input sources.
Active runtime values
Excerpted from config.yaml:
source:
type: harbor_job # harbor_job | hf_lf | local_lf
scaffold: claude-code # openhands-sdk | claude-code | open-code | terminus2
job_dir: ".../trajgen/artifacts/jobs/<job>"
# hf_lf: hf_hub_url / hf_subset / hf_split
# local_lf: lf_path
conversion:
max_instances: 64
exclude_repos_file: artifacts/data/excluded_repos.txt
data_name: "m_2.7_swerebenchv2-200-260429_cc_test"
model:
model_name_or_path: /mnt/public/models/Qwen3-8B
trust_remote_code: true
training:
stage: sft
finetuning_type: full
deepspeed: artifacts/training_config/deepspeed/ds_z3_config.json
template: qwen3_nothink
cutoff_len: 131072
rope_scaling: yarn
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 1.0e-4
num_train_epochs: 4.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
flash_attn: fa2
infrastructure:
n_gpus_per_node: 8
experiment:
wandb_mode: offlineSee Configuration for what each training field controls.
Outputs
Written to config.yaml → runtime_info.output:
| Output | Path / value | Consumer |
|---|---|---|
checkpoint_path | artifacts/model/<run>/checkpoint-<step>/ | rl, eval |
training_metrics | final_loss, train_runtime, total_steps | reporting |
artifacts | train_results.json, training_loss.png, console log | reporting |
training_curves | wandb run id, when available | reporting |
See Results & Artifacts for the full layout, and Config Variants for running experiments without disturbing the active profile.
Environment
The training environment is a uv venv at meta_info.environment.sft_uv (default artifacts/env/lf, Python 3.12). Recreate it with bash scripts/install_env.sh. It is excluded from the repo, so its contents are not version-controlled.