Scaffolds
Supported agent scaffolds and their converters
The agent scaffold that produced a trajectory determines its raw shape, which in turn selects the swe_data_process converter module used to turn it into IM data. Set the active scaffold in config.yaml → runtime_info.input.source.scaffold.
Converter matrix
| Scaffold | Converter module |
|---|---|
claude-code | swe_data_process.claudecode_opencode.convert_cc_to_im |
open-code | swe_data_process.claudecode_opencode.convert_oc_to_im |
openhands-sdk | swe_data_process.openhands.convert_openhands_sdk_to_im |
terminus2 | swe_data_process.terminus2.convert_terminus2_to_im |
Only these job-dir converters are wired. The refactored swe_data_process package removed the older source-specific converter scripts, so a scaffold outside this matrix has no conversion path.
IM format
Every converter emits the same intermediate (IM) shape — one JSONL row per trajectory:
{
"version": "2.0.0",
"meta_info": {
"unique_info": {
"_instance_id": "owner__repo-123",
"_agent_type": "main",
"_score": {"composite_score": 0.72, "...": "..."}
}
},
"tools": ["..."],
"messages": [
{
"role": "user/assistant/tool",
"content": "...",
"reasoning_content": "...",
"tool_calls": ["..."]
}
]
}_instance_id, _agent_type, and _score are stored under meta_info.unique_info on disk. The package's load_jsonl() expands them back to the legacy top-level shape for internal scoring and filtering helpers.
Reasoning content and templates
When a trajectory carries reasoning_content (chain-of-thought), train with the qwen3 chat template to keep it. The default qwen3_nothink template disables thinking mode and drops it. See Configuration.