Claude Code’s complete action trajectories are now available as the MCCT-1K dataset on Hugging Face, containing 1,017 full traces. Researcher choucsan created this dataset to record coding agent tool usage, file edits, and reasoning processes in JSONL format. It serves as primary source material for Code LLM imitation learning and trajectory analysis aimed at researchers and developers.

📑Table of Contents
  1. What is a Claude Code Trajectory Dataset
  2. Main Statistics and Scale of MCCT-1K
  3. Tool Usage Breakdown and Category Distribution
  4. Dataset Structure and Usage
  5. Usage Examples for Code LLM Developers and Researchers
  6. Download and License Information
  7. Differences from Similar Datasets
  8. Frequently Asked Questions
  9. Summary

What is a Claude Code Trajectory Dataset

Claude Code is Anthropic’s autonomous coding agent. When users give natural language instructions, it repeatedly reads and writes files, executes shell commands, and edits code until the task is complete. MCCT-1K captures these sequences as “trajectories.” Each trajectory includes the user’s task, multi-turn messages, tool calls and results, and reasoning content.

Traditional code generation datasets only record the final code, but MCCT-1K includes the intermediate steps. This allows analysis of how the agent selects tools and recovers from failures. It is freely available on the official Hugging Face page at https://huggingface.co/datasets/choucsan/mimo-claude-code-traces-1k.


Main Statistics and Scale of MCCT-1K

MCCT-1K consists of 1,017 traces in total. There are 15,046 total event lines and 11,995 conversation messages. Assistant tool calls number 5,271, with an equal number of tool result messages. 859 traces include tool usage, and all 1,017 traces contain reasoning fields.

The total recorded turns are 4,932, with approximately 20.5 hours of recording time. The total API cost is $163.89, and logged tokens amount to about 127.2 million (8M input, 117M cache reads, 1.9M output). The model used for generation was mimo-v2.5-pro, consuming roughly 400M tokens.

These figures are from the upload around June 21, 2026, on Hugging Face. Source: Hugging Face Datasets page (as of June 2026).


Tool Usage Breakdown and Category Distribution

Looking at tool usage, Bash commands were called 1,805 times, followed by Read (1,480), Write (919), Glob (381), Edit (339), and Grep (163). Adding other tools such as Agent and TodoWrite brings the total to 5,271 tool calls.

Category distribution shows code_generation with 213 traces, algorithms with 157, debugging with 162, and refactoring with 126. shell_devops has 70, math_problems 76, supplement 75, data_processing 58, hf_trace 57, and api_integration 23.

This distribution indicates emphasis on code generation, debugging, and algorithm problems. The table below summarizes it:

Category Traces
code_generation 213
algorithms 157
debugging 162
refactoring 126
shell_devops 70
math_problems 76
supplement 75
data_processing 58
hf_trace 57
api_integration 23

Source: Hugging Face Datasets (https://huggingface.co/datasets/choucsan/mimo-claude-code-traces-1k, June 2026)


Dataset Structure and Usage

The dataset has category subdirectories such as algorithms/, code_generation/, and debugging/ under session/. Each .jsonl file corresponds to one trace containing user task, multi-turn message trace, tool schemas, reasoning, tool calls/outputs, and metadata.

Users can leverage it for code-agent distillation, SFT, trajectory modeling, and tool-use research. The JSONL format makes it easy to load with Python’s jsonlines library. Direct download is available from the official page.


Usage Examples for Code LLM Developers and Researchers

Code LLM developers can use this dataset to learn agent behavior patterns. For example, analyzing tool call sequences and recovery strategies from failures helps build more robust agents. Researchers can use the reasoning fields to quantitatively evaluate how LLMs decompose problems.

Practical examples include imitation learning from 5,271 tool calls or time-efficiency analysis based on 20.5 hours of recordings. Developers can compare their own agents to see which tools are overused or in which categories they struggle.


Download and License Information

MCCT-1K is published on Hugging Face Datasets and can be downloaded for free. Check the license on the dataset page. The release was also announced on X at https://x.com/choucisa/status/2069997970670727205.

It is designed primarily for research and non-commercial use. Commercial use should follow the license terms on the page.


Differences from Similar Datasets

Existing code-related datasets mainly record final code or single-shot generation results. In contrast, MCCT-1K provides complete trajectories of an actual Claude Code agent operating over multiple turns, using tools, editing files, and iterating on reasoning.

This differentiates it as dynamic agent behavior data rather than static code snippets. Having both detailed tool usage logs and reasoning fields is a key strength.


Frequently Asked Questions

Q: Can MCCT-1K be used commercially?

Check the license on the Hugging Face dataset page. It is designed for research purposes, but always read the terms.

Q: What is the size of the dataset?

1,017 JSONL files, 15,046 total event lines, and approximately 20.5 hours of recording time.

Q: How can I load and use it?

Use Python’s jsonlines or pandas to read each .jsonl and analyze tool calls and reasoning fields.

Q: How does it differ from other Claude Code datasets?

It uniquely provides complete multi-turn trajectories along with tool usage logs and reasoning processes.

Q: Which model was used for generation?

mimo-v2.5-pro (MiMo 1.02T parameter MoE model).

Q: How much did it cost to generate?

Total API cost was $163.89 with approximately 127.2 million logged tokens.


Related articles:

Summary

MCCT-1K is a dataset of 1,017 trajectories that detail Claude Code’s actions. By leveraging the tool usage breakdown, category distribution, and complete multi-turn logs, it can accelerate Code LLM research and agent development. Download it now from the official Hugging Face page and apply it to your own projects.

Sources: Hugging Face (https://huggingface.co/datasets/choucsan/mimo-claude-code-traces-1k), X post (https://x.com/choucisa/status/2069997970670727205)

krona23

Author

krona23

Over 20 years in the IT industry, serving as Division Head and CTO at multiple companies running large-scale web services in Japan. Experienced across Windows, iOS, Android, and web development. Currently focused on AI-native transformation. At DevGENT, sharing practical guides on AI code editors, automation tools, and LLMs in three languages.

DevGENT about →

Leave a Reply

Trending

Discover more from DevGENT

Subscribe now to keep reading and get access to the full archive.

Continue reading