Engineered for Reliability
Generating millions of tokens requires more than a simple script. KothaSet is built to handle the chaos of large-scale API consumption.
High-Concurrency Engine
Built on Go's goroutines, the parallel worker pool maximizes throughput while strictly respecting API rate limits. Generate datasets 10x faster than standard Python scripts.
Atomic Checkpointing
Never lose a token. KothaSet writes to disk atomically. Resume interrupted runs exactly where they left off.
Provider Agnostic
Native support for OpenAI, DeepSeek, vLLM, and Ollama.
Strict Schemas
Built-in validation for Instruction, Chat, and Preference datasets.
Define your data.
Let KothaSet do the rest.
Configure
Set up your teacher model (e.g., GPT-4o) and output schema in a simple YAML file.
Seed
Provide a list of topics or seed prompts to ensure diversity and coverage across your domain.
Generate
Run the CLI. KothaSet handles retries, rate limits, and validation automatically.
Ready to build your dataset?
Open source, free to use, and ready for your next fine-tuning project.