What we're imagining.
Three artifacts compose the Kōzu lab — fine-tuned models, the apparatus that trains them, and the datasets that shape them.
Intelligence, distilled.
Deimos A4
A 4.66B reasoning specialist built on Qwen3.5-4B. Internal terse, concise chain-of-thought yields ~60% fewer tokens, ~36% faster inference, and +40 pt avg accuracy on hard math vs base. Distilled from 4,338 shortest-correct traces curated from Deimos-A1.
Europa
Our upcoming medium sized model, designed to explore how intelligence scales per parameter, while further refining the experimental reasoning techniques from Deimos.
Ganymede
Our upcoming flagship model — scaling the techniques we've refined into a model tested and validated for real-world scenarios where reasoning efficiency is key.
Instruments of the craft.
Hadron
An LLM distillation framework built around NousResearch's AutoReason tournament refinement. A single teacher answers, critiques, adversarially revises, synthesizes, and blind-Borda ranks itself until "do nothing" wins twice. Distillation labels measurably beat the teacher's own single-shot output. Full reasoning traces per role — drop them straight into process-supervision fine-tuning.
Tokamak
Extracts reasoning traces from LLM conversations and compresses them into a super token-efficient stream of internal chain-of-thought and concise outputs. Built for generating tight, high-signal training data from long, branching dialogues — without losing the reasoning that got you there.
Stellarator
A control plane for fine-tuning and reinforcement-learning workloads on Tinker. Sandbox runs feed a structured pre-flight gate before promotion to scale, with cost projections, budgets, and live alert streams threaded through every step. A Rust supervisor handles per-job polling and websocket fan-out; an integrated research subsystem cites HF papers, arXiv, and code examples per run.
Signal, isolated.
Quark
Our first dataset — built for concise chain-of-thought reasoning (CCoT) and token efficiency. Packs additional reasoning steps inside the same output footprint, so models think further per token instead of spending tokens to think.
Lepton
Theoretical: a drafter dataset tuned for speculative decoding. Tiny predictive models get fast enough to meaningfully accelerate large reasoning LLMs.
Muon
Theoretical: an ephemeral-context dataset focused on fast prompt routing and activation speed in sparse Mixture-of-Experts architectures.
Baryon
Theoretical: a composite reinforcement-learning dataset rich in agentic chain-of-thought structures — reasoning lift without expanding parameter count.
Neutrino
Theoretical: a pruning-aware sparsification dataset. Trains models to hold accuracy while safely dropping inactive weights.
Boson · Gluon · Photon
Names reserved for datasets that haven't earned their scope yet. When one does, it gets promoted out of this cell.