Ship AI-written code you can actually audit.
Ortim is a disciplined, multi-agent AI software factory. Turn a one-paragraph brief into working, reviewed, audit-trailed code — without surrendering control to the LLM.
Why this exists
Bare-LLM coding fails the same way every time
Ortim treats AI coding as a production-engineering problem, not a prompting problem.
- It re-implements code you already have because it forgot.
- It tries a fix, the fix breaks something else, and three turns later the original intent is gone.
- It quietly invents library names. It silently skips tests. It says it ran the migration when it didn’t.
- It picks microservices for a CRUD app, then asks you to approve the same decision four times in one session.
- You can’t audit any of it — there’s no record of why a choice was made, only the final diff.
How it works
A deterministic pipeline, not a chat loop
A state machine drives every run. Two human gates stand between the brief and the code.
- 1step
brief
one paragraph, any language
- 2step
PRD
Babel + Analyst — G1 human gate
- 3step
RFC
Architect + deterministic tier — G2 human gate
- 4step
task DAG
Orchestrator — scope-locked, RFC-validated
- 5step
reviewed code
Worker × N + Reviewer chain
The LLM never picks a tier, and DAGs are runtime-validated — cycles, missing dependencies, or off-RFC modules retry up to 3× then escalate to a human.
What you get
Structural fixes, not better prompts
The LLM never picks the architecture
An agent extracts characteristics from the PRD; a rule-based scorer over 12 canonical tiers (web / mobile / desktop) makes the actual choice. Same inputs, same architecture, every time.
Hash-chained audit log
Every LLM call, state transition, gate decision, and hook output lands in a tamper-evident JSONL. `ortim audit-verify` detects any edit after the fact.
Two mandatory human gates
G1 (PRD) and G2 (RFC) require explicit approval — the LLM never silently commits to a scope or an architecture. Five conditional gates fire only when relevant.
Reviewer chain that can’t fake a pass
Code → Security → Test → Perf. A missing test runner trips a distinct “unverifiable” mode — it can never be laundered into a false approval.
Scope-locked, sandboxed tasks
Each task carries a module_scope; the sandbox rejects writes outside it. A 3-attempt budget with reviewer feedback escalates to a human instead of spinning forever.
Multi-provider routing
DeepSeek for the cheap bulk, Anthropic where judgement matters, Ollama for zero API cost. A full planning chain costs $0.02–$0.10.
Ortim Cloud
Sync runs, audit trails, and team seats to a shared workspace. The CLI stays the source of truth; the cloud adds collaboration and retention for teams who need to answer “who approved this, and why?”
Open Ortim CloudNot an IDE
If you love Cursor or Claude Code for interactive work, Ortim is the opposite end: batch, gated, auditable. Works on greenfield and existing codebases — brownfield mode scans the import graph. 800+ tests. The bet is that teams will trade some interactivity for governance.
Read the concepts