ortim.dev
Open source · FSL-1.1 · Python ≥ 3.11

Ship AI-written code you can actually audit.

Ortim is a disciplined, multi-agent AI software factory. Turn a one-paragraph brief into working, reviewed, audit-trailed code — without surrendering control to the LLM.

$pip install ortim

Why this exists

Bare-LLM coding fails the same way every time

Ortim treats AI coding as a production-engineering problem, not a prompting problem.

  • It re-implements code you already have because it forgot.
  • It tries a fix, the fix breaks something else, and three turns later the original intent is gone.
  • It quietly invents library names. It silently skips tests. It says it ran the migration when it didn’t.
  • It picks microservices for a CRUD app, then asks you to approve the same decision four times in one session.
  • You can’t audit any of it — there’s no record of why a choice was made, only the final diff.

How it works

A deterministic pipeline, not a chat loop

A state machine drives every run. Two human gates stand between the brief and the code.

  1. 1step

    brief

    one paragraph, any language

  2. 2step

    PRD

    Babel + Analyst — G1 human gate

  3. 3step

    RFC

    Architect + deterministic tier — G2 human gate

  4. 4step

    task DAG

    Orchestrator — scope-locked, RFC-validated

  5. 5step

    reviewed code

    Worker × N + Reviewer chain

The LLM never picks a tier, and DAGs are runtime-validated — cycles, missing dependencies, or off-RFC modules retry up to 3× then escalate to a human.

What you get

Structural fixes, not better prompts

The LLM never picks the architecture

An agent extracts characteristics from the PRD; a rule-based scorer over 12 canonical tiers (web / mobile / desktop) makes the actual choice. Same inputs, same architecture, every time.

Hash-chained audit log

Every LLM call, state transition, gate decision, and hook output lands in a tamper-evident JSONL. `ortim audit-verify` detects any edit after the fact.

Two mandatory human gates

G1 (PRD) and G2 (RFC) require explicit approval — the LLM never silently commits to a scope or an architecture. Five conditional gates fire only when relevant.

Reviewer chain that can’t fake a pass

Code → Security → Test → Perf. A missing test runner trips a distinct “unverifiable” mode — it can never be laundered into a false approval.

Scope-locked, sandboxed tasks

Each task carries a module_scope; the sandbox rejects writes outside it. A 3-attempt budget with reviewer feedback escalates to a human instead of spinning forever.

Multi-provider routing

DeepSeek for the cheap bulk, Anthropic where judgement matters, Ollama for zero API cost. A full planning chain costs $0.02–$0.10.

Ortim Cloud

Sync runs, audit trails, and team seats to a shared workspace. The CLI stays the source of truth; the cloud adds collaboration and retention for teams who need to answer “who approved this, and why?”

Open Ortim Cloud

Not an IDE

If you love Cursor or Claude Code for interactive work, Ortim is the opposite end: batch, gated, auditable. Works on greenfield and existing codebases — brownfield mode scans the import graph. 800+ tests. The bet is that teams will trade some interactivity for governance.

Read the concepts

From brief to reviewed code in one command flow

$pip install ortim