Concepts

Ortim turns an unreliable LLM into a reviewable pipeline with a handful of structural guarantees. This page explains each one.

The pipeline

[brief]
   ↓  Babel        (any language → structured intent, token-frugal)
intent.json
   ↓  Analyst chain (IntentAnalyst + StackAnalyst + PRDAnalyst)
PRD.md + stack.json
   ↓  G1 — human approval (mandatory)
   ↓  Architect    (Call 1: scorer inputs; Call 2: RFC with module breakdown)
RFC.md + golden_path_inputs.json
   ↓  G2 — human approval (mandatory)
   ↓  Orchestrator (TaskDAG; Hard Rule 13: DAG ⊂ RFC modules)
task_dag.json + .ortim/tasks/T-NNN.md
   ↓  Worker × N   (FILE_BLOCK output, git branch per task)
   ↓  Reviewer chain (Code → Security → Test → Perf)
   ↓  Hooks        (pre_commit / pre_deploy)
DONE

The state machine

A deterministic state machine drives every run — not a free-form chat loop:

intake → babel → intake_dialog → stack_dialog → prd_dialog → prd_drafting
       → prd_awaiting_approval → prd_approved
                 ↑ G1 (mandatory)
       → rfc_drafting → rfc_awaiting_approval → rfc_approved
                              ↑ G2 (mandatory)
       → tasks_generating → tasks_ready → executing → done

Human gates

Two gates are mandatory:

G1 — PRD. Locks scope (MVP vs deferred) before any architecture work.
G2 — RFC. Locks the architecture before any code is written.

Five conditional gates fire only when relevant, each pausing the task in AWAITING_HITL until you run ortim advance <state>_approved:

| Gate | Fires when | |---|---| | G3 | A schema / migration is involved | | G4 | An external API call is introduced | | G5 | A security finding of severity ≥ medium | | G6 | A deploy step | | G7 | A budget cap is reached |

This kills approval fatigue: you approve scope and architecture once, then only see a prompt again when something genuinely warrants a human.

The LLM never picks the architecture

This is the core invariant. The Architect agent does not choose a tier. Instead:

Architect Call 1 emits parameters extracted from the PRD.
A rule-based scorer (ortim/architecture/golden_paths.py) picks the architecture across 12 canonical tiers — T0–T6 (web), M0–M2 (mobile), D0–D1 (desktop).

Same inputs produce the same architecture, every time. No microservices for a CRUD app.

Scope-locked, runtime-validated DAGs

The Orchestrator emits a task DAG constrained to the RFC's modules (Hard Rule 13: DAG ⊂ RFC modules). DAGs are validated at runtime — if the LLM emits a cycle, a missing dependency, or an off-RFC module, validators retry up to 3×, then escalate to a human instead of producing broken work.

Each task carries a module_scope; the sandbox rejects writes outside it. Cross-module use happens through imports, not stray file creation.

The reviewer chain

Every task's output runs through Code → Security → Test → Perf reviewers with rubric-shaped verdicts. The chain treats unverifiable differently from pass: a missing test runner trips a distinct mode, so it can never be laundered into a false approval.

Failing tasks get a 3-attempt budget with reviewer feedback injected into each retry, then escalate to AWAITING_HITL rather than spinning forever.

Hash-chained audit log

Every LLM call, state transition, gate decision, and hook output lands in a hash-chained JSONL at .ortim/audit.jsonl. Because each entry chains the hash of the previous one, any edit after the fact is detectable:

ortim audit-verify

This is what lets a team answer "who approved this, and why?" months later.

Greenfield and brownfield

ortim init in an empty directory starts greenfield. In a directory with a recognized manifest (package.json, pyproject.toml, Cargo.toml, go.mod, pubspec.yaml) it enters brownfield mode: import-graph extraction and scope-aware task generation against your existing code.