Concepts
Ortim turns an unreliable LLM into a reviewable pipeline with a handful of structural guarantees. This page explains each one.
The pipeline
[brief]
↓ Babel (any language → structured intent, token-frugal)
intent.json
↓ Analyst chain (IntentAnalyst + StackAnalyst + PRDAnalyst)
PRD.md + stack.json
↓ G1 — human approval (mandatory)
↓ Architect (Call 1: scorer inputs; Call 2: RFC with module breakdown)
RFC.md + golden_path_inputs.json
↓ G2 — human approval (mandatory)
↓ Orchestrator (TaskDAG; Hard Rule 13: DAG ⊂ RFC modules)
task_dag.json + .ortim/tasks/T-NNN.md
↓ Worker × N (FILE_BLOCK output, git branch per task)
↓ Reviewer chain (Code → Security → Test → Perf)
↓ Hooks (pre_commit / pre_deploy)
DONE
The state machine
A deterministic state machine drives every run — not a free-form chat loop:
intake → babel → intake_dialog → stack_dialog → prd_dialog → prd_drafting
→ prd_awaiting_approval → prd_approved
↑ G1 (mandatory)
→ rfc_drafting → rfc_awaiting_approval → rfc_approved
↑ G2 (mandatory)
→ tasks_generating → tasks_ready → executing → done
Human gates
Two gates are mandatory:
- G1 — PRD. Locks scope (MVP vs deferred) before any architecture work.
- G2 — RFC. Locks the architecture before any code is written.
Five conditional gates fire only when relevant, each pausing the task in AWAITING_HITL
until you run ortim advance <state>_approved:
| Gate | Fires when | |---|---| | G3 | A schema / migration is involved | | G4 | An external API call is introduced | | G5 | A security finding of severity ≥ medium | | G6 | A deploy step | | G7 | A budget cap is reached |
This kills approval fatigue: you approve scope and architecture once, then only see a prompt again when something genuinely warrants a human.
The LLM never picks the architecture
This is the core invariant. The Architect agent does not choose a tier. Instead:
- Architect Call 1 emits parameters extracted from the PRD.
- A rule-based scorer (
ortim/architecture/golden_paths.py) picks the architecture across 12 canonical tiers — T0–T6 (web), M0–M2 (mobile), D0–D1 (desktop).
Same inputs produce the same architecture, every time. No microservices for a CRUD app.
Scope-locked, runtime-validated DAGs
The Orchestrator emits a task DAG constrained to the RFC's modules (Hard Rule 13:
DAG ⊂ RFC modules). DAGs are validated at runtime — if the LLM emits a cycle, a missing
dependency, or an off-RFC module, validators retry up to 3×, then escalate to a human
instead of producing broken work.
Each task carries a module_scope; the sandbox rejects writes outside it. Cross-module
use happens through imports, not stray file creation.
The reviewer chain
Every task's output runs through Code → Security → Test → Perf reviewers with
rubric-shaped verdicts. The chain treats unverifiable differently from pass: a missing
test runner trips a distinct mode, so it can never be laundered into a false approval.
Failing tasks get a 3-attempt budget with reviewer feedback injected into each retry,
then escalate to AWAITING_HITL rather than spinning forever.
Hash-chained audit log
Every LLM call, state transition, gate decision, and hook output lands in a hash-chained
JSONL at .ortim/audit.jsonl. Because each entry chains the hash of the previous one, any
edit after the fact is detectable:
ortim audit-verify
This is what lets a team answer "who approved this, and why?" months later.
Greenfield and brownfield
ortim init in an empty directory starts greenfield. In a directory with a recognized
manifest (package.json, pyproject.toml, Cargo.toml, go.mod, pubspec.yaml) it
enters brownfield mode: import-graph extraction and scope-aware task generation against
your existing code.