How Raftkit Works
Every phase has a purpose. Here's what happens inside each one — and why structured AI development produces better results than ad-hoc prompting.
Phase 1 — Blueprint
A single collaborative session produces your entire product foundation.
What happens
One conversation produces your entire product foundation — PRD, competitive analysis, database schema, API contracts, design tokens, and product constitution. Not separate meetings with separate teams. One session, all interconnected.
How the brainstorming works
Blueprint uses a 4-Act structure. Each act builds on the previous one, so the AI accumulates context as it goes.
Act 1 — Product Discovery
Raftkit asks 4-8 adaptive questions to understand the product — users, problem, core features, success metrics. Not a form. A conversation that adapts based on your answers. Say "developers who want a fast task tracker" and the follow-up questions shift toward developer workflows, not consumer onboarding.
Act 2 — Tech & Design Discovery
Asks about your existing tech stack, design preferences, and deployment targets. Groups questions into 3 batches to avoid back-and-forth fatigue. Don't have preferences? That's fine — it picks sensible defaults.
Act 3 — Research & Architecture
The AI researches competitors (5-10 with feature matrix), creates user personas, sizes the market (TAM/SAM/SOM), then designs the entire technical foundation — database schema, system architecture, API contracts, design tokens, sample data. All in one pass.
Act 4 — Review & Refine
Section-by-section walkthrough of everything produced. Change anything that doesn't feel right. The AI explains its decisions and adjusts. This isn't a take-it-or-leave-it deliverable — it's a collaborative refinement.
How architecture decisions flow from research
Architecture decisions aren't made in isolation — they flow from the product research. The AI sees the competitive landscape, user needs, and technical constraints simultaneously. The schema reflects the PRD. The API contracts match the features. The design tokens align with the product identity. Everything is cross-referenced because it was all produced in the same context window.
Why use AI for this
Traditional product discovery takes 2-4 weeks across multiple teams — PM, designer, architect. AI can hold the entire context at once: product goals, technical constraints, competitive landscape. It produces a coherent, cross-referenced foundation in one session. Your role is to steer and validate, not to draft.
~14 files produced in one session
brainstorms/prd.mdProduct requirements document
brainstorms/competitive-analysis.md5-10 competitors with feature matrix
brainstorms/market-analysis.mdTAM/SAM/SOM with assumptions
brainstorms/decisions.mdDecision log with rationale
architecture/db-schema.mdERD, indexes, constraints
architecture/tech-stack.mdTechnology decisions + rationale
architecture/system-diagram.mdComponents + data flow
architecture/api-contracts.yamlOpenAPI/Swagger spec
architecture/design-system.mdDesign tokens + UI specs
architecture/sample-data.mdSeed records + TypeScript types
architecture/tech-preferences.mdYour existing stack choices
architecture/third-party-services.mdService integrations
constitution.mdTesting, security, accessibility rules
status.mdProgress tracking
Phase 2 — Plan
Features get RICE-prioritized and broken into micro-tasks.
What happens
Raftkit reads your blueprint and extracts features. Each feature gets a RICE score. Then each feature is decomposed into micro-tasks — tiny, testable units that take 2-5 minutes each.
How RICE scoring works
RICE stands for Reach × Impact × Confidence ÷ Effort. Each dimension is scored:
How many users does this feature affect?
How much does it move the needle? (Minimal to Massive)
How sure are we about the estimates? (Low to High)
How much work is this? (Person-weeks)
High-value, low-effort features ship first. The AI doesn't just list features — it tells you which ones matter most.
How micro-tasks work
These aren't Jira tickets scoped to 1-3 days. They're tiny, testable units with EARS acceptance criteria:
WHEN [condition] THE SYSTEM SHALL [behavior]Each task file is self-contained — all context is inlined so the build agent doesn't need to search across files. The task knows which database tables it touches, which API endpoints it creates, and which UI components it affects.
Why small tasks matter
AI handles 3-minute tasks reliably. 3-day tasks? Hallucinations compound, context drifts, quality drops. Micro-tasks keep the error radius small and the feedback loop tight.
Why use AI for this
Humans tend to plan in large chunks ("build auth system"). AI can decompose systematically — it sees the full PRD and architecture, so it knows exactly which database tables, API endpoints, and UI components each feature needs.
Phase 3 — Build
A fresh AI agent picks up each task, writes tests first, implements code to pass them.
What happens
For each micro-task: a fresh AI agent spins up, reads the task file, writes failing tests based on the EARS criteria, implements minimum code to pass them, captures verification evidence, and commits. Then the next task begins.
How TDD works in practice
The agent reads the task's EARS acceptance criteria, then:
Fresh agent per task
Each task gets a clean context with only its task file and relevant skills loaded. No accumulated confusion from earlier tasks. Domain-specific skills (React, database, API design) are conditionally injected based on what the task actually touches — a backend task doesn't load the Tailwind skill.
Why use AI for this
The structured task files give the AI everything it needs in one place — acceptance criteria, architecture context, dependencies. Combined with TDD, the AI can't "pass" a task by writing code that looks right but doesn't work. The tests are the proof.
Phase 4 — Review
Two-stage quality review with up to 23 specialized reviewers running in parallel.
What happens
Every piece of code gets a two-stage quality check. Stage 1 verifies spec compliance — does the code actually do what the task asked for? Stage 2 runs up to 23 specialized reviewers in parallel, each examining the code from a different angle.
How the review system works
Conditional dispatch — only relevant reviewers run. A backend task won't trigger the Tailwind or Accessibility reviewer. Each reviewer has a specific mandate: security looks for OWASP top 10, performance checks for N+1 queries, architecture validates alignment with the blueprint.
The 23 reviewers
12 Generic Reviewers
11 Stack-Specific Reviewers
Why use AI for this
Human code review catches ~60% of defects. Running 23 specialized reviewers in parallel — each with a narrow mandate — catches issues a single reviewer would miss. The AI doesn't get tired, doesn't skip the security check because it's Friday, and doesn't rubber-stamp because the PR is too large.
Phase 5 — Compound
The AI analyzes learnings across all tasks and gets smarter about your codebase.
What happens
After building, Raftkit analyzes learnings across all tasks. Common patterns get written into the project's CLAUDE.md so the AI gets smarter about this specific codebase over time.
How knowledge compounding works
Each task captures learnings during build — what worked, what failed, what patterns emerged. Compound aggregates these across all tasks, finds cross-task patterns, and proposes CLAUDE.md updates.
Example pattern discovery
"Noticed 3 tasks used the same auth middleware pattern → codify as project standard in CLAUDE.md so future tasks follow it automatically."
Why this matters
Without compounding, every AI session starts from scratch. With it, the AI accumulates project-specific knowledge — naming conventions, architectural decisions, testing patterns, common pitfalls. The 50th task benefits from everything learned in the first 49.
Why use AI for this
Humans rarely update documentation. AI can systematically analyze what happened across dozens of tasks and extract patterns that humans would miss or wouldn't bother to write down.
Ready to try it?
Install Raftkit and start your first blueprint.
claude plugin install raftkit