Deep Dive

How Raftkit Works

Every phase has a purpose. Here's what happens inside each one — and why structured AI development produces better results than ad-hoc prompting.

5 phases|1 methodology|0 guesswork

Phase 1 — Blueprint

A single collaborative session produces your entire product foundation.

What happens

One conversation produces your entire product foundation — PRD, competitive analysis, database schema, API contracts, design tokens, and product constitution. Not separate meetings with separate teams. One session, all interconnected.

How the brainstorming works

Blueprint uses a 4-Act structure. Each act builds on the previous one, so the AI accumulates context as it goes.

Act 1 — Product Discovery

Raftkit asks 4-8 adaptive questions to understand the product — users, problem, core features, success metrics. Not a form. A conversation that adapts based on your answers. Say "developers who want a fast task tracker" and the follow-up questions shift toward developer workflows, not consumer onboarding.

Act 2 — Tech & Design Discovery

Asks about your existing tech stack, design preferences, and deployment targets. Groups questions into 3 batches to avoid back-and-forth fatigue. Don't have preferences? That's fine — it picks sensible defaults.

Act 3 — Research & Architecture

The AI researches competitors (5-10 with feature matrix), creates user personas, sizes the market (TAM/SAM/SOM), then designs the entire technical foundation — database schema, system architecture, API contracts, design tokens, sample data. All in one pass.

Act 4 — Review & Refine

Section-by-section walkthrough of everything produced. Change anything that doesn't feel right. The AI explains its decisions and adjusts. This isn't a take-it-or-leave-it deliverable — it's a collaborative refinement.

How architecture decisions flow from research

Architecture decisions aren't made in isolation — they flow from the product research. The AI sees the competitive landscape, user needs, and technical constraints simultaneously. The schema reflects the PRD. The API contracts match the features. The design tokens align with the product identity. Everything is cross-referenced because it was all produced in the same context window.

Why use AI for this

Traditional product discovery takes 2-4 weeks across multiple teams — PM, designer, architect. AI can hold the entire context at once: product goals, technical constraints, competitive landscape. It produces a coherent, cross-referenced foundation in one session. Your role is to steer and validate, not to draft.

~14 files produced in one session

brainstorms/prd.md

Product requirements document

brainstorms/competitive-analysis.md

5-10 competitors with feature matrix

brainstorms/market-analysis.md

TAM/SAM/SOM with assumptions

brainstorms/decisions.md

Decision log with rationale

architecture/db-schema.md

ERD, indexes, constraints

architecture/tech-stack.md

Technology decisions + rationale

architecture/system-diagram.md

Components + data flow

architecture/api-contracts.yaml

OpenAPI/Swagger spec

architecture/design-system.md

Design tokens + UI specs

architecture/sample-data.md

Seed records + TypeScript types

architecture/tech-preferences.md

Your existing stack choices

architecture/third-party-services.md

Service integrations

constitution.md

Testing, security, accessibility rules

status.md

Progress tracking

Phase 2 — Plan

Features get RICE-prioritized and broken into micro-tasks.

What happens

Raftkit reads your blueprint and extracts features. Each feature gets a RICE score. Then each feature is decomposed into micro-tasks — tiny, testable units that take 2-5 minutes each.

How RICE scoring works

RICE stands for Reach × Impact × Confidence ÷ Effort. Each dimension is scored:

Reach

How many users does this feature affect?

Impact

How much does it move the needle? (Minimal to Massive)

Confidence

How sure are we about the estimates? (Low to High)

Effort

How much work is this? (Person-weeks)

High-value, low-effort features ship first. The AI doesn't just list features — it tells you which ones matter most.

How micro-tasks work

These aren't Jira tickets scoped to 1-3 days. They're tiny, testable units with EARS acceptance criteria:

WHEN [condition] THE SYSTEM SHALL [behavior]

Each task file is self-contained — all context is inlined so the build agent doesn't need to search across files. The task knows which database tables it touches, which API endpoints it creates, and which UI components it affects.

Why small tasks matter

AI handles 3-minute tasks reliably. 3-day tasks? Hallucinations compound, context drifts, quality drops. Micro-tasks keep the error radius small and the feedback loop tight.

Why use AI for this

Humans tend to plan in large chunks ("build auth system"). AI can decompose systematically — it sees the full PRD and architecture, so it knows exactly which database tables, API endpoints, and UI components each feature needs.

Phase 3 — Build

A fresh AI agent picks up each task, writes tests first, implements code to pass them.

What happens

For each micro-task: a fresh AI agent spins up, reads the task file, writes failing tests based on the EARS criteria, implements minimum code to pass them, captures verification evidence, and commits. Then the next task begins.

How TDD works in practice

The agent reads the task's EARS acceptance criteria, then:

Writes failing tests that encode the acceptance criteria

Implements the minimum code to make tests pass

Runs verification — actual terminal output captured as evidence

Anti-rationalization: no "the test is wrong" claims without evidence

Fresh agent per task

Each task gets a clean context with only its task file and relevant skills loaded. No accumulated confusion from earlier tasks. Domain-specific skills (React, database, API design) are conditionally injected based on what the task actually touches — a backend task doesn't load the Tailwind skill.

Why use AI for this

The structured task files give the AI everything it needs in one place — acceptance criteria, architecture context, dependencies. Combined with TDD, the AI can't "pass" a task by writing code that looks right but doesn't work. The tests are the proof.

Phase 4 — Review

Two-stage quality review with up to 23 specialized reviewers running in parallel.

What happens

Every piece of code gets a two-stage quality check. Stage 1 verifies spec compliance — does the code actually do what the task asked for? Stage 2 runs up to 23 specialized reviewers in parallel, each examining the code from a different angle.

How the review system works

Conditional dispatch — only relevant reviewers run. A backend task won't trigger the Tailwind or Accessibility reviewer. Each reviewer has a specific mandate: security looks for OWASP top 10, performance checks for N+1 queries, architecture validates alignment with the blueprint.

P1Blocks the build. Must be fixed before the code can be committed.

P2Should fix. Important issues that should be addressed before shipping.

P3Logged for later. Good to know, not urgent.

The 23 reviewers

12 Generic Reviewers

SecurityCode QualityPerformanceArchitectureTest CoverageOverbuiltError HandlingAccessibilityAPI ContractsDatabaseThird PartyResponsive Design

11 Stack-Specific Reviewers

Next.jsTypeScriptTailwindPrismaDrizzlePythonGoReactReact NativeHonoFlutter

Why use AI for this

Human code review catches ~60% of defects. Running 23 specialized reviewers in parallel — each with a narrow mandate — catches issues a single reviewer would miss. The AI doesn't get tired, doesn't skip the security check because it's Friday, and doesn't rubber-stamp because the PR is too large.

Phase 5 — Compound

The AI analyzes learnings across all tasks and gets smarter about your codebase.

What happens

After building, Raftkit analyzes learnings across all tasks. Common patterns get written into the project's CLAUDE.md so the AI gets smarter about this specific codebase over time.

How knowledge compounding works

Each task captures learnings during build — what worked, what failed, what patterns emerged. Compound aggregates these across all tasks, finds cross-task patterns, and proposes CLAUDE.md updates.

Example pattern discovery

"Noticed 3 tasks used the same auth middleware pattern → codify as project standard in CLAUDE.md so future tasks follow it automatically."

Why this matters

Without compounding, every AI session starts from scratch. With it, the AI accumulates project-specific knowledge — naming conventions, architectural decisions, testing patterns, common pitfalls. The 50th task benefits from everything learned in the first 49.

Why use AI for this

Humans rarely update documentation. AI can systematically analyze what happened across dozens of tasks and extract patterns that humans would miss or wouldn't bother to write down.

Ready to try it?

Install Raftkit and start your first blueprint.

claude plugin install raftkit

Follow the guide →GitHub