Spec Plan
This document defines the specification breakdown for implementing AI Dev Agent v1, as described in dev-agent-v1.md.
Each spec maps to one or more architecture documents under docs/ and is designed to be independently implementable and deliverable via sdd framework like cc-sdd and OpenSpec, spec-kit.
Design Principles
- Dependency order: specs are listed in implementation order; each spec depends only on those listed before it
- Single responsibility: each spec owns one architectural concern with a clearly defined interface boundary
- Independently testable: each spec can be verified in isolation before integration
- v1 scope: specs 1–10 cover the complete AI Dev Agent v1 feature set; spec 11 is a stretch goal (v1.x)
Architecture Reference
| Spec | Architecture Document(s) |
|---|---|
| spec1: orchestrator-core | system-overview.md, architecture/architecture.md |
| spec2: tool-system | architecture/tool-system-architecture.md |
| spec3: agent-safety | architecture/agent-safety-architecture.md |
| spec4: agent-loop | architecture/agent-loop-architecture.md |
| spec5: memory-system | memory/memory-architecture.md |
| spec6: context-engine | architecture/context-engineering-architecture.md |
| spec7: task-planning | architecture/task-planning-architecture.md |
| spec8: git-integration | architecture/architecture.md (Git Controller section) |
| spec9: implementation-loop | agent/dev-agent-v1.md |
| spec10: self-healing-loop | agent/dev-agent-v1.md |
| spec11: codebase-intelligence | architecture/codebase-intelligence-architecture.md |
Dependency Map
spec1: orchestrator-core
├── spec2: tool-system
│ └── spec3: agent-safety
├── spec4: agent-loop
│ └── spec7: task-planning
│ └── spec9: implementation-loop
│ └── spec10: self-healing-loop
├── spec5: memory-system
│ └── spec10: self-healing-loop
└── spec6: context-engine
├── spec7: task-planning
└── spec9: implementation-loop
spec8: git-integration (depends on spec2: tool-system)
spec11: codebase-intelligence (v1.x — depends on spec2, spec6)Note: The tree above shows the full transitive hierarchy, not just direct
spec1children. Specs that appear as branches ofspec1may still require intermediate specs before they can start. Always refer to each spec'sDependenciesfield for the exact prerequisites.
The following table summarizes which specs can be implemented in parallel:
| Wave | Specs | Prerequisite |
|---|---|---|
| 1 | spec1 | — |
| 2 | spec2, spec5 | spec1 |
| 3 | spec3, spec4, spec6 | spec2 (for spec3, spec4); spec2 + spec5 (for spec6) |
| 4 | spec7, spec8 | spec4 + spec6 (for spec7); spec2 + spec3 (for spec8) |
| 5 | spec9 | spec4 + spec6 + spec7 + spec8 |
| 6 | spec10 | spec5 + spec9 |
| 7 | spec11 (v1.x) | spec2 + spec6 |
v1 Specs
spec1: orchestrator-core
Architecture: docs/system-overview.md, docs/architecture/architecture.md
Scope: The runnable skeleton of the system. Nothing else can execute without this. Establishes the entry point, the phase-based workflow state machine, the primary SDD adapter, and the LLM provider abstraction.
Sub-components:
cli— Entry point:aes run <spec-name>, configuration loading, execution trigger, progress reportingworkflow-engine— State machine managing the 7-phase development lifecycle:SPEC_INIT → SPEC_REQUIREMENTS → VALIDATE_GAP (optional) → SPEC_DESIGN → VALIDATE_DESIGN → SPEC_TASKS (TASK_GENERATION) → SPEC_IMPL (IMPLEMENTATION) → VALIDATE_IMPL (optional) → PULL_REQUESTphase-transitions— Phase validation, lifecycle hooks, state persistence, phase-boundary context resetcc-sdd-adapter— Adapter that invokes cc-sdd commands to generate requirements, design docs, and task definitions from a specllm-abstraction— LLM provider interface + Claude provider implementation; all LLM calls flow through this abstraction, never directly to provider APIs
Dependencies: none
Success criteria: aes run <spec> triggers the full 7-phase sequence, invokes cc-sdd at each spec phase, uses Claude via the abstraction, and resets context at each phase boundary.
spec2: tool-system
Architecture: docs/architecture/tool-system-architecture.md
Scope: The structured execution interface between the LLM and the development environment. All filesystem, shell, git, code analysis, and knowledge operations go through this system. Provides the deterministic tool interface that the agent loop depends on.
Sub-components:
tool-interface— CommonTool<Input, Output>interface: name, description, JSON schema, execute functiontool-context— Execution context injected into every tool: workspaceRoot, workingDirectory, permissions, memory client, loggertool-registry— Central registry for tool registration, discovery, and schema retrievaltool-executor— Validates input against schema, invokes tool, validates output, enforces timeouts, handles errorspermission-system—PermissionSetcapability flags (filesystemRead, filesystemWrite, shellExecution, gitWrite, networkAccess); execution modes (ReadOnly, Dev, CI, Full)tool-categories— Implementations for all five tool categories:- Filesystem tools:
read_file,write_file,list_directory,search_files - Git tools:
git_status,git_diff,git_commit,git_branch - Shell tools:
run_command,run_test_suite,install_dependencies - Code analysis tools:
parse_typescript_ast,find_symbol_definition,find_references,dependency_graph - Knowledge tools:
search_memory,retrieve_spec,retrieve_design_doc
- Filesystem tools:
error-handling— StructuredToolErrortype with"validation" | "runtime" | "permission"categories
Dependencies: spec1 (orchestrator-core)
Success criteria: Tools can be registered and invoked with schema-validated inputs/outputs; permission checks block unauthorized operations; all 5 tool categories are available and functional.
spec3: agent-safety
Architecture: docs/architecture/agent-safety-architecture.md
Scope: The operational safety layer that wraps tool execution. Defines policies and guardrails that prevent the agent from causing unintended or destructive changes to the environment.
Sub-components:
workspace-isolation— Enforces that all file operations stay within the configured workspace root; rejects any path traversal outside the boundaryfilesystem-guardrails— Path normalization, protected file detection (.env,secrets.json,.git/config), write validationgit-safety— Protected branch enforcement (no direct push tomain/production); feature branch naming conventions; change size limits (max files per commit)shell-restrictions— Allowlist/blocklist for shell commands; pattern matching to block destructive commands (rm -rf /,shutdown, etc.)sandboxing— Containerized or restricted-shell execution environment for untrusted code and test runnersiteration-limits— ConfigurablemaxIterationsandmaxRuntimeper agent session; triggers graceful stop on breachfailure-detection— Detects repeated identical failures (threshold: 3 occurrences); pauses execution and requests human reviewdestructive-action-detection— Flags high-impact operations (mass file deletion, force-push) and routes to human approval workflowrate-limiting— Per-operation frequency limits for tool execution, repository modifications, and external API requestsaudit-logging— Immutable log of every tool invocation: timestamp, tool, parameters, result, errorshuman-approval-workflow— Approval gate for flagged high-risk operations; agent pauses and proposes change; resumes on approvalemergency-stop— Signal handler for immediate termination of agent loop, tool execution, and background processes
Dependencies: spec2 (tool-system)
Success criteria: Agent cannot write outside the workspace; protected branches and sensitive files are untouched; shell blocklist is enforced; repeated failures pause execution and request human review; all tool invocations are logged.
spec4: agent-loop
Architecture: docs/architecture/agent-loop-architecture.md
Scope: The cognitive core of the AI Dev Agent — the iterative reasoning and execution cycle that turns a task description into completed work. Operates at the level of individual LLM iterations, below task planning and above raw tool execution.
Sub-components:
agent-state— Persistent state across iterations:{ task, plan, completedSteps, currentStep, observations }plan-step— LLM reasons over current state to produce the nextActionPlan: what to do next and whyact-step— Executes the planned action via the tool system; produces a raw resultobserve-step— Records the tool result as a structuredObservation; adds to context for next iterationreflect-step— LLM evaluates whether the result was expected, what was learned, and whether the plan needs adjustmentupdate-state-step— UpdatesAgentState: marks completed steps, logs discoveries, updates the working planiteration-control— EnforcesmaxIterationslimit; handles loop termination (task complete / human intervention required / safety limit)action-types— Supports four action categories: Exploration (read/search), Modification (write/edit), Validation (test/build/lint), Documentation (update docs/comments)error-recovery— Intra-loop recovery: analyze error → identify root cause → attempt fix → re-run validationobservability— Per-iteration structured logs: iteration number, action, tools invoked, execution time, result status
Dependencies: spec2 (tool-system), spec1 (orchestrator-core)
Success criteria: Given a task, the agent iteratively executes PLAN→ACT→OBSERVE→REFLECT→UPDATE until the task is complete or a stopping condition is reached; iteration logs are produced; errors trigger recovery attempts before escalation.
spec5: memory-system
Architecture: docs/memory/memory-architecture.md
Scope: Persistent knowledge storage that enables the agent to accumulate and reuse information across workflow sessions.
Sub-components:
short-term-memory— In-process store for active workflow state: current spec, current phase, task progress, working contextproject-memory— File-based store at.memory/for repository-specific knowledge:project_rules.md— Coding conventions and architectural decisionscoding_patterns.md— Recurring implementation approachesreview_feedback.md— Feedback from previous review cycles
knowledge-memory— Reusable implementation patterns and strategies extracted from successful past runs and stored as structured entriesfailure-memory— Structured records of failures: what was attempted, what failed, root cause, and resolution; feeds directly into self-healing-loopmemory-reader— Retrieves relevant memory entries given a query; supports keyword search and metadata filtering; returns ranked results for context injection
Dependencies: spec1 (orchestrator-core)
Success criteria: Knowledge from previous sessions (patterns, rules, review feedback) is automatically retrievable in new sessions; failure records persist across restarts; memory reader returns relevant results for context injection.
spec6: context-engine
Architecture: docs/architecture/context-engineering-architecture.md
Scope: Constructs the information provided to the LLM at each reasoning step. Determines what goes into every prompt — no more, no less. Critical for reasoning quality and token efficiency across all other specs.
Sub-components:
context-layers— 7-layer context model assembled per prompt:- System instructions (agent role, tool rules, coding standards, safety constraints)
- Task description
- Active specification (relevant sections of design/requirements docs)
- Relevant code context (retrieved by symbol, dependency, or file proximity)
- Repository state (git status, modified files, current branch)
- Memory retrieval (injected knowledge from memory-system)
- Tool results (outputs from current session's tool calls)
context-planner— Decides which files, memories, and spec sections to retrieve, based on current task and steptoken-budget-manager— Allocates tokens per layer (e.g., system:1000, task:500, spec:2000, code:4000, memory:1500, tools:2000); adapts budget to model limitscontext-compression— Reduces oversized layers: document summarization, function-level code extraction, memory priority filteringiterative-expansion— Supports agent-driven context growth mid-iteration (agent discovers it needs an additional file → retrieves and adds to context)context-cache— Caches stable layers (system instructions, architecture docs, coding standards) to avoid redundant retrievalphase-isolation— Resets accumulated context when the workflow transitions between phases; prevents cross-phase context pollutiontask-isolation— Ensures each task section starts with a fresh minimal context derived only from its own artifacts
Dependencies: spec1 (orchestrator-core), spec2 (tool-system), spec5 (memory-system)
Success criteria: Prompts contain only the relevant layers for each step; token usage stays within configured budget; context does not leak between phases or task sections; compression activates automatically when limits are approached.
spec7: task-planning
Architecture: docs/architecture/task-planning-architecture.md
Scope: The hierarchical planning layer that sits above the agent loop. Transforms a high-level goal into a structured, executable plan. Guides the sequence of work the agent loop operates on.
Sub-components:
planning-hierarchy— Four-level structure: Goal → Tasks → Steps → Actions; each level has distinct granularity and lifecycleplan-types— TypeScript types:TaskPlan { goal, tasks },Task { id, title, status, steps },Step { id, description, status, dependsOn[] }initial-plan-generation— LLM generates an initial plan from task description, architecture docs, and repository contextdynamic-plan-adjustment— Updates plan mid-execution when new information (existing modules, architectural constraints, test failures) changes the approachstep-execution-model— Each step is handed off to the agent loop; step status is updated (pending → in_progress → completed) based on agent loop outcomedependency-tracking— RespectsdependsOnrelationships between steps; prevents out-of-order executionfailure-recovery— On step failure: retry → refine implementation → revise plan; escalates to self-healing-loop when retries are exhaustedplan-validation— Pre-execution check for architectural compatibility, coding standards, and dependency constraintsplan-persistence— Plans stored at.memory/tasks/task_{id}.json; enables resume after interruption or crashhuman-interaction— Exposes plan for human review before execution of large or high-risk changes; waits for approval before proceeding
Dependencies: spec4 (agent-loop), spec6 (context-engine)
Success criteria: Given a cc-sdd task list, the system generates an executable plan, respects step dependencies, persists plan state, and resumes correctly after interruption; human review gate works for flagged plans.
spec8: git-integration
Architecture: docs/architecture/architecture.md (Git Controller section), docs/agent/dev-agent-v1.md
Scope: All repository operations required for an automated development pipeline. Fully encapsulated behind a Git controller interface; all other components call this via tools from the tool-system.
Sub-components:
branch-manager— Creates feature branches from the configured base branch; names branches from spec and task metadata (e.g.,agent/cache-implementation)commit-automation— Detects staged changes, generates descriptive commit messages using the LLM, validates against safety limits (change size), and commitspush— Pushes the feature branch to the configured remote after safety checks (not a protected branch, not a force push)pull-request-creator— Creates pull requests via repository API with LLM-generated title and body; includes spec reference and implementation summary
Dependencies: spec2 (tool-system), spec3 (agent-safety)
Success criteria: After implementation completes, the system creates a feature branch, commits all changes with meaningful messages, pushes, and opens a pull request — with no manual intervention and no writes to protected branches.
spec9: implementation-loop
Architecture: docs/agent/dev-agent-v1.md
Scope: Orchestrates the execution of each task section from the task plan. Drives the agent loop through an implement → review → improve → commit cycle per task section, and coordinates with the review engine to enforce quality gates.
Sub-components:
task-section-executor— Iterates through task sections from the plan; for each section: initializes context, invokes agent loop, evaluates outcomereview-engine— Automated review of generated output against:- Requirement alignment (does the implementation satisfy the spec?)
- Design consistency (does it follow the architecture?)
- Code quality (linting, test coverage, naming conventions)
implement-review-improve-commit— Per-section cycle:implement— Agent loop writes code for the sectionreview— Review engine evaluates output and generates feedbackimprove— Agent loop applies review feedback to fix issuescommit— Git integration commits the approved changes
iteration-control— Tracks retry count per section; configurable threshold (e.g., 3 cycles); escalates to self-healing-loop on threshold breachquality-gate— Defines review pass/fail criteria; a section cannot proceed to commit until the gate is satisfied
Dependencies: spec4 (agent-loop), spec7 (task-planning), spec6 (context-engine), spec8 (git-integration)
Success criteria: Each task section is implemented, passes automated review, and is committed; the cycle retries up to the configured threshold; sections that exceed the threshold escalate correctly to self-healing.
spec10: self-healing-loop
Architecture: docs/agent/dev-agent-v1.md, docs/architecture/agent-loop-architecture.md (Error Recovery section)
Scope: Activates when the implementation-loop exceeds its retry threshold or when the agent enters a stuck state. Analyzes the failure, identifies missing knowledge, updates rules, and resumes with improved context.
Sub-components:
failure-detection— Triggered by: retry threshold breach from implementation-loop; repeated identical errors in agent-loop; agent reporting inability to proceedroot-cause-analysis— LLM-driven analysis of full failure context: what was attempted across all retries, what failed each time, patterns in the errorsgap-identification— Determines which rule, pattern, or knowledge is absent from the current rule set that would have prevented the failurerule-update— Writes targeted updates to rule files:rules/coding_rules.mdrules/review_rules.mdrules/implementation_patterns.md
failure-record— Writes a structured failure record to failure-memory (memory-system): task context, root cause, gap identified, rule changes madeself-healing-retry— Resumes the failed task section with updated rules injected into context; logs outcome (resolved / escalated to human)
Dependencies: spec9 (implementation-loop), spec5 (memory-system)
Success criteria: Repeated failures trigger automatic rule file updates; the agent successfully completes tasks it previously failed after self-healing; failure records persist and are retrievable; tasks that cannot be resolved after self-healing escalate cleanly to human review.
v1.x Spec (Stretch Goal)
spec11: codebase-intelligence
Architecture: docs/architecture/codebase-intelligence-architecture.md
Scope: Enables the agent to understand and reason about large existing software repositories. Provides scalable code retrieval that feeds into the context-engine. The architecture is fully documented but this spec is excluded from the initial v1 delivery.
Sub-components:
file-scanner— Discovers source files, detects changes, filters irrelevant directories (node_modules/,dist/,.git/)parser-layer— Converts source files to structured representations (AST, symbol definitions, imports, function signatures) using TypeScript compiler API, Tree-sitter, or Rust parserssymbol-index— Stores symbol definitions (functions, classes, interfaces, types) with file location and metadatadependency-graph— Represents inter-module relationships: imports, type references, module dependencies; supports impact analysissemantic-index— Embeds code fragments (functions, classes, documentation) for meaning-based retrieval; supports queries like"user authentication logic"query-engine— Unified retrieval API: symbol lookup, reference search, dependency traversal, semantic search; combines and ranks results from all indicesincremental-indexer— Re-parses only modified files; updates symbol index and dependency graph incrementallycode-chunker— Splits large files into independently retrievable chunks (per function, class, module) for semantic indexing
Dependencies: spec2 (tool-system), spec6 (context-engine)
Success criteria: The agent can find relevant source files and symbols by name and meaning; dependency paths are traversable; context-engine retrieves code snippets from the query engine rather than loading entire files.
Implementation Order
1. spec1: orchestrator-core — CLI, workflow state machine, cc-sdd adapter, LLM abstraction
2. spec2: tool-system — tool interface, registry, executor, 5 tool categories
3. spec3: agent-safety — workspace isolation, guardrails, sandboxing, human approval
4. spec4: agent-loop — PLAN→ACT→OBSERVE→REFLECT→UPDATE, agent state, iteration control
5. spec5: memory-system — project memory, knowledge memory, failure memory, retrieval
6. spec6: context-engine — 7-layer context, planner, token budget, compression, isolation
7. spec7: task-planning — goal→task→steps→actions, dynamic revision, persistence
8. spec8: git-integration — branch, commit, push, pull request
9. spec9: implementation-loop — implement→review→improve→commit, quality gate
10. spec10: self-healing-loop — failure analysis, rule updates, retry
--- v1 complete ---
11. spec11: codebase-intelligence — file scanner, parser, symbol index, dependency graph, semantic search
--- v1.x complete ---