Claude Code Tools

sd0x-dev-flow

github

The harness layer for Claude Code — a reference implementation of harness engineering with hook-enforced dual review, state-machine gates that survive context compaction, and fail-closed safety where it counts. Quality gates that AI can't skip.

Stars
⭐ 156
License
MIT
Last Updated
2026-05-20
Source
github

sd0x-dev-flow

sd0x-dev-flow banner

Language: English | 繁體中文 | 简体中文 | 日本語 | 한국어 | Español

The harness layer for Claude Code.

Quality gates that AI can’t skip. A reference implementation of AI Agent Harness Engineering for Claude Code — hook-enforced dual review, state-machine gates that survive context compaction, and fail-closed safety where it counts.

96 bundled · 96 public skills · 15 agents — ~4% of Claude’s context window

License: MIT npm

What This Harness Does

Harness engineering is the discipline of engineering everything around the LLM — tool loops, context management, hooks, state machines, safety layers — as opposed to training the model itself. Mitchell Hashimoto coined the term in Feb 2026; Anthropic engineering and Martin Fowler have published on it; arXiv 2603.05344 formalizes it.

sd0x-dev-flow is a reference implementation. Each row below maps a canonical harness sub-problem to concrete code you can study:

#Harness sub-problemsd0x-dev-flow implementationCode evidence
1Tool loop control/codex-review-fast/precommit auto-loop with sentinel-driven transitionsrules/auto-loop.md + hooks/post-tool-review-state.sh
2Sentinel-driven state machine✅ Ready / ⛔ Blocked / ✅ All Pass gate markers parsed into durable statescripts/emit-review-gate.sh (producer) + hooks/post-tool-review-state.sh (parser)
3Context recovery across compaction[AUTO_LOOP_RESUME] stdout injection after SessionStart(compact)hooks/post-compact-auto-loop.sh
4Lifecycle interceptors5 hook event types dispatched to 8 scripts: PreToolUse / PostToolUse / Stop / SessionStart / UserPromptSubmithooks/ (8 scripts) + .claude/settings.json
5Capability-based tool gatingSkill frontmatter allowed-tools — e.g., /ask has no Edit/Write86 of 95 public skills declare allowed-tools
6Defense-in-depth safety5 layers: pre-edit-guard → commit-msg-guard → pre-push-gate → stop-guard → sidecar fail-closed markerscripts/pre-push-gate.sh + scripts/commit-msg-guard.sh + hooks/stop-guard.sh
7Generator-evaluator splitDual review: Codex (primary) + Claude (secondary) dispatched in parallel on every review cyclerules/codex-invocation.md + rules/auto-loop.md (Dual Review Mode)
8Incremental progress trackingiteration_history.current_round + max_rounds + convergence plateau detectionrules/auto-loop.md (exit conditions + strategic reset)
9Human-in-the-loop safety gates/dev/tty confirmation + AskUserQuestion for destructive opsscripts/pre-push-gate.sh + skills/push-ci/SKILL.md
10Self-improvement loopCorrection → record lesson → promote to rule after 3+ recurrencesrules/self-improvement.md

Most harness projects cover 2–4 of these. sd0x-dev-flow covers all 10 — which makes the code useful as a study target, not just a tool.

Why sd0x-dev-flow?

Without guardrailsWith sd0x-dev-flow
AI skips review when context is longHook-enforced: stop-guard blocks incomplete reviews
Single reviewer misses issuesDual dispatch: Codex + secondary in parallel
”Fixed it” without re-verificationAuto-loop: fix → re-review → pass → continue
Review state lost after compactState tracking: SessionStart hook re-injects

Quick Start

# Install plugin
/plugin marketplace add sd0xdev/sd0x-dev-flow
/plugin install sd0x-dev-flow@sd0xdev-marketplace

# Configure your project
/project-setup

One command auto-detects framework, package manager, database, entrypoints, and scripts. Installs a subset of rules and hooks; the full plugin bundles 14 rules + 9 hooks.

Use --lite to only configure CLAUDE.md (skip rules/hooks).

How It Works

flowchart LR
    P["🎯 Plan"] --> B["🔨 Build"]
    B --> G["🛡️ Gate"]
    G --> S["🚀 Ship"]

    P -.- P1["/codex-brainstorm<br/>/feasibility-study<br/>/tech-spec"]
    B -.- B1["/feature-dev<br/>/bug-fix<br/>/codex-implement"]
    G -.- G1["/codex-review-fast<br/>/precommit<br/>/codex-test-review"]
    S -.- S1["/smart-commit<br/>/push-ci<br/>/create-pr<br/>/pr-review"]

The auto-loop engine enforces quality gates automatically — after code edits, the review command dispatches dual review (Codex MCP + secondary reviewer in parallel) in the same reply. Findings are deduplicated, severity-normalized, and aggregated into a single gate. In strict mode, hooks enforce fail-closed semantics: if the aggregate gate is incomplete, stop-guard blocks. See docs/hooks.md for mode and dependency details.

Detailed: Dual-Review Sequence Diagram
sequenceDiagram
    participant D as Developer
    participant C as Claude
    participant X as Codex MCP
    participant T as Secondary Reviewer
    participant H as Hooks

    D->>C: Edit code
    H->>H: Track file change
    C->>H: emit-review-gate PENDING
    par Dual Review
        C->>X: Codex review (sandbox)
    and
        C->>T: Task(code-reviewer)
    end
    X-->>C: Findings (primary)
    T-->>C: Findings (secondary)
    C->>C: Aggregate + dedup + gate
    C->>H: emit-review-gate READY/BLOCKED

    alt Issues found
        C->>C: Fix all issues
        C->>X: --continue threadId
        X-->>C: Re-verify
    end

    C->>C: /precommit (auto)
    C-->>D: ✅ All gates passed

    Note over H: Strict mode: incomplete gate → blocked

Feature Spotlight: Dual-Reviewer Architecture

v2.0 dispatches two independent reviewers in parallel — dual-review by default with degraded fallback modes:

ReviewerRoleFallback
Codex MCPPrimary (sandbox, full diff)Single-reviewer mode if unavailable
Secondary (pr-review-toolkit)Confidence-scored reviewstrict-reviewer → single mode

Findings are severity-normalized (P0-Nit), deduplicated (file + issue key, ±5 line tolerance), and source-attributed (codex | toolkit | both).

Gate: ✅ Ready or ⛔ Blocked — in strict mode, incomplete gate = blocked.

How We Compare

Capabilitysd0x-dev-flowgstackGeneric prompts
Enforced review gatesHook + behavior layerSuggestion onlyNone
Dual-reviewerCodex + secondary (parallel)Single /reviewNone
Auto-fix loopFix → re-review → passManualNone
Multi-agent research/deep-research (3 agents)NoneNone
Adversarial validationNash equilibrium debateNoneNone
Self-improvementLesson log + rule promotion/retro stats onlyNone
Cross-tool supportCodex/Cursor/WindsurfClaude/Codex/Gemini/CursorN/A

When to Use

Good FitNot Ideal
Solo or small-team projects with Claude CodeTeams not using Claude Code
Projects needing automated review gatesOne-off scripts with no CI
Codex CLI / Cursor / Windsurf users (skills subset)Projects requiring custom LLM providers
Repos where quality gates prevent regressionsRepos with no test infrastructure

Install

Codex CLI / Other AI Agents

# Install individual skills via Agent Skills standard
npx skills add sd0xdev/sd0x-dev-flow

# Generate AGENTS.md + install hooks (in Claude Code)
/codex-setup init
MethodToolsCoverage
Plugin installClaude CodeFull (96 bundled skills, hooks, rules, auto-loop)
npx skills addCodex CLI, Cursor, Windsurf, AiderSkills only (96 public skills)
/codex-setup initCodex CLIAGENTS.md kernel + git hooks

Requirements: Claude Code 2.1+ | Codex MCP (optional — /codex-* skills require it; without it, review falls back to single-reviewer mode)

Workflow Tracks

WorkflowCommandsGateEnforced By
Feature/feature-dev/verify/codex-review-fast/precommit✅/⛔Hook + Behavior
Bug Fix/issue-analyze/bug-fix/verify/precommit✅/⛔Hook + Behavior
Auto-LoopCode edit → /codex-review-fast/precommit✅/⛔Hook
Doc Review.md edit → /codex-review-doc✅/⛔Hook
Planning/codex-brainstorm/feasibility-study/tech-spec
Onboarding/project-setup/repo-intake
Visual: Workflow Flowcharts
flowchart TD
    subgraph feat ["🔨 Feature Development"]
        F1["/feature-dev"] --> F2["Code + Tests"]
        F2 --> F3["/verify"]
        F3 --> F4["/codex-review-fast"]
        F4 --> F5["/precommit"]
        F5 --> F6["/update-docs"]
    end

    subgraph fix ["🐛 Bug Fix"]
        B1["/issue-analyze"] --> B2["/bug-fix"]
        B2 --> B3["Fix + Regression test"]
        B3 --> B4["/verify"]
        B4 --> B5["/codex-review-fast"]
        B5 --> B6["/precommit"]
    end

    subgraph docs ["📝 Docs Only"]
        D1["Edit .md"] --> D2["/codex-review-doc"]
        D2 --> D3["Done"]
    end

    subgraph plan ["🎯 Planning"]
        P1["/codex-brainstorm"] --> P2["/feasibility-study"]
        P2 --> P3["/tech-spec"]
        P3 --> P4["/codex-architect"]
        P4 --> P5["Implementation ready"]
    end

    subgraph ops ["⚙️ Operations"]
        O1["/project-setup"] --> O2["/repo-intake"]
        O2 --> O3["Develop"]
        O3 --> O4["/project-audit"]
        O3 --> O7["/best-practices"]
        O3 --> O5["/risk-assess"]
        O4 --> O6["/next-step --go"]
        O5 --> O6
        O7 --> O6
    end

Cookbook

Real-world scenarios showing which skills to combine and in what order.

ScenarioFlowDocs
First day in a repo/project-setup/repo-intake/next-step
Implement a new feature/feature-dev/verify/codex-test-review/codex-review-fast/precommit
Resolve PR review comments/load-pr-review → fix → /codex-review-fast/push-ci
Security pre-merge pass/codex-security/dep-audit/risk-assess/pre-pr-audit
Showcase: Validate direction/deep-research/best-practices/feasibility-study/codex-brainstorm
Showcase: Adversarial design/codex-brainstorm (Nash equilibrium debate) → /codex-architect

All 10 scenarios →

What’s Included

CategoryCountExamples
Skills96 public (96 bundled)/project-setup, /codex-review-fast, /verify, /smart-commit, /deep-research
Agents15strict-reviewer, verify-app, coverage-analyst, architecture-designer
Hooks9pre-edit-guard, auto-format, review state tracking, stop guard, namespace hint, post-compact-auto-loop, post-skill-auto-loop, user-prompt-review-guard, session-init
Rules14auto-loop, auto-loop-project, codex-invocation, security, testing, git-workflow, self-improvement, context-management
Scripts13precommit runner, verify runner, dep audit, namespace hint, skill runner, commit-msg guard, pre-push gate, utils (shared lib), emit-review-gate, build-codex-artifacts, resolve-feature (CLI + shell), feature-resolver, readme-catalog

Minimal Context Footprint

~4% of Claude’s 200k context window — 96% remains for your code.

ComponentTokens% of 200k
Rules (always loaded)5.1k2.6%
Skills (on-demand)1.9k1.0%
Agents7910.4%
Total~8k~4%

Skills load on-demand. Idle skills cost zero tokens.

Skill Reference

SkillUse when
/project-setupFirst-time project configuration
/bug-fixFixing bugs and resolving issues
/feature-devImplementing new features end-to-end
/smart-commitCommitting changes with smart grouping
/push-ciPushing code and monitoring CI
/create-prCreating GitHub pull requests
/codex-review-fastQuick code review (diff only)
/codex-review-docReviewing documentation changes
/codex-securityOWASP Top 10 security audit
/verifyRunning full test verification chain
/precommitPre-commit quality gate (lint + build + test)
/precommit-fastQuick pre-commit (lint + test, no build)
/codex-brainstormAdversarial brainstorming (Nash equilibrium)
/tech-specWriting technical specifications
/pr-reviewPR self-review before merge
All 96 public skills

Development (33)

SkillDescription
/askContext-aware Q&A with auto context gathering.
/bug-fixBug fix workflow.
/bump-versionBump package and plugin version in sync.
/code-explorePure Claude code investigation.
/code-investigateDual-perspective code investigation.
/codex-architectCodex architecture consulting.
/codex-implementImplement features via Codex MCP.
/codex-setupInitialize sd0x-dev-flow infrastructure for Codex CLI and other non-Claude agents.
/create-prCreate or update GitHub PR with gh CLI.
/debugInteractive debugging workflow with hypothesis-driven probe loop.
/deep-exploreMulti-wave parallel code exploration orchestrator.
/epic-mergeSequential squash-merge of stacked PR chains into an epic branch.
/feature-devFeature development workflow.
/feature-verifyFeature verification (READ-ONLY, P0-P5).
/git-investigateGit history investigation.
/git-profileGit identity and GPG signing profile manager.
/install-hooksInstall plugin hooks into project .claude/ for persistent use without plugin loaded
/install-rulesInstall plugin rules into project .claude/rules/ for persistent use without plugin loaded
/install-scriptsInstall plugin runner scripts into project .claude/scripts/ for persistent use without plugin loaded
/issue-analyzeGitHub Issue and PR review thread deep analysis with Codex blind verdict.
/jiraJira integration — view issues, generate branches, create tickets, transition status.
/load-pr-reviewLoad GitHub PR review comments into AI session — analyze, triage, plan.
/merge-prepPre-merge analysis and preparation.
/next-stepChange-aware next step advisor.
/post-dev-testPost-development test completion.
/pr-commentPost friendly review comments to a GitHub PR — prepare locally, preview, then submit as atomic review.
/project-setupProject configuration initialization.
/push-ciPush to remote and monitor CI.
/remindLightweight model correction with context-aware rule loading.
/repo-intakeProject initialization inventory (one-time).
/smart-commitSmart batch commit.
/smart-rebaseSmart partial rebase for squash-merge repositories.
/watch-ciMonitor GitHub Actions CI runs until completion.

Review (Codex MCP) (14)

SkillDescriptionLoop Support
/codex-cli-reviewCode review via Codex CLI with full disk access.-
/codex-code-reviewCode review using Codex MCP.-
/codex-explainExplain complex code via Codex MCP.-
/codex-reviewFull second-opinion using Codex MCP (with lint:fix + build).--continue <threadId>
/codex-review-branchFully automated review of an entire feature branch using Codex MCP-
/codex-review-docReview documents using Codex MCP.--continue <threadId>
/codex-review-fastQuick second-opinion using Codex MCP (diff only, no tests).--continue <threadId>
/codex-securityOWASP Top 10 security review using Codex MCP.--continue <threadId>
/codex-test-genGenerate unit tests for specified functions using Codex MCP-
/codex-test-reviewReview test case sufficiency using Codex MCP, suggest additional edge cases.--continue <threadId>
/doc-reviewDocument review via Codex MCP.-
/security-reviewSecurity review via Codex MCP.-
/seek-verdictIndependent second-opinion verification for any finding.-
/test-reviewTest coverage review via Codex MCP.-

Verification (13)

SkillDescription
/best-practicesIndustry best practices conformance audit with mandatory adversarial debate.
/check-coverageComprehensive assessment of Unit / Integration / E2E three-layer test coverage, identify gaps and provide actionable …
/dep-auditAudit dependency security risks
/dev-security-auditComprehensive developer workstation security audit — scans for exposed credentials, compromised application data, per…
/necessity-auditNecessity audit for over-designed spec elements.
/pre-pr-auditPre-PR confidence audit with 5-dimension scoring.
/precommitPre-commit checks — lint:fix -> build -> test
/precommit-fastQuick pre-commit checks — lint:fix -> test
/project-auditProject health audit with deterministic scoring.
/risk-assessUncommitted code risk assessment with breaking change detection, blast radius analysis, and scope metrics.
/test-deepContext-aware test orchestration.
/test-healthHolistic test coverage measurement.
/verifyVerification loop — lint -> typecheck -> unit -> integration -> e2e

Planning (16)

SkillDescription
/architectureArchitecture design and documentation.
/codex-brainstormAdversarial brainstorming via Claude+Codex debate.
/deep-analyzeDeep-dive analysis of an initial proposal — research code implementation, produce an actionable roadmap and alternatives
/deep-researchUniversal multi-source research orchestration.
/feasibility-studyFeasibility analysis from first principles.
/fp-briefFirst-principles briefing from technical documents.
/post-dev-recapPost-development recap wrapper.
/project-briefConvert a technical spec into a PM/CTO-readable executive summary.
/recap-askInteractive Q&A over an existing recap document.
/recap-docPost-development recap document generator.
/req-analyzeRequirements analysis — problem decomposition, stakeholder scan, requirement structuring.
/request-trackingRequest tracking knowledge base.
/review-specReview technical spec documents from completeness, feasibility, risk, and code consistency perspectives.
/tech-briefTechnical briefing for developer sharing.
/tech-specTech spec generation and review.
/ui-first-principlesFirst-principles UI/IA reasoning: turns a <scenario> + API field set into JTBD analysis, principle-anchored field-p…

Documentation & Tooling (20)

SkillDescription
/claude-healthClaude Code config health check + plugin sync.
/contract-decodeEVM contract error and calldata decoder.
/create-requestCreate, update, or scan per-task request tickets for progress tracking.
/de-ai-flavorRemove AI artifacts from documents.
/doc-refactorRefactor documents — simplify without losing information, visualize flows with sequenceDiagram.
/generate-runnerGenerate a customized precommit runner for any ecosystem.
/obsidian-cliObsidian vault integration via official CLI.
/op-sessionInitialize 1Password CLI session for Claude Code.
/portfolioPortfolio system knowledge base.
/pr-reviewPR self-review — review changes, produce checklist, update rules
/pr-summaryList open PRs, filter automation PRs, group by ticket ID, format as Markdown.
/refactorMulti-target refactoring orchestrator.
/runbookGenerate/update feature release runbook
/safe-removeSafely remove plugin assets (skill/agent/rule/script/hook) with dependency detection and reference cleanup.
/sharinganReplicate knowledge from any source as sd0x-dev-flow skill definition.
/simplifyWrap-up refactoring — simplify code, eliminate duplication, preserve behavior
/skill-health-checkValidate skill quality against routing, progressive loading, and verification criteria.
/statusline-configCustomize Claude Code statusline.
/update-docsResearch current code state then update corresponding docs, ensuring docs stay in sync with code.
/zh-twRewrite the previous reply in Traditional Chinese

Rules & Hooks

14 rules (always-loaded conventions) + 9 hooks (automated guardrails).

Customization: Edit auto-loop-project.md to override auto-loop behavior per project. Plugin updates won’t conflict — see Rule Override Pattern.

For full rules, hooks, and environment variable reference, see docs/rules.md and docs/hooks.md.

Customization

Run /project-setup to auto-detect and configure all placeholders, or manually edit .claude/CLAUDE.md:

PlaceholderDescriptionExample
{PROJECT_NAME}Your project namemy-app
{FRAMEWORK}Your frameworkMidwayJS 3.x, NestJS, Express
{CONFIG_FILE}Main config filesrc/configuration.ts
{BOOTSTRAP_FILE}Bootstrap entrybootstrap.js, main.ts
{DATABASE}DatabaseMongoDB, PostgreSQL
{TEST_COMMAND}Test commandyarn test:unit
{LINT_FIX_COMMAND}Lint auto-fixyarn lint:fix
{BUILD_COMMAND}Build commandyarn build
{TYPECHECK_COMMAND}Type checkingyarn typecheck

Showcase: Multi-Agent Research

Run /deep-research to orchestrate 2-3 parallel researcher agents across web sources, codebase, and community knowledge — with claim registry synthesis and conditional adversarial debate.

FeatureDetails
Agents2-3 parallel (web + code + community)
SynthesisClaim registry with consensus detection
ValidationConditional /codex-brainstorm debate
Scoring4-signal completeness model

Full documentation

Architecture

Command (entry) → Skill (capability) → Agent (environment)
  • Commands: User-triggered via /...
  • Skills: Knowledge bases loaded on demand
  • Agents: Isolated subagents with specific tools
  • Hooks: Automated guardrails (format, review state, stop guard)
  • Rules: Always-on conventions (auto-loaded)

For advanced architecture details (agentic control stack, control loop theory, sandbox rules), see docs/architecture.md.

Contributing

PRs welcome. Please:

  1. Follow existing naming conventions (kebab-case)
  2. Include When to Use / When NOT to Use in skills
  3. Add disable-model-invocation: true for dangerous operations
  4. Test with Claude Code before submitting

License

MIT

Star History

Star History Chart