My Claude Code Harness Is Public. Don't Copy It.
I open-sourced my Claude Code harness for Mac, Jetson, and Windows. Read the reasoning, skip the configs. The honest answer is don't build.
I spent most of last month watching myself do the same dance every time I opened Claude Code. Each session ate 20-30 minutes up front, depending on how Claude Code was performing that day, and I’d spend that time re-stating trust boundaries, re-configuring tooling, and reminding a fresh session what the project was. I was doing it on three machines (Mac, Jetson AGX Orin, Windows), 5-10x/week. Before I’d written a line of code, I was burning two to five hours a week on a problem I’d already solved twice and forgotten how.
The “fix it in code review” answer for security findings fell apart around the same time, once I’d read enough of the benign-prompt vulnerability data on frontier models to understand what I was accepting by deferring. If the model’s shipping vulnerable code at a non-trivial rate even when nobody’s trying to make it, “we’ll catch it in PR” is wishful thinking with a JIRA ticket attached.
That was the moment. I stopped patching the symptom. I built my harness from scratch on the Mac, ported the reasoning to the Jetson and Windows, and wrote down why I made every choice. The repo’s a reasoning trail with the code attached as evidence.
What I’m publishing lives at github.com/rocklambros/harness-engineering. The README says it plainly: this isn’t a clone-and-run template, and personal-specific configuration is the point. If you read it expecting a drop-in setup, you’ll come away disappointed. If you read it expecting to see how a harness gets reasoned into existence, you’ll come away with a frame for arguing with mine and building yours.
Harness engineering isn’t what most people think it is
Prompt engineering got the marketing budget. Harness engineering didn’t, and most Claude Code users skip past it because it doesn’t feel like coding. It feels like ops, and nobody writes posts about ops decisions.
Here’s the working definition I’ve landed on. A harness is the configured environment around an agent (in this case, a coding agent) that determines what it can and can’t do, what guidance it follows by default, and what guardrails it can’t talk its way past. Harness engineering is the discipline of designing that environment on purpose, with reasoning you can defend, instead of accepting whatever defaults shipped in the box.
In Claude Code terms, the harness is everything outside the chat turn. The project-level CLAUDE.md the model reads at session start. The settings.json that defines permission modes and hook registrations. The deterministic rules the model can’t override, even if it tries. The skills that load advisory guidance on demand. The hooks that fire on tool use to validate, scan, and audit. The agents you delegate specialized tasks to.
If you’re running Claude Code with a default settings.json, no hooks, no skills beyond what shipped, and a CLAUDE.md that someone else wrote, you don’t have a harness. You have a session. The model is making decisions about what’s safe to run, what tools to invoke, and what your codebase should look like, with zero guardrails you can defend in a postmortem.
For a vibe-coding indie dev shipping a side project, no harness might be fine. The blast radius is one repo, possibly with no production users. For anyone shipping code that matters, the absence of a harness means the model is making decisions about what’s safe with zero documented constraints, and you’re trusting the defaults to do work you’d never trust an unverified junior to do.
Most of the “10 tips for Claude Code” content I’ve read is harness suggestion without harness reasoning, which means surface configs without the why. That’s why those posts age out within a minor-version bump. The configs survive maybe four weeks before an upstream change breaks the assumption they were built on, and the reader has no idea which assumption broke or how to fix it. The reasoning is what survives the upgrade. The configs are what fall out.
The honest answer is: don’t build
Most of you should adopt, not build. The README says this directly, and I want to repeat it before anyone gets the wrong idea from the announcement:
The honest answer for most people reading this is: don’t build. Adopt.
The cost of building isn’t in the writing. It’s in the maintenance against Claude Code itself, which ships breaking changes on minor version bumps. The TTL cache regression in March 2026 was the canonical example. A behavior change in the cache layer silently halved the economic value of half the harnesses in circulation, and most of the people running those harnesses didn’t notice for weeks. If your harness assumes a Claude Code behavior that later changes in a release, every part of your reasoning trail that depended on that assumption needs re-evaluation. That’s a non-trivial tax to pay if your day job isn’t building harnesses.
Who should build, then? The conditions are narrow, and all four must be true.
You operate across multiple machines, and the off-the-shelf options don’t survive the cross-platform parity test. You have a non-trivial security posture, and “fix it in code review” isn’t a defensible answer for the work you ship. You don’t trust the trust boundaries that ship in the existing community harnesses, either because they’re underspecified or because they’re calibrated to a different threat model than yours. You can afford the maintenance cost of keeping a reasoning trail up to date as Claude Code evolves.
If any of those four don’t apply, adopt. There are good public harnesses in the community right now. Pick one whose reasoning you can read and whose tradeoffs you can defend. That’s a faster path to a harness you can trust than building your own.
I built mine because all four applied: three machines, an AI security threat model I don’t want negotiated by a maintainer I’ve never met, a low tolerance for trust boundaries I can’t trace, and the time budget to keep the reasoning current. Most of you don’t have all four. Reading my repo to argue with my reasoning is useful. Copying my configs into a project that doesn’t share my four conditions is the same kind of mistake as cloning someone else’s threat model and hoping it covers yours.
If you read this section and think, “but my situation is special,” it probably isn’t. The cases that earn building are rarer than people think, and the cases where adopting is the smart move look pretty similar to mine from the outside.
What’s in the repo, and what it does
The repo is organized as one foundation section, three platform sections (Mac, Jetson AGX Orin, Windows), and a research section. Foundation holds the parts that are identical across platforms: the Quality Contract that binds every artifact, the threat model, the architectural principles, the seed evaluation methodology, and the research references.
The Mac section is the validated reference build. All six phases (Phase 0 goals through Phase 5 release) are written and tested against my actual machine. The Jetson and Windows sections mirror the structure. Phases 0 through 2 are written and ready. Phases 3 through 5 are scaffolded with explicit “needs validation when ported” markers because I haven’t run them against those environments yet. The capability surface is identical to Mac. Tools differ where they have to.
Each platform’s harness has the same five-layer shape. The project-level CLAUDE.md sits under 200 lines and covers seven sections: the role the model is operating in, the code standards I expect it to honor, the security rules it can’t bypass, the core constraints on the project, the things that break (failure modes I’ve already hit), an operational section for day-to-day commands, and a status section that captures where the build currently is. A settings.json template defines permission modes, hook registrations, and trust-boundary policy. A deterministic rules directory lists path deny patterns, command deny patterns, and secret patterns that get consumed by hooks rather than interpreted by the model. A skills directory holds lazy-loaded advisory guidance. A hooks and agents directory holds the deterministic gates and the specialized subagents.
The piece I’m most willing to defend is the three-layer security stack that cuts across the skills and hooks layers. Layer one is pre-generation guidance: a security-review skill seeded from the Arcanum-Sec sec-context anti-pattern taxonomy (CC BY 4.0, Jason Haddix), with 10 pattern files for the Mac build that match the skill’s manifest one-to-one. The skill loads pattern sections based on file type, so the context tax stays small. Layer two is commit-time hardening: a Semgrep PostToolUse hook that fires on every Write or Edit and feeds findings back to Claude in the same session, implementing the SecureForge methodology from Liu et al. (arXiv:2605.08382, MIT). The published paper reports a roughly 48% reduction in CWE rate from this layer alone. Layer three is post-generation validation: a pinned pre-commit gate running gitleaks for secrets, Semgrep for SAST, shellcheck for hook scripts, and a local drift check for reference integrity. It’s the same Semgrep engine as layer two, running in a different invocation context. The redundancy is intentional.
The one piece I’d point to first if you want to see how the reasoning trail format works is JOURNEY.md. It’s a running narrative of the build, written as prose checkpoints. Reasoning lives in JOURNEY.md, decisions land in commits, locked decisions land in foundation docs. That separation is doing real work. The commit history is part of the artifact, not just a side effect of using git.
Decisions I made that won’t transfer to your setup
The repo is a reasoning trail, not a config to copy. Here are the load-bearing decisions in it that won’t survive translation to your environment unchanged.
The Windows section runs Semgrep in WSL2 rather than the native Windows binary. The native binary has spotty coverage on some of the rule packs I care about, and forcing parity across platforms outweighed the convenience of running Semgrep natively on Windows. If your security posture cares about different rule packs than mine does, your decision might run the other way. The same goes for the broader WSL2 call. I picked it because it gave me a Linux-shaped tool environment without dual-booting. If you’re already deep into PowerShell and Windows-native tooling, you’d pick differently, and you’d be right.
The Jetson section assumes Tegra Python and the apt-plus-Jetson-SDK package management posture. If you’re running a Jetson but you’ve layered conda over the top, or you’re using a different L4T release than mine, the Phase 0 inventory output won’t match yours, and the downstream phases will need adjustment. The reasoning still applies. The specific tool versions won’t.
The seven-section CLAUDE.md under 200 lines is calibrated to my context-tax tolerance, not yours. I write CLAUDE.md to be the smallest thing that’s still useful, because every line in it is paid for on every turn in every session. If your projects are larger or smaller than mine, your CLAUDE.md should be too. If your tolerance for context tax is different (some people will trade more setup tokens for less in-session friction), your CLAUDE.md will be longer than mine.
The pattern prose in the security-review skill has been rewritten from the Arcanum-Sec sec-context taxonomy to reflect my voice and selection logic. The attribution is preserved, but the prose isn’t theirs anymore. If you adopt the skill as a starting point, you should rewrite it again. The selection logic is mine, the priorities are mine, and the file-type triggers reflect what I write the most of. If your language mix is different, you’ll want different triggers and a different priority order.
The Quality Contract section IDs and threat IDs are stable across my repo, which means hooks and skills can cite them by ID, and a drift check can verify the citations resolve. If you adopt the structure, you’ll want to renumber to your own threat model. Don’t inherit my IDs and pretend they’re yours. The whole point of the reasoning trail format is that the citations track to something real, and ID inheritance breaks that the first time you forget which threat ID came from where.
What I’d do differently if I started over
Two things, and I’ll know about a third by the time I finish the Jetson and Windows validations.
Lock the foundation docs and the Quality Contract before any platform work. I built the Mac section in parallel with the foundation, which meant some early Mac decisions had to be revisited as the Quality Contract sharpened. Each revisit costs a commit cycle and a small amount of confidence in the validity of earlier work. Doing the foundation first and the platform second would have made the reasoning trail cleaner, and the Mac reference build wouldn’t have had a handful of decisions that needed an asterisk.
Write the JOURNEY.md format on day one. I started JOURNEY.md after the initial batch of artifacts had already landed, which meant the reasoning for the first batch had to be reconstructed from commit messages rather than captured live. Commit messages are good for landing decisions. They aren’t the same thing as a running narrative that captures the questions you were sitting with as you made them. Future me will thank present me for any reasoning that gets captured live instead of being reconstructed later. Past me did not get that gift.
The third thing I’m watching for: I suspect the Phase 4 security-review skill will need a different structure once I validate it against the Jetson and Windows environments. The Mac pattern selection assumes a tool mix I haven’t proven survives the port. If it doesn’t, the lesson will be “design the skill structure against the hardest target first, not the easiest.” I don’t know yet. The JOURNEY.md entry that resolves it will say so.
How to read the repo
Read foundation/00-quality-contract.md first. It binds everything else in the repo, and if you’re going to argue with my reasoning, you need to argue from the same starting point I’m arguing from. After that, pick your path. USER_GUIDE.md walks through the wiring if you want a quick start for adopting the harness in your own project. HARNESS_GUIDE.md is the technical reference across all three platforms. If you want the full validated build with all the reasoning intact, read mac/ start to finish in commit order.
What I want from readers isn’t forks of the configs. It’s forks of the thinking. If your harness ends up looking nothing like mine because you have a different threat model, different platforms, a different language mix, or a different context-tax budget, that’s the right outcome. If your harness ends up looking exactly like mine, one of us is wrong, and the math says it’s probably you.
The question I’m leaving open
Most Claude Code users I’ve talked to are running with default permission modes on production codebases and calling that ops maturity. They have no hooks, no skills beyond what shipped, and a CLAUDE.md that someone else wrote or that doesn’t exist at all. If you can’t name the three layers of your security stack without checking, and you can’t say what gets enforced deterministically versus advisorily, you don’t have a harness. You have a session.
What’s in your harness, and could you defend it on a panel?
The repo’s at github.com/rocklambros/harness-engineering. The license is MIT. Use the patterns and argue with me in the comments or in your own JOURNEY.md.
👉 For ongoing analysis of agentic AI governance frameworks, the conversation continues at RockCyber Musings.
👉 Visit RockCyber.com to learn more about how we can help with your traditional Cybersecurity and AI Security and Governance journey.
👉 Want to save a quick $100K? Check out our AI Governance Tools at AIGovernanceToolkit.com






I have started the process of using the organization and labeling. I think that's an excellent idea. I think giving Claude Code some solid labels to hold onto is probably a great idea.
The don’t-copy-it framing is the right callout. Harnesses are personal because the constraints are personal. The slash commands, the system prompt, the deny-list, they encode how YOU work, not how the tool ships. Forking someone’s http://CLAUDE.md is inheriting their context, not their judgment.