Sharp piece, Rock, the MSA carve-out point is one I wish more CISOs internalized before approving open-weight pilots.
One thing I'd raise about the four architectural moves: they all live external to the model. Signed weights, runtime inventory, scoped authorization, continuous evals, that's a perimeter rebuild, and necessary, but the evidence you cite cuts in an interesting direction. Badllama strips Llama 3 in five minutes for fifty cents. Refusal rates collapse from ~100% to under 20% after light fine-tuning across every major closed model. I don’t see that as a perimeter failure, it’s evidence the layer being stripped was always a behavioral overlay rather than a structural property of the model.
If that read is right, runtime perimeter work is necessary but insufficient. It observes consequences of substrate instability rather than preventing causes. The constraints that were supposed to be load-bearing turn out to be cosmetic, and the perimeter ends up absorbing the failure modes the substrate should have prevented in the first place.
Curious whether you see room in the open-weight enterprise architecture for substrate-side stabilization sitting underneath AAGATE-class runtime controls or whether you see the model-as-fixed assumption as the right operating constraint given current tooling.
Truly wonder if "Preventing causes" isn't reachable at the substrate level for downloadable weights as well, which makes "necessary but insufficient" a measurement against a bar that doesn't exist on either side. What would build for a substrate-side control I can actually deploy?
For downloadable weights, I don’t think full “cause prevention” is reachable unless you intervene during training. But a deployable substrate-side control is still possible.
It would sit below policy and above raw weights: activation telemetry, basin detection, trajectory/logit intervention, fallback routing, and an audit trail of the internal signals that triggered the control.
So the first practical build isn’t “perfect prevention.” It’s a structural governor that detects when the model is entering an unsafe or false-grounded formation path and interrupts before the behavior fully forms.
That’s much closer to substrate control than an output filter.
tl;dr If you're deploying it yourself, you need to secure it yourself.
Which means having someone review the deployment for security issues who knows what they're doing.
Most people deploying local AI and especially AI agents don't. See here:
Breaking: Autonomous Agents are a Shitshow
Brace for chaos
https://garymarcus.substack.com/p/breaking-autonomous-agents-are-a
Sharp piece, Rock, the MSA carve-out point is one I wish more CISOs internalized before approving open-weight pilots.
One thing I'd raise about the four architectural moves: they all live external to the model. Signed weights, runtime inventory, scoped authorization, continuous evals, that's a perimeter rebuild, and necessary, but the evidence you cite cuts in an interesting direction. Badllama strips Llama 3 in five minutes for fifty cents. Refusal rates collapse from ~100% to under 20% after light fine-tuning across every major closed model. I don’t see that as a perimeter failure, it’s evidence the layer being stripped was always a behavioral overlay rather than a structural property of the model.
If that read is right, runtime perimeter work is necessary but insufficient. It observes consequences of substrate instability rather than preventing causes. The constraints that were supposed to be load-bearing turn out to be cosmetic, and the perimeter ends up absorbing the failure modes the substrate should have prevented in the first place.
Curious whether you see room in the open-weight enterprise architecture for substrate-side stabilization sitting underneath AAGATE-class runtime controls or whether you see the model-as-fixed assumption as the right operating constraint given current tooling.
— Royce
Truly wonder if "Preventing causes" isn't reachable at the substrate level for downloadable weights as well, which makes "necessary but insufficient" a measurement against a bar that doesn't exist on either side. What would build for a substrate-side control I can actually deploy?
That’s the right bar.
For downloadable weights, I don’t think full “cause prevention” is reachable unless you intervene during training. But a deployable substrate-side control is still possible.
It would sit below policy and above raw weights: activation telemetry, basin detection, trajectory/logit intervention, fallback routing, and an audit trail of the internal signals that triggered the control.
So the first practical build isn’t “perfect prevention.” It’s a structural governor that detects when the model is entering an unsafe or false-grounded formation path and interrupts before the behavior fully forms.
That’s much closer to substrate control than an output filter.