Staged rollouts for agent deployments

The standard web-services rollout playbook — feature flag, 1% canary, expand by traffic share — leaks safety properties when applied to agents. The reason is that the failure modes aren't independent across requests. One bad tool call can persist state that poisons the next ten.

A four-stage alternative

Shadow. The new agent runs alongside production but its outputs are discarded. You compare its tool-call plans against the live agent's actual calls.
Sandbox. Real inputs, real reasoning, but tools execute against a mocked side-effect layer. Catches "would-have-done-the-wrong-thing" failures cheaply.
Gated. Real tools, but every irreversible action is human-confirmed. You measure how often the human disagrees.
General. Default path, with the gates lowered to the irreversibles you still don't fully trust.

The stages are sequential per release, not per user. Skipping a stage to ship faster is the most common cause of agent incidents we see.