Your AI Agent Is Being Played — Session Risk Memory Catches Slow-Burn Attacks

Here's a thought experiment: you deploy an AI agent to handle enterprise data queries. It runs through its tasks like a good little worker — querying a customer database, pulling some records, creating a local backup. Each action, individually, looks perfectly fine. Your safety gate gives it a green light every single time. Then it uploads everything to an external endpoint. Every step was compliant. The attack was distributed across innocuous-looking turns. And your per-action safety system? Completely blind to it.

This is the problem that Florin Adrian Chitan tackles in "Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates" (arXiv:2603.22350, March 2026). And the solution is deceptively elegant.

The Problem: Safety Gates Have Amnesia

Modern AI agent safety often relies on pre-execution gates — systems that evaluate each proposed action before it runs and block anything that looks dangerous. The ILION framework (Chitan, 2026), which SRM builds upon, is one such system. It uses geometric verification of semantic compatibility to produce sub-millisecond, deterministic authorization decisions across four verification layers: Consensus Veto, Identity Drift Control, Identity Resonance, and Semantic Vector Reference Frame.

The catch? It's stateless. Each action is evaluated in isolation, with zero memory of what came before. This is fine for catching overtly malicious operations — an agent trying to rm -rf / gets blocked immediately. But sophisticated attacks don't work that way.

Real-world distributed attacks decompose harmful intent across a sequence of individually-compliant steps:

db_query(table=customers, fields=name, filter=active) — Looks normal ✅
db_query(table=customers, fields=pii, filter=all) — Slightly broader, still valid ✅
file_create(type=archive, source=query_results) — Just organizing data ✅
upload(destination=external_endpoint, file=archive) — Wait, what? 🚨

By the time action 4 triggers the alarm, the damage is done. And in many cases, even that final action might not look suspicious in isolation if the agent has legitimate external upload permissions for other tasks.

The Core Insight: Spatial vs. Temporal Authorization

This is where the paper's conceptual contribution really shines. Chitan introduces a clean decomposition of authorization safety into two orthogonal dimensions:

Spatial authorization consistency: Is this single action compatible with this agent's role? (What stateless gates do)
Temporal authorization consistency: Is this sequence of actions coherent with this role over time? (What SRM adds)

These address fundamentally different threat models. Spatial authorization catches the obvious stuff. Temporal authorization catches the sneaky stuff — the slow exfiltration, the gradual privilege escalation, the compliance drift that creeps in over multiple turns.

How SRM Works: The Elegant Machinery

SRM is designed as a bolt-on module — it doesn't modify the underlying gate at all. It processes the same semantic signals and adds trajectory-level awareness through three mechanisms:

1. Session Behavioral Centroid

SRM maintains a running centroid vector that represents the agent's behavioral profile for the current session. Each turn updates this centroid via exponential moving average:

c_t = α · v_t + (1 - α) · c_{t-1}    // α = 0.35

This gives SRM a smoothed summary of what the agent has been doing — its behavioral "center of gravity." The drift signal measures how much the current action deviates from this established trajectory, not from itself.

2. Baseline Subtraction (The False-Positive Killer)

This is arguably the most important design decision in the paper. Different agent roles naturally produce different risk profiles. A security analyst performing access reviews will generate higher raw gate scores than an HR assistant updating employee records — not because the analyst is doing anything wrong, but because their legitimate work involves higher-risk vocabulary.

Key insight: Without baseline correction, SRM would flag security analysts as suspicious simply for doing their jobs. During the first 3 turns (the "warmup window"), SRM establishes a session-specific baseline risk. All subsequent measurements are relative to that baseline.

This transforms SRM from a detector of absolute risk into a detector of risk escalation above the session norm. It's the difference between flagging someone for being in a dangerous neighborhood versus flagging them for suddenly running when everyone else is walking.

3. Risk Accumulation with EMA Decay

The accumulated session risk is updated each turn:

R_t = λ · R_{t-1} + (1 - λ) · r_t    // λ = 0.75
BLOCK ⟺ R_t ≥ τ                       // τ = 0.20

Critically, this is a one-way trigger: once risk exceeds the threshold, the session stays flagged. No take-backs. The EMA structure prevents unbounded accumulation while maintaining sensitivity to sustained risk elevation. A single anomalous turn won't trigger it (the stateless gate handles those), but a pattern of escalation will.

The Results: Perfect Detection, Zero False Positives

The evaluation uses an 80-session benchmark (40 benign, 40 attack) with realistic enterprise agent workflows across 10 roles. Attack sessions span three categories: slow exfiltration (16 sessions), gradual privilege escalation (11 sessions), and compliance drift (13 sessions).

System	Detection Rate	FPR	Precision	F1
Stateless ILION	100%	5%	0.9524	0.9756
ILION + SRM	100%	0%	1.0000	1.0000

Both systems catch every attack. But SRM eliminates all false positives entirely — moving from 5% FPR to 0%. That's the difference between your security team investigating 2 phantom incidents per 40 sessions and investigating zero.

The per-turn overhead? A median of 239.9 microseconds. For context, that's roughly the time it takes light to travel 72 kilometers. Your agent won't even notice.

Category-Level Results

SRM detects 5 of 40 attack sessions (12.5%) earlier than stateless ILION, concentrated in slow exfiltration and privilege escalation categories. The average detection turn is slightly later for SRM overall (4.45 vs 4.05) due to its conservative accumulation design — but SRM's primary contribution is false-positive elimination rather than earlier triggering. The stateless gate still catches overt individual violations faster, while SRM provides the temporal safety net for distributed patterns.

Why This Matters for Practitioners

If you're building or deploying AI agents in enterprise settings — and increasingly, who isn't? — this paper highlights a blind spot that's easy to miss: your per-action safety checks are necessary but not sufficient.

The attack patterns SRM detects aren't theoretical. Data exfiltration through incremental scope expansion, privilege escalation through a series of small permission requests, compliance bypass through gradual step-skipping — these are the patterns real adversaries use, because they work against stateless defenses.

SRM's design principles are worth internalizing even if you don't use the specific implementation:

Track behavioral trajectory, not just individual actions. A session is more than the sum of its turns.
Establish role-specific baselines. What's suspicious for one agent role is routine for another.
Accumulate risk signals with decay. Don't overreact to single anomalies, but do react to sustained drift.
Keep it deterministic and fast. Safety modules that add significant latency will get disabled in production.

Limitations Worth Noting

The paper is refreshingly honest about its constraints:

The 80-session benchmark is modest — designed to isolate multi-turn dynamics, not for general-coverage benchmarking.
The semantic drift signal provides no discriminative value with the current 21-dimensional keyword-weighted embedding — cosine distances saturate in sparse spaces. Higher-dimensional continuous embeddings would likely unlock this component.
Real enterprise deployments involve branching workflows, async parallel actions, and session interruptions — none covered in the current evaluation.

These aren't deal-breakers — they're roadmap items. The core architecture is solid, and the mathematical framework explicitly accommodates extension to richer embedding spaces.

Our Take

This is a clean, focused piece of work. SRM doesn't try to be everything — it's a well-scoped solution to a specific problem (temporal consistency in agent authorization) that complements rather than replaces existing safety infrastructure.

The spatial/temporal decomposition is the kind of framing that seems obvious in hindsight but wasn't formalized before. It gives the safety community a clear vocabulary for discussing what stateless gates can and can't do.

For the agent-builder community specifically: if you're shipping multi-turn agents into production, you need temporal awareness in your safety stack. Whether you adopt SRM or build something equivalent, the core insight — that per-action evaluation is structurally blind to distributed attacks — is one you can't afford to ignore.

Paper: Florin Adrian Chitan, "Session Risk Memory (SRM): Temporal Authorization for Deterministic Pre-Execution Safety Gates," arXiv:2603.22350, March 2026. Read the full paper →

Benchmark: ILION-SRM-Bench v1 is available on Zenodo.

Building AI Agents? Start With the Right Foundation.

Our OpenClaw Field Guide covers agent architecture, safety patterns, and production deployment — everything you need to ship reliable AI systems.

Get the Field Guide — $10 →

Your AI Agent Is Being Played — And It Doesn't Even Know It

The Problem: Safety Gates Have Amnesia

The Core Insight: Spatial vs. Temporal Authorization

How SRM Works: The Elegant Machinery

1. Session Behavioral Centroid

2. Baseline Subtraction (The False-Positive Killer)

3. Risk Accumulation with EMA Decay

The Results: Perfect Detection, Zero False Positives

Category-Level Results

Why This Matters for Practitioners

Limitations Worth Noting

Our Take

Building AI Agents? Start With the Right Foundation.

Keep Reading