Needlepath

Selective state compression for production agentic AI systems.

Needlepath turns context management into an infrastructure primitive: select the right state, preserve grounding, fail open when needed, and measure every high-volume agent workflow.

27.38%
final-path input token reduction across the 100-task benchmark
83/100
tasks with usefulness preserved or improved versus baseline
0
silent unsafe compressions in the benchmark target
24/100
fail-open cases where baseline context was safer

Why context optimization is now infrastructure

Production agents do not fail only because they lack context. They also fail because they carry stale chat history, distractor retrieval chunks, unused tool schemas, redundant memories, workflow traces, file metadata, and partial results into model calls. Needlepath treats context selection as an infrastructure concern: preserve grounding-critical evidence, remove low-signal state, and fail open when optimization would be risky.

How Needlepath works

Needlepath sits between the agent runtime and the model call. It evaluates the next action, scores candidate state records against the task, preserves constraints such as citations, identifiers, workflow dependencies, policies, and output contracts, then returns either an optimized context package or a safe baseline pass-through decision.

Where it fits in the optimization stack

Needlepath operates before token-level compression. It decides which state belongs in the run at all. Compact encodings, provider caching, and prompt compression can then make the retained payload denser, cheaper, or faster. That distinction matters because the dominant production risk is often not excessive text alone, but the wrong state being sent to the model.

What teams should validate

The paper argues for production-conservative rollout: replay real customer traces, track savings, latency, fallback, and quality by workflow, and promote only workflows that clear governance gates. The goal is not token reduction in isolation. The goal is lower cost and latency while preserving correctness, grounding, policy boundaries, and output completeness.

Typed state records

Every candidate record should carry type, source, provenance, freshness, authority, dependency, and risk metadata so the optimizer can reason over state rather than raw text.

Constraint-first selection

Needlepath protects evidence, citations, identifiers, workflow state, policy context, and output contracts before optimizing for token budget.

Auditable fallback

When optimization is unsafe, the system should explain why it declined and use baseline context instead of silently degrading the run.