Tool Forge

A validation-carrying toolchain for governed agentic execution.

Tool Forge turns capability intent into sandbox-verified tool artifacts and exposes the right tools to agents through a token-efficient governed router.

Download PDF View GitHub repo Read on arXiv

0.908

aggregate micro-F1 across 83 router benchmark cases

99.49%

estimated task-flow tool-context reduction versus naive schema exposure

25/25

tool bundles generated in the end-to-end generation probe

23/25

live sandbox validations passed in the reported probe

The problem: agents can only act through tools

As agents move from text generation to operational work, the tool layer becomes the trust boundary. A plausible generated script is not a production capability unless its inputs, outputs, dependencies, credentials, tests, runtime behavior, and lifecycle state are validated.

Tool Forge as a validation-carrying toolchain

Tool Forge converts natural-language capability intent into governed, sandbox-verified, cataloged tool artifacts. A tool is treated as a capsule: intent, contract, implementation, dependency policy, tests, documentation, validation evidence, lifecycle state, credential bindings, and routing metadata.

Token-efficient tool routing

Instead of exposing every full tool schema to the model, Tool Forge Router exposes a small MCP-compatible surface that can search, resolve, describe, and call tools. Full schemas are loaded lazily only for the selected subset, keeping catalogs large while keeping the model-facing decision surface small.

Why governance is part of the artifact

Generated tools, imported MCP tools, and third-party integrations should not become trusted merely because they exist. Tool Forge makes validation, approval, sandbox results, audit metadata, and lifecycle state part of the artifact that agents and workflows reason over.

Capability contract

Tool Forge first maps intent into a structured contract: name, parameters, credentials, output shape, runtime class, failure handling, and source evidence.

Sandbox validation

Generated bundles are checked through deterministic review, tests, CLI validation, dependency policy, and live sandbox execution before becoming trusted capabilities.

Governed catalog

Approved tools can be searched, resolved, pinned, blocked, credential-mapped, and exposed through scoped router sessions rather than global schema dumps.