Claude Managed Agents & ‘Dreaming’ Review (2026): Self-Improving AI Agents

Most agent updates this year have been about doing more things. Anthropic’s recent wave is more interesting because part of it is about agents getting better on their own. Alongside practical enterprise infrastructure, Anthropic introduced a technique it calls “dreaming” — and it’s one of the more thought-provoking ideas in agent development we’ve seen in 2026.

What’s new in Claude Managed Agents

Anthropic expanded its managed-agent offering on two fronts. On the infrastructure side, it added public-beta self-hosted sandboxes and a research-preview “MCP tunnels” feature. On the capability side, it expanded beta access to tools that let agents coordinate sub-agents and evaluate their own work using rubric-based outcomes — part of a broader push toward autonomous agents that handle long-running workflows in coding, finance, and law.

The “dreaming” technique, explained

This is the headline idea. “Dreaming” lets autonomous systems review their prior behavior between sessions, identify patterns, and improve future performance. Rather than starting each task cold, an agent can reflect on what it did before and carry forward lessons — loosely analogous to how sleep consolidates human learning.

If conventional agents are stateless contractors, a “dreaming” agent is one that actually learns from last week’s mistakes before showing up Monday.

It’s launching as a research preview, so this is early. But the direction matters: self-improvement between sessions is exactly the capability that separates a genuinely useful long-running agent from one that repeats the same errors indefinitely.

Self-hosted sandboxes & MCP tunnels

The infrastructure additions are aimed squarely at regulated enterprises. Self-hosted sandboxes let tool execution run on customer-managed or partner compute (like Cloudflare, Daytona, Modal, or Vercel) instead of the provider’s cloud. MCP tunnels let agents call internal MCP servers through an outbound-only encrypted gateway.

Together they solve a practical dilemma: how to use a managed agent orchestration layer while keeping sensitive data and credentials inside your own security perimeter. For banks, law firms, and healthcare — the exact sectors Anthropic is targeting — that’s often the difference between “interesting demo” and “approved for production.”

Building long-running agents with sensitive data?Self-hosted sandboxes and MCP tunnels keep execution inside your perimeter.

Learn more →

How Claude Managed Agents workHow Claude Managed Agents workAgent runs tasklong-horizon workflowSelf-hosted sandboxexecution in yourperimeterMCP tunnelreaches internalservices safely‘Dreaming’learns between sessions
Figure 1: Claude’s security-first execution model, ending with cross-session learning via ‘dreaming’.

Pros & cons

What we loved

  • ‘Dreaming’ enables genuine cross-session learning
  • Self-hosted sandboxes keep data in your perimeter
  • MCP tunnels protect internal credentials
  • Sub-agent coordination for complex workflows
  • Rubric-based self-evaluation of outputs
  • Aimed at high-value coding, finance, legal work

Where it falls short

  • ‘Dreaming’ is a research preview — very early
  • Several features still in public beta
  • Real setup complexity for the security features
  • Best value needs in-house engineering maturity

Who it’s for

It’s for engineering teams building long-running, autonomous agents — especially in regulated industries that need execution and data to stay inside their own perimeter. Hold off if you just need a simple task bot; the power here is in infrastructure and learning that most lightweight use cases won’t tap.

How it compares

Capability Claude Managed Agents OpenAI Frontier Gemini / Antigravity
Cross-session learning “Dreaming” (preview) Optimization loops Limited
Self-hosted execution Yes (beta) Limited Partial
Sub-agent orchestration Yes Yes Yes (Antigravity)
Best fit Regulated, long-running Large enterprise Google ecosystem

Anthropic’s distinctive angle is the combination of a security-first execution model with an early but real bet on agents that improve themselves. Frontier is broader for enterprise management; Google’s Antigravity is strongest inside its own stack.

Claude Managed Agents — our scorecardClaude Managed Agents — our scorecardCapability8.8Security model9.2Innovation9.0Production-readiness7.2
Figure 2: Our category scores — the security model leads; production-readiness is the weakest area.

Frequently asked questions

What does ‘dreaming’ actually do?
It lets an agent review its prior behavior between sessions, spot patterns, and improve future performance — a form of self-improvement that carries lessons forward rather than starting each task fresh. It’s currently a research preview.
What problem do self-hosted sandboxes solve?
They let tool execution run on your own or partner compute, so sensitive data and credentials stay inside your security perimeter while you still use a managed orchestration layer.
What are MCP tunnels?
A research-preview feature that lets agents reach your internal MCP servers through an outbound-only encrypted gateway, avoiding direct exposure of internal services.
Is this production-ready?
Parts are in public beta and ‘dreaming’ is a research preview, so treat it as early. The security infrastructure is the most mature piece today.
The OneAppleFall Team

We independently test every AI agent and tool we review — on our own dime, on real work. We never accept payment for a score, and we disclose affiliate links clearly. Read our review methodology →

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top