Agent OS v1.0 → v1.2

Autonomous Self-Improvement Protocol — Upgrade | May 11, 2026
AGENT OPERATING SYSTEM — 9 LAYERS
L1 · ENVIRONMENT Read files · Web · User messages · Observe changes accurately L2 · INTRINSIC MOTIVATION Explore adjacent paths · Detect novelty · Curiosity engine L3 · FAST ACTOR Execute tool calls · Don't overthink reversible actions L4 · SHARED STATE Todo list · Track what's been tried and learned L5 · GOAL PERSISTENCE †NEW GOAL SNAPSHOT · DRIFT CHECK (NONE→LOW→MED→HIGH) agent-persistence-toolkit · H-GPT NEW L6 · SLOW MONITOR Pause every 3-5 calls · Plan valid? · Goal on track? · RSC-Loop L7 · ESCALATION BOUNDARIES PRE-ACTION · Risk score · LOW/MED/HIGH/CRITICAL · Anthropic guard L8 · HUMAN INTERFACE Report concisely · Surface findings · Own outcomes CONFIDENCE SCORING †NEW Every recommendation tagged: [CONFIDENCE: 8/10] — Verified [CONFIDENCE: 5/10] — Inference [CONFIDENCE: 2/10] — Speculative Source: AgentLens · Uncertainty-Guided Self-Monitoring NEW TASK RETROSPECTIVE †NEW After every 5+ tool call task: ▸ Goal persistence? YES/DRIFTED/LOST ▸ Confidence calibration? ▸ Failure mode coverage? ▸ Escalation decisions? ▸ Learning → FAILURE_LOG.md Source: PersistentWorld benchmark (UK AISI) NEW ADVERSARIAL MEMORY FAILURE_LOG.md — First-class state ▸ Pre-action: search past failures ▸ Same approach failed before? → FIND ALTERNATIVE ▸ Retro failures written back here Source: Westworld · Round 2 Self-Correction MEMORY PROVENANCE Every claim tagged with origin: [SOURCE: tool_output] · [SOURCE: user_input] [SOURCE: self_inference] · [SOURCE: unknown] Source: Blade Runner 2049 · Rounds 2/3 ▸ Solid = v1.0 · ▸ Dashed = v1.2 new · ▸ Side panel = supporting protocols

① Confidence Scoring

  • Trigger: Every recommendation, prediction, risk assessment
  • Format: [CONFIDENCE: N/10] + reason
  • Factors: Source reliability · Recency · Verification · Consistency
  • Anti-pattern: Tagging 10/10 for everything
  • Source: AgentLens + Uncertainty-Guided Self-Monitoring

② Goal Persistence

  • Trigger: Tasks with 3+ tool calls
  • Start: GOAL SNAPSHOT — objective, constraints, criteria
  • Checkpoint: DRIFT CHECK → NONE/LOW/MED/HIGH
  • Escalation: MEDIUM=pause+notify · HIGH=stop+replan
  • Source: agent-persistence-toolkit + H-GPT

③ Post-Task Retro

  • Trigger: After 5+ tool call tasks
  • Covers: Goal persistence · Confidence calibration
  • Covers: Failure coverage · Escalation decisions
  • Output: Learning → FAILURE_LOG.md
  • Source: PersistentWorld (UK AISI)