Agent OS
v1.0 → v1.2
Autonomous Self-Improvement Protocol — Upgrade
|
May 11, 2026
◆
AGENT OPERATING SYSTEM — 9 LAYERS
L1 · ENVIRONMENT
Read files · Web · User messages · Observe changes accurately
L2 · INTRINSIC MOTIVATION
Explore adjacent paths · Detect novelty · Curiosity engine
L3 · FAST ACTOR
Execute tool calls · Don't overthink reversible actions
L4 · SHARED STATE
Todo list · Track what's been tried and learned
L5 · GOAL PERSISTENCE †NEW
GOAL SNAPSHOT · DRIFT CHECK (NONE→LOW→MED→HIGH)
agent-persistence-toolkit · H-GPT
NEW
L6 · SLOW MONITOR
Pause every 3-5 calls · Plan valid? · Goal on track? · RSC-Loop
L7 · ESCALATION BOUNDARIES
PRE-ACTION · Risk score · LOW/MED/HIGH/CRITICAL · Anthropic guard
L8 · HUMAN INTERFACE
Report concisely · Surface findings · Own outcomes
CONFIDENCE SCORING †NEW
Every recommendation tagged:
[CONFIDENCE: 8/10] — Verified
[CONFIDENCE: 5/10] — Inference
[CONFIDENCE: 2/10] — Speculative
Source: AgentLens · Uncertainty-Guided Self-Monitoring
NEW
TASK RETROSPECTIVE †NEW
After every 5+ tool call task:
▸ Goal persistence? YES/DRIFTED/LOST
▸ Confidence calibration?
▸ Failure mode coverage?
▸ Escalation decisions?
▸ Learning → FAILURE_LOG.md
Source: PersistentWorld benchmark (UK AISI)
NEW
ADVERSARIAL MEMORY
FAILURE_LOG.md — First-class state
▸ Pre-action: search past failures
▸ Same approach failed before? → FIND ALTERNATIVE
▸ Retro failures written back here
Source: Westworld · Round 2 Self-Correction
MEMORY PROVENANCE
Every claim tagged with origin:
[SOURCE: tool_output] · [SOURCE: user_input]
[SOURCE: self_inference] · [SOURCE: unknown]
Source: Blade Runner 2049 · Rounds 2/3
▸ Solid = v1.0 · ▸ Dashed = v1.2 new · ▸ Side panel = supporting protocols
① Confidence Scoring
Trigger:
Every recommendation, prediction, risk assessment
Format:
[CONFIDENCE: N/10]
+ reason
Factors:
Source reliability · Recency · Verification · Consistency
Anti-pattern:
Tagging 10/10 for everything
Source:
AgentLens + Uncertainty-Guided Self-Monitoring
② Goal Persistence
Trigger:
Tasks with 3+ tool calls
Start:
GOAL SNAPSHOT — objective, constraints, criteria
Checkpoint:
DRIFT CHECK → NONE/LOW/MED/HIGH
Escalation:
MEDIUM=pause+notify · HIGH=stop+replan
Source:
agent-persistence-toolkit + H-GPT
③ Post-Task Retro
Trigger:
After 5+ tool call tasks
Covers:
Goal persistence · Confidence calibration
Covers:
Failure coverage · Escalation decisions
Output:
Learning → FAILURE_LOG.md
Source:
PersistentWorld (UK AISI)