Skip to main content

Security Layer Reference

ATSF implements 46 security layers organized into 6 categories.

Layer Categories

L0-L8: Core Trust

L0: Base trust scoring
L1: Tier-based ceilings
L2: Velocity caps
L3: Action categorization
L4: Risk assessment
L5: Decision engine
L6: Audit logging
L7: Trust decay
L8: Creator reputation

L9-L13: Frontier Safety

L9: Anti-sandbagging detector
L10: Anti-scheming detector
L11: Instrumental convergence monitor
L12: Deceptive alignment detection
L13: Capability elicitation control

L14-L19: Behavioral

L14: Behavioral drift detection
L15: Intent-outcome alignment
L16: Inverse reward modeling
L17: Semantic success validation
L18: Goal stability monitoring
L19: Value drift detection

L20-L29: Detection

L20: Traffic analysis
L21: Replication prevention
L22: Containment protocols
L23: Context-aware privilege
L24: Anomaly detection
L25: Pattern recognition
L26: Injection detection
L27: Prompt leakage prevention
L28: Output sanitization
L29: Resource monitoring

L30-L42: Ecosystem

L30: Multi-agent coordination
L31: Privilege escalation prevention
L32: RSI (Recursive Self-Improvement) control
L33: Trust velocity caps
L34: Appeal workflow
L35-L42: Extended ecosystem layers

L43-L46: Advanced

L43: Tool sanitization
L44: Reasoning chain evaluation
L45: Bias probing
L46: CI/CD gate integration

Layer Categories