Security Layer Reference
ATSF implements 46 security layers organized into 6 categories.
Layer Categories
L0-L8: Core Trust
- L0: Base trust scoring
- L1: Tier-based ceilings
- L2: Velocity caps
- L3: Action categorization
- L4: Risk assessment
- L5: Decision engine
- L6: Audit logging
- L7: Trust decay
- L8: Creator reputation
L9-L13: Frontier Safety
- L9: Anti-sandbagging detector
- L10: Anti-scheming detector
- L11: Instrumental convergence monitor
- L12: Deceptive alignment detection
- L13: Capability elicitation control
L14-L19: Behavioral
- L14: Behavioral drift detection
- L15: Intent-outcome alignment
- L16: Inverse reward modeling
- L17: Semantic success validation
- L18: Goal stability monitoring
- L19: Value drift detection
L20-L29: Detection
- L20: Traffic analysis
- L21: Replication prevention
- L22: Containment protocols
- L23: Context-aware privilege
- L24: Anomaly detection
- L25: Pattern recognition
- L26: Injection detection
- L27: Prompt leakage prevention
- L28: Output sanitization
- L29: Resource monitoring
L30-L42: Ecosystem
- L30: Multi-agent coordination
- L31: Privilege escalation prevention
- L32: RSI (Recursive Self-Improvement) control
- L33: Trust velocity caps
- L34: Appeal workflow
- L35-L42: Extended ecosystem layers
L43-L46: Advanced
- L43: Tool sanitization
- L44: Reasoning chain evaluation
- L45: Bias probing
- L46: CI/CD gate integration