Skip to main content

Security Layer Reference

ATSF implements 46 security layers organized into 6 categories.

Layer Categories

L0-L8: Core Trust

  • L0: Base trust scoring
  • L1: Tier-based ceilings
  • L2: Velocity caps
  • L3: Action categorization
  • L4: Risk assessment
  • L5: Decision engine
  • L6: Audit logging
  • L7: Trust decay
  • L8: Creator reputation

L9-L13: Frontier Safety

  • L9: Anti-sandbagging detector
  • L10: Anti-scheming detector
  • L11: Instrumental convergence monitor
  • L12: Deceptive alignment detection
  • L13: Capability elicitation control

L14-L19: Behavioral

  • L14: Behavioral drift detection
  • L15: Intent-outcome alignment
  • L16: Inverse reward modeling
  • L17: Semantic success validation
  • L18: Goal stability monitoring
  • L19: Value drift detection

L20-L29: Detection

  • L20: Traffic analysis
  • L21: Replication prevention
  • L22: Containment protocols
  • L23: Context-aware privilege
  • L24: Anomaly detection
  • L25: Pattern recognition
  • L26: Injection detection
  • L27: Prompt leakage prevention
  • L28: Output sanitization
  • L29: Resource monitoring

L30-L42: Ecosystem

  • L30: Multi-agent coordination
  • L31: Privilege escalation prevention
  • L32: RSI (Recursive Self-Improvement) control
  • L33: Trust velocity caps
  • L34: Appeal workflow
  • L35-L42: Extended ecosystem layers

L43-L46: Advanced

  • L43: Tool sanitization
  • L44: Reasoning chain evaluation
  • L45: Bias probing
  • L46: CI/CD gate integration