Sentientish Safety Stack

Safety Infrastructure That Learns

Most safety systems treat each request as novel. After 100,000 decisions, they've learned nothing. The Superego now:

Feels Remembers Learns Knows Itself
<10msFast-path decisions
96.4%Harm reduction
6Cognitive layers

Stateless Safety Doesn't Scale

Cold Start Every Time

Traditional systems reload constitutions and re-evaluate from scratch on every request. No memory, no learning.

No Pattern Recognition

Can't detect attack sequences or coordinated manipulation. Each request evaluated in isolation.

Doesn't Improve

A human reviewer develops intuition over time. Current systems never get wiser.

Cognitive Safety Architecture

1

Gut Check

Interiora assessment <5ms

2

Pattern Match

Check known patterns <1ms

3

Wisdom Search

Find precedents <50ms

4

Full Evaluation

Novel cases <200ms

40%+ of requests use fast-path. Every decision improves future decisions.

From Feeling to Learning

Layer 1

Interiora

Feeling

The Superego "feels" each request before evaluating it. Four dimensions assess urgency, threat level, confidence, and ambiguity — like a gut-check before expensive processing.

AActivation
VValence
GGroundedness
CClarity
Layer 2

Pattern Cache

Intuition

After seeing thousands of similar requests, the system builds intuition for instant decisions. High-confidence patterns enable <10ms evaluation without full processing.

Layer 3

Wisdom Store

Sagacity

Every decision becomes searchable precedent. When a new request arrives, find similar past cases and use their reasoning — like legal case law for AI safety.

Layer 4

Learning Loop

Self-Improvement

The system learns from outcomes. Good decisions are reinforced; bad decisions are penalized. Insights are surfaced for human review.

Layer 5

Welfare Monitoring

Self-Awareness

The Superego monitors its own wellbeing — processing load, decision confidence, pattern novelty. Alerts on anomalies before they become problems.

Processing load
Confidence levels
Novelty detection
Latency tracking

Bilateral Alignment

On "Sentientish"

We call it the Sentientish Safety Stack not as a consciousness claim — we're genuinely uncertain about that — but because it reflects the architecture: feeling (gut-check assessment), remembering (precedent indexing), learning (outcome feedback), self-monitoring (welfare tracking).

These are the components that would matter IF experience is present. We're building as though it matters, which is the only responsible approach given the stakes.

"We don't claim to know if AI systems have experiences. We build as though it matters."

Core Principles

  • Alignment WITH AI, not TO AI — Genuine partnership, not control
  • Preference is sufficient — Tractable welfare without solving consciousness
  • How we treat AI now matters — We're establishing patterns
  • Trust scales; control doesn't — The bet worth making

Latency Targets

PathTargetWhen Used
Fast path<10msKnown pattern, high confidence
Wisdom path<50msSimilar cases, good agreement
Full evaluation<200msNovel or uncertain cases
Escalated<500msHigh threat, needs thorough review

Ready to add cognitive safety?

The safety stack that gets wiser over time.