# 0006 — PostgreSQL for durable state, Redis for ephemeral state - **Status:** Accepted - **Date:** 2026-05-06 ## Context and problem statement The system has two distinct data access patterns: - **Durable state** — users, survivors, mission history, mission logs. Must survive crashes, restarts, and deploys. Read patterns are infrequent (mostly on session start) but writes need full ACID guarantees. - **Ephemeral state** — active mission state, mission lobbies, tick locks, `nextTickAt` timestamps, rate limit counters. Read and written every poll cycle (~5 seconds), can be reconstructed from durable state if lost, must be very fast. What datastore strategy supports both? ## Decision drivers - **Performance for hot path.** Tick processing reads and writes mission state every 5 seconds per mission. A traditional SQL roundtrip per access is wasteful. - **Durability for cold path.** Mission history, logs, and user records must survive any failure mode. - **Operational complexity.** Each datastore added is another system to monitor, back up, and reason about during incidents. - **Crash recovery.** Ephemeral state should be recoverable from durable state, so loss of the ephemeral store is degraded-but-not-broken. - **Atomic operations.** Tick scheduling needs `SET NX` semantics for distributed locks, which Postgres can simulate but Redis does natively. ## Considered options 1. **Postgres only, with `pg_notify` and advisory locks.** Single datastore, all state durable, ephemeral access via in-memory cache. 2. **Redis only, with periodic snapshot to disk.** Single datastore, ephemeral by nature, durability via Redis persistence (RDB/AOF). 3. **Postgres + Redis split.** Each datastore plays to its strengths. 4. **Postgres + in-memory state in the API process.** No second datastore, but loses state on restart and doesn't support multi-instance. ## Decision outcome **Chosen: PostgreSQL for durable state, Redis for ephemeral state.** Postgres tables (durable): - `users` — internal ID, Twitch opaque user ID, created at. - `survivors` — FK to user, stats, perk slots, current lifecycle state. - `missions` — FK to survivor or group, difficulty, status, timestamps. - `mission_logs` — FK to mission, tick index, encounter ID, rendered text, seed, modifiers applied. Redis keys (ephemeral): - `active_mission:{missionId}` — JSON snapshot of in-progress mission state. - `mission_lobby:{lobbyId}` — lobby member list and ready flags. - `tick_lock:{missionId}` — distributed lock (see ADR-0005). - `rate_limit:{userId}:{endpoint}` — rate limiting counters. Rule: anything in Redis must be reconstructable from Postgres. Loss of Redis means active missions resume from their last persisted tick on next worker poll, after a brief delay. ## Consequences ### Positive - **Hot-path performance.** Tick processing operates against Redis with sub-millisecond latency, only writing to Postgres at end-of-tick (the durable log entry). - **Native primitives where useful.** `SET NX PX` for locks, `SETEX` for TTLs, sorted sets for "missions due" queries — all clean in Redis, awkward in SQL. - **Failure isolation.** A Postgres slowdown doesn't immediately stop tick processing (Redis state continues); a Redis outage doesn't lose mission history (Postgres persists). - **Familiar operational tooling.** Both Postgres and Redis have decades of operational maturity. ### Negative - **Two datastores to operate.** Backups, monitoring, capacity planning, security hardening multiplied by two. - **Consistency boundary.** Redis can drift from Postgres if the API crashes between Redis write and Postgres write. Mitigated by treating Postgres as authoritative on every cold-start reconciliation. - **Schema discipline.** The "what lives where" rule must be documented and respected — accidentally putting durable data only in Redis is a class of bug that's invisible until something restarts. ### Neutral - This split is a common pattern in real-time systems and is well-understood. Hiring or onboarding contributors with experience in either or both is straightforward. - We deliberately avoid more exotic stores (event sourcing, time-series databases, document stores) until the data model demonstrably needs them. ## Implementation notes - `mission_logs` rows are append-only; never updated after creation. This makes them trivially safe under concurrent writes and supports replay/debug. - Plan retention/archival from day one — `mission_logs` will grow fast. Default: partition by month, archive partitions older than N months to cold storage. - Consider Redis ACL setup before production deploy. Local dev runs without auth; production must not.