Files

Maurycy 65af268b86 Add Zod dependency and update API interfaces

- Added Zod as a dependency in package.json.
- Updated pnpm-lock.yaml to include Zod.
- Refactored API interfaces: exported new modules for perk, survivor, mission, and encounter.
- Removed obsolete api-interfaces.ts file.
- Enhanced tests for new schemas in api-interfaces.spec.ts, covering various validation scenarios.

2026-05-07 00:46:03 +00:00

4.6 KiB

Executable File

Raw Blame History

0006 — PostgreSQL for durable state, Redis for ephemeral state

Status: Accepted
Date: 2026-05-06

Context and problem statement

The system has two distinct data access patterns:

Durable state — users, survivors, mission history, mission logs. Must survive crashes, restarts, and deploys. Read patterns are infrequent (mostly on session start) but writes need full ACID guarantees.
Ephemeral state — active mission state, mission lobbies, tick locks, nextTickAt timestamps, rate limit counters. Read and written every poll cycle (~5 seconds), can be reconstructed from durable state if lost, must be very fast.

What datastore strategy supports both?

Decision drivers

Performance for hot path. Tick processing reads and writes mission state every 5 seconds per mission. A traditional SQL roundtrip per access is wasteful.
Durability for cold path. Mission history, logs, and user records must survive any failure mode.
Operational complexity. Each datastore added is another system to monitor, back up, and reason about during incidents.
Crash recovery. Ephemeral state should be recoverable from durable state, so loss of the ephemeral store is degraded-but-not-broken.
Atomic operations. Tick scheduling needs SET NX semantics for distributed locks, which Postgres can simulate but Redis does natively.

Considered options

Postgres only, with pg_notify and advisory locks. Single datastore, all state durable, ephemeral access via in-memory cache.
Redis only, with periodic snapshot to disk. Single datastore, ephemeral by nature, durability via Redis persistence (RDB/AOF).
Postgres + Redis split. Each datastore plays to its strengths.
Postgres + in-memory state in the API process. No second datastore, but loses state on restart and doesn't support multi-instance.

Decision outcome

Chosen: PostgreSQL for durable state, Redis for ephemeral state.

Postgres tables (durable):

users — internal ID, Twitch opaque user ID, created at.
survivors — FK to user, stats, perk slots, current lifecycle state.
missions — FK to survivor or group, difficulty, status, timestamps.
mission_logs — FK to mission, tick index, encounter ID, rendered text, seed, modifiers applied.

Redis keys (ephemeral):

active_mission:{missionId} — JSON snapshot of in-progress mission state.
mission_lobby:{lobbyId} — lobby member list and ready flags.
tick_lock:{missionId} — distributed lock (see ADR-0005).
rate_limit:{userId}:{endpoint} — rate limiting counters.

Rule: anything in Redis must be reconstructable from Postgres. Loss of Redis means active missions resume from their last persisted tick on next worker poll, after a brief delay.

Consequences

Positive

Hot-path performance. Tick processing operates against Redis with sub-millisecond latency, only writing to Postgres at end-of-tick (the durable log entry).
Native primitives where useful. SET NX PX for locks, SETEX for TTLs, sorted sets for "missions due" queries — all clean in Redis, awkward in SQL.
Failure isolation. A Postgres slowdown doesn't immediately stop tick processing (Redis state continues); a Redis outage doesn't lose mission history (Postgres persists).
Familiar operational tooling. Both Postgres and Redis have decades of operational maturity.

Negative

Two datastores to operate. Backups, monitoring, capacity planning, security hardening multiplied by two.
Consistency boundary. Redis can drift from Postgres if the API crashes between Redis write and Postgres write. Mitigated by treating Postgres as authoritative on every cold-start reconciliation.
Schema discipline. The "what lives where" rule must be documented and respected — accidentally putting durable data only in Redis is a class of bug that's invisible until something restarts.

Neutral

This split is a common pattern in real-time systems and is well-understood. Hiring or onboarding contributors with experience in either or both is straightforward.
We deliberately avoid more exotic stores (event sourcing, time-series databases, document stores) until the data model demonstrably needs them.

Implementation notes

mission_logs rows are append-only; never updated after creation. This makes them trivially safe under concurrent writes and supports replay/debug.
Plan retention/archival from day one — mission_logs will grow fast. Default: partition by month, archive partitions older than N months to cold storage.
Consider Redis ACL setup before production deploy. Local dev runs without auth; production must not.

4.6 KiB Executable File Raw Blame History