Přeskočit obsah

Bezpečnost

Defense-in-depth přístup. Více vrstev ochrany dat per request.

Threat model

In-scope ohrožení:

  • Cross-tenant data leak (user A vidí data user B)
  • Stolen JWT (token replay attack)
  • Stolen ADMIN_HASH (admin operations)
  • Service role key leak (full DB access)
  • File upload abuse (malicious files, XXE, oversized)
  • Prompt injection na AI calls (zatím low priority — single-tenant chat per report)
  • Rate-limit bypass (DoS, cost burn)
  • Insecure storage paths (path traversal)

Out-of-scope ohrožení:

  • Supply chain attacks na transitive dependencies (Dependabot security alerts pomáhají)
  • DDoS (Cloudflare protection na FE; backend je jednou IP, ale rate limit zmírňuje)
  • Insider threat (žádný formal access control kdo má dev access)

Vrstvy ochrany

1. Network — HTTPS only

  • Cloudflare Workers automaticky HTTPS
  • Sliplane managed cert (Let's Encrypt)
  • Backend → Supabase přes TLS 1.3
  • Backend ↔ Worker ↔ Redis: internal Sliplane network (no internet)

2. CORS

# app/main.py
app.add_middleware(
    CORSMiddleware,
    allow_origins=settings.origins_list,  # explicit list, no "*"
    allow_credentials=True,
    allow_methods=["GET", "POST", "PUT", "DELETE", "PATCH", "OPTIONS"],
    allow_headers=["Authorization", "Content-Type", ...],
)

# config.py validator zakazuje "*"
@field_validator("allowed_origins")
def _no_wildcard_origin(cls, v: str) -> str:
    if "*" in v:
        raise ValueError("...")
    return v

3. JWT verification

  • PyJWT[crypto] — Apache 2.0, pravidelně updated, Supabase recommended
  • JWKS cache s 1h TTL — fetchne aktuální public key z {supabase_url}/auth/v1/.well-known/jwks.json
  • Algorithms whitelist: RS256, ES256 (no none, no symmetric)
  • Required claims: exp, iat, sub, aud="authenticated"
  • Timing-safe compare přes hmac.compare_digest

4. Rate limiting

slowapi per-IP s Redis backendem. Defaults v config.py:

Endpoint Limit
Default 60/minute
/ingest 20/hour
/migrate/claim 5/minute
/chat 30/minute
/retrieve 60/minute

Note: rate limit je per-IP, ne per-user. Pro produkci consider per-user limits přes JWT-derived key.

5. Row Level Security (RLS)

Všechny nemoreport.* tabulky mají RLS enabled. Helper:

private.user_has_tenant_access(p_tenant_id uuid) RETURNS boolean

Kontroluje: - JWT claim personal_tenant_id = p_tenant_id - JWT claim active_tenant_id = p_tenant_id - EXISTS (SELECT 1 FROM tenant_members WHERE tenant_id = p_tenant_id AND user_id = auth.uid())

Detail viz Databáze.

6. Defense-in-depth grants

Pro citlivé tabulky (api keys, claim emaily, worker-managed):

REVOKE INSERT, UPDATE, DELETE ON nemoreport.<table> FROM authenticated;
GRANT SELECT, INSERT, UPDATE, DELETE ON nemoreport.<table> TO service_role;

Důvod: 2 vrstvy. I kdyby se omylem přidala permissive RLS policy, table grant by ji blokoval.

7. Storage RLS (path-based)

Migrace 0004 — bucket policies používají path prefix:

CREATE POLICY "<bucket>_user_path" ON storage.objects FOR ... TO authenticated
  USING (bucket_id = '<bucket>' AND name LIKE auth.uid()::text || '/%');

User vidí jen {tenant_id}/... paths. Service role bypassuje.

8. Defense-in-depth grants — private schema

Migrace 0009 — explicit GRANT USAGE ON SCHEMA private TO supabase_auth_admin. Bez toho SECURITY DEFINER funkce v private schema selžou s permission denied for schema private.

9. Sequence USAGE grants

Phase A.1 hotfix (migrace 0012) — pro každý PK sequence explicitní GRANT USAGE TO authenticated. alter default privileges to nepokrývá pro budoucí SERIAL columns.

pgTAP 04_grants.sql má sequence USAGE checks aby budoucí migrace nezapomenula.

10. HMAC pro Nette integration

Phase B.11 — POST /reports/{id}/attachments/system autentizuje přes HMAC nad canonical string:

<report_id>|<nette_id>|<attachment_type>|<sha256(file_bytes)>

Důvod canonical (ne raw multipart body): boundary se liší napříč clients (PHP curl vs. node-fetch vs. axios), takže HMAC nad raw bytes by nereprodukovatelný.

hmac.compare_digest = timing-safe.

Detail viz Autentizace.

11. File upload validation

app/routers/ingestion.py:upload_for_ingest:

  1. Size check — storage.validate_size(len(raw)) → 413 if > 50 MB
  2. MIME detection via python-magic (libmagic1) — čte file header, ne extension
  3. Allowlist check proti ALLOWED_CONTENT_TYPES → 415 if not allowed
  4. UUID-generated report_id (no user-controlled path components)
  5. Storage path = {tenant_id}/reports/{report_id}/original.{ext} — RLS scoping

Žádný path traversal možný (UUID v path).

12. SQL injection — žádný

  • Backend používá supabase-py s parameterized queries (PostgREST builds proper SQL)
  • RPC calls (např. search_chunks_by_folder) — všechny argumenty parameterized
  • Žádné raw SQL string concatenation

13. XXE / XML injection

  • Mistral OCR používáme přes API (žádný XML processing našemu stranou)
  • python-docx parsuje DOCX (zip s XML internals) — known safe pro reasonable input
  • BS4 + trafilatura pro MHTML — jsou hardened proti standard XML attacks
  • libmagic detekce file type je C library — má vlastní hardening

14. ADMIN_HASH protection

  • Hash je SHA1 of secret (3bddbafdec...) — sdílený pro všechny adminy
  • Kontrola přes hmac.compare_digest (timing-safe)
  • Skladovaný v Sliplane env jako secret=true

Známé limity: - Single shared secret — pokud leakne, musí se rotnout - Žádný revocation per-admin - Žádný audit log pro admin actions

Pro produkci doporučeno: redesign na Supabase user table s is_admin boolean + audit logging do user_events.

Bezpečnostní notes per komponenta

libmagic1 (Phase B B.4 review)

python-magic je ctypes wrapper kolem libmagic.so.1 (z file/file projektu Christos Zoulas, ~30 let aktivně udržováno, ~150 commitů/rok).

Read-only operace (čte file header, ne deep parsing). Minimal attack surface vs. třeba pdfplumber.

+ allowlist v app/storage.py (ALLOWED_CONTENT_TYPES) jako defense-in-depth — i kdyby libmagic nesprávně klasifikovalo, allowlist zachytí cokoliv mimo PDF/PNG/JPEG/WEBP/MHTML/DOCX.

Verdikt: bezpečné, vhodné, battle-tested (Apache, ClamAV, AWS Lambda, GitHub Linguist).

Mistral OCR (vendor managed)

API receiving signed URL → vendor responsibility for processing. Žádný custom code execution na našemu side.

Pydantic AI Agent (LLM calls)

Pro Phase B figure annotation:

  • Output schema validation — Pydantic enforces FigureAnnotation shape, ignore-uje LLM hallucinations s wrong types
  • Output type strict — agent.run() vrátí typed object, ne raw text → žádný injection vector

Embedding inputs (Phase C)

Embed query je user-controlled string → embed do halfvec(1536). Žádný code path kde embedding má side effects mimo SQL similarity search.

Secret management

Sliplane env vars

  • secret=true flag → masked v API responses (ale value persists)
  • Per-service env (backend i worker mají separate copies)

Frontend env vars

NEXT_PUBLIC_* jsou bundled do client JS — viditelné v browser. Pouze: - NEXT_PUBLIC_SUPABASE_URL (public) - NEXT_PUBLIC_SUPABASE_ANON_KEY (public per Supabase design) - NEXT_PUBLIC_BACKEND_URL (public)

SUPABASE_SERVICE_ROLE_KEY NIKDY nedávat na FE.

Local .env.local

.env.local v dev gitignored. Sample v .env.example:

SUPABASE_URL=
SUPABASE_ANON_KEY=
SUPABASE_SERVICE_ROLE_KEY=
GEMINI_API_KEY=
ADMIN_HASH=test_local
APP_ENV=development

Rotation

Pro pravidelné rotation (recommended quarterly):

  1. GEMINI_API_KEY — Generate new v Google Cloud Console, update Sliplane env, restart service
  2. MISTRAL_API_KEY — same flow přes Mistral console
  3. COHERE_API_KEY — same přes Cohere console
  4. SUPABASE_SERVICE_ROLE_KEY — Supabase Settings → API → Reset, update Sliplane
  5. ADMIN_HASH — Generate new, update Sliplane, distribute manually
  6. NETTE_HMAC_SECRET — Coordinate s Nette tým

GDPR + data retention

User data

  • Personal data: auth.users.email, nemoreport.user_profiles
  • User content: nemoreport.reports + attachments + parsed_sections + figures + chunks + storage objects

Right to erasure

nemoreport.handle_delete_user() BEFORE DELETE trigger na auth.users:

  1. Hard-delete personal tenant
  2. CASCADE FK chain → reports → attachments → parsed_sections → figures → chunks
  3. Storage objects → orphan (admin cleanup TODO — TBD pre-production)

Limitation: storage cleanup je manuální. Pro produkci doporučeno cron job který detekuje orphan storage paths a maže je.

Audit log

nemoreport.user_events table existuje pro auditní záznamy. Aktuálně podpopulováno — Phase A nepředpokládal full audit. Pro produkci:

  • Login/logout events
  • Admin operations
  • Data exports
  • Permission changes

Data retention

Aktuálně nedefinovaná policy. Pro produkci:

  • Reports: keep indefinitely (user choice via delete button)
  • retrieval_log: rotation (např. 90 dní pro analytics, pak archive)
  • Audit logs: 1 rok minimum (legal compliance)

Compliance

SOC2 / ISO 27001

Inherited z vendors: - Supabase — SOC2 Type II - Cloudflare — SOC2 Type II + ISO 27001 - Sliplane — neznám (smaller player) - Mistral, Google, Cohere — všechny mají compliance certifications

Per produkt sami: - Zatím žádný formal audit - Pre-pilot recommended pen test od third-party

EU AI Act (2024+)

NemoReport AI je AI system pro real estate evaluation:

  • Risk level: limited risk (transparency obligations) — uživatel ví že odpověď je AI-generated
  • Required: disclosure pro user že chatuje s AI, ne lidským expertem (frontend UI)
  • NOT high-risk (žádné employment/credit/legal decisions automated)

Detail viz EU AI Act compliance guide (TBD).

Czech zákony

Aplikace pro CZ market:

  • GDPR — EU regulace, Supabase + Resend EU data residency OK
  • Zákon o kybernetické bezpečnosti — pre-production review TBD
  • Občanský zákoník (consumer protection) — Terms of Service review

Reporting security issues

Pro security disclosures:

  • Email: jiri@slimarik.cz (zatím — production by měl mít security@nemoreport.cz)
  • Severity classification: critical (data leak) / high / medium / low
  • Response time: 48 hodin acknowledge, 7 dní triage

Bezpečnostní checklist pro release

  • Všechny dependencies aktualizované (no known CVEs)
  • pgTAP testy prošly (134/134)
  • Smoke testy prošly (auth, RLS, defense-in-depth grants)
  • Backend health check OK
  • Backend logs free of unexpected errors
  • Žádné secrets v git history (check .gitignore + audit)
  • Audit ENV vars na Sliplane (žádné staging values v production)
  • CORS allowlist match production frontend origin
  • HTTPS only (no plain HTTP redirects)