Bezpečnost¶
Defense-in-depth přístup. Více vrstev ochrany dat per request.
Threat model¶
In-scope ohrožení:
- Cross-tenant data leak (user A vidí data user B)
- Stolen JWT (token replay attack)
- Stolen ADMIN_HASH (admin operations)
- Service role key leak (full DB access)
- File upload abuse (malicious files, XXE, oversized)
- Prompt injection na AI calls (zatím low priority — single-tenant chat per report)
- Rate-limit bypass (DoS, cost burn)
- Insecure storage paths (path traversal)
Out-of-scope ohrožení:
- Supply chain attacks na transitive dependencies (Dependabot security alerts pomáhají)
- DDoS (Cloudflare protection na FE; backend je jednou IP, ale rate limit zmírňuje)
- Insider threat (žádný formal access control kdo má dev access)
Vrstvy ochrany¶
1. Network — HTTPS only¶
- Cloudflare Workers automaticky HTTPS
- Sliplane managed cert (Let's Encrypt)
- Backend → Supabase přes TLS 1.3
- Backend ↔ Worker ↔ Redis: internal Sliplane network (no internet)
2. CORS¶
# app/main.py
app.add_middleware(
CORSMiddleware,
allow_origins=settings.origins_list, # explicit list, no "*"
allow_credentials=True,
allow_methods=["GET", "POST", "PUT", "DELETE", "PATCH", "OPTIONS"],
allow_headers=["Authorization", "Content-Type", ...],
)
# config.py validator zakazuje "*"
@field_validator("allowed_origins")
def _no_wildcard_origin(cls, v: str) -> str:
if "*" in v:
raise ValueError("...")
return v
3. JWT verification¶
- PyJWT[crypto] — Apache 2.0, pravidelně updated, Supabase recommended
- JWKS cache s 1h TTL — fetchne aktuální public key z
{supabase_url}/auth/v1/.well-known/jwks.json - Algorithms whitelist:
RS256, ES256(nonone, no symmetric) - Required claims:
exp, iat, sub, aud="authenticated" - Timing-safe compare přes
hmac.compare_digest
4. Rate limiting¶
slowapi per-IP s Redis backendem. Defaults v config.py:
| Endpoint | Limit |
|---|---|
| Default | 60/minute |
/ingest |
20/hour |
/migrate/claim |
5/minute |
/chat |
30/minute |
/retrieve |
60/minute |
Note: rate limit je per-IP, ne per-user. Pro produkci consider per-user limits přes JWT-derived key.
5. Row Level Security (RLS)¶
Všechny nemoreport.* tabulky mají RLS enabled. Helper:
Kontroluje:
- JWT claim personal_tenant_id = p_tenant_id
- JWT claim active_tenant_id = p_tenant_id
- EXISTS (SELECT 1 FROM tenant_members WHERE tenant_id = p_tenant_id AND user_id = auth.uid())
Detail viz Databáze.
6. Defense-in-depth grants¶
Pro citlivé tabulky (api keys, claim emaily, worker-managed):
REVOKE INSERT, UPDATE, DELETE ON nemoreport.<table> FROM authenticated;
GRANT SELECT, INSERT, UPDATE, DELETE ON nemoreport.<table> TO service_role;
Důvod: 2 vrstvy. I kdyby se omylem přidala permissive RLS policy, table grant by ji blokoval.
7. Storage RLS (path-based)¶
Migrace 0004 — bucket policies používají path prefix:
CREATE POLICY "<bucket>_user_path" ON storage.objects FOR ... TO authenticated
USING (bucket_id = '<bucket>' AND name LIKE auth.uid()::text || '/%');
User vidí jen {tenant_id}/... paths. Service role bypassuje.
8. Defense-in-depth grants — private schema¶
Migrace 0009 — explicit GRANT USAGE ON SCHEMA private TO supabase_auth_admin. Bez toho SECURITY DEFINER funkce v private schema selžou s permission denied for schema private.
9. Sequence USAGE grants¶
Phase A.1 hotfix (migrace 0012) — pro každý PK sequence explicitní GRANT USAGE TO authenticated. alter default privileges to nepokrývá pro budoucí SERIAL columns.
pgTAP 04_grants.sql má sequence USAGE checks aby budoucí migrace nezapomenula.
10. HMAC pro Nette integration¶
Phase B.11 — POST /reports/{id}/attachments/system autentizuje přes HMAC nad canonical string:
Důvod canonical (ne raw multipart body): boundary se liší napříč clients (PHP curl vs. node-fetch vs. axios), takže HMAC nad raw bytes by nereprodukovatelný.
hmac.compare_digest = timing-safe.
Detail viz Autentizace.
11. File upload validation¶
app/routers/ingestion.py:upload_for_ingest:
- Size check —
storage.validate_size(len(raw))→ 413 if > 50 MB - MIME detection via
python-magic(libmagic1) — čte file header, ne extension - Allowlist check proti
ALLOWED_CONTENT_TYPES→ 415 if not allowed - UUID-generated
report_id(no user-controlled path components) - Storage path =
{tenant_id}/reports/{report_id}/original.{ext}— RLS scoping
Žádný path traversal možný (UUID v path).
12. SQL injection — žádný¶
- Backend používá supabase-py s parameterized queries (PostgREST builds proper SQL)
- RPC calls (např.
search_chunks_by_folder) — všechny argumenty parameterized - Žádné raw SQL string concatenation
13. XXE / XML injection¶
- Mistral OCR používáme přes API (žádný XML processing našemu stranou)
- python-docx parsuje DOCX (zip s XML internals) — known safe pro reasonable input
- BS4 + trafilatura pro MHTML — jsou hardened proti standard XML attacks
- libmagic detekce file type je C library — má vlastní hardening
14. ADMIN_HASH protection¶
- Hash je SHA1 of secret (
3bddbafdec...) — sdílený pro všechny adminy - Kontrola přes
hmac.compare_digest(timing-safe) - Skladovaný v Sliplane env jako
secret=true
Známé limity: - Single shared secret — pokud leakne, musí se rotnout - Žádný revocation per-admin - Žádný audit log pro admin actions
Pro produkci doporučeno: redesign na Supabase user table s is_admin boolean + audit logging do user_events.
Bezpečnostní notes per komponenta¶
libmagic1 (Phase B B.4 review)¶
python-magic je ctypes wrapper kolem libmagic.so.1 (z file/file projektu Christos Zoulas, ~30 let aktivně udržováno, ~150 commitů/rok).
Read-only operace (čte file header, ne deep parsing). Minimal attack surface vs. třeba pdfplumber.
+ allowlist v app/storage.py (ALLOWED_CONTENT_TYPES) jako defense-in-depth — i kdyby libmagic nesprávně klasifikovalo, allowlist zachytí cokoliv mimo PDF/PNG/JPEG/WEBP/MHTML/DOCX.
Verdikt: bezpečné, vhodné, battle-tested (Apache, ClamAV, AWS Lambda, GitHub Linguist).
Mistral OCR (vendor managed)¶
API receiving signed URL → vendor responsibility for processing. Žádný custom code execution na našemu side.
Pydantic AI Agent (LLM calls)¶
Pro Phase B figure annotation:
- Output schema validation — Pydantic enforces
FigureAnnotationshape, ignore-uje LLM hallucinations s wrong types - Output type strict — agent.run() vrátí typed object, ne raw text → žádný injection vector
Embedding inputs (Phase C)¶
Embed query je user-controlled string → embed do halfvec(1536). Žádný code path kde embedding má side effects mimo SQL similarity search.
Secret management¶
Sliplane env vars¶
secret=trueflag → masked v API responses (ale value persists)- Per-service env (backend i worker mají separate copies)
Frontend env vars¶
NEXT_PUBLIC_* jsou bundled do client JS — viditelné v browser. Pouze:
- NEXT_PUBLIC_SUPABASE_URL (public)
- NEXT_PUBLIC_SUPABASE_ANON_KEY (public per Supabase design)
- NEXT_PUBLIC_BACKEND_URL (public)
SUPABASE_SERVICE_ROLE_KEY NIKDY nedávat na FE.
Local .env.local¶
.env.local v dev gitignored. Sample v .env.example:
SUPABASE_URL=
SUPABASE_ANON_KEY=
SUPABASE_SERVICE_ROLE_KEY=
GEMINI_API_KEY=
ADMIN_HASH=test_local
APP_ENV=development
Rotation¶
Pro pravidelné rotation (recommended quarterly):
- GEMINI_API_KEY — Generate new v Google Cloud Console, update Sliplane env, restart service
- MISTRAL_API_KEY — same flow přes Mistral console
- COHERE_API_KEY — same přes Cohere console
- SUPABASE_SERVICE_ROLE_KEY — Supabase Settings → API → Reset, update Sliplane
- ADMIN_HASH — Generate new, update Sliplane, distribute manually
- NETTE_HMAC_SECRET — Coordinate s Nette tým
GDPR + data retention¶
User data¶
- Personal data:
auth.users.email,nemoreport.user_profiles - User content:
nemoreport.reports+attachments+parsed_sections+figures+chunks+ storage objects
Right to erasure¶
nemoreport.handle_delete_user() BEFORE DELETE trigger na auth.users:
- Hard-delete personal tenant
- CASCADE FK chain → reports → attachments → parsed_sections → figures → chunks
- Storage objects → orphan (admin cleanup TODO — TBD pre-production)
Limitation: storage cleanup je manuální. Pro produkci doporučeno cron job který detekuje orphan storage paths a maže je.
Audit log¶
nemoreport.user_events table existuje pro auditní záznamy. Aktuálně podpopulováno — Phase A nepředpokládal full audit. Pro produkci:
- Login/logout events
- Admin operations
- Data exports
- Permission changes
Data retention¶
Aktuálně nedefinovaná policy. Pro produkci:
- Reports: keep indefinitely (user choice via delete button)
- retrieval_log: rotation (např. 90 dní pro analytics, pak archive)
- Audit logs: 1 rok minimum (legal compliance)
Compliance¶
SOC2 / ISO 27001¶
Inherited z vendors: - Supabase — SOC2 Type II - Cloudflare — SOC2 Type II + ISO 27001 - Sliplane — neznám (smaller player) - Mistral, Google, Cohere — všechny mají compliance certifications
Per produkt sami: - Zatím žádný formal audit - Pre-pilot recommended pen test od third-party
EU AI Act (2024+)¶
NemoReport AI je AI system pro real estate evaluation:
- Risk level: limited risk (transparency obligations) — uživatel ví že odpověď je AI-generated
- Required: disclosure pro user že chatuje s AI, ne lidským expertem (frontend UI)
- NOT high-risk (žádné employment/credit/legal decisions automated)
Detail viz EU AI Act compliance guide (TBD).
Czech zákony¶
Aplikace pro CZ market:
- GDPR — EU regulace, Supabase + Resend EU data residency OK
- Zákon o kybernetické bezpečnosti — pre-production review TBD
- Občanský zákoník (consumer protection) — Terms of Service review
Reporting security issues¶
Pro security disclosures:
- Email: jiri@slimarik.cz (zatím — production by měl mít security@nemoreport.cz)
- Severity classification: critical (data leak) / high / medium / low
- Response time: 48 hodin acknowledge, 7 dní triage
Bezpečnostní checklist pro release¶
- Všechny dependencies aktualizované (no known CVEs)
- pgTAP testy prošly (134/134)
- Smoke testy prošly (auth, RLS, defense-in-depth grants)
- Backend health check OK
- Backend logs free of unexpected errors
- Žádné secrets v git history (check
.gitignore+ audit) - Audit ENV vars na Sliplane (žádné staging values v production)
- CORS allowlist match production frontend origin
- HTTPS only (no plain HTTP redirects)