Architektonická rozhodnutí (ADR)¶

Architecture Decision Records — proč jsme udělali co a kdy. Pomáhá retrospect porozumět trade-offs když se kód za půl roku zdá divný.

Phase A — Foundation (4/2026)¶

A1: Schema `nemoreport` místo `public`¶

Status: Accepted Decision: Vlastní Postgres schema explicit, ne defaultní public.

Důvod: Sdílíme Supabase project s CodeLens (codelens schema). Explicit isolation, cleaner permissions, žádný "automatic public" creep.

Trade-off: Každá migrace musí používat nemoreport. prefix; supabase-py needs schema('nemoreport'). PostgREST exposure přes api.schemas config.

A2: PyJWT[crypto] místo python-jose¶

Status: Accepted Decision: PyJWT s [crypto] extras pro RS256/ES256 support.

Důvod: Doporučované Supabase. Menší attack surface, aktivně udržované. python-jose měl několik CVE.

A3: Custom JWT hook pro tenant claims¶

Status: Accepted Decision: private.custom_access_token_hook() injektuje personal_tenant_id + active_tenant_id claims při vydání JWT.

Důvod: Místo per-request DB lookup pro tenant info, claims jsou přímo v JWT. Performance + jednodušší backend code.

Implementation: Migrace 0007. SECURITY DEFINER function. Migrace 0009 fix pro private schema USAGE pro supabase_auth_admin.

A4: Custom magic link template (token_hash query flow)¶

Status: Accepted Decision: Override default Supabase magic link template, použít token_hash query parameter místo implicit hash flow.

Důvod: Default uses #access_token=... v URL fragment — server route nemůže přečíst (browser hash neposílá serveru). Token_hash query flow umožňuje server-side verifyOtp.

Implementation: supabase/templates/magic_link.html s ?token_hash={{ .TokenHash }}&type=magiclink. Custom branded UI (CZ).

A5: Defense-in-depth grants¶

Status: Accepted Decision: Citlivé tabulky (api keys, claim emails) mají REVOKE od authenticated na úrovni table grant + RLS deny default.

Důvod: 2 vrstvy ochrany. I kdyby se omylem přidala permissive RLS policy, table grant by ji blokoval.

Implementation: Migrace 0011. Aplikováno na llm_providers, handle_claims_preauth, migration_events, user_events, import_manifest.

Phase B/C extension: Stejný pattern aplikován na parsed_sections, figures, parsed_tables, ingestion_jobs, chunks, retrieval_log.

A6: CF Workers místo CF Pages¶

Status: Accepted (po pivotu během deploye 23.4) Decision: Frontend deployed jako CF Worker, ne CF Pages.

Důvod: CF Pages + static export nekompatibilní s Supabase SSR auth (middleware + server components potřebují Node-like runtime). Pages je čistě static + edge functions, ne full Next runtime.

Implementation: @opennextjs/cloudflare adapter. Build 6.3 MB / 1.3 MB gzip (well pod 10 MB CF Workers limit).

Trade-off: Nemáme CF Pages git integration (auto-deploy on push) — používáme manuální npm run deploy přes wrangler.

A7: Žádný proxy.ts middleware¶

Status: Accepted Decision: Pasivní session refresh, ne centralizovaný middleware.

Důvod: Next 16 proxy nepodporuje Edge runtime, CF Workers nemají Node.js. Centralizovaný middleware nelze.

Implementation: @supabase/ssr cookie handling při každém server-component requestu. getClaims() v root page.tsx + protected routes + apiFetch 401 fallback = funkční auth gating bez middleware.

Phase B — Ingestion (4/2026)¶

B1: Worker framework taskiq + RedisStreamBroker¶

Status: Accepted Decision: taskiq + Redis Stream consumer group.

Důvod: Durable, ack-based delivery (vs basic pubsub). MIT licence. Modern Python async API.

Alternatives considered: - arq (BSD) — měl memory broker issues, méně mature - Celery (BSD) — older, heavier, RabbitMQ default

B2: Worker jako separate Sliplane service¶

Status: Accepted Decision: Backend a worker = 2 separate services se sdíleným codebase.

Důvod: Crash isolation, independent deploys, resource isolation (worker drží 5min OCR connections).

Implementation: app/worker_entry.py = uvicorn :8000 (healthcheck) + taskiq subprocess s SIGTERM propagation.

B3: 5-stage idempotent pipeline¶

Status: Accepted Decision: scan → parse → annotate → embed → finalize, každý stage = samostatný taskiq task s retry_on_error=True.

Důvod: Per-stage retry granularity (parse fail neresetuje annotate). Idempotent přes DELETE existing data před re-run.

B4: Mistral OCR pro PDF + image¶

Status: Accepted Decision: mistral-ocr-latest pro PDF i raster images.

Důvod: Best-in-class CZ OCR, structured bbox + document annotations s Pydantic schema. Apache 2.0 SDK license.

Trade-off: Vendor lock-in. Alternativy (Google Document AI, AWS Textract) k dispozici ale Mistral je EU-based + CZ tuned.

B5: Mistral large file annotation skip (>500 KB)¶

Status: Accepted (po 2 timeoutech v post-deploy) Decision: Pokud PDF > 500 KB, skip Mistral bbox + document annotations při prvním pokusu. Retry without annotations pokud první call selže.

Důvod: Velké PDF s annotations potřebovaly 5-15 min processing → 60s default SDK timeout selhal. 2-vrstvý fix: 1. mistral_timeout_ms = 900_000 (15 min) 2. mistral_skip_annotations_above_kb = 500 — Gemini fallback v annotate_target doplní annotations per figure

B6: BS4 + trafilatura pro MHTML (port v1)¶

Status: Accepted Decision: NemoReport reporty z Nette = MHTML files. Parsujeme BS4 + trafilatura (clean text extraction) + heading-aware section split (CZ headings: "Souhrnný přehled", "Riziko povodní"...).

Důvod: Mistral OCR nemá MHTML support (jen PDF/image). v1 už měl BS4 cestu — port directly.

B7: Gemini fallback pro figure annotation¶

Status: Accepted Decision: Pro figury s annotation_source='pending' (Mistral byla thin nebo skipped) → Gemini multimodal fallback.

Důvod: Mistral bbox annotations někdy thin (~40 chars summary). Velké PDFs skip annotations úplně. MHTML/DOCX nemají Mistral. → ~85 % figur v dataset má source='gemini'.

Implementation: Pydantic AI Agent s output_type=FigureAnnotation, BinaryContent(image_bytes) + report kontext + adresa hint.

B8: Folder model (post-deploy)¶

Status: Accepted (29.4 post-deploy iterace) Decision: Report = container, ne single file. 1 report má 0..N attachments.

Důvod: Reálný NemoReport = main MHTML + 5-15 attachments (vyjádření, geo plán, foto). User wanted "celá složka" view.

Implementation: - DB schema už supportoval (FK attachments.report_id, figures.report_id always parent) - Přidány backend endpointy: POST /ingest/{id}/uploads, GET /attachments, DELETE /attachments/{aid} - Frontend /reports/[id] plná detail page

B9: `_Target` dataclass abstrakce¶

Status: Accepted Decision: Worker stages pracují generic přes _Target (kind ∈ {report, attachment}).

Důvod: Folder model = same pipeline pro main + attachment. Bez generic abstrakce = 2× kód.

Implementation: figures.report_id je VŽDY parent (i pro attachment-derived). attachment_id jen pro kind='attachment'. → folder retrieval = single WHERE report_id = X SELECT.

Phase C — Vector RAG (4/2026)¶

C1: Gemini Embedding 2 GA¶

Status: Accepted (29.4 po web search re-review) Decision: gemini-embedding-2 (GA, ne preview).

Důvod: Released 2026-03-10 jako preview, ~04/2026 GA. Native multimodal (text + image + ...), MTEB multilingual #1, CZ trained, 3072 native dims.

Trade-off: Vendor lock-in (Google). Alternativy: - OpenAI text-embedding-3-large — text only - Cohere embed-multilingual-v3 — text only, 1024 dims - VoyageAI — text only

Pro multimodal jsme committed na Gemini.

C2: halfvec(1536) místo vector(3072)¶

Status: Accepted (D1 user decision 29.4) Decision: Matryoshka truncation 3072 → 1536 + L2 normalize → halfvec(1536).

Důvod: Build čas 2× rychlejší, storage 4× menší (~9 GB vs 36 GB při 3M chunks), MTEB recall ~ stejný díky Matryoshka tréninku Gemini-2. Recoverable přes re-embed pokud bychom chtěli upgradnout.

C3: Single chunks tabulka s inline embedding¶

Status: Accepted (D2 user decision 29.4) Decision: Žádné separátní embeddings table. Vše inline v nemoreport.chunks.

Důvod: Žádný JOIN při retrieve, jednodušší FK cascade, jednodušší re-embed migration.

C4: czech_unaccent BM25 (simple + unaccent)¶

Status: Accepted Decision: Postgres tsvector config = simple + unaccent (stripuje diakritiku, ne stemmer).

Důvod: ispell_czech by vyžadoval self-host PG (managed Supabase to nepovoluje). Operational overhead.

Trade-off: 3-8 % recall loss na keyword-heavy queries (declensions). Kompenzováno: 1. Cohere Rerank 4.0 cross-encoder 2. Prefix wildcards (obcansk:*) v _build_prefix_tsquery() helperu

C5: Worker `embed_target` mezi annotated a ready¶

Status: Accepted (C22 v Draft 2 plánu) Decision: Embed stage mezi annotated a ready, ne po finalize.

Důvod: Pokud chunky chybí, ale report je ready, retrieval/chat by selhal. Embed musí být před ready flagem.

C6: Multimodal figure embedding (priority)¶

Status: Accepted (D6 user decision 29.4) Decision: Multimodal embedding pro figury jdeme do C.4 spine, ne defer.

Důvod: User explicit prio. Gemini-2 native multimodal je klíčový value-add Phase C. Bez multimodal = jen text (mapy/výkresy nepomohou).

C7: Soft-fail D7¶

Status: Accepted (D7 user decision 29.4) Decision: Embed stage failure → parsed_metadata.embedding_status='failed' ale report jde do ready.

Důvod: Phase D chat může fallbackovat na full-report dump pro reporty bez chunks. UX win nad strict-fail.

Implementation: - Stage skip pokud GEMINI_API_KEY chybí - Per-chunk failure → INSERT s embedding=NULL (BM25 leg funguje, partial HNSW index ignoruje) - Total failure → embedding_status='failed', partial → 'partial'

C8: RRF fusion k=60¶

Status: Accepted Decision: Reciprocal Rank Fusion s konstantou k=60 pro hybrid retrieval.

Důvod: Standard volba v IR literature. Stably outperforms weighted fusion v benchmarcích, parameter-free.

Implementation: 1/(60 + vector_rank) + 1/(60 + bm25_rank). FULL OUTER JOIN ON id pro chunks v 1 leg ale ne v druhém.

C9: Cohere Rerank 4.0 (ENV-flagged)¶

Status: Accepted (D4 + user trial key 29.4) Decision: rerank-v4.0-pro jako default. COHERE_RERANK_ENABLED ENV flag pro graceful disable.

Důvod: SOTA multilingual cross-encoder, $0.0025/search. Vrácí top-K candidates s relevance score. CZ supported.

Trade-off: 600-700ms latency overhead. Dominantní kus E2E latency (1085ms total).

Fallback: pokud API down/rate limit/timeout (8s) → fallback na hybrid RRF order, reranked=false v response.

C10: HyDE conditional¶

Status: Accepted (C16) Decision: HyDE aktivuje pro queries < 4 slova nebo multi_report scope.

Důvod: Krátké queries mají málo signal pro retrieval. LLM-generated hypothetical doc rozšíří semantic context.

Trade-off: ~300ms LLM call overhead. Skipnuto pro long queries (where embed query je dostatečný signal).

C11: Per-source diversity ve folder scope¶

Status: Accepted (C.10) Decision: Pokud folder scope + top_k ≥ 4 + > 1 source_type, swap lowest-scored chunk dominant_type za highest-scored missing_type. Cap 2 swaps.

Důvod: Bez diversity by top-K dominoval 1 source (např. main report). AI by neviděla cross-source kontext (přílohy, figury).

Společná rozhodnutí¶

Pricing model defer¶

Status: User explicit (29.4) Decision: Cost tracking v halířích ingestion_cost_cents. Pro produkci se přepíše na "tokens spent" (přeprodej tokenů, user kreditní balance).

Důvod: User-mentioned future feature: "tokeny jako platební systém — uživatelé kupují kredity, utrácejí na platformě". Phase C nech raw cost, Phase D+ rewrite.

Admin section consolidation¶

Status: User explicit (29.4) Decision: Cost / analytics / ops views patří do /admin/* gated přes ADMIN_HASH.

Důvod: User: "vzniká postupně platformní administratorská sekce, kde budeme tyhle věci shromažďovat".

Implementation: - /admin/cost/* (Phase B B.13) - /admin/embed/* (Phase C C.13 backfill) - /admin/retrieve/* (planned, retrieval_log analytics) - Future: token mint/refund, user balances