Přeskočit obsah

Databáze

Detail Postgres schema, migrace a RLS policies.

Connection info

Project ref: cubdrgjdkatyecrgckwp URL: https://cubdrgjdkatyecrgckwp.supabase.co Postgres: 17 (Supabase managed) Schema: nemoreport

Migrace

Všechny migrace jsou v supabase/migrations/ v backend repu, applied via:

# Local dev
supabase db reset --linked

# Production
supabase db push --linked

NEBO inline přes Supabase MCP apply_migration s project_id="cubdrgjdkatyecrgckwp".

# Soubor Co dělá
0001 0001_auth_tenants.sql tenants + tenant_members + user_profiles + handle_new_user trigger + RLS
0002 0002_reports_conversations.sql reports + conversations + messages + feedback
0003 0003_llm_providers.sql llm_providers admin tabulka
0004 0004_storage_buckets.sql 3 R2 buckety + path-based RLS
0005 0005_migration_tables.sql handle_claims + migration_events + user_events + import_manifest
0006 0006_import_v1_function.sql nemoreport.import_v1(handle, user_id, tenant_id) PL/pgSQL
0007 0007_jwt_hook.sql private.custom_access_token_hook(event) injektuje claims
0008 (skipnuto) (původně schema cleanup, později merged jinam)
0009 0009_grant_private_usage_to_auth_admin.sql GRANT USAGE ON SCHEMA private TO supabase_auth_admin
0010 0010_grant_service_role_nemoreport.sql service_role full CRUD na celém schema
0011 0011_revoke_authenticated_from_admin_tables.sql Defense-in-depth REVOKE od authenticated
0012 0012_grant_messages_seq_to_authenticated.sql USAGE on messages_id_seq
Phase B
0013 0013_ingestion_tables.sql attachments + parsed_sections + figures + parsed_tables + ingestion_jobs + reports rozšíření
Phase C
0014 0014_chunks_setup.sql pgvector 0.8 + unaccent + czech_unaccent ts config + chunks tabulka + HNSW partial index
0015 0015_search_chunks_rpc.sql search_chunks_by_folder() RPC (vector-only, MVP)
0016 0016_hybrid_search_chunks.sql hybrid_search_chunks_by_folder() RPC (BM25 + vector + RRF)
0017 0017_hybrid_search_prefix_match.sql _build_prefix_tsquery() helper, fix declension matching
0018 0018_retrieval_log.sql retrieval_log table

RLS policies

Všechny nemoreport.* tabulky mají RLS enabled.

Helper function

private.user_has_tenant_access(p_tenant_id uuid) RETURNS boolean

Kontroluje JWT claims personal_tenant_id + active_tenant_id NEBO tenant_members row pro current auth.uid().

Standard pattern per tabulka

ALTER TABLE nemoreport.<table> ENABLE ROW LEVEL SECURITY;

CREATE POLICY "<table>_select" ON nemoreport.<table> FOR SELECT
  TO authenticated USING (private.user_has_tenant_access(tenant_id));

-- Pro user-writable tabulky (např. attachments user_upload):
CREATE POLICY "<table>_insert" ON nemoreport.<table> FOR INSERT
  TO authenticated WITH CHECK (
    private.user_has_tenant_access(tenant_id)
    AND <conditions>
  );

CREATE POLICY "<table>_update" ON nemoreport.<table> FOR UPDATE
  TO authenticated USING (...) WITH CHECK (...);

CREATE POLICY "<table>_delete" ON nemoreport.<table> FOR DELETE
  TO authenticated USING (...);

-- Pro worker-managed tabulky: žádná INSERT/UPDATE/DELETE policy (default deny).
-- + defense-in-depth REVOKE:
REVOKE INSERT, UPDATE, DELETE ON nemoreport.<table> FROM authenticated;
GRANT SELECT, INSERT, UPDATE, DELETE ON nemoreport.<table> TO service_role;

Per-tabulka summary

Tabulka SELECT pro authenticated INSERT/UPDATE/DELETE
tenants own tenant service_role only
tenant_members own service_role
user_profiles own service_role
reports tenant scoped tenant scoped (user write) + service_role
attachments tenant scoped tenant + WITH CHECK source='user_upload' (Nette = service_role)
parsed_sections tenant scoped service_role only
figures tenant scoped service_role only
parsed_tables tenant scoped service_role only
ingestion_jobs tenant scoped service_role only
chunks tenant scoped service_role only
retrieval_log tenant scoped service_role only
conversations tenant scoped tenant + service_role
messages tenant scoped tenant + service_role
feedback tenant scoped tenant + service_role
llm_providers (RLS deny vše + REVOKE) service_role only
handle_claims_preauth (RLS deny + REVOKE) service_role only
migration_events (RLS deny + REVOKE) service_role only
user_events (RLS deny + REVOKE) service_role only
import_manifest (RLS deny + REVOKE) service_role only

Indexy

chunks (Phase C — vector RAG)

chunks_pkey                    -- (id) PRIMARY KEY btree
idx_chunks_tsv_gin             -- gin(tsv) — BM25 leg
idx_chunks_report              -- (report_id, order_in_doc) — folder retrieval ordering
idx_chunks_tenant              -- (tenant_id)
idx_chunks_section_filter      -- (report_id, section_slug, content_type) PARTIAL where section_slug NOT NULL
idx_chunks_attachment          -- (attachment_id) PARTIAL where attachment_id NOT NULL
idx_chunks_embedding_hnsw      -- hnsw(embedding halfvec_cosine_ops) WITH (m=16, ef_construction=64) PARTIAL where embedding NOT NULL

reports

idx_reports_tenant_created     -- (tenant_id, created_at DESC) PARTIAL where deleted_at IS NULL
idx_reports_status_pending     -- (status, created_at) PARTIAL where status IN (active states)
idx_reports_nette_external     -- UNIQUE (tenant_id, external_id) PARTIAL where source='nette' AND external_id IS NOT NULL

attachments

idx_attachments_report
idx_attachments_tenant
idx_attachments_status_pending  -- (status, created_at) PARTIAL
idx_attachments_nette_unique    -- UNIQUE (report_id, nette_id) PARTIAL where source='nette' AND nette_id IS NOT NULL

parsed_sections / figures / parsed_tables / ingestion_jobs / retrieval_log

Standard btree indexy: _report, _tenant, partial _pending, _section/_attachment partial.

Postgres extensions

Extension Verze Použití
vector 0.8.0 pgvector — halfvec(1536), HNSW index, cosine similarity
unaccent 1.1 strip diakritiku v czech_unaccent ts config
pgcrypto 1.3 gen_random_uuid()
uuid-ossp 1.1 UUID legacy support
pgtap 1.3.3 Unit testing (test only)
pg_stat_statements 1.11 Query performance monitoring
supabase_vault 0.3.1 Secret storage (Supabase managed)
postgis (avail) NEPOUŽÍVÁME (GIS coordinates jsou v jsonb, ne PostGIS)

RPC functions (Postgres SECURITY DEFINER)

nemoreport.import_v1(p_handle text, p_user_id uuid, p_tenant_id uuid) returns jsonb

Phase A.1 — claim flow pro v1 testery. Importuje reporty z public.* schema do nemoreport.*.

nemoreport.search_chunks_by_folder(p_query_vec halfvec(1536), p_report_id uuid, p_top_k int, p_ef_search int) returns table

Phase C.5 — vector-only retrieval (legacy MVP, Phase C.6 přepsala na hybrid).

nemoreport.hybrid_search_chunks_by_folder(p_query_text text, p_query_vec halfvec(1536), p_report_id uuid, p_top_k int, p_pre_fusion_n int, p_ef_search int, p_rrf_k int) returns table

Phase C.6 — hlavní retrieval RPC. Vrací top-K chunks s per-leg ranks + RRF score.

nemoreport._build_prefix_tsquery(p_query text) returns tsquery

Phase C.6 — interní helper, builduje prefix tsquery z user input (split tokens, OR, :* wildcard).

private.custom_access_token_hook(event jsonb) returns jsonb

Phase A — JWT custom hook. Injektuje personal_tenant_id + active_tenant_id claims.

nemoreport.set_updated_at() returns trigger

Helper pro BEFORE UPDATE triggers na tabulkách s updated_at.

nemoreport.handle_new_user() returns trigger

Phase AAFTER INSERT ON auth.users trigger. Auto-creates personal tenant + user_profiles + tenant_members row.

nemoreport.handle_delete_user() returns trigger

Phase A — GDPR BEFORE DELETE ON auth.users. Hard-delete personal tenant + cascade na všechno.

Realtime publication

ALTER PUBLICATION supabase_realtime ADD TABLE nemoreport.reports;
ALTER PUBLICATION supabase_realtime ADD TABLE nemoreport.attachments;

Frontend subscribe na postgres_changes UPDATE events. NIKDY neemituje: - chunks (frontend pulluje on-demand) - figures (pulluje při status='ready') - parsed_sections (read po finalize) - retrieval_log (admin only)

pgTAP testy

Test files v tests/db/:

  • setup.sql — pre-test fixtures (User A/B + jejich reporty)
  • 01_rls_tenants.sql — RLS isolation pen tests (22 tests)
  • 02_storage_rls.sql — bucket path scoping (6 tests)
  • 03_migration.sql — claim flow (14 tests)
  • 04_grants.sql — defense-in-depth + sweep test (38 tests)
  • 05_ingestion.sql — Phase B (28 tests)
  • 06_chunks.sql — Phase C (26 tests)

Total: 134 testů.

Spuštění: - Inline: přes mcp__claude_ai_Supabase__execute_sql (no Docker) - Docker: supabase test db --linked - psql: tests/db/run_remote.sh (přes connection string)

Detail viz Testování.

Backup + recovery

Supabase managed: - PITR (Point-in-time recovery) na Pro tier (default 7 dní retention) - Daily backups — 1 týden retention - Read replica — aktuálně NEPOUŽÍVÁME (1 cluster master pro write + read)

Pro produkci doporučeno upgrade na Pro plán + 14d PITR.

Performance

Indexes coverage

Všechny user-facing queries jsou indexed (verifikované přes pg_stat_statements):

  • SELECT FROM reports WHERE tenant_id = X → idx_reports_tenant_created
  • SELECT FROM attachments WHERE report_id = X → idx_attachments_report
  • SELECT FROM chunks WHERE report_id = X ORDER BY embedding <=> Y LIMIT K → idx_chunks_embedding_hnsw + idx_chunks_report

HNSW tuning

SET LOCAL hnsw.iterative_scan = 'relaxed_order';  -- partial index recovery
SET LOCAL hnsw.ef_search = 100;                    -- folder scope default

ef_search per scope: - section: 80 (small candidate pool) - folder: 100 (default) - multi_report: 150 (larger pool needed)

Vacuum + autovacuum

Supabase auto-managed. Per-table tuning není potřeba pro current scale (290 chunks).

Pro produkci 100K+ chunks: monitorovat pg_stat_user_tables a tweak autovacuum_vacuum_scale_factor per chunks tabulku.