# Tech Debt Audit & Remediation Plan > Status: **Plan / not yet started.** Audit performed 2026-06-03 against `main`. > Context: app has been through 11 versions. Dev runs on **SQLite (Windows)**; > production is mid-migration to **PostgreSQL**. Six modules: Mix Calculator, > Product Costing, Editor, Throughput, Reporting, Settings. > > Decisions already taken: > - **Migrations:** adopt **Alembic** (replaces the startup `create_all` + ad-hoc `ALTER` scheme). > - **Approach:** full audit first (this document), then execute in phases. --- ## Findings, ranked by severity ### P0 — correctness / data integrity #### P0.1 — Money is stored as `Float` everywhere Every cost / price / margin column is SQLAlchemy `Float`: - `backend/app/models/product_costing.py` — `cleaned_product_cost_per_kg`, `grading_cost_per_kg`, `bagging_cost_per_kg`, `cracking_cost_per_kg`, `bag_cost_per_unit`, `freight_cost_per_unit`, `finished_product_delivered_cost`, `distributor_price`, `wholesale_price`, `cost_per_kg`, `distributor_margin`, `wholesale_margin`, `cost`, … - `backend/app/models/assumption.py` — `grading_cost`, `bagging_cost`, `cracking_cost`, `bag_cost`, `cost_per_unit`. - `backend/app/models/mix.py` — `quantity_kg`. - `backend/app/models/mix_calculator.py` — `batch_size_kg`, `total_bags`, `total_kg`, `product_unit_size_kg`, `required_kg`, `mix_percentage`. - `backend/app/models/product.py` — `distributor_margin`, `wholesale_margin`, `quantity_kg`. **Why it matters:** for a costing-and-pricing tool, binary floating point introduces rounding drift in money. It is also a **SQLite ↔ Postgres divergence point** — SQLite stores loose floats, Postgres `Numeric` is exact, so the same calculation can produce different stored/displayed values across environments. **Fix:** migrate money/quantity columns to `Numeric(12, 4)` (tune precision/scale per field) and use `Decimal` in the calculation engine. Guard with the existing formula-parity tests. #### P0.2 — Frontend silently shows mock data when the API fails `frontend/src/lib/api.ts` → `fetchJson(path, fallback, ...)` returns the `fallback` (mock data) on **any** fetch error: - `api.ts:151` and `api.ts:158` — `return fallback;` on failure. - Fallbacks are real mock datasets: `mockRawMaterials`, `mockCosts`, `mockProducts`, `mockMixes`, `mockScenarios`, `mockMixCalculatorOptions`, `mockMixCalculatorSessions`, `mockClientAccess` — imported from `$lib/mock` (`api.ts:4-13`, used at `api.ts:296+`). **Why it matters:** a backend hiccup makes the UI display **fabricated prices/costs** with no error shown to the user. In a pricing application this is the most dangerous item in this audit — a user could quote or decide off fake numbers. **Fix:** remove the mock-on-error fallback path. Surface real API errors in the UI (error/empty states). Keep `mock.ts` for tests only. #### P0.3 — Schema management is `create_all` + ad-hoc `ALTER`, no versioning `backend/app/db/migrations.py` runs on **every startup** via `bootstrap_schema()` (`backend/app/main.py:105`, inside `ensure_database_ready()`): - `ensure_metadata_tables()` → `metadata.create_all()` for any missing tables. - `ensure_tenant_columns()` → adds `tenant_id` to a hardcoded `TENANT_TABLES` list. - `ensure_legacy_columns()` → a hand-maintained `_LEGACY_COLUMN_PATCHES` tuple of `ALTER TABLE ... ADD COLUMN` statements. - `sync_tenant_ids()` → ~250 lines of near-identical `UPDATE` backfills. - `sync_product_visibility()` → data backfill. **Why it matters:** this can only **create tables and add columns**. It can never change a column type, add an index / constraint / FK, drop a column, or do a NOT-NULL backfill in a controlled way. A fresh Postgres gets the *current* model via `create_all`, while an upgraded SQLite has columns bolted on by `ALTER` in whatever order/type they accreted — the two **drift apart silently**, and SQLite's loose typing hides mismatches until production. Across 11 versions the only escape hatch has been appending more manual patches (unbounded, fragile). The one-shot SQLite→Postgres move (`deploy/migrate-to-postgres.sh`) uses `SET session_replication_role` and manual sequence resets — fine for a single cutover, but not a repeatable/testable migration path. **Fix:** adopt **Alembic** (see Phase 1). --- ### P1 — maintainability #### P1.1 — Copy-pasted helpers, no shared util No shared formatting/number module. Duplicated implementations: - `formatDate` — **9** files: `lib/components/ClientAccessWorkspace.svelte`, `lib/components/mix-calculator/MixCalculatorResultsPanel.svelte`, `lib/components/MixCalculatorPrintDocument.svelte`, `routes/+page.svelte`, `routes/admin/+page.svelte`, `routes/client-access/+page.svelte`, `routes/mix-calculator/+page.svelte`, `routes/raw-materials/+page.svelte`, `routes/throughput/+page.svelte`. - `formatNumber` — **5** files: `lib/components/mix-calculator/MixCalculatorEditor.svelte`, `lib/components/mix-calculator/MixCalculatorResultsPanel.svelte`, `lib/components/MixCalculatorPrintDocument.svelte`, `routes/mix-calculator/+page.svelte`, `routes/throughput/+page.svelte`. - `toNum` — **2** files: `routes/throughput/+page.svelte`, `routes/throughput/add/+page.svelte`. **Symptom already hit:** the `toNum` `value.trim is not a function` bug (Svelte coerces `` bindings to `number`/`null`, not `string`). Fixed in both files 2026-06-03, but this class of bug will recur until there is a single source of truth. **Fix:** add `frontend/src/lib/format.ts` (`formatDate`, `formatNumber`, `formatCurrency`, `toNum`) and replace the duplicates. #### P1.2 — Monolith route files Largest route components (LOC): | File | LOC | | --- | --- | | `routes/+page.svelte` (dashboard) | 2238 | | `routes/product-costing/+page.svelte` | 1557 | | `routes/throughput/+page.svelte` | 1232 | | `routes/editor/+page.svelte` | 1163 | | `routes/raw-materials/+page.svelte` | 1062 | | `routes/client-access/+page.svelte` | 851 | | `routes/reporting/+page.svelte` | 518 | **Why it matters:** hard to test, hard to change safely, encourages more copy-paste. **Fix:** decompose incrementally — extract components and `+page.ts` load logic. Dashboard and product-costing first. #### P1.3 — `migrations.py` conflates DDL + data backfill Schema DDL (`ensure_*`) and data backfill (`sync_*`) live in one module, including ~250 lines of near-identical `UPDATE` blocks in `sync_tenant_ids()`. Alembic will absorb most of this into versioned schema steps + explicit data-migration steps. --- ### P2 — hygiene - **P2.1** — `backend/tests/_repro_throughput_post.py` is a debug repro, not a real test. Remove. - **P2.2** — ~20 `TODO`/`FIXME`/legacy markers across `backend/app` and `frontend/src`. Triage and burn down. - **P2.3** — `backend/app/seed.py` is 1325 LOC. Split by module. - **P2.4** — `CLAUDE.md` still says "PostgreSQL recommended / SQLite acceptable only for prototype", stale vs the live Postgres migration. Refresh. --- ## P0.4 — Three overlapping authentication / user-type systems The app has accreted **three separate auth systems**, with **two cookies**, **three login endpoints**, **three role namespaces**, and two parallel permission models. They overlap awkwardly and one of them is already dead in the UI. This is the single largest piece of structural debt in the codebase. ### The three systems **1. Internal / "lean" system** — `users` / `roles` / `permissions` / `role_permissions` - Code: `app/core/access.py`, `app/models/access.py`, `app/api/access.py` (`/api/access/*`), `app/seed_access.py`. - Per-user password hash (`User.password_hash`, PBKDF2). Role → permission keys (`view_raw_materials`, `edit_products`, …). Fail-closed `require_permission(...)` dependencies. - Token carries `sub=INTERNAL_USER_SUBJECT`; session `role="internal"`. - Tenant is **hardcoded** to a constant: `INTERNAL_USER_TENANT_ID = "hunter-premium-produce"` (`core/access.py:37`). - Uses `CLIENT_AUTH_COOKIE`. - **This is the actual primary login.** The root page (`routes/+page.svelte:57`) calls `api.internalLogin()` → `/api/access/login`. **2. Client-portal system** — `client_accounts` / `client_users` / `client_feature_access` / `client_user_module_permissions` / `client_access_audit_events` - Code: `app/models/client_access.py`, `app/services/client_access_service.py`, `app/api/auth.py` (`/api/auth/client/*`), `app/api/client_access.py`. - **Multi-tenant** (`tenant_id` on every table), per-account feature flags, per-user module access levels (`none`/`view`/`edit`/`manage`), `client_role` in {superadmin, admin, viewer, …}. - Session `role="client"`. Uses `CLIENT_AUTH_COOKIE`. - **Authentication is broken-by-design: a single shared password.** `client_login` checks `payload.password != settings.client_password` (`api/auth.py:79`) — *one* password for *all* client users; the per-user record only supplies identity, not a credential. - **The login UI is dead.** `api.clientLogin` is referenced only by `api.ts` (definition) and `api.test.ts` — no component calls it. The endpoints, tables, tenant plumbing, and the `require_client_session` / `module_access_map` path are all still live and still wired into the shared route dependencies. **3. Admin system** — environment-variable single credential - Code: `app/api/auth.py` (`/api/auth/admin/*`), `require_admin_session` in `app/api/deps.py`. - No DB row. Authenticates against `settings.admin_email` / `settings.admin_password`. Session `role="admin"` → **blanket access** (`session.ts:105` `hasModuleAccess` returns `true` for admin; `require_admin_session` gates the admin-only routes). - Uses a **second cookie**, `ADMIN_AUTH_COOKIE`. - Drives the separate `/admin` + `/admin/client-access` UI (`routes/admin/+page.svelte` calls `api.adminLogin()`), which exists to manage client users / feature flags / Power BI preview — i.e. the "management behind the scenes" layer we no longer want. ### How they tangle - **Frontend** routes by URL: `/admin*` → `AdminShell`, everything else → `ClientShell` (`routes/+layout.svelte:15,42-49`). Two `localStorage` session stores (`data-entry-app-client-session`, `data-entry-app-admin-session`) in `session.ts`. - **Shared route deps bend to accept two token shapes.** `require_client_session` and `require_client_module_access` (`api/deps.py:97-184`) special-case `role=="internal"` to skip the `ClientUser` DB lookup and read permissions from a role-derived map, while still supporting `role=="client"`. `core/access.py:_PERMISSION_TO_MODULE_LEVEL` exists purely to translate the internal permission keys into the legacy client module/level shape so the same routes accept both. - **Two permission models run in parallel**: role→permission-keys (internal) vs per-user module→access-level rows + per-account feature flags (client). `permissions_to_module_map` bridges them. - **`tenant_id` is smeared across ~25 tables and ~25 backend files** (heaviest: `db/migrations.py` 70 refs, `seed.py` 52, `api/product_costing.py` 36, plus every service/model), for multi-tenancy we no longer want. ### Target architecture (per product direction) > One login for everyone. User type `lean` = full access. `client` = its own permissions. > No multi-tenant. No separate behind-the-scenes management app, except `lean` may have a few > extra settings (e.g. change logo). - **One login endpoint + one login page** for all users. - **One user store**: keep `users` / `roles` / `permissions` / `role_permissions`. Everyone is a `User` with a role. Add a **`lean`** role = all permissions (full access, including the extra settings like logo). Define a **`client`** role with its own permission set. Operations etc. stay as additional roles. - **One cookie**, one session shape, one frontend session store. - **Remove multi-tenancy**: drop `tenant_id` from models/queries/migrations/seed; collapse to a single implicit tenant. - **Retire the env-var admin login** and the **dead client-portal login** + its tables/service, folding any still-needed capability (e.g. managing users) into permission-gated routes inside the single app. `lean`-only settings (logo, etc.) become permission-gated, not a separate shell. ### Decoupling / migration approach (proposed) 1. **Confirm the dead path is dead** (done: `clientLogin` has no UI caller) and snapshot any client data worth keeping (`client_users`, module permissions) so it can be re-expressed as `users` + `roles` if needed. 2. **Unify on the `users`/`roles`/`permissions` model.** Introduce `lean` and `client` roles in `seed_access.py` with the right permission sets. Migrate any real client users into `users`. 3. **Single login**: make `/api/access/login` the only login; one cookie; one session store; one login page. Remove `/api/auth/admin/*`, `/api/auth/client/*`, `ADMIN_AUTH_COOKIE`, the admin/client localStorage split, and the `/admin*` shell routing (fold any surviving admin screens into permission-gated routes in the main app). 4. **Collapse the dual permission model**: drop `_PERMISSION_TO_MODULE_LEVEL` bridging and the `role=="internal"` / `role=="client"` special-casing in `deps.py`; every route depends on `require_permission(...)` (or a thin module-level wrapper) only. 5. **Drop multi-tenancy**: Alembic migration removing `tenant_id` columns (or leaving them nullable and unused first, then dropping), plus removing `tenant_id` filters from services/queries and the `sync_tenant_ids` backfill. **Sequence this on top of Phase 1 (Alembic)** so the column drops are versioned and run identically on SQLite and Postgres. 6. **Delete the client-portal subsystem** once nothing references it: `models/client_access.py`, `client_access_service.py`, `api/auth.py`, `api/client_access.py`, related schemas, and the `ClientShell`/`AdminShell` split. ### Risk notes - This touches **authentication** — stage it carefully behind tests; do not delete the old endpoints until the unified login is proven in dev against both SQLite and (a Postgres copy of) prod. - The shared password (`P0.4`/system 2) and the env-admin credential should be considered a **security cleanup**, not just structure: per-user hashed passwords for everyone is the target. - Dropping `tenant_id` is irreversible data-wise — do it as a dedicated, reviewed Alembic step with a backup, after the login unification has settled. --- ## Remediation plan (phased) ### Phase 0 — Safety net (no behavior change) - Add a schema-parity smoke test: fresh-SQLite `create_all` vs `Base.metadata` so later phases cannot silently drift. - Remove `backend/tests/_repro_throughput_post.py`. ### Phase 1 — Adopt Alembic *(foundation; most moving parts)* - Add `alembic` to `backend/pyproject.toml`; `alembic init`. - Wire `env.py` to read `DATABASE_URL` (via `app.core.config.settings`) and `Base.metadata`. - **Critical for this setup:** enable `render_as_batch=True` so `ALTER` works on **SQLite (Windows dev)**; Postgres handles `ALTER` natively. - Autogenerate a **`0001_baseline`** migration from current models. - `alembic stamp 0001_baseline` on existing dev **and** prod DBs so they are recognized without rebuilding. - Fold `_LEGACY_COLUMN_PATCHES`, `sync_tenant_ids`, and `sync_product_visibility` into versioned migrations (schema steps + explicit data-migration steps). - Replace the startup `bootstrap_schema(...)` call (`main.py:105`) with `alembic upgrade head` (or an explicit deploy step). - Update `deploy/migrate-to-postgres.sh` **Phase 5** to run `alembic upgrade head` instead of calling `bootstrap_schema`. ### Phase 2 — Money correctness - `Float → Numeric(12, 4)` (tune per field) across the money/quantity columns listed in P0.1. - Use `Decimal` in `services/costing_engine.py` and `services/product_costing_service.py`. - Dedicated Alembic migration; guard with `tests/test_costing_engine.py` formula-parity tests. ### Phase 3 — Frontend shared utils - New `frontend/src/lib/format.ts`: `formatDate`, `formatNumber`, `formatCurrency`, `toNum`. - Replace the 9 / 5 / 2 duplicate implementations. Eliminates the `toNum`-style bug class. ### Phase 4 — Remove mock-on-error fallback *(quick, high-value correctness fix)* - Remove the `fallback` return path in `api.ts` `fetchJson`. - Surface real API errors / empty states in the UI. - Keep `mock.ts` for tests only. ### Phase 5 — Unify authentication & user types *(addresses P0.4; large, cross-cutting)* Sits on top of Phase 1 (Alembic) because the column drops must be versioned. Order within the phase: 1. Snapshot/migrate any real client users into `users` + `roles`; add `lean` and `client` roles in `seed_access.py`. 2. Single login: make `/api/access/login` the only login; one cookie; one session store; one login page. Remove `/api/auth/admin/*`, `/api/auth/client/*`, `ADMIN_AUTH_COOKIE`, and the `/admin*` shell split. 3. Collapse the dual permission model — every route on `require_permission(...)`; delete the `internal`/`client` special-casing and `_PERMISSION_TO_MODULE_LEVEL` bridge in `deps.py`/`access.py`. 4. Drop multi-tenancy (`tenant_id`) via a dedicated Alembic migration + query cleanup; remove `sync_tenant_ids`. 5. Delete the dead client-portal subsystem (`models/client_access.py`, `client_access_service.py`, `api/auth.py`, `api/client_access.py`, `AdminShell`). 6. `lean`-only extras (logo change, etc.) become permission-gated settings in the single app. ### Phase 6 — Decompose monolith routes - Incrementally extract components + `+page.ts` load logic. Start with dashboard (`+page.svelte`) and product-costing. ### Phase 7 — Hygiene - Burn down `TODO`/legacy markers. - Split `seed.py` by module. - Refresh `CLAUDE.md` DB guidance. --- ## Suggested sequencing note Phase 1 (Alembic) is the foundation the SQLite-dev / Postgres-prod split most depends on. However, **Phase 4 (mock-on-error)** is the scariest correctness bug and a ~20-minute fix — a reasonable quick win to do first, before Phase 1. ## Progress log - 2026-06-03 — Audit completed; plan written. `toNum` bug fixed in `routes/throughput/+page.svelte` and `routes/throughput/add/+page.svelte` (precursor to Phase 3). - 2026-06-03 — Auth/user-type investigation added (P0.4 + Phase 5). Found three overlapping auth systems; the client-portal login (`/api/auth/client/login`, shared password) is already dead in the UI (`clientLogin` has no component caller). Target: single login, `lean`/`client` roles, no multi-tenant, no separate admin shell. - 2026-06-04 — Phase 4 quick win started: removed production `api.ts` mock-on-error fallback so failed reads throw normalized API errors instead of returning fabricated mock pricing/costing data. Removed `backend/tests/_repro_throughput_post.py` debug repro file. - 2026-06-04 — Phase 0 safety net started: added a fresh SQLite schema smoke test that checks model metadata tables and columns are created as declared. - 2026-06-04 — Phase 3 shared utils started: added `frontend/src/lib/format.ts`, covered it with unit tests, and replaced the duplicated `toNum` helper plus the mix-calculator/throughput number and date formatters touched in recent work.