19 KiB
Tech Debt Audit & Remediation Plan
Status: Plan / not yet started. Audit performed 2026-06-03 against
main. Context: app has been through 11 versions. Dev runs on SQLite (Windows); production is mid-migration to PostgreSQL. Six modules: Mix Calculator, Product Costing, Editor, Throughput, Reporting, Settings.Decisions already taken:
- Migrations: adopt Alembic (replaces the startup
create_all+ ad-hocALTERscheme).- Approach: full audit first (this document), then execute in phases.
Findings, ranked by severity
P0 — correctness / data integrity
P0.1 — Money is stored as Float everywhere
Every cost / price / margin column is SQLAlchemy Float:
backend/app/models/product_costing.py—cleaned_product_cost_per_kg,grading_cost_per_kg,bagging_cost_per_kg,cracking_cost_per_kg,bag_cost_per_unit,freight_cost_per_unit,finished_product_delivered_cost,distributor_price,wholesale_price,cost_per_kg,distributor_margin,wholesale_margin,cost, …backend/app/models/assumption.py—grading_cost,bagging_cost,cracking_cost,bag_cost,cost_per_unit.backend/app/models/mix.py—quantity_kg.backend/app/models/mix_calculator.py—batch_size_kg,total_bags,total_kg,product_unit_size_kg,required_kg,mix_percentage.backend/app/models/product.py—distributor_margin,wholesale_margin,quantity_kg.
Why it matters: for a costing-and-pricing tool, binary floating point introduces rounding
drift in money. It is also a SQLite ↔ Postgres divergence point — SQLite stores loose floats,
Postgres Numeric is exact, so the same calculation can produce different stored/displayed values
across environments.
Fix: migrate money/quantity columns to Numeric(12, 4) (tune precision/scale per field) and use
Decimal in the calculation engine. Guard with the existing formula-parity tests.
P0.2 — Frontend silently shows mock data when the API fails
frontend/src/lib/api.ts → fetchJson(path, fallback, ...) returns the fallback (mock data) on
any fetch error:
api.ts:151andapi.ts:158—return fallback;on failure.- Fallbacks are real mock datasets:
mockRawMaterials,mockCosts,mockProducts,mockMixes,mockScenarios,mockMixCalculatorOptions,mockMixCalculatorSessions,mockClientAccess— imported from$lib/mock(api.ts:4-13, used atapi.ts:296+).
Why it matters: a backend hiccup makes the UI display fabricated prices/costs with no error shown to the user. In a pricing application this is the most dangerous item in this audit — a user could quote or decide off fake numbers.
Fix: remove the mock-on-error fallback path. Surface real API errors in the UI (error/empty
states). Keep mock.ts for tests only.
P0.3 — Schema management is create_all + ad-hoc ALTER, no versioning
backend/app/db/migrations.py runs on every startup via bootstrap_schema()
(backend/app/main.py:105, inside ensure_database_ready()):
ensure_metadata_tables()→metadata.create_all()for any missing tables.ensure_tenant_columns()→ addstenant_idto a hardcodedTENANT_TABLESlist.ensure_legacy_columns()→ a hand-maintained_LEGACY_COLUMN_PATCHEStuple ofALTER TABLE ... ADD COLUMNstatements.sync_tenant_ids()→ ~250 lines of near-identicalUPDATEbackfills.sync_product_visibility()→ data backfill.
Why it matters: this can only create tables and add columns. It can never change a column
type, add an index / constraint / FK, drop a column, or do a NOT-NULL backfill in a controlled way.
A fresh Postgres gets the current model via create_all, while an upgraded SQLite has columns
bolted on by ALTER in whatever order/type they accreted — the two drift apart silently, and
SQLite's loose typing hides mismatches until production. Across 11 versions the only escape hatch
has been appending more manual patches (unbounded, fragile).
The one-shot SQLite→Postgres move (deploy/migrate-to-postgres.sh) uses
SET session_replication_role and manual sequence resets — fine for a single cutover, but not a
repeatable/testable migration path.
Fix: adopt Alembic (see Phase 1).
P1 — maintainability
P1.1 — Copy-pasted helpers, no shared util
No shared formatting/number module. Duplicated implementations:
formatDate— 9 files:lib/components/ClientAccessWorkspace.svelte,lib/components/mix-calculator/MixCalculatorResultsPanel.svelte,lib/components/MixCalculatorPrintDocument.svelte,routes/+page.svelte,routes/admin/+page.svelte,routes/client-access/+page.svelte,routes/mix-calculator/+page.svelte,routes/raw-materials/+page.svelte,routes/throughput/+page.svelte.formatNumber— 5 files:lib/components/mix-calculator/MixCalculatorEditor.svelte,lib/components/mix-calculator/MixCalculatorResultsPanel.svelte,lib/components/MixCalculatorPrintDocument.svelte,routes/mix-calculator/+page.svelte,routes/throughput/+page.svelte.toNum— 2 files:routes/throughput/+page.svelte,routes/throughput/add/+page.svelte.
Symptom already hit: the toNum value.trim is not a function bug (Svelte coerces
<input type="number"> bindings to number/null, not string). Fixed in both files
2026-06-03, but this class of bug will recur until there is a single source of truth.
Fix: add frontend/src/lib/format.ts (formatDate, formatNumber, formatCurrency, toNum)
and replace the duplicates.
P1.2 — Monolith route files
Largest route components (LOC):
| File | LOC |
|---|---|
routes/+page.svelte (dashboard) |
2238 |
routes/product-costing/+page.svelte |
1557 |
routes/throughput/+page.svelte |
1232 |
routes/editor/+page.svelte |
1163 |
routes/raw-materials/+page.svelte |
1062 |
routes/client-access/+page.svelte |
851 |
routes/reporting/+page.svelte |
518 |
Why it matters: hard to test, hard to change safely, encourages more copy-paste.
Fix: decompose incrementally — extract components and +page.ts load logic. Dashboard and
product-costing first.
P1.3 — migrations.py conflates DDL + data backfill
Schema DDL (ensure_*) and data backfill (sync_*) live in one module, including ~250 lines of
near-identical UPDATE blocks in sync_tenant_ids(). Alembic will absorb most of this into
versioned schema steps + explicit data-migration steps.
P2 — hygiene
- P2.1 —
backend/tests/_repro_throughput_post.pyis a debug repro, not a real test. Remove. - P2.2 — ~20
TODO/FIXME/legacy markers acrossbackend/appandfrontend/src. Triage and burn down. - P2.3 —
backend/app/seed.pyis 1325 LOC. Split by module. - P2.4 —
CLAUDE.mdstill says "PostgreSQL recommended / SQLite acceptable only for prototype", stale vs the live Postgres migration. Refresh.
P0.4 — Three overlapping authentication / user-type systems
The app has accreted three separate auth systems, with two cookies, three login endpoints, three role namespaces, and two parallel permission models. They overlap awkwardly and one of them is already dead in the UI. This is the single largest piece of structural debt in the codebase.
The three systems
1. Internal / "lean" system — users / roles / permissions / role_permissions
- Code:
app/core/access.py,app/models/access.py,app/api/access.py(/api/access/*),app/seed_access.py. - Per-user password hash (
User.password_hash, PBKDF2). Role → permission keys (view_raw_materials,edit_products, …). Fail-closedrequire_permission(...)dependencies. - Token carries
sub=INTERNAL_USER_SUBJECT; sessionrole="internal". - Tenant is hardcoded to a constant:
INTERNAL_USER_TENANT_ID = "hunter-premium-produce"(core/access.py:37). - Uses
CLIENT_AUTH_COOKIE. - This is the actual primary login. The root page (
routes/+page.svelte:57) callsapi.internalLogin()→/api/access/login.
2. Client-portal system — client_accounts / client_users / client_feature_access /
client_user_module_permissions / client_access_audit_events
- Code:
app/models/client_access.py,app/services/client_access_service.py,app/api/auth.py(/api/auth/client/*),app/api/client_access.py. - Multi-tenant (
tenant_idon every table), per-account feature flags, per-user module access levels (none/view/edit/manage),client_rolein {superadmin, admin, viewer, …}. - Session
role="client". UsesCLIENT_AUTH_COOKIE. - Authentication is broken-by-design: a single shared password.
client_logincheckspayload.password != settings.client_password(api/auth.py:79) — one password for all client users; the per-user record only supplies identity, not a credential. - The login UI is dead.
api.clientLoginis referenced only byapi.ts(definition) andapi.test.ts— no component calls it. The endpoints, tables, tenant plumbing, and therequire_client_session/module_access_mappath are all still live and still wired into the shared route dependencies.
3. Admin system — environment-variable single credential
- Code:
app/api/auth.py(/api/auth/admin/*),require_admin_sessioninapp/api/deps.py. - No DB row. Authenticates against
settings.admin_email/settings.admin_password. Sessionrole="admin"→ blanket access (session.ts:105hasModuleAccessreturnstruefor admin;require_admin_sessiongates the admin-only routes). - Uses a second cookie,
ADMIN_AUTH_COOKIE. - Drives the separate
/admin+/admin/client-accessUI (routes/admin/+page.sveltecallsapi.adminLogin()), which exists to manage client users / feature flags / Power BI preview — i.e. the "management behind the scenes" layer we no longer want.
How they tangle
- Frontend routes by URL:
/admin*→AdminShell, everything else →ClientShell(routes/+layout.svelte:15,42-49). TwolocalStoragesession stores (data-entry-app-client-session,data-entry-app-admin-session) insession.ts. - Shared route deps bend to accept two token shapes.
require_client_sessionandrequire_client_module_access(api/deps.py:97-184) special-caserole=="internal"to skip theClientUserDB lookup and read permissions from a role-derived map, while still supportingrole=="client".core/access.py:_PERMISSION_TO_MODULE_LEVELexists purely to translate the internal permission keys into the legacy client module/level shape so the same routes accept both. - Two permission models run in parallel: role→permission-keys (internal) vs
per-user module→access-level rows + per-account feature flags (client).
permissions_to_module_mapbridges them. tenant_idis smeared across ~25 tables and ~25 backend files (heaviest:db/migrations.py70 refs,seed.py52,api/product_costing.py36, plus every service/model), for multi-tenancy we no longer want.
Target architecture (per product direction)
One login for everyone. User type
lean= full access.client= its own permissions. No multi-tenant. No separate behind-the-scenes management app, exceptleanmay have a few extra settings (e.g. change logo).
- One login endpoint + one login page for all users.
- One user store: keep
users/roles/permissions/role_permissions. Everyone is aUserwith a role. Add aleanrole = all permissions (full access, including the extra settings like logo). Define aclientrole with its own permission set. Operations etc. stay as additional roles. - One cookie, one session shape, one frontend session store.
- Remove multi-tenancy: drop
tenant_idfrom models/queries/migrations/seed; collapse to a single implicit tenant. - Retire the env-var admin login and the dead client-portal login + its tables/service,
folding any still-needed capability (e.g. managing users) into permission-gated routes inside the
single app.
lean-only settings (logo, etc.) become permission-gated, not a separate shell.
Decoupling / migration approach (proposed)
- Confirm the dead path is dead (done:
clientLoginhas no UI caller) and snapshot any client data worth keeping (client_users, module permissions) so it can be re-expressed asusers+rolesif needed. - Unify on the
users/roles/permissionsmodel. Introduceleanandclientroles inseed_access.pywith the right permission sets. Migrate any real client users intousers. - Single login: make
/api/access/loginthe only login; one cookie; one session store; one login page. Remove/api/auth/admin/*,/api/auth/client/*,ADMIN_AUTH_COOKIE, the admin/client localStorage split, and the/admin*shell routing (fold any surviving admin screens into permission-gated routes in the main app). - Collapse the dual permission model: drop
_PERMISSION_TO_MODULE_LEVELbridging and therole=="internal"/role=="client"special-casing indeps.py; every route depends onrequire_permission(...)(or a thin module-level wrapper) only. - Drop multi-tenancy: Alembic migration removing
tenant_idcolumns (or leaving them nullable and unused first, then dropping), plus removingtenant_idfilters from services/queries and thesync_tenant_idsbackfill. Sequence this on top of Phase 1 (Alembic) so the column drops are versioned and run identically on SQLite and Postgres. - Delete the client-portal subsystem once nothing references it:
models/client_access.py,client_access_service.py,api/auth.py,api/client_access.py, related schemas, and theClientShell/AdminShellsplit.
Risk notes
- This touches authentication — stage it carefully behind tests; do not delete the old endpoints until the unified login is proven in dev against both SQLite and (a Postgres copy of) prod.
- The shared password (
P0.4/system 2) and the env-admin credential should be considered a security cleanup, not just structure: per-user hashed passwords for everyone is the target. - Dropping
tenant_idis irreversible data-wise — do it as a dedicated, reviewed Alembic step with a backup, after the login unification has settled.
Remediation plan (phased)
Phase 0 — Safety net (no behavior change)
- Add a schema-parity smoke test: fresh-SQLite
create_allvsBase.metadataso later phases cannot silently drift. - Remove
backend/tests/_repro_throughput_post.py.
Phase 1 — Adopt Alembic (foundation; most moving parts)
- Add
alembictobackend/pyproject.toml;alembic init. - Wire
env.pyto readDATABASE_URL(viaapp.core.config.settings) andBase.metadata. - Critical for this setup: enable
render_as_batch=TruesoALTERworks on SQLite (Windows dev); Postgres handlesALTERnatively. - Autogenerate a
0001_baselinemigration from current models. alembic stamp 0001_baselineon existing dev and prod DBs so they are recognized without rebuilding.- Fold
_LEGACY_COLUMN_PATCHES,sync_tenant_ids, andsync_product_visibilityinto versioned migrations (schema steps + explicit data-migration steps). - Replace the startup
bootstrap_schema(...)call (main.py:105) withalembic upgrade head(or an explicit deploy step). - Update
deploy/migrate-to-postgres.shPhase 5 to runalembic upgrade headinstead of callingbootstrap_schema.
Phase 2 — Money correctness
Float → Numeric(12, 4)(tune per field) across the money/quantity columns listed in P0.1.- Use
Decimalinservices/costing_engine.pyandservices/product_costing_service.py. - Dedicated Alembic migration; guard with
tests/test_costing_engine.pyformula-parity tests.
Phase 3 — Frontend shared utils
- New
frontend/src/lib/format.ts:formatDate,formatNumber,formatCurrency,toNum. - Replace the 9 / 5 / 2 duplicate implementations. Eliminates the
toNum-style bug class.
Phase 4 — Remove mock-on-error fallback (quick, high-value correctness fix)
- Remove the
fallbackreturn path inapi.tsfetchJson. - Surface real API errors / empty states in the UI.
- Keep
mock.tsfor tests only.
Phase 5 — Unify authentication & user types (addresses P0.4; large, cross-cutting)
Sits on top of Phase 1 (Alembic) because the column drops must be versioned. Order within the phase:
- Snapshot/migrate any real client users into
users+roles; addleanandclientroles inseed_access.py. - Single login: make
/api/access/loginthe only login; one cookie; one session store; one login page. Remove/api/auth/admin/*,/api/auth/client/*,ADMIN_AUTH_COOKIE, and the/admin*shell split. - Collapse the dual permission model — every route on
require_permission(...); delete theinternal/clientspecial-casing and_PERMISSION_TO_MODULE_LEVELbridge indeps.py/access.py. - Drop multi-tenancy (
tenant_id) via a dedicated Alembic migration + query cleanup; removesync_tenant_ids. - Delete the dead client-portal subsystem (
models/client_access.py,client_access_service.py,api/auth.py,api/client_access.py,AdminShell). lean-only extras (logo change, etc.) become permission-gated settings in the single app.
Phase 6 — Decompose monolith routes
- Incrementally extract components +
+page.tsload logic. Start with dashboard (+page.svelte) and product-costing.
Phase 7 — Hygiene
- Burn down
TODO/legacy markers. - Split
seed.pyby module. - Refresh
CLAUDE.mdDB guidance.
Suggested sequencing note
Phase 1 (Alembic) is the foundation the SQLite-dev / Postgres-prod split most depends on. However, Phase 4 (mock-on-error) is the scariest correctness bug and a ~20-minute fix — a reasonable quick win to do first, before Phase 1.
Progress log
- 2026-06-03 — Audit completed; plan written.
toNumbug fixed inroutes/throughput/+page.svelteandroutes/throughput/add/+page.svelte(precursor to Phase 3). - 2026-06-03 — Auth/user-type investigation added (P0.4 + Phase 5). Found three overlapping auth
systems; the client-portal login (
/api/auth/client/login, shared password) is already dead in the UI (clientLoginhas no component caller). Target: single login,lean/clientroles, no multi-tenant, no separate admin shell. - 2026-06-04 — Phase 4 quick win started: removed production
api.tsmock-on-error fallback so failed reads throw normalized API errors instead of returning fabricated mock pricing/costing data. Removedbackend/tests/_repro_throughput_post.pydebug repro file. - 2026-06-04 — Phase 0 safety net started: added a fresh SQLite schema smoke test that checks model metadata tables and columns are created as declared.
- 2026-06-04 — Phase 3 shared utils started: added
frontend/src/lib/format.ts, covered it with unit tests, and replaced the duplicatedtoNumhelper plus the mix-calculator/throughput number and date formatters touched in recent work.