v0.1.12
This commit is contained in:
+335
@@ -0,0 +1,335 @@
|
||||
# Tech Debt Audit & Remediation Plan
|
||||
|
||||
> Status: **Plan / not yet started.** Audit performed 2026-06-03 against `main`.
|
||||
> Context: app has been through 11 versions. Dev runs on **SQLite (Windows)**;
|
||||
> production is mid-migration to **PostgreSQL**. Six modules: Mix Calculator,
|
||||
> Product Costing, Editor, Throughput, Reporting, Settings.
|
||||
>
|
||||
> Decisions already taken:
|
||||
> - **Migrations:** adopt **Alembic** (replaces the startup `create_all` + ad-hoc `ALTER` scheme).
|
||||
> - **Approach:** full audit first (this document), then execute in phases.
|
||||
|
||||
---
|
||||
|
||||
## Findings, ranked by severity
|
||||
|
||||
### P0 — correctness / data integrity
|
||||
|
||||
#### P0.1 — Money is stored as `Float` everywhere
|
||||
Every cost / price / margin column is SQLAlchemy `Float`:
|
||||
|
||||
- `backend/app/models/product_costing.py` — `cleaned_product_cost_per_kg`, `grading_cost_per_kg`,
|
||||
`bagging_cost_per_kg`, `cracking_cost_per_kg`, `bag_cost_per_unit`, `freight_cost_per_unit`,
|
||||
`finished_product_delivered_cost`, `distributor_price`, `wholesale_price`, `cost_per_kg`,
|
||||
`distributor_margin`, `wholesale_margin`, `cost`, …
|
||||
- `backend/app/models/assumption.py` — `grading_cost`, `bagging_cost`, `cracking_cost`,
|
||||
`bag_cost`, `cost_per_unit`.
|
||||
- `backend/app/models/mix.py` — `quantity_kg`.
|
||||
- `backend/app/models/mix_calculator.py` — `batch_size_kg`, `total_bags`, `total_kg`,
|
||||
`product_unit_size_kg`, `required_kg`, `mix_percentage`.
|
||||
- `backend/app/models/product.py` — `distributor_margin`, `wholesale_margin`, `quantity_kg`.
|
||||
|
||||
**Why it matters:** for a costing-and-pricing tool, binary floating point introduces rounding
|
||||
drift in money. It is also a **SQLite ↔ Postgres divergence point** — SQLite stores loose floats,
|
||||
Postgres `Numeric` is exact, so the same calculation can produce different stored/displayed values
|
||||
across environments.
|
||||
|
||||
**Fix:** migrate money/quantity columns to `Numeric(12, 4)` (tune precision/scale per field) and use
|
||||
`Decimal` in the calculation engine. Guard with the existing formula-parity tests.
|
||||
|
||||
#### P0.2 — Frontend silently shows mock data when the API fails
|
||||
`frontend/src/lib/api.ts` → `fetchJson(path, fallback, ...)` returns the `fallback` (mock data) on
|
||||
**any** fetch error:
|
||||
|
||||
- `api.ts:151` and `api.ts:158` — `return fallback;` on failure.
|
||||
- Fallbacks are real mock datasets: `mockRawMaterials`, `mockCosts`, `mockProducts`, `mockMixes`,
|
||||
`mockScenarios`, `mockMixCalculatorOptions`, `mockMixCalculatorSessions`, `mockClientAccess` —
|
||||
imported from `$lib/mock` (`api.ts:4-13`, used at `api.ts:296+`).
|
||||
|
||||
**Why it matters:** a backend hiccup makes the UI display **fabricated prices/costs** with no error
|
||||
shown to the user. In a pricing application this is the most dangerous item in this audit — a user
|
||||
could quote or decide off fake numbers.
|
||||
|
||||
**Fix:** remove the mock-on-error fallback path. Surface real API errors in the UI (error/empty
|
||||
states). Keep `mock.ts` for tests only.
|
||||
|
||||
#### P0.3 — Schema management is `create_all` + ad-hoc `ALTER`, no versioning
|
||||
`backend/app/db/migrations.py` runs on **every startup** via `bootstrap_schema()`
|
||||
(`backend/app/main.py:105`, inside `ensure_database_ready()`):
|
||||
|
||||
- `ensure_metadata_tables()` → `metadata.create_all()` for any missing tables.
|
||||
- `ensure_tenant_columns()` → adds `tenant_id` to a hardcoded `TENANT_TABLES` list.
|
||||
- `ensure_legacy_columns()` → a hand-maintained `_LEGACY_COLUMN_PATCHES` tuple of
|
||||
`ALTER TABLE ... ADD COLUMN` statements.
|
||||
- `sync_tenant_ids()` → ~250 lines of near-identical `UPDATE` backfills.
|
||||
- `sync_product_visibility()` → data backfill.
|
||||
|
||||
**Why it matters:** this can only **create tables and add columns**. It can never change a column
|
||||
type, add an index / constraint / FK, drop a column, or do a NOT-NULL backfill in a controlled way.
|
||||
A fresh Postgres gets the *current* model via `create_all`, while an upgraded SQLite has columns
|
||||
bolted on by `ALTER` in whatever order/type they accreted — the two **drift apart silently**, and
|
||||
SQLite's loose typing hides mismatches until production. Across 11 versions the only escape hatch
|
||||
has been appending more manual patches (unbounded, fragile).
|
||||
|
||||
The one-shot SQLite→Postgres move (`deploy/migrate-to-postgres.sh`) uses
|
||||
`SET session_replication_role` and manual sequence resets — fine for a single cutover, but not a
|
||||
repeatable/testable migration path.
|
||||
|
||||
**Fix:** adopt **Alembic** (see Phase 1).
|
||||
|
||||
---
|
||||
|
||||
### P1 — maintainability
|
||||
|
||||
#### P1.1 — Copy-pasted helpers, no shared util
|
||||
No shared formatting/number module. Duplicated implementations:
|
||||
|
||||
- `formatDate` — **9** files: `lib/components/ClientAccessWorkspace.svelte`,
|
||||
`lib/components/mix-calculator/MixCalculatorResultsPanel.svelte`,
|
||||
`lib/components/MixCalculatorPrintDocument.svelte`, `routes/+page.svelte`,
|
||||
`routes/admin/+page.svelte`, `routes/client-access/+page.svelte`,
|
||||
`routes/mix-calculator/+page.svelte`, `routes/raw-materials/+page.svelte`,
|
||||
`routes/throughput/+page.svelte`.
|
||||
- `formatNumber` — **5** files: `lib/components/mix-calculator/MixCalculatorEditor.svelte`,
|
||||
`lib/components/mix-calculator/MixCalculatorResultsPanel.svelte`,
|
||||
`lib/components/MixCalculatorPrintDocument.svelte`, `routes/mix-calculator/+page.svelte`,
|
||||
`routes/throughput/+page.svelte`.
|
||||
- `toNum` — **2** files: `routes/throughput/+page.svelte`, `routes/throughput/add/+page.svelte`.
|
||||
|
||||
**Symptom already hit:** the `toNum` `value.trim is not a function` bug (Svelte coerces
|
||||
`<input type="number">` bindings to `number`/`null`, not `string`). Fixed in both files
|
||||
2026-06-03, but this class of bug will recur until there is a single source of truth.
|
||||
|
||||
**Fix:** add `frontend/src/lib/format.ts` (`formatDate`, `formatNumber`, `formatCurrency`, `toNum`)
|
||||
and replace the duplicates.
|
||||
|
||||
#### P1.2 — Monolith route files
|
||||
Largest route components (LOC):
|
||||
|
||||
| File | LOC |
|
||||
| --- | --- |
|
||||
| `routes/+page.svelte` (dashboard) | 2238 |
|
||||
| `routes/product-costing/+page.svelte` | 1557 |
|
||||
| `routes/throughput/+page.svelte` | 1232 |
|
||||
| `routes/editor/+page.svelte` | 1163 |
|
||||
| `routes/raw-materials/+page.svelte` | 1062 |
|
||||
| `routes/client-access/+page.svelte` | 851 |
|
||||
| `routes/reporting/+page.svelte` | 518 |
|
||||
|
||||
**Why it matters:** hard to test, hard to change safely, encourages more copy-paste.
|
||||
|
||||
**Fix:** decompose incrementally — extract components and `+page.ts` load logic. Dashboard and
|
||||
product-costing first.
|
||||
|
||||
#### P1.3 — `migrations.py` conflates DDL + data backfill
|
||||
Schema DDL (`ensure_*`) and data backfill (`sync_*`) live in one module, including ~250 lines of
|
||||
near-identical `UPDATE` blocks in `sync_tenant_ids()`. Alembic will absorb most of this into
|
||||
versioned schema steps + explicit data-migration steps.
|
||||
|
||||
---
|
||||
|
||||
### P2 — hygiene
|
||||
|
||||
- **P2.1** — `backend/tests/_repro_throughput_post.py` is a debug repro, not a real test. Remove.
|
||||
- **P2.2** — ~20 `TODO`/`FIXME`/legacy markers across `backend/app` and `frontend/src`. Triage and burn down.
|
||||
- **P2.3** — `backend/app/seed.py` is 1325 LOC. Split by module.
|
||||
- **P2.4** — `CLAUDE.md` still says "PostgreSQL recommended / SQLite acceptable only for prototype",
|
||||
stale vs the live Postgres migration. Refresh.
|
||||
|
||||
---
|
||||
|
||||
## P0.4 — Three overlapping authentication / user-type systems
|
||||
|
||||
The app has accreted **three separate auth systems**, with **two cookies**, **three login
|
||||
endpoints**, **three role namespaces**, and two parallel permission models. They overlap awkwardly
|
||||
and one of them is already dead in the UI. This is the single largest piece of structural debt in
|
||||
the codebase.
|
||||
|
||||
### The three systems
|
||||
|
||||
**1. Internal / "lean" system** — `users` / `roles` / `permissions` / `role_permissions`
|
||||
- Code: `app/core/access.py`, `app/models/access.py`, `app/api/access.py` (`/api/access/*`),
|
||||
`app/seed_access.py`.
|
||||
- Per-user password hash (`User.password_hash`, PBKDF2). Role → permission keys
|
||||
(`view_raw_materials`, `edit_products`, …). Fail-closed `require_permission(...)` dependencies.
|
||||
- Token carries `sub=INTERNAL_USER_SUBJECT`; session `role="internal"`.
|
||||
- Tenant is **hardcoded** to a constant: `INTERNAL_USER_TENANT_ID = "hunter-premium-produce"`
|
||||
(`core/access.py:37`).
|
||||
- Uses `CLIENT_AUTH_COOKIE`.
|
||||
- **This is the actual primary login.** The root page (`routes/+page.svelte:57`) calls
|
||||
`api.internalLogin()` → `/api/access/login`.
|
||||
|
||||
**2. Client-portal system** — `client_accounts` / `client_users` / `client_feature_access` /
|
||||
`client_user_module_permissions` / `client_access_audit_events`
|
||||
- Code: `app/models/client_access.py`, `app/services/client_access_service.py`,
|
||||
`app/api/auth.py` (`/api/auth/client/*`), `app/api/client_access.py`.
|
||||
- **Multi-tenant** (`tenant_id` on every table), per-account feature flags, per-user module
|
||||
access levels (`none`/`view`/`edit`/`manage`), `client_role` in {superadmin, admin, viewer, …}.
|
||||
- Session `role="client"`. Uses `CLIENT_AUTH_COOKIE`.
|
||||
- **Authentication is broken-by-design: a single shared password.** `client_login` checks
|
||||
`payload.password != settings.client_password` (`api/auth.py:79`) — *one* password for *all*
|
||||
client users; the per-user record only supplies identity, not a credential.
|
||||
- **The login UI is dead.** `api.clientLogin` is referenced only by `api.ts` (definition) and
|
||||
`api.test.ts` — no component calls it. The endpoints, tables, tenant plumbing, and the
|
||||
`require_client_session` / `module_access_map` path are all still live and still wired into the
|
||||
shared route dependencies.
|
||||
|
||||
**3. Admin system** — environment-variable single credential
|
||||
- Code: `app/api/auth.py` (`/api/auth/admin/*`), `require_admin_session` in `app/api/deps.py`.
|
||||
- No DB row. Authenticates against `settings.admin_email` / `settings.admin_password`.
|
||||
Session `role="admin"` → **blanket access** (`session.ts:105` `hasModuleAccess` returns `true`
|
||||
for admin; `require_admin_session` gates the admin-only routes).
|
||||
- Uses a **second cookie**, `ADMIN_AUTH_COOKIE`.
|
||||
- Drives the separate `/admin` + `/admin/client-access` UI (`routes/admin/+page.svelte` calls
|
||||
`api.adminLogin()`), which exists to manage client users / feature flags / Power BI preview —
|
||||
i.e. the "management behind the scenes" layer we no longer want.
|
||||
|
||||
### How they tangle
|
||||
|
||||
- **Frontend** routes by URL: `/admin*` → `AdminShell`, everything else → `ClientShell`
|
||||
(`routes/+layout.svelte:15,42-49`). Two `localStorage` session stores
|
||||
(`data-entry-app-client-session`, `data-entry-app-admin-session`) in `session.ts`.
|
||||
- **Shared route deps bend to accept two token shapes.** `require_client_session` and
|
||||
`require_client_module_access` (`api/deps.py:97-184`) special-case `role=="internal"` to skip the
|
||||
`ClientUser` DB lookup and read permissions from a role-derived map, while still supporting
|
||||
`role=="client"`. `core/access.py:_PERMISSION_TO_MODULE_LEVEL` exists purely to translate the
|
||||
internal permission keys into the legacy client module/level shape so the same routes accept both.
|
||||
- **Two permission models run in parallel**: role→permission-keys (internal) vs
|
||||
per-user module→access-level rows + per-account feature flags (client). `permissions_to_module_map`
|
||||
bridges them.
|
||||
- **`tenant_id` is smeared across ~25 tables and ~25 backend files** (heaviest:
|
||||
`db/migrations.py` 70 refs, `seed.py` 52, `api/product_costing.py` 36, plus every service/model),
|
||||
for multi-tenancy we no longer want.
|
||||
|
||||
### Target architecture (per product direction)
|
||||
|
||||
> One login for everyone. User type `lean` = full access. `client` = its own permissions.
|
||||
> No multi-tenant. No separate behind-the-scenes management app, except `lean` may have a few
|
||||
> extra settings (e.g. change logo).
|
||||
|
||||
- **One login endpoint + one login page** for all users.
|
||||
- **One user store**: keep `users` / `roles` / `permissions` / `role_permissions`. Everyone is a
|
||||
`User` with a role. Add a **`lean`** role = all permissions (full access, including the extra
|
||||
settings like logo). Define a **`client`** role with its own permission set. Operations etc. stay
|
||||
as additional roles.
|
||||
- **One cookie**, one session shape, one frontend session store.
|
||||
- **Remove multi-tenancy**: drop `tenant_id` from models/queries/migrations/seed; collapse to a
|
||||
single implicit tenant.
|
||||
- **Retire the env-var admin login** and the **dead client-portal login** + its tables/service,
|
||||
folding any still-needed capability (e.g. managing users) into permission-gated routes inside the
|
||||
single app. `lean`-only settings (logo, etc.) become permission-gated, not a separate shell.
|
||||
|
||||
### Decoupling / migration approach (proposed)
|
||||
|
||||
1. **Confirm the dead path is dead** (done: `clientLogin` has no UI caller) and snapshot any client
|
||||
data worth keeping (`client_users`, module permissions) so it can be re-expressed as `users` +
|
||||
`roles` if needed.
|
||||
2. **Unify on the `users`/`roles`/`permissions` model.** Introduce `lean` and `client` roles in
|
||||
`seed_access.py` with the right permission sets. Migrate any real client users into `users`.
|
||||
3. **Single login**: make `/api/access/login` the only login; one cookie; one session store; one
|
||||
login page. Remove `/api/auth/admin/*`, `/api/auth/client/*`, `ADMIN_AUTH_COOKIE`, the
|
||||
admin/client localStorage split, and the `/admin*` shell routing (fold any surviving admin
|
||||
screens into permission-gated routes in the main app).
|
||||
4. **Collapse the dual permission model**: drop `_PERMISSION_TO_MODULE_LEVEL` bridging and the
|
||||
`role=="internal"` / `role=="client"` special-casing in `deps.py`; every route depends on
|
||||
`require_permission(...)` (or a thin module-level wrapper) only.
|
||||
5. **Drop multi-tenancy**: Alembic migration removing `tenant_id` columns (or leaving them nullable
|
||||
and unused first, then dropping), plus removing `tenant_id` filters from services/queries and the
|
||||
`sync_tenant_ids` backfill. **Sequence this on top of Phase 1 (Alembic)** so the column drops are
|
||||
versioned and run identically on SQLite and Postgres.
|
||||
6. **Delete the client-portal subsystem** once nothing references it: `models/client_access.py`,
|
||||
`client_access_service.py`, `api/auth.py`, `api/client_access.py`, related schemas, and the
|
||||
`ClientShell`/`AdminShell` split.
|
||||
|
||||
### Risk notes
|
||||
|
||||
- This touches **authentication** — stage it carefully behind tests; do not delete the old endpoints
|
||||
until the unified login is proven in dev against both SQLite and (a Postgres copy of) prod.
|
||||
- The shared password (`P0.4`/system 2) and the env-admin credential should be considered a
|
||||
**security cleanup**, not just structure: per-user hashed passwords for everyone is the target.
|
||||
- Dropping `tenant_id` is irreversible data-wise — do it as a dedicated, reviewed Alembic step with
|
||||
a backup, after the login unification has settled.
|
||||
|
||||
---
|
||||
|
||||
## Remediation plan (phased)
|
||||
|
||||
### Phase 0 — Safety net (no behavior change)
|
||||
- Add a schema-parity smoke test: fresh-SQLite `create_all` vs `Base.metadata` so later phases
|
||||
cannot silently drift.
|
||||
- Remove `backend/tests/_repro_throughput_post.py`.
|
||||
|
||||
### Phase 1 — Adopt Alembic *(foundation; most moving parts)*
|
||||
- Add `alembic` to `backend/pyproject.toml`; `alembic init`.
|
||||
- Wire `env.py` to read `DATABASE_URL` (via `app.core.config.settings`) and `Base.metadata`.
|
||||
- **Critical for this setup:** enable `render_as_batch=True` so `ALTER` works on **SQLite (Windows dev)**;
|
||||
Postgres handles `ALTER` natively.
|
||||
- Autogenerate a **`0001_baseline`** migration from current models.
|
||||
- `alembic stamp 0001_baseline` on existing dev **and** prod DBs so they are recognized without rebuilding.
|
||||
- Fold `_LEGACY_COLUMN_PATCHES`, `sync_tenant_ids`, and `sync_product_visibility` into versioned
|
||||
migrations (schema steps + explicit data-migration steps).
|
||||
- Replace the startup `bootstrap_schema(...)` call (`main.py:105`) with `alembic upgrade head`
|
||||
(or an explicit deploy step).
|
||||
- Update `deploy/migrate-to-postgres.sh` **Phase 5** to run `alembic upgrade head` instead of
|
||||
calling `bootstrap_schema`.
|
||||
|
||||
### Phase 2 — Money correctness
|
||||
- `Float → Numeric(12, 4)` (tune per field) across the money/quantity columns listed in P0.1.
|
||||
- Use `Decimal` in `services/costing_engine.py` and `services/product_costing_service.py`.
|
||||
- Dedicated Alembic migration; guard with `tests/test_costing_engine.py` formula-parity tests.
|
||||
|
||||
### Phase 3 — Frontend shared utils
|
||||
- New `frontend/src/lib/format.ts`: `formatDate`, `formatNumber`, `formatCurrency`, `toNum`.
|
||||
- Replace the 9 / 5 / 2 duplicate implementations. Eliminates the `toNum`-style bug class.
|
||||
|
||||
### Phase 4 — Remove mock-on-error fallback *(quick, high-value correctness fix)*
|
||||
- Remove the `fallback` return path in `api.ts` `fetchJson`.
|
||||
- Surface real API errors / empty states in the UI.
|
||||
- Keep `mock.ts` for tests only.
|
||||
|
||||
### Phase 5 — Unify authentication & user types *(addresses P0.4; large, cross-cutting)*
|
||||
Sits on top of Phase 1 (Alembic) because the column drops must be versioned. Order within the phase:
|
||||
1. Snapshot/migrate any real client users into `users` + `roles`; add `lean` and `client` roles in
|
||||
`seed_access.py`.
|
||||
2. Single login: make `/api/access/login` the only login; one cookie; one session store; one login
|
||||
page. Remove `/api/auth/admin/*`, `/api/auth/client/*`, `ADMIN_AUTH_COOKIE`, and the `/admin*`
|
||||
shell split.
|
||||
3. Collapse the dual permission model — every route on `require_permission(...)`; delete the
|
||||
`internal`/`client` special-casing and `_PERMISSION_TO_MODULE_LEVEL` bridge in `deps.py`/`access.py`.
|
||||
4. Drop multi-tenancy (`tenant_id`) via a dedicated Alembic migration + query cleanup; remove
|
||||
`sync_tenant_ids`.
|
||||
5. Delete the dead client-portal subsystem (`models/client_access.py`, `client_access_service.py`,
|
||||
`api/auth.py`, `api/client_access.py`, `AdminShell`).
|
||||
6. `lean`-only extras (logo change, etc.) become permission-gated settings in the single app.
|
||||
|
||||
### Phase 6 — Decompose monolith routes
|
||||
- Incrementally extract components + `+page.ts` load logic. Start with dashboard (`+page.svelte`)
|
||||
and product-costing.
|
||||
|
||||
### Phase 7 — Hygiene
|
||||
- Burn down `TODO`/legacy markers.
|
||||
- Split `seed.py` by module.
|
||||
- Refresh `CLAUDE.md` DB guidance.
|
||||
|
||||
---
|
||||
|
||||
## Suggested sequencing note
|
||||
Phase 1 (Alembic) is the foundation the SQLite-dev / Postgres-prod split most depends on. However,
|
||||
**Phase 4 (mock-on-error)** is the scariest correctness bug and a ~20-minute fix — a reasonable
|
||||
quick win to do first, before Phase 1.
|
||||
|
||||
## Progress log
|
||||
- 2026-06-03 — Audit completed; plan written. `toNum` bug fixed in
|
||||
`routes/throughput/+page.svelte` and `routes/throughput/add/+page.svelte` (precursor to Phase 3).
|
||||
- 2026-06-03 — Auth/user-type investigation added (P0.4 + Phase 5). Found three overlapping auth
|
||||
systems; the client-portal login (`/api/auth/client/login`, shared password) is already dead in
|
||||
the UI (`clientLogin` has no component caller). Target: single login, `lean`/`client` roles, no
|
||||
multi-tenant, no separate admin shell.
|
||||
- 2026-06-04 — Phase 4 quick win started: removed production `api.ts` mock-on-error fallback so
|
||||
failed reads throw normalized API errors instead of returning fabricated mock pricing/costing data.
|
||||
Removed `backend/tests/_repro_throughput_post.py` debug repro file.
|
||||
- 2026-06-04 — Phase 0 safety net started: added a fresh SQLite schema smoke test that checks
|
||||
model metadata tables and columns are created as declared.
|
||||
- 2026-06-04 — Phase 3 shared utils started: added `frontend/src/lib/format.ts`, covered it with
|
||||
unit tests, and replaced the duplicated `toNum` helper plus the mix-calculator/throughput number
|
||||
and date formatters touched in recent work.
|
||||
Reference in New Issue
Block a user