This commit is contained in:
2026-05-26 23:30:22 +12:00
parent 135a5a3b83
commit 91b22c6d60
27 changed files with 2401 additions and 88 deletions
+1
View File
@@ -27,6 +27,7 @@ Reference, audit, and planning documents for the Goodwalk site. Project-level ru
- [deployment.md](deployment.md) — production deploy flow, server layout, nginx cutover
- [webp-conversion.md](webp-conversion.md) — one-time WebP setup for hero images
- [onboarding.md](onboarding.md) — client onboarding flow, lifecycle status, legacy CSV migration, Postgres target schema
- [server-side-analytics.md](server-side-analytics.md) — first-party `/api/track` endpoint that forwards to GA4 when ad blockers kill client-side gtag
## Archive
+116
View File
@@ -0,0 +1,116 @@
# Server-side analytics + visitor journey
Two things in one pipeline:
1. **Ad-block-resistant GA4** — forward events server-to-server when client-side `gtag.js` is blocked.
2. **Visitor journey reconstruction** — record every event into our own DB, and when a visitor submits the booking form, link their journey to that submission so the owner can review it in the CP dashboard.
## Why each piece exists
### Ad-block fallback
Browser ad blockers (uBlock, Brave, Safari ITP, AdGuard, Pi-hole) block requests to `googletagmanager.com` and `google-analytics.com`. For NZ consumer traffic that's roughly **2040% of visits silently lost**. The fix is a first-party endpoint on our own domain that blocklists don't match, which forwards to GA4 via the Measurement Protocol.
### Journey reconstruction
GA4 is aggregate. The owner can see "200 hero CTA clicks this week" but not "this specific submission's journey was /pricing → /about → hero CTA → form." For a small services business, knowing what a *specific lead* engaged with before submitting is more useful than another aggregate dashboard.
## Architecture
```
Browser ── trackEvent() ──┬─► gtag (when not blocked) ─► GA4
└─► /api/track (always) ─┬─► session_events table
└─► GA4 (only when gtag missing)
Browser also keeps a rolling sessionStorage buffer of the last 30 events
as a fallback (in case /api/track is itself blocked at the network layer).
On booking form submit success:
Browser ── promoteJourney(email) ──► /api/track/promote
├─► reads session_events for this anon_id
├─► reads sessionStorage buffer from request body
└─► writes one row to submission_journeys
Owner opens enquiry in CP dashboard:
AdminDashboard ── /api/owner/client-enquiry?email=... ──► mail-api
├─► returns enquiry record
└─► returns submission_journeys row
```
## Tables (`docker/postgres/init/004-session-events.sql`)
### `session_events`
Every analytics event, keyed by the `anonId` cookie set in `src/hooks.server.ts`. **Pruned after 24h** by a probabilistic cleanup inside `/api/track` (~1 in 200 inserts triggers a `DELETE WHERE created_at < now() - 24h`). No cron container needed — cleanup runs naturally with traffic.
### `submission_journeys`
Promoted journeys keyed by email. **Not auto-pruned.** Owner-facing data. Contains:
- `events` — snapshot of `session_events` rows at promotion time (server-captured)
- `client_events` — the sessionStorage buffer the client posted (fallback)
The merge happens in the CP UI (`AdminDashboard.mergedJourneyEvents`), de-duped by `name|page_path|ts`.
## De-duplication
- **GA4** receives each event exactly once: client when gtag is loaded, server when it isn't. The `forward_ga4` flag in the `/api/track` body controls this.
- **session_events** receives every event once (always written server-side).
- **Journey display** merges server + client events with key `name|page_path|ts`.
## Privacy
Disclosed in `src/lib/content/privacy-policy.ts` under the **Analytics** section. The key promises:
- Browsing record contains pages, clicks, timestamps, and a random browser ID — never name/email/phone or form contents.
- Unsubmitted journeys are deleted within 24h.
- Submitted journeys are linked to the enquiry email, visible only to the Goodwalk team, never shared or used for advertising.
- Users can request deletion at info@goodwalk.co.nz.
**Update the policy in the same PR** if you ever change what's stored or how long.
## Configuration
```bash
GA4_MEASUREMENT_ID=G-K7TLSFJVP1 # already in deploy.env.template
GA4_API_SECRET=<from GA4 admin> # required for the GA4 forward
```
To get the API secret: GA4 admin → Data Streams → web stream → Measurement Protocol API secrets → Create. Without it, `/api/track` still records to `session_events` (journey works) — only the GA4 forward is off.
## Files
- `src/routes/api/track/+server.ts` — main ingest, persists + forwards
- `src/routes/api/track/promote/+server.ts` — links journey to submission email
- `src/lib/analytics.ts` — client `trackEvent`, sessionStorage buffer, `promoteJourney(email)`
- `src/lib/components/BookingWizard.svelte` — calls `promoteJourney(email)` on submit success
- `mail-api/db.py``get_submission_journey(email)` reader
- `mail-api/main.py``/owner/client-enquiry` returns `{enquiry, journey}`
- `src/lib/components/admin-dashboard/AdminDashboard.svelte` — renders the **Visitor journey** section in the enquiry modal
- `src/lib/content/privacy-policy.ts` — disclosure
- `docker/postgres/init/004-session-events.sql` — table definitions
## Testing locally
Without env vars set (no GA4 forwarding):
```bash
curl -X POST http://localhost:5173/api/track \
-H 'content-type: application/json' \
-H 'user-agent: Mozilla/5.0' \
-d '{"name":"test_event","params":{"label":"manual","page_path":"/"}}'
```
Then check the row landed:
```bash
docker exec -it goodwalk_svelte_db psql -U goodwalk -d goodwalk \
-c "select event_name, page_path, created_at from session_events order by id desc limit 5;"
```
To test the full journey flow locally, submit a booking through the wizard with a test email, then open `cp.goodwalk.local` (or use `?preview=cp` on localhost), open the enquiry for that email, and the **Visitor journey** panel should list every page view and click that led to the submission.
## What this does NOT do
- **Meta Pixel / Facebook Ads** — same blocker problem, different fix (Conversions API). Not built.
- **Real-time owner notifications** — journey is visible only after submission, not as a live feed of who's on the site.
- **Cross-device journey** — anon_id is per-browser. A visitor who researches on phone then submits on laptop produces two separate (mostly empty) journeys.
- **Consent banner** — NZ has no explicit cookie law today. If we ever serve EU/UK traffic, we need Consent Mode v2 before this pipeline is legal there for the GA4 forward.
- **Pruning of `submission_journeys`** — these are kept indefinitely. If you want a max retention (e.g. delete journeys older than 12 months), add a cron or extend the probabilistic cleanup in `/api/track`.