Files
gw-svelte/docs/server-side-analytics.md
T
2026-05-26 23:30:22 +12:00

117 lines
6.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Server-side analytics + visitor journey
Two things in one pipeline:
1. **Ad-block-resistant GA4** — forward events server-to-server when client-side `gtag.js` is blocked.
2. **Visitor journey reconstruction** — record every event into our own DB, and when a visitor submits the booking form, link their journey to that submission so the owner can review it in the CP dashboard.
## Why each piece exists
### Ad-block fallback
Browser ad blockers (uBlock, Brave, Safari ITP, AdGuard, Pi-hole) block requests to `googletagmanager.com` and `google-analytics.com`. For NZ consumer traffic that's roughly **2040% of visits silently lost**. The fix is a first-party endpoint on our own domain that blocklists don't match, which forwards to GA4 via the Measurement Protocol.
### Journey reconstruction
GA4 is aggregate. The owner can see "200 hero CTA clicks this week" but not "this specific submission's journey was /pricing → /about → hero CTA → form." For a small services business, knowing what a *specific lead* engaged with before submitting is more useful than another aggregate dashboard.
## Architecture
```
Browser ── trackEvent() ──┬─► gtag (when not blocked) ─► GA4
└─► /api/track (always) ─┬─► session_events table
└─► GA4 (only when gtag missing)
Browser also keeps a rolling sessionStorage buffer of the last 30 events
as a fallback (in case /api/track is itself blocked at the network layer).
On booking form submit success:
Browser ── promoteJourney(email) ──► /api/track/promote
├─► reads session_events for this anon_id
├─► reads sessionStorage buffer from request body
└─► writes one row to submission_journeys
Owner opens enquiry in CP dashboard:
AdminDashboard ── /api/owner/client-enquiry?email=... ──► mail-api
├─► returns enquiry record
└─► returns submission_journeys row
```
## Tables (`docker/postgres/init/004-session-events.sql`)
### `session_events`
Every analytics event, keyed by the `anonId` cookie set in `src/hooks.server.ts`. **Pruned after 24h** by a probabilistic cleanup inside `/api/track` (~1 in 200 inserts triggers a `DELETE WHERE created_at < now() - 24h`). No cron container needed — cleanup runs naturally with traffic.
### `submission_journeys`
Promoted journeys keyed by email. **Not auto-pruned.** Owner-facing data. Contains:
- `events` — snapshot of `session_events` rows at promotion time (server-captured)
- `client_events` — the sessionStorage buffer the client posted (fallback)
The merge happens in the CP UI (`AdminDashboard.mergedJourneyEvents`), de-duped by `name|page_path|ts`.
## De-duplication
- **GA4** receives each event exactly once: client when gtag is loaded, server when it isn't. The `forward_ga4` flag in the `/api/track` body controls this.
- **session_events** receives every event once (always written server-side).
- **Journey display** merges server + client events with key `name|page_path|ts`.
## Privacy
Disclosed in `src/lib/content/privacy-policy.ts` under the **Analytics** section. The key promises:
- Browsing record contains pages, clicks, timestamps, and a random browser ID — never name/email/phone or form contents.
- Unsubmitted journeys are deleted within 24h.
- Submitted journeys are linked to the enquiry email, visible only to the Goodwalk team, never shared or used for advertising.
- Users can request deletion at info@goodwalk.co.nz.
**Update the policy in the same PR** if you ever change what's stored or how long.
## Configuration
```bash
GA4_MEASUREMENT_ID=G-K7TLSFJVP1 # already in deploy.env.template
GA4_API_SECRET=<from GA4 admin> # required for the GA4 forward
```
To get the API secret: GA4 admin → Data Streams → web stream → Measurement Protocol API secrets → Create. Without it, `/api/track` still records to `session_events` (journey works) — only the GA4 forward is off.
## Files
- `src/routes/api/track/+server.ts` — main ingest, persists + forwards
- `src/routes/api/track/promote/+server.ts` — links journey to submission email
- `src/lib/analytics.ts` — client `trackEvent`, sessionStorage buffer, `promoteJourney(email)`
- `src/lib/components/BookingWizard.svelte` — calls `promoteJourney(email)` on submit success
- `mail-api/db.py``get_submission_journey(email)` reader
- `mail-api/main.py``/owner/client-enquiry` returns `{enquiry, journey}`
- `src/lib/components/admin-dashboard/AdminDashboard.svelte` — renders the **Visitor journey** section in the enquiry modal
- `src/lib/content/privacy-policy.ts` — disclosure
- `docker/postgres/init/004-session-events.sql` — table definitions
## Testing locally
Without env vars set (no GA4 forwarding):
```bash
curl -X POST http://localhost:5173/api/track \
-H 'content-type: application/json' \
-H 'user-agent: Mozilla/5.0' \
-d '{"name":"test_event","params":{"label":"manual","page_path":"/"}}'
```
Then check the row landed:
```bash
docker exec -it goodwalk_svelte_db psql -U goodwalk -d goodwalk \
-c "select event_name, page_path, created_at from session_events order by id desc limit 5;"
```
To test the full journey flow locally, submit a booking through the wizard with a test email, then open `cp.goodwalk.local` (or use `?preview=cp` on localhost), open the enquiry for that email, and the **Visitor journey** panel should list every page view and click that led to the submission.
## What this does NOT do
- **Meta Pixel / Facebook Ads** — same blocker problem, different fix (Conversions API). Not built.
- **Real-time owner notifications** — journey is visible only after submission, not as a live feed of who's on the site.
- **Cross-device journey** — anon_id is per-browser. A visitor who researches on phone then submits on laptop produces two separate (mostly empty) journeys.
- **Consent banner** — NZ has no explicit cookie law today. If we ever serve EU/UK traffic, we need Consent Mode v2 before this pipeline is legal there for the GA4 forward.
- **Pruning of `submission_journeys`** — these are kept indefinitely. If you want a max retention (e.g. delete journeys older than 12 months), add a cron or extend the probabilistic cleanup in `/api/track`.