Files
sheq-analysis-tool/README.md
T
2026-04-20 15:22:31 +12:00

18 KiB
Raw Blame History

SHEQ Analysis Tool

A local Python web application that loads three SHEQ data sources — Events, Safety Energy, and LLC Data — and produces a comprehensive DOCX safety performance report suitable for executive and board-level reporting.


What the Tool Does

The tool has two modes:

  1. Events Explorer — interactive browser-based charts for filtering and exploring incident data in real time.
  2. Full Safety Report — a one-click DOCX report covering ten analysis sections:
    • Executive Summary
    • Data Quality and Coverage
    • Events Analysis (full-window trends, type breakdown, CRP, root causes, serious-event hotspots, motor vehicle insights)
    • Safety Energy Leading Activity Overview (LLC / CCC / OCC trends, topics, leaders, two-year quality view)
    • Effectiveness of Leading Activities (BU-level comparison, monthly correlation)
    • At-Risk Behaviours (theme extraction from free text)
    • Relationship Between Safety Energy and Events (monthly overlay, spike detection)
    • Leader Focus Areas (declining BUs, activity gaps, high-volume / low-value hotspots)
    • Recommended Actions (auto-generated from findings)
    • Methodology and Caveats

The report now includes a dedicated rolling two-year Safety Energy trend and quality analysis focused on whether CCC, OCC, and LLC activity appears meaningful and informative, or whether parts of the dataset are drifting toward low-value administrative completion.


Required Input Files

Place all three files in the project root directory (or configure paths via environment variables):

File Description
Events.xlsx Incident and event records exported from the Ventia safety management system
Safety_Energy.xlsx Combined leading activity export: LLC, CCC, and OCC records
LLC_Data.xlsx Supplementary LLC export with richer free-text fields (topics, observations)

Expected File Structures

Events.xlsx — key columns used:

Column Notes
EventDate Date of event (accepts "Monday, 25 March 2024" or ISO format)
EventType / Event Type Category (Injury/Illness, Motor Vehicle, Close Call, etc.)
Actual Consequence Negligible / Minor / Moderate / Major / Substantial
Status Open / Closed
Business Unit Organisational unit
Project Project name
CRP Involved Critical Risk Protocol(s) involved
Root Cause Category Top-level root cause
Ventia Injury Classification FAT / LTI / MTI / FAT etc.
Bodily Location Comma-separated body parts
Brief Description Free-text (used for theme extraction)

Safety_Energy.xlsx — key columns used:

Column Notes
EventDate Date of activity
ModuleType Leader Learning Conversation / Critical Control Check / Operational Control Check
ModuleName Specific activity name
ModulePrefix Short code (LLC, CCC1, OCC2, etc.)
CompletedByName Leader who conducted the activity
Business Unit Organisational unit
Project Project name
At Risk Aspects Count of at-risk items identified
Total Questions Total checklist items assessed
Actions Number of corrective actions raised

Additional Safety_Energy fields are now used when available to improve quality and theme analysis, including:

  • Immediate Actions Taken / Comments
  • Instruction
  • Top practices
  • Top improvement opportunities
  • Review & Action
  • Best practices shared with site leaders
  • Activity/Task
  • Was a critical risk identified and controls verified as effective and in place?
  • Specific Location / Location
  • Shift

LLC_Data.xlsx — key columns used:

Column Notes
EventDate Date conducted
LLC Topic Conversation topic
Conducted by Leader name
Business Unit Organisational unit
CRP in Focus CRP discussed during the LLC
At risk work practices observed Flag count
At risk situation/observation Free-text description

How Safety Energy is Interpreted

Safety Energy is treated as the combined analytical domain covering all leading activity types:

  • LLC (Leader Learning Conversation): A structured conversation between a leader and a worker or work group, focused on safety topics, risk identification, and critical controls.
  • CCC (Critical Control Check): A field verification that critical controls for high-risk activities are in place and effective.
  • OCC (Operational Control Check): A broader operational inspection covering a range of risk topics.

Note on "OCC" labelling: In some legacy documentation, the term "OCC" was used broadly to cover items now separated into CCC and OCC in the current Safety_Energy export. The current Safety_Energy.xlsx file correctly separates these using the ModuleType column. No manual deduplication is required. This decision is documented in config.py.

LLC_Data and Safety_Energy are complementary exports. Safety_Energy provides authoritative counts for all three activity types. LLC_Data provides richer free-text content for topic and theme analysis. Where both contain LLC records, they are used independently for their respective strengths.


What Was Added

The analysis engine has been expanded to add a rolling two-year Safety Energy review that goes beyond activity counts and looks at likely activity value.

New outputs include:

  • monthly and quarterly activity mix across LLC / CCC / OCC
  • year-on-year change indicators by activity type
  • monthly quality trend lines by activity type
  • recurring themes and rising / declining focus areas over the last two years
  • CCC-specific recurring module analysis
  • Business Unit snapshots showing where quality appears stronger or weaker
  • identification of high-volume / low-value hotspots
  • leadership watchouts focused on shallow, repetitive, reactive, or low-follow-up records

This is intended to help answer questions such as:

  • What are our CCCs really telling us?
  • Are CCCs / OCCCs / LLCs surfacing meaningful risk and learning?
  • Where do records look preventive and high value?
  • Where does the dataset suggest compliance-only behaviour?

How Events is Compared Against Leading Activities

The analysis engine compares Safety Energy data against Events on three levels:

  1. Business Unit level: Total activities and total events per BU are tabulated. BUs with high activities and low events are flagged as positive patterns; BUs with high activities and high events are flagged for review (possible reactive patterns).

  2. Monthly level: Monthly activity counts and monthly event counts are plotted together on a dual-axis chart. Periods where events spike while activities are below average are flagged as spike months.

  3. Theme level: LLC conversation topics are compared against event root causes and free-text descriptions. Gaps between what is being discussed in LLCs and what is actually causing events are surfaced as alignment gaps.


How to Run Locally

Prerequisites

Python 3.10+

Install dependencies

pip install -r requirements.txt

Place data files

Copy Events.xlsx, Safety_Energy.xlsx, and LLC_Data.xlsx into the project root.

Start the application

python app.py

Open http://localhost:5000 in your browser.

Using the Events Explorer

  1. Adjust the date range and filter selections in the left sidebar.
  2. Click Apply Filters — charts load in the main panel.

Generating the Full Report

  1. In the sidebar under Full Safety Report, set:
    • Analysis Start Date — earliest date to include (e.g. 2024-01-01)
  2. Click Download Full Report.
  3. The app loads all three files, runs the analysis (typically 2060 seconds), and downloads a .docx file to your browser's download folder.

The full report now automatically computes a rolling two-year Safety Energy window ending on the latest date in Safety_Energy.xlsx. This deeper trend view runs alongside the existing broader report logic.

Environment Variables

Override default file paths without editing code:

Variable Default Description
SHEQ_EVENTS_FILE Events.xlsx Path to Events file
SHEQ_SE_FILE Safety_Energy.xlsx Path to Safety Energy file
SHEQ_LLC_FILE LLC_Data.xlsx Path to LLC Data file
SHEQ_OUTPUT_DIR output/ Directory for generated reports and charts

Example:

SHEQ_EVENTS_FILE=data/Events_2025.xlsx python app.py

Project Structure

sheq/
├── app.py                  # Flask web application (routes and server)
├── config.py               # Column mappings, constants, brand colours
├── data_loader.py          # Load and normalise all three data sources
├── analysis_engine.py      # Analysis logic (trends, effectiveness, at-risk themes)
├── report_builder.py       # DOCX report generation
├── analysis.py             # Legacy Events-only report (preserved for backwards compatibility)
├── requirements.txt        # Python dependencies
├── DESIGN.md               # Ventia brand guidelines (typography, colours)
├── templates/
│   └── index.html          # Web UI
├── static/                 # Static assets (if any)
└── output/                 # Generated reports land here (gitignored)

Sample Output

The generated DOCX includes:

  1. Title page with data coverage dates
  2. Executive Summary with full-window event KPIs and Safety Energy totals
  3. Data Quality tables showing row counts, date coverage, and null rates
  4. Events Analysis — monthly trend chart, consequence breakdown, root causes, serious-event hotspots, timing, and motor vehicle insights
  5. Safety Energy Overview — activity mix donut, monthly stacked bar, BU breakdown, LLC topics, CRP focus, top leaders, and two-year quality view
  6. Effectiveness — monthly overlay chart (activities vs events), BU comparison table, correlation note
  7. At-Risk Behaviours — combined theme frequency chart, LLC vs events theme comparison, alignment gaps
  8. Safety Energy ↔ Events Relationship — BU activity-to-event ratio table, spike months, topic alignment
  9. Leader Focus Areas — declining activity BUs, BU summary table
  10. Recommended Actions — auto-generated list based on findings
  11. Methodology & Caveats — data source descriptions, activity type definitions, analytical approach

All charts and tables follow the Ventia brand colour palette and Source Sans Pro typography as specified in DESIGN.md.

Additional report content now includes:

  • rolling two-year quality trend chart for LLC / CCC / OCC
  • quality summary table by activity type
  • top recurring Safety Energy themes
  • CCC / OCC / LLC value signals
  • high-volume / low-value hotspot chart and table
  • leadership watchouts derived from two-year patterns

How the Two-Year Trend Analysis Works

The two-year analysis is anchored to the latest Safety_Energy.xlsx record and looks back 24 calendar months. If the dataset contains fewer than 24 months, the tool uses the available period and reports the actual window used.

Data points used

The deeper analysis looks across more than just headline counts. Depending on which fields are populated, it uses:

  • activity type, module name, module prefix
  • business unit, project, location, shift, leader
  • at-risk aspects, total questions, actions, ATL actions
  • at-risk CRP and critical-risk verification fields
  • LLC topic, at-risk observations, positive observations
  • immediate actions / comments, instructions, review & action notes
  • top practices and top improvement opportunities
  • free-text narrative fields and repeated wording patterns

How quality is inferred

“Quality” is a proxy score, not an audit result. The tool scores each Safety Energy record using a weighted blend of signals such as:

  • text richness: longer, more descriptive entries score higher
  • specificity: records with more unique wording, concrete detail, and named themes score higher
  • input depth: rows with more meaningfully populated fields across observations, actions, topics, and context score higher as a supporting signal
  • action orientation: actions raised, close-out wording, and action verbs lift the score
  • learning evidence: coaching, feedback, lesson, or best-practice wording lifts the score
  • hazard / risk recognition: at-risk aspects, critical-risk language, and control verification lift the score
  • follow-up depth: review, monitor, close-out, owner, or escalation language lifts the score
  • low-value indicators: generic wording, very short entries, and repeated duplicated narratives reduce the score

The score is then used to classify records into broad bands such as:

  • High value
  • Meaningful
  • Mixed
  • Shallow

These bands are intended to guide leadership attention, not replace manual review of the underlying entries.

The tool now also calculates a separate input depth metric for each Safety Energy row. This measures how many useful inputs are actually populated, after excluding empty, generic, or placeholder values. The report compares input depth against overall quality so leaders can see whether “more complete rows” are a practical supporting proxy for better-quality records.

What the two-year outputs are trying to detect

  • activity volume changes over time
  • whether activity mix is shifting toward or away from CCC / OCC / LLC
  • whether quality is improving or drifting
  • whether certain themes or modules keep reappearing without stronger evidence of learning
  • whether some teams produce high volumes of low-detail records
  • whether entries look more preventive, reactive, repetitive, or shallow
  • where leadership attention should go next

Key Questions the Tool Helps Answer

  • Are our leading activities effective, or do we have the same event rates despite high activity volumes?
  • Which Business Units have both high activity and high event counts (reactive pattern)?
  • Which Business Units have declining leading-activity engagement?
  • Which projects and locations appear strongest when Safety Energy activity is compared against event volume?
  • Which projects and locations are carrying the heaviest serious-event burden?
  • What time of day are serious events occurring?
  • What do the motor vehicle events tell us about road type, road condition, and vehicle mix?
  • Are the topics we discuss in LLCs aligned with the actual causes of events?
  • Which CRPs are being focused on in field conversations, and do they match the CRPs appearing in events?
  • Who are the most active leaders, and who may need engagement to increase their activity cadence?
  • In which months did events spike while activities were below average?
  • What at-risk behaviour themes are most prominent across all data sources?

Analysis Limitations

  • Correlation ≠ causation: Statistical associations between activity counts and event counts are indicative only and do not prove causal relationships.
  • Under-reporting: Activity counts depend on accurate data entry. Under-reporting in any source will affect all analyses that use that source.
  • Text analysis: At-risk theme extraction uses keyword matching only. Nuanced or ambiguously worded entries may be missed or miscategorised.
  • Quality scoring is inferential: The new leading-activity quality score is a practical proxy based on record content. It is useful for triage and trend monitoring, but it does not prove whether an individual activity was genuinely high quality in the field.
  • Business Unit comparisons: BUs vary in headcount, contract scope, and operational risk profile. Raw count comparisons should be interpreted in context.
  • Short time windows: Correlation analysis requires at least 4 overlapping months. Shorter windows will not produce a correlation result.
  • Date format variance: Dates in the source files may use long-form formats ("Monday, 25 March 2024"). The data loader handles these automatically, but unusual formats may result in NaT values and reduced row counts.

Troubleshooting

Issue Resolution
FileNotFoundError on report generation Confirm all three .xlsx files are present in the project root (or check environment variable paths)
Report generates but charts are missing Check output/ folder for chart .png files; matplotlib may have failed silently — check terminal output
ModuleNotFoundError Run pip install -r requirements.txt
Dates parsing as NaT Open the xlsx in Excel and verify the date column format; the loader handles ISO and long-form formats
Empty sections in report A section is empty when the relevant columns are absent or entirely null in the source data — check column names against config.py
"No overlapping data" in correlation The date ranges of Events.xlsx and Safety_Energy.xlsx don't overlap — check start_date parameter
App runs but filters return no data The Events.xlsx date column name may differ — check config.py EVENTS_COL_MAP and adjust if needed

Configuration

All column-name mappings, file paths, brand colours, and analysis thresholds are in config.py.

Key settings to review:

  • EVENTS_COL_MAP — if Events column names change between exports, update the candidates list
  • SE_COL_MAP / LLC_COL_MAP — same for Safety Energy and LLC files
  • AT_RISK_KEYWORDS — add or edit keyword groups to tune theme extraction
  • TWO_YEAR_WINDOW_MONTHS — rolling window length for deeper Safety Energy trend analysis
  • QUALITY_SCORE_BANDS — thresholds used to label records as high value, meaningful, mixed, or shallow
  • LEADER_MIN_ACTIVITIES — threshold for flagging low-activity leaders (default: 5)
  • CORR_MIN_MONTHS — minimum months required before reporting a correlation (default: 4)
  • DEFAULT_START_DATE / DEFAULT_SPLIT_DATE — default date parameters in the UI