Files

T

2026-04-20 15:22:31 +12:00

18 KiB

Raw Blame History

SHEQ Analysis Tool

A local Python web application that loads three SHEQ data sources — Events, Safety Energy, and LLC Data — and produces a comprehensive DOCX safety performance report suitable for executive and board-level reporting.

What the Tool Does

The tool has two modes:

Events Explorer — interactive browser-based charts for filtering and exploring incident data in real time.
Full Safety Report — a one-click DOCX report covering ten analysis sections:
- Executive Summary
- Data Quality and Coverage
- Events Analysis (full-window trends, type breakdown, CRP, root causes, serious-event hotspots, motor vehicle insights)
- Safety Energy Leading Activity Overview (LLC / CCC / OCC trends, topics, leaders, two-year quality view)
- Effectiveness of Leading Activities (BU-level comparison, monthly correlation)
- At-Risk Behaviours (theme extraction from free text)
- Relationship Between Safety Energy and Events (monthly overlay, spike detection)
- Leader Focus Areas (declining BUs, activity gaps, high-volume / low-value hotspots)
- Recommended Actions (auto-generated from findings)
- Methodology and Caveats

The report now includes a dedicated rolling two-year Safety Energy trend and quality analysis focused on whether CCC, OCC, and LLC activity appears meaningful and informative, or whether parts of the dataset are drifting toward low-value administrative completion.

Required Input Files

Place all three files in the project root directory (or configure paths via environment variables):

File	Description
`Events.xlsx`	Incident and event records exported from the Ventia safety management system
`Safety_Energy.xlsx`	Combined leading activity export: LLC, CCC, and OCC records
`LLC_Data.xlsx`	Supplementary LLC export with richer free-text fields (topics, observations)

Expected File Structures

Events.xlsx — key columns used:

Column	Notes
`EventDate`	Date of event (accepts "Monday, 25 March 2024" or ISO format)
`EventType` / `Event Type`	Category (Injury/Illness, Motor Vehicle, Close Call, etc.)
`Actual Consequence`	Negligible / Minor / Moderate / Major / Substantial
`Status`	Open / Closed
`Business Unit`	Organisational unit
`Project`	Project name
`CRP Involved`	Critical Risk Protocol(s) involved
`Root Cause Category`	Top-level root cause
`Ventia Injury Classification`	FAT / LTI / MTI / FAT etc.
`Bodily Location`	Comma-separated body parts
`Brief Description`	Free-text (used for theme extraction)

Safety_Energy.xlsx — key columns used:

Column	Notes
`EventDate`	Date of activity
`ModuleType`	`Leader Learning Conversation` / `Critical Control Check` / `Operational Control Check`
`ModuleName`	Specific activity name
`ModulePrefix`	Short code (LLC, CCC1, OCC2, etc.)
`CompletedByName`	Leader who conducted the activity
`Business Unit`	Organisational unit
`Project`	Project name
`At Risk Aspects`	Count of at-risk items identified
`Total Questions`	Total checklist items assessed
`Actions`	Number of corrective actions raised

Additional Safety_Energy fields are now used when available to improve quality and theme analysis, including:

Immediate Actions Taken / Comments
Instruction
Top practices
Top improvement opportunities
Review & Action
Best practices shared with site leaders
Activity/Task
Was a critical risk identified and controls verified as effective and in place?
Specific Location / Location
Shift

LLC_Data.xlsx — key columns used:

Column	Notes
`EventDate`	Date conducted
`LLC Topic`	Conversation topic
`Conducted by`	Leader name
`Business Unit`	Organisational unit
`CRP in Focus`	CRP discussed during the LLC
`At risk work practices observed`	Flag count
`At risk situation/observation`	Free-text description

How Safety Energy is Interpreted

Safety Energy is treated as the combined analytical domain covering all leading activity types:

LLC (Leader Learning Conversation): A structured conversation between a leader and a worker or work group, focused on safety topics, risk identification, and critical controls.
CCC (Critical Control Check): A field verification that critical controls for high-risk activities are in place and effective.
OCC (Operational Control Check): A broader operational inspection covering a range of risk topics.

Note on "OCC" labelling: In some legacy documentation, the term "OCC" was used broadly to cover items now separated into CCC and OCC in the current Safety_Energy export. The current Safety_Energy.xlsx file correctly separates these using the ModuleType column. No manual deduplication is required. This decision is documented in config.py.

LLC_Data and Safety_Energy are complementary exports. Safety_Energy provides authoritative counts for all three activity types. LLC_Data provides richer free-text content for topic and theme analysis. Where both contain LLC records, they are used independently for their respective strengths.

What Was Added

The analysis engine has been expanded to add a rolling two-year Safety Energy review that goes beyond activity counts and looks at likely activity value.

New outputs include:

monthly and quarterly activity mix across LLC / CCC / OCC
year-on-year change indicators by activity type
monthly quality trend lines by activity type
recurring themes and rising / declining focus areas over the last two years
CCC-specific recurring module analysis
Business Unit snapshots showing where quality appears stronger or weaker
identification of high-volume / low-value hotspots
leadership watchouts focused on shallow, repetitive, reactive, or low-follow-up records

This is intended to help answer questions such as:

What are our CCCs really telling us?
Are CCCs / OCCCs / LLCs surfacing meaningful risk and learning?
Where do records look preventive and high value?
Where does the dataset suggest compliance-only behaviour?

How Events is Compared Against Leading Activities

The analysis engine compares Safety Energy data against Events on three levels:

Business Unit level: Total activities and total events per BU are tabulated. BUs with high activities and low events are flagged as positive patterns; BUs with high activities and high events are flagged for review (possible reactive patterns).
Monthly level: Monthly activity counts and monthly event counts are plotted together on a dual-axis chart. Periods where events spike while activities are below average are flagged as spike months.
Theme level: LLC conversation topics are compared against event root causes and free-text descriptions. Gaps between what is being discussed in LLCs and what is actually causing events are surfaced as alignment gaps.

How to Run Locally

Prerequisites

Python 3.10+

Install dependencies

pip install -r requirements.txt

Place data files

Copy Events.xlsx, Safety_Energy.xlsx, and LLC_Data.xlsx into the project root.

Start the application

python app.py

Open http://localhost:5000 in your browser.

Using the Events Explorer

Adjust the date range and filter selections in the left sidebar.
Click Apply Filters — charts load in the main panel.

Generating the Full Report

In the sidebar under Full Safety Report, set:
- Analysis Start Date — earliest date to include (e.g. 2024-01-01)
Click Download Full Report.
The app loads all three files, runs the analysis (typically 20–60 seconds), and downloads a .docx file to your browser's download folder.

The full report now automatically computes a rolling two-year Safety Energy window ending on the latest date in Safety_Energy.xlsx. This deeper trend view runs alongside the existing broader report logic.

Environment Variables

Override default file paths without editing code:

Variable	Default	Description
`SHEQ_EVENTS_FILE`	`Events.xlsx`	Path to Events file
`SHEQ_SE_FILE`	`Safety_Energy.xlsx`	Path to Safety Energy file
`SHEQ_LLC_FILE`	`LLC_Data.xlsx`	Path to LLC Data file
`SHEQ_OUTPUT_DIR`	`output/`	Directory for generated reports and charts

Example:

SHEQ_EVENTS_FILE=data/Events_2025.xlsx python app.py

Project Structure

sheq/
├── app.py                  # Flask web application (routes and server)
├── config.py               # Column mappings, constants, brand colours
├── data_loader.py          # Load and normalise all three data sources
├── analysis_engine.py      # Analysis logic (trends, effectiveness, at-risk themes)
├── report_builder.py       # DOCX report generation
├── analysis.py             # Legacy Events-only report (preserved for backwards compatibility)
├── requirements.txt        # Python dependencies
├── DESIGN.md               # Ventia brand guidelines (typography, colours)
├── templates/
│   └── index.html          # Web UI
├── static/                 # Static assets (if any)
└── output/                 # Generated reports land here (gitignored)

Sample Output

The generated DOCX includes:

Title page with data coverage dates
Executive Summary with full-window event KPIs and Safety Energy totals
Data Quality tables showing row counts, date coverage, and null rates
Events Analysis — monthly trend chart, consequence breakdown, root causes, serious-event hotspots, timing, and motor vehicle insights
Safety Energy Overview — activity mix donut, monthly stacked bar, BU breakdown, LLC topics, CRP focus, top leaders, and two-year quality view
Effectiveness — monthly overlay chart (activities vs events), BU comparison table, correlation note
At-Risk Behaviours — combined theme frequency chart, LLC vs events theme comparison, alignment gaps
Safety Energy ↔ Events Relationship — BU activity-to-event ratio table, spike months, topic alignment
Leader Focus Areas — declining activity BUs, BU summary table
Recommended Actions — auto-generated list based on findings
Methodology & Caveats — data source descriptions, activity type definitions, analytical approach

All charts and tables follow the Ventia brand colour palette and Source Sans Pro typography as specified in DESIGN.md.

Additional report content now includes:

rolling two-year quality trend chart for LLC / CCC / OCC
quality summary table by activity type
top recurring Safety Energy themes
CCC / OCC / LLC value signals
high-volume / low-value hotspot chart and table
leadership watchouts derived from two-year patterns

How the Two-Year Trend Analysis Works

The two-year analysis is anchored to the latest Safety_Energy.xlsx record and looks back 24 calendar months. If the dataset contains fewer than 24 months, the tool uses the available period and reports the actual window used.

Data points used

The deeper analysis looks across more than just headline counts. Depending on which fields are populated, it uses:

activity type, module name, module prefix
business unit, project, location, shift, leader
at-risk aspects, total questions, actions, ATL actions
at-risk CRP and critical-risk verification fields
LLC topic, at-risk observations, positive observations
immediate actions / comments, instructions, review & action notes
top practices and top improvement opportunities
free-text narrative fields and repeated wording patterns

How quality is inferred

“Quality” is a proxy score, not an audit result. The tool scores each Safety Energy record using a weighted blend of signals such as:

text richness: longer, more descriptive entries score higher
specificity: records with more unique wording, concrete detail, and named themes score higher
input depth: rows with more meaningfully populated fields across observations, actions, topics, and context score higher as a supporting signal
action orientation: actions raised, close-out wording, and action verbs lift the score
learning evidence: coaching, feedback, lesson, or best-practice wording lifts the score
hazard / risk recognition: at-risk aspects, critical-risk language, and control verification lift the score
follow-up depth: review, monitor, close-out, owner, or escalation language lifts the score
low-value indicators: generic wording, very short entries, and repeated duplicated narratives reduce the score

The score is then used to classify records into broad bands such as:

High value
Meaningful
Mixed
Shallow

These bands are intended to guide leadership attention, not replace manual review of the underlying entries.

The tool now also calculates a separate input depth metric for each Safety Energy row. This measures how many useful inputs are actually populated, after excluding empty, generic, or placeholder values. The report compares input depth against overall quality so leaders can see whether “more complete rows” are a practical supporting proxy for better-quality records.

What the two-year outputs are trying to detect

activity volume changes over time
whether activity mix is shifting toward or away from CCC / OCC / LLC
whether quality is improving or drifting
whether certain themes or modules keep reappearing without stronger evidence of learning
whether some teams produce high volumes of low-detail records
whether entries look more preventive, reactive, repetitive, or shallow
where leadership attention should go next

Key Questions the Tool Helps Answer

Are our leading activities effective, or do we have the same event rates despite high activity volumes?
Which Business Units have both high activity and high event counts (reactive pattern)?
Which Business Units have declining leading-activity engagement?
Which projects and locations appear strongest when Safety Energy activity is compared against event volume?
Which projects and locations are carrying the heaviest serious-event burden?
What time of day are serious events occurring?
What do the motor vehicle events tell us about road type, road condition, and vehicle mix?
Are the topics we discuss in LLCs aligned with the actual causes of events?
Which CRPs are being focused on in field conversations, and do they match the CRPs appearing in events?
Who are the most active leaders, and who may need engagement to increase their activity cadence?
In which months did events spike while activities were below average?
What at-risk behaviour themes are most prominent across all data sources?

Analysis Limitations

Correlation ≠ causation: Statistical associations between activity counts and event counts are indicative only and do not prove causal relationships.
Under-reporting: Activity counts depend on accurate data entry. Under-reporting in any source will affect all analyses that use that source.
Text analysis: At-risk theme extraction uses keyword matching only. Nuanced or ambiguously worded entries may be missed or miscategorised.
Quality scoring is inferential: The new leading-activity quality score is a practical proxy based on record content. It is useful for triage and trend monitoring, but it does not prove whether an individual activity was genuinely high quality in the field.
Business Unit comparisons: BUs vary in headcount, contract scope, and operational risk profile. Raw count comparisons should be interpreted in context.
Short time windows: Correlation analysis requires at least 4 overlapping months. Shorter windows will not produce a correlation result.
Date format variance: Dates in the source files may use long-form formats ("Monday, 25 March 2024"). The data loader handles these automatically, but unusual formats may result in NaT values and reduced row counts.

Troubleshooting

Issue	Resolution
`FileNotFoundError` on report generation	Confirm all three .xlsx files are present in the project root (or check environment variable paths)
Report generates but charts are missing	Check `output/` folder for chart .png files; matplotlib may have failed silently — check terminal output
`ModuleNotFoundError`	Run `pip install -r requirements.txt`
Dates parsing as NaT	Open the xlsx in Excel and verify the date column format; the loader handles ISO and long-form formats
Empty sections in report	A section is empty when the relevant columns are absent or entirely null in the source data — check column names against `config.py`
"No overlapping data" in correlation	The date ranges of Events.xlsx and Safety_Energy.xlsx don't overlap — check start_date parameter
App runs but filters return no data	The Events.xlsx date column name may differ — check `config.py` EVENTS_COL_MAP and adjust if needed

Configuration

All column-name mappings, file paths, brand colours, and analysis thresholds are in config.py.

Key settings to review:

EVENTS_COL_MAP — if Events column names change between exports, update the candidates list
SE_COL_MAP / LLC_COL_MAP — same for Safety Energy and LLC files
AT_RISK_KEYWORDS — add or edit keyword groups to tune theme extraction
TWO_YEAR_WINDOW_MONTHS — rolling window length for deeper Safety Energy trend analysis
QUALITY_SCORE_BANDS — thresholds used to label records as high value, meaningful, mixed, or shallow
LEADER_MIN_ACTIVITIES — threshold for flagging low-activity leaders (default: 5)
CORR_MIN_MONTHS — minimum months required before reporting a correlation (default: 4)
DEFAULT_START_DATE / DEFAULT_SPLIT_DATE — default date parameters in the UI

18 KiB Raw Blame History Unescape Escape