Files
sheq-analysis-tool/__pycache__/data_loader.cpython-314.pyc
T

155 lines
19 KiB
Plaintext
Raw Normal View History

2026-04-20 15:23:18 +12:00
+
Î×i];ãóXRt^RIHt^RIt^RIt^RIHt^RIHt^RI t
^RI H t H
t
HtHtHtHtHt]P&!]4t]P,!R]RR7R R
ltR R ltR
RltRRltRRlt]3RRllt]3RRllt]3RRllt]]]3RRllt RRlt!RRlt"R#)
data_loader.py — Load and normalise the three SHEQ data sources.
Each loader returns a pandas DataFrame with normalised column names
(defined in config.py) so that downstream analysis code is insulated
from changes to the source file schema.
Public API
----------
load_events(filepath) -> pd.DataFrame
load_safety_energy(filepath) -> pd.DataFrame
load_llc_data(filepath) -> pd.DataFrame
load_all(events_path, se_path, llc_path) -> dict[str, pd.DataFrame]
)Ú annotationsN)ÚPath)ÚOptional)ÚEVENTS_COL_MAPÚ
SE_COL_MAPÚ LLC_COL_MAPÚMODULE_TYPE_LABELSÚ EVENTS_FILEÚSAFETY_ENERGY_FILEÚLLC_FILEÚignoreÚopenpyxl)ÚcategoryÚmodulecó(V^8„dQhRRRRRRRR/#) éÚdfú pd.DataFrameÚ
candidatesz list[str]ÚkeyÚstrÚreturnz
Optional[str]©)Úformatsdata_loader.pyÚ __annotate__r-s(÷ñ¨yð¸sðÀ}ñócólVFpW0P9gKVu# \PRW!4R#)z=Return the first candidate column that exists in df, or None.z%Column key '%s' not found (tried: %s)N)ÚcolumnsÚlogÚdebug)rrrÚcs&&& rÚ _resolve_colr"-s.ã
ˆØ
Ž?ØŠHñô‡I5°sÔ rcó V^8„dQhRRRR/#©rÚseriesú pd.Seriesrr)rs"rrr6s÷"˜ð" "rcóò\PPPV4'd6VPP
eVPP
R4#T#RpVPV4#)
Parse a date series that may contain:
- ISO strings "2024-01-15"
- Long-form strings "Monday, 15 January 2024"
- Excel datetime objects
Returns a tz-naive datetime64 series; unparseable values become NaT.
NcóÄ\P!V4'd\P#\V4P 4pRV9d\\ VP
R4^,P
44^8Xd(VP
R^4^,P 4p\P!VRR7# \d\Pu#i;i)Ú,T)Údayfirst) ÚpdÚisnaÚNaTrÚstripÚlenÚsplitÚ to_datetimeÚ Exception)ÚvalÚss& rÚ
_parse_oneÚ _parse_dates.<locals>._parse_oneAsÜ
7Š73<Š<Ü—66ˆ ‹HN‰NÓ ˆà !Œ8œ˜AŸG™G C›L¨Ô˜˜Q“ Õ*ˆ Ü—>> !¨dÔ 3øÜô Ü—6‘6ŠMð úsÂ)CÃCÃC)r+ÚapiÚtypesÚis_datetime64_any_dtypeÚdtÚtzÚ tz_localizeÚmap)r%r5s& rÚ _parse_datesr>6sZô
‡vv‡||×+¨F×3Ø.4¯i©i¯l©lÒ.Fˆvy‰y×$ RÈFÐ
ð :‰: !rcó$V^8„dQhRRRRRR/#)rrrÚcol_mapzdict[str, list[str]]rr)rs"rrrPs"÷ñˆÐ&:ð¸rcóÐVP4pVP4FAwr4\WV4pVeW0P9d
W,W#&K1VfK7W,W#&KC V#)a:
Build a new DataFrame with normalised column names.
For each key in col_map, find the first matching source column and
rename it. Columns not mentioned in col_map are dropped. The
original source columns are preserved under their original names as
well, allowing callers to access additional fields if needed.
)ÚcopyÚitemsr"r)rr@ÚresultÚ norm_namerÚsrcs&& rÚ_remaprGPs\ðW‰W‹Y€FØ!(§¡¦Ñˆ ܘ2¨9ÓØ Š?˜y·
±
Ô:Ø "¥ˆ Ø
Œ_Ø "¥ˆ ñ "1ð €Mrcó V^8„dQhRRRR/#)rr%r&rÚfloatr)rs"rrrds÷ ñ   Uñ rcó>VP4P4#)u/Return fraction of null / empty values (0–1).)r,Úmean)r%s&rÚ
_null_raterLdsà ;‰;‹=× Ñ Ó Ðrcó$V^8„dQhRRRRRR/#)rrrÚlabelrrÚdictr)rs"rrris!÷ñð cð¨dñrcóªRVR\V4R\VP4R\VPR\P
!RR744/#)z1Return a simple quality profile dict for logging.ÚsourceÚrowsÚcolsÚ
date_nullsÚdateÚobject©Údtype)r/rrLÚgetr+ÚSeries)rrNs&&rÚ_profiler[isGð B“ØB—J‘J“Ø”j §¡¨´· ² ÀÔ0IÓ!JÓ ðrcó V^8„dQhRRRR/#©rÚfilepathrrrr)rs"rrrws÷2ñ2˜2° ñ2rc ó\V4pVP4'g\RV 24h\P RV4\
P !V4p\P!R.VPO5!\V\4p\VR,4VR&\V4pVPR.R7P4p\V4V8d'\PRV\V4,
4VR,PP VR&VR,PP"VR&VR,PP%R 4VR
&VR,PP'4VR &RF“pWSP(9gKW5,P+\,4P,P/4W5&W5,P1R
\
P2R\
P2R\
P2/4W5&K• \5VR4p\P RVR,R V9d5\7VR ,P4P9444V#R4V#)a
Load Events.xlsx and return a normalised DataFrame.
Normalised columns (see EVENTS_COL_MAP):
date, event_type, consequence, status, business_unit, project,
location, crp, root_cause_cat, root_cause_sub, injury_class,
body_part, brief_desc, event_desc, days_to_enter, event_lag,
report_lag, investigation_done, hipo, critical_event
Also adds:
year, month, year_month (Period[M])
zEvents file not found: zLoading Events from %sõ Raw shape: %s rows × %s colsrU©Úsubsetú# Dropped %d rows with missing dateÚyearÚmonthÚ
year_monthÚdowÚ
business_unitÚnanÚNoneÚÚEventsz Loaded %d events | BUs: %srRÚ?)Ú
event_typeÚ consequenceriÚprojectÚroot_cause_catÚ injury_class)rÚexistsÚFileNotFoundErrorrÚinfor+Ú
read_excelÚshaperGrr>r/ÚdropnarBÚwarningr:rdreÚ to_periodÚday_namerÚastyperr.ÚreplaceÚNAr[ÚlistÚunique)r^ÚpathÚrawrÚn_beforeÚcolÚprofiles& rÚ load_eventsr‡wô >€DØ ;‰;=Š=ÜÐ"9¸(¸Ð DÓ‡H
% 
-Š-˜Ó
!€C܇H
·±Ô”^Ó $€Bô˜b )€B€v2w€HØ ˜6˜(ˆÓ #× (Ñ (Ó *€BÜ
ˆ2ƒwÔÜ Ð9¸8ÄcÈ"ÃgÕ;MÔ˜&•z—}}×)€B€v˜&•z—}}×*€B€w˜&•z—}}×.¨sÓ3€B€|ÑØ˜&•z—}}×/€B€uà —*‘*Ö Ø•g—n‘n¤SÓ5ˆB‰GØ•g—o‘o u¬b¯e©e°V¼R¿U¹UÀBÌÏÉÐ&NÓOˆB r˜$€G܇H
V_Ø<KÈrÔ<QŒT"%×.×
\ð €IðX[ô\ð €Ircó V^8„dQhRRRR/#r]r)rs"rrr°s÷=ñ= ð=¸=rc óÊ\V4pVP4'g\RV 24h\P RV4\
P !V4p\P!R.VPO5!\V\4p\VR,4VR&\V4pVPR.R7P4p\V4V8d'\PRV\V4,
4VR,P\ 4P#VP%R\
P&!RR 744VR
&VR,P(P*VR &VR,P(P,VR &VR,P(P/R
4VR&RF“pWSP09gKW5,P3\44P4P74W5&W5,P9R\
P:R\
P:R\
P:/4W5&K• RF5pWSP09gK\
P<!W5,RR7W5&K7 \P R\V4R
V9d,VR
,P?4PA44V#R4V#)u 
Load Safety_Energy.xlsx and return a normalised DataFrame.
Safety Energy is the combined analytical domain covering all leading
activity types: LLC (Leader Learning Conversations), CCC (Critical
Control Checks), and OCC (Operational Control Checks).
Normalised columns (see SE_COL_MAP):
date, module_name, module_prefix, module_type, activity_type
(short label: LLC/CCC/OCC), leader, business_unit, project,
location, at_risk_aspects, total_questions, actions, atl_actions,
at_risk_crp, llc_topic, at_risk_obs, positive_obs, participants
Also adds:
year, month, year_month (Period[M])
activity_type — shortened label from MODULE_TYPE_LABELS
zSafety Energy file not found: zLoading Safety Energy from %sr`rUrarcÚ module_typerrWÚ
activity_typerdrerfrgrjrkrlÚcoerce©Úerrorsz" Loaded %d activities | types: %srn)rirqÚleaderr)Úat_risk_aspectsÚtotal_questionsÚactionsÚ atl_actions)!rrtrurrvr+rwrxrGrr>r/ryrBrzr=rÚfillnarYrZr:rdrer{rr}rr.r~rÚ
to_numericÚ value_countsÚto_dict©r^rrr„r…s& rÚload_safety_energyr™°s4ô$ >€DØ ;‰;=Š=ÜÐ"@ÀÀ
Ð KÓ‡H
,¨hÔ
-Š-˜Ó
!€C܇H
·±Ô”ZÓ €Bܘb )€B€v2w€HØ ˜6˜(ˆÓ #× (Ñ (Ó *€BÜ
ˆ2ƒwÔÜ Ð9¸8ÄcÈ"ÃgÕ;MÔ ˆß Ô
Ó ß }¤b§i¢i°eÔ&<Ó=Ó >ðÑð˜&•z—}}×)€B€v˜&•z—}}×*€B€w˜&•z—}}×.¨sÓ3€B€|ÑóGˆØ —*‘*Ö Ø•g—n‘n¤SÓ5ˆB‰GØ•g—o‘o u¬b¯e©e°V¼R¿U¹UÀBÌÏÉÐ&NÓOˆB‹GñGó PˆØ —*‘*Ö Ü—m’m B¥G°HÔ=ˆB‹GñPô‡H
WØ=LÐPRÔ=RˆRÕ
×
/×
]ð €IðY\ô]ð €Ircó V^8„dQhRRRR/#r]r)rs"rrrôs÷0ñ0˜0¨|ñ0rc ó&\V4pVP4'g\RV 24h\P RV4\
P !V4p\P!R.VPO5!\V\4p\VR,4VR&\V4pVPR.R7P4p\V4V8d'\PRV\V4,
4VR,PP VR&VR,PP"VR&VR,PP%R 4VR
&RF“pWSP&9gKW5,P)\*4P*P-4W5&W5,P/R \
P0R
\
P0R\
P0/4W5&K• RVP&9d#\
P2!VR,RR7VR&\P R\V4R V9d5\5VR ,P4P7444V#R4V#)
Load LLC_Data.xlsx and return a normalised DataFrame.
LLC_Data is a supplementary export of Leader Learning Conversations,
often containing richer free-text fields (topic, at-risk observations,
review & action notes) than the Safety_Energy export.
Normalised columns (see LLC_COL_MAP):
date, topic, leader, business_unit, project, location,
crp_focus, at_risk_obs, positive_obs, at_risk_flag, participants
Also adds:
year, month, year_month (Period[M])
zLLC Data file not found: zLoading LLC Data from %sr`rUrarcrdrerfrgrirjrkrlÚ at_risk_flagrŒrz! Loaded %d LLC records | BUs: %srn)rirqrÚtopicÚ crp_focus)rrtrurrvr+rwrxrGrr>r/ryrBrzr:rdrer{rr}rr.r~rr•r€rr˜s& rÚ
load_llc_datarŸôô ‹>€DØ ;‰;=Š=ÜÐ";¸H¸:Ð FÓ‡H
Ô
-Š-˜Ó
!€C܇H