Features / AI intelligence

AI built into the work — not a chatbot bolted on.

SE's AI capability is a dedicated, isolated layer the rest of the platform reaches only through a typed contract — the application never embeds an AI vendor's library. Chat and embedding services sit behind a switch that runs canned offline-resilient responses in dev and a real model in production. Per-tenant daily cost caps are enforced before any call reaches the model. An outage-resilient pipeline absorbs transient AI downtime; a versioned prompt registry centralises every prompt so quality regressions surface in an automated evaluation harness. Grounded extractors (obligations, policy clauses, hazard candidates — each with a citation back to the source document) and advisory suggestions (RCA why-step, fishbone cause, similar-incident retrieval, recordability advisor) all surface the same way: click-to-accept, never auto-fill. This page documents the foundation; the per-capability AI features live in their respective spokes.

AI-extracted hazard candidates with grounded SDS citations, reviewed and approved inline — the running product against Apex Manufacturing demo data. On-screen captions narrate each step.

See it in action

Walk-throughs against live demo data.

Short, captioned clips of the running product against the Apex Manufacturing demo tenant. Click any to play.

AI usage + daily cap — the per-tenant token-budget governance panel.

AI candidates + grounded citations — reviewing AI-extracted hazard candidates with their source-SDS citation.

What's in it

The capability surface.

Chat + embedding services

Production chat and embedding services back the extractors and similarity search; the model provider is swappable behind the contract.
Development and test run on canned responses keyed by prompt fingerprint — identical contract to production, so the platform runs offline-resilient with no AI credentials.
Every call's token usage, latency, and outcome is captured for cost accounting and telemetry.
Model responses are normalised into typed objects before they cross the AI boundary, so a malformed model reply never reaches the application.

A typed AI contract

The application talks to AI through one typed contract — no AI vendor library leaks into the rest of the platform, so models, prompts, and providers change without touching application code.
Extractor operations — obligations, policy clauses, and hazard candidates. Every candidate carries a grounded citation back to the source document page, paragraph, and excerpt.
Suggestion operations — RCA why-step, fishbone cause, similar-incident retrieval, and recordability advice.
Embedding-lifecycle operations — persist and diagnose the per-tenant similarity index.
Operational surfaces — AI health, daily usage, and per-tenant provisioning.

Per-tenant cost governance

A per-tenant daily token cap is enforced before any call reaches the model. Calls that would exceed the cap short-circuit to an empty / fallback response — the tenant's bill and the platform's economics are both protected.
Per-tenant daily-token and per-feature usage are rolled up and persisted.
Per-model rate cards drive an accurate cost calculation on every call.
An admin dashboard surfaces a date picker, the tenant summary, and a per-feature breakdown of where the AI spend is going.
The cap is configured per tenant in tenant settings.

Resilience + outage testing

Retry, timeout, and circuit-breaker handling absorbs transient AI-provider outages on the production path.
Outage simulations are part of the automated test suite — they verify graceful degradation: suggestions return empty rather than erroring, and the click-to-accept UI surfaces nothing rather than failing the flow.
Because dev/test and production share one contract, an AI-provider outage doesn't break the platform — the user flow stays alive.

Document extraction pipeline

The production extraction path loads the source document, reads its text page by page, chunks it, runs the obligation / policy-clause / hazard-candidate extractions in parallel, reconciles duplicates across chunks, and independently verifies that each cited excerpt actually exists in the source.
A failure on one chunk doesn't fail the run — partial extraction is better than nothing.
Provenance is stamped per candidate — manual vs AI source, model version, confidence, and a grounded flag.
Extraction is per-tenant; there is no cross-tenant pattern learning at launch.
The obligation registry (Compliance), the policy-clause registry (Behavioural Safety), and the hazard-candidate pipeline (Hazards) all flow through this single extraction foundation.

Similarity search

Embedding generation, retrieval, and result ranking are handled end-to-end.
The similarity index is pluggable — in-memory for tests, durable in production.
RCA embeddings are persisted, with a diagnostic surface to inspect the index.
"Find similar incidents" uses tenant-scoped embeddings only — privacy is preserved across tenants.

Recordability advisor

The advisor sits between the rule-based 1904.7 recordability cascade and the human-review surface — it never replaces the rule engine.
A case is routed to the advisor only when the rule engine returns Needs Review, the advisor's confidence clears its threshold, and the tenant has opted in.
The advisor's output is persisted with a prior-state snapshot for audit.
The advisor is built end-to-end; production model selection and prompt tuning is pending customer data (see "What's next" below).

Versioned prompt registry + evaluation harness

Every prompt the platform uses is centralised — no prompts buried in code. Each one has a name, a version, and a test surface.
A separate evaluation harness captures regression behaviour for prompt and response quality, with a runner that can score prompt quality against a new model version before it ships.
Test sets are portable, so the same quality bar runs in continuous integration.

Per-tenant AI provisioning

Each tenant gets its own isolated AI storage and its own similarity-index partition, provisioned on onboarding.
The provisioning operation is surfaced as an admin action.
Tenant isolation is the default — no shared AI storage across tenants.

RCA AI suggestions (advisory only)

Why-step and fishbone-cause suggestions — click-to-fill chips populated from the tenant's own historical RCA sessions matching similar subjects.
Three-tier confidence scoring (high / medium / low); low-confidence suggestions render subdued.
Always click-to-accept — never auto-fill, never write to a record without explicit user OK.
Graceful degrade — suggestions return empty (not an error) when the AI service is unreachable; the user flow continues uninterrupted.

Representative workflows

What this looks like in practice.

Policy PDF upload → grounded clauses + obligation candidates → human review → live

A tenant admin uploads a 90-page employee handbook. Extraction reads the document page by page, chunks it, runs the obligation and policy-clause extractions in parallel on the outage-resilient path, reconciles duplicates, and independently verifies the AI's claimed excerpts against the source. Each candidate persists with a grounded flag, provenance (AI source, model version, run identifier). A failed chunk is logged but doesn't fail the run. The domain expert reviews — extraction left, source right — approves what the model got right, edits where the quote needs cleanup, rejects the noise. Approved clauses become available to Behavioural Safety (observation-template authoring) and the discipline engine (clause-strict matching). Token usage flows through the per-tenant daily-cap roll-up + cost dashboard.

RCA session → AI suggests why-step from tenant's own history → click-to-accept

A safety officer opens a 5-Whys RCA session against a near-miss at a manufacturing line. Why 1: "Why did the operator bypass the machine guard?" The platform asks the advisor for why-step suggestions using the question, the incident context, and the tenant's prior session history. The advisor returns three — one high-confidence (operator reported guard activation slowed throughput on a prior similar event), two medium (sister-plant incidents showing training-recency and shift-pressure factors). All three render as click-to-fill chips below the answer field; low-confidence rendered subdued, never auto-filled. The officer accepts the high-confidence one, edits a medium for fit, ignores the third, writes their own. If the AI service were down, the chips render empty and the session continues (graceful degradation is part of the automated test suite). The AI never writes to the record without the officer's click.

Recordability cascade hits Needs Review → the advisor routes → audit-stamped suggestion

An incident lands at a multi-site tenant. The rule-based 1904.7 cascade runs through 1904.1 / 1904.2 / 1904.5 / 1904.7 / 1904.7(b)(5)(ii) and returns Needs Review — work-relatedness needs judgement. The tenant has opted in, so the platform routes the case to the advisor with the incident facts + the rule engine's reasoning chain. The advisor returns recordable-vs-not plus reasoning citing the applicable regulation paragraphs, with a confidence score. Only suggestions that clear the threshold surface to the admin; low-confidence is hidden. Suggestion + rule reasoning + confidence + model version all persist. The admin reviews, accepts or overrides; the determination flows to Form 300 / 300A / 301 with the full chain audited — "rule determined Needs Review → AI advised X with confidence Y → admin decided W." Three years on, an OSHA inspector questioning the determination sees data, not recollection.

How this is different

What sets the AI surface apart.

Most EHS platforms ship AI as a chatbot or a feature checkbox — call a model, parse the response, render the result. SE shipped AI as a dedicated, isolated layer behind a typed contract, with a canned-vs-real-model switch for dev / test, a per-tenant cost-governance pipeline, an outage-resilient path, a versioned prompt registry, and a separate evaluation harness. The differences below are direct consequences of holding AI to the same standards as the rest of the platform — auditability, testability, cost governance, graceful degradation — rather than treating it as a sprinkle of magic across existing UI.

AI is isolated behind one typed contract

AI lives in its own deployable layer; the application never embeds an AI vendor's library. One typed contract is all the rest of the platform sees — extraction, suggestions, recordability advice, similarity search. Model version, prompt, resilience handling, response parsing all live behind that boundary. Swapping models, changing prompts, adding extractors — none of it touches application code. The discipline that keeps AI maintainable past the first ship.

Click-to-accept, never auto-fill

High-confidence prominent, medium muted, low below-threshold + dashed-border. The user accepts what fits — the platform never writes to an audit-recorded field on their behalf.

Every AI surface renders suggestions as click-to-accept chips, not auto-filled fields. Hazards' SDS extraction uses an Approve / Reject lifecycle. Behavioural Safety's RCA why-step and fishbone suggestions render as confidence-tiered chips below the answer field. The recordability advisor surfaces a suggestion only when it clears its threshold; the human admin accepts or overrides; the suggestion persists separately with a prior-state snapshot. The platform never writes to an audit-recorded field without an explicit human OK — structural, in the data model, not policy documentation.

Cost governance is first-class, not an afterthought

A per-tenant daily token cap, a usage recorder, per-model rate cards, and a usage dashboard mean a tenant's AI usage can't quietly run away. Calls that would exceed the cap short-circuit to an empty / fallback response before they reach the model — protecting the tenant's bill and the platform's economics. Per-feature roll-up surfaces "extraction X tokens, similarity Y tokens, advisor Z tokens" so admins see where their AI spend goes. Cost telemetry at this level is uncommon outside enterprise observability vendors.

Grounded citations, not vibes

Every AI-extracted candidate carries a citation back to the source page, paragraph, and excerpt, with a grounded flag stamped by an independent verifier that checks the model's claimed excerpt actually exists in the source text. Approve preserves the citation on the resulting record. When a regulator asks "how did you know this?" the answer is the citation, not a narrative reconstruction. For evidentiary defence, that's the floor.

Dev and test run with no AI credentials

Every test in the codebase runs without an AI key. Canned responses are keyed by prompt fingerprint; production services have identical contracts. CI runs the full suites and the evaluation harness in canned mode for speed and reproducibility; production switches to the real model. Lets the AI capability be tested and iterated on without burning tokens or coupling tests to a live service.

Graceful degradation is tested, not hoped for

Automated outage simulations verify graceful degradation — RCA suggestions render empty chips (not error toasts), extraction runs queue (not fail), the recordability advisor abstains (not blocks the case). The user flow continues uninterrupted. Most platforms add this kind of testing as a maturity layer years in; SE wired it from the first AI-feature ship because EHS workflows can't afford "AI is down today, please try again." The resilience pipeline does the work; outage tests verify the work was done correctly.

Adjacencies

What the AI surface connects to.

The AI capability is foundational — every capability area on the platform that ships AI features consumes the same AI contract. This page documents the foundation; the per-capability AI features live in their respective spokes.

Hazards — SDS-driven candidate extraction

Uploaded Safety Data Sheets route through hazard-candidate extraction, producing candidates with grounded citations (see Hazards spoke).

Explore →

Compliance — Obligation extraction + recordability advisor

Regulatory / policy documents flow through obligation extraction, producing obligation candidates; the recordability advisor sits between the 1904.7 rule cascade and human review (see Compliance spoke).

Explore →

Behavioural Safety — policy-clause extraction + RCA suggestions

Policy documents flow through policy-clause extraction; RCA why-step and fishbone-cause suggestions populate from the tenant's own historical sessions (see Behavioural Safety spoke).

Explore →

Incidents — similarity search + recordability advisor

Similar-incident retrieval via the similarity search; the recordability advisor routes Needs Review cases through the advisor (see Incident Management spoke).

Explore →

Tenant onboarding — per-tenant provisioning

New tenants get isolated AI storage and their own similarity-index partition on onboarding; an admin action surfaces the operation.

Cross-cutting — notifications + auditing

The same cost-cap pattern is mirrored for notification cost governance. AI calls flow through the same audit log as the rest of the platform.

On the roadmap

What's next for AI intelligence.

The AI advisors above are shipped, production-capable, and ready to demo. These are the focused next additions, sequenced by customer demand and by the data that sharpens each one.

Recordability-advisor model tuning. The advisor is shipped and production-capable, with the routing logic, confidence threshold, and decision audit trail all in place; ongoing model and prompt tuning against accumulated customer recordability data sharpens it further.

Voice incident capture. Speech-to-text capture on the mobile flow — speak the incident in the field, structured fields drafted from the transcript for review. Builds on the shipped AI surface and offline mobile capture.

AI-suggested observation templates from clauses. AI extracts policy clauses today; the next coupling — AI proposing observation templates derived from approved clauses — is sequenced next.

Red-flag detection for repeat injury patterns. Not shipped as a dedicated detector. Pattern detection across incidents + body parts + departments depends on enough customer data accumulation to identify meaningful signal vs noise. Builds on the existing similarity-search foundation when it lands.

Facility-level risk scoring. Not shipped as a dedicated scorer. Aggregated risk-score computation across an establishment's incident + hazard + observation history with contributing-factor weighting is roadmap. The underlying data (establishment + site scope + hazards scored via the 5×5 matrix + a contributing-factor field on incidents) is already in place.

Claim-cost estimation from historical claim data. Lands with the workers'-comp claims module on the roadmap — cost estimation from the historical claim distribution, on the same evidence-cited AI surface as the rest of the advisors.

Confidence-tiered review for AI suggestions across all surfaces. SDS extraction + RCA suggestions + recordability advisor each carry confidence. The cross-surface UX pattern (consistent rendering of high / medium / low confidence; consistent click-to-accept; consistent low-confidence subduing) is partially shipped + the polish-out pass is roadmap.

Multimodal AI (photo / video → AI suggestions). Mobile capture ships photo + GPS + offline queue today. Multimodal AI taking a captured photo or short video and suggesting matching hazard categories / observation outcomes / clause citations is the natural extension when production vision-model selection lands.

AI auto-conclude RCA assist. Drafting a Conclusion (root-cause statement + 2-3 recommendations) from a near-complete why-chain — bigger trust conversation. Deferred to a follow-on milestone after the click-to-accept pattern has accumulated more tenant usage.

Cross-tenant pattern learning. Extraction is per-tenant at launch. Cross-tenant aggregate insights (anonymised pattern surfacing) is a future feature gated on tenant consent and an appropriate privacy architecture.

Multi-language prompt + response support. Form-rendering language is fixed per jurisdiction today (English for US/UK, Dutch for NL); AI prompts and responses follow the same posture. Full multi-language AI prompt and response support is a later addition.

Public AI-health dashboard. The AI-health data exists today; a public-facing dashboard surfacing per-tenant AI status (provider connectivity / cap utilisation / per-feature success rate / model version) is roadmap polish.

Continue exploring

More on the SE platform.

Five live feature spokes + two roadmap pages + the Workers' Comp claims roadmap. Jump anywhere.

Incident management Hazards Workers' comp claims Compliance automation Behavioural safety Chemical safety Equipment + assets AI intelligence Live · You are here

See the AI capability in the running product.

A 30-minute walk-through against your actual AI-evaluation criteria — your tenant-isolation posture, your cost-governance expectations, your evidentiary-citation needs, your prompt-quality regression process. We'll show the extraction pipeline + the similarity search + the recordability advisor + the cost-monitoring dashboard, all on a single audit trail.

Request a demo