AI Oversight and Evidence for FCA Supervisors

“The risk isn’t the incident. The risk is whether you can prove what happened when the clock starts.”

For a supervisory team, the central challenge is not simply whether firms claim their controls work — it is whether those controls can be demonstrated under scrutiny.

As AI-driven systems increasingly influence customer outcomes, operational decisions, and risk management processes, regulators face a growing information asymmetry. Firms may provide explanations, policies, and summaries of model behaviour, but verifying how decisions actually occurred — under which controls and at what moment in time — can be slow, subjective, and dependent on reconstructed narratives.

PARCIS-style evidence changes that dynamic. Decision-level artefacts, replayable traces, and structured evidence packs allow supervisors to test claims against verifiable records rather than interpretation. By reducing the gap between what firms report and what can be independently examined, oversight becomes more objective, supervisory challenges become faster, and regulatory confidence becomes grounded in evidence rather than assurance.

A Supervisors Empathy Quadrant

Says:

“Send me the pack.”

“Not a narrative. Decision-level evidence I can verify.”

“Show me what ran, when, under what policy regime, and what changed.”

“Was the control operating, not just designed?”

“Retention varies isn’t an answer. What can you replay?”

“I need to make a proportionate decision I can defend.”

Thinks:

The real risk is information asymmetry: the firm holds reality; I receive a curated version.

Fragmented evidence turns supervision into negotiation, and negotiation is slow.

If I intervene too early I overreach; too late and consumers are harmed.

Standardised, verifiable artefacts reduce subjectivity and speed up challenge.

Cross-firm comparability is the force-multiplier: thematic reviews should be evidence exercises, not dialect translation.

Feels:

Pressure to be fair and precise under time constraint.

Frustration at the “PDF theatre” loop and the slow drip of partial answers.

Unease when firms can’t reconstruct what happened (even if they’re not evasive).

Relief when evidence arrives in a falsifiable, independently verifiable structure.

Calm confidence when the intervention decision becomes defensible.

Does:

Issues a targeted request for a decision-level evidence pack rather than more documentation.

Samples decisions and checks lineage: version, policy regime, gate outcomes, integrity anchors.

Tests operating effectiveness using receipts, not interviews and screenshots.

Asks for scoped replay on the decisions that matter, then constrains continued operation if needed.

Clusters signals across firms using standardised packs to spot “rhyming” patterns earlier.

Briefs seniors with “the evidence shows” rather than “the firm says”.

Send Me the Pack – An FCA Supervisor’s Story

It’s 16:35 on a Friday, and Amara’s inbox looks like a small floodplain.

Three firms. Three “urgent” updates. One board-level consumer issue that’s been escalating quietly for a week. And a statistical wobble in outcomes data from a firm that recently deployed an AI-driven decisioning pathway—might be drift, might be noise, might be something uglier. Amara doesn’t know yet.

That’s the problem.

The Supervisory Asymmetry

She’s been a supervisor for nine years. The part of the job that nobody outside the authority understands is the asymmetry. The firm has the system. The firm has the data. The firm has the engineers who built it and the risk team who approved it.

Amara has what the firm is willing to tell her, wrapped in PDFs with careful language and selective metrics. Her job is to look at that curated version of reality and decide—on behalf of the public—whether the firm’s controls are adequate. And she has to get it right. Intervene too early on weak evidence and she’s overreaching. Intervene too late and consumers are harmed.

The window between those two failures is narrow, and it’s built entirely on the quality of the evidence she can access.

The Old Supervisory Cycle

She’s lived the old version of this Friday too many times.

She sends the firm a formal information request. The firm takes a week to align internally on what they’re comfortable sharing. They send logs from one system, a model card from another, and a narrative document that explains what the controls are designed to do without showing whether they actually operated.

Amara asks for decision-level evidence. The firm says retention varies. She asks what version of the model was running when the outcomes shifted. The firm says they’ll check.

They come back with a spreadsheet that doesn’t quite reconcile with the logs they sent the previous week. She asks whether the policy gate was active during the period. Nobody is sure.

Three weeks in, Amara has a thick folder of partial information and a growing suspicion that the firm isn’t being evasive—they genuinely can’t reconstruct what happened.

When Good Firms Can’t Prove They’re Good

And that’s the outcome that keeps supervisors awake: not bad actors, but good firms that can’t prove they’re good. Because when the evidence is fragmented, every supervisory conversation becomes a negotiation about what the data means rather than what actually happened.

Intervention starts to feel subjective. And subjective intervention is the one thing a regulator can’t defend—not to the firm, not to the tribunal, not to the public.

But this Friday is different.

Asking for Evidence, Not Narrative

Amara doesn’t ask for another narrative. She asks for evidence in a format that can be verified.

“Send me the pack.”

Not a policy document. Not a model card. A decision-level evidence pack: what went in, what ran, which checks passed, who changed what, and how to replay the decision under the same conditions. The firm replies: “We can do that. We’re running PARCIS XAI-Lite.”

The Evidence Pack

XAI-Lite wraps the firm’s AI stack at the decision boundary without touching the model. Every governed decision emits a QiTraceID—a cryptographic receipt backed by a tamper-evident audit spine. The governance view is derived from the same integration hooks and decision context as the underlying AI, not a shadow copy assembled after the request arrived.

The pack arrives as both machine-readable JSON and a signed human-readable bundle, and Amara feels something rare in supervisory work: relief. Because instead of “trust us,” the pack gives falsifiable structure.

For each sampled decision, there’s a replayable proof capsule with the anatomy a supervisor actually needs: QiTraceID with timestamps, entity, and jurisdiction. Model and version lineage.

The policy regime in force at decision time, with applicable controls, operator bounds, and Ethics Gate status.

A role-appropriate rationale. And cryptographic integrity—ledger anchor plus evidence integrity hash—so a third party can verify the pack without relying on the firm’s word.

Sharper Questions, Kinder Supervision

Amara’s questions become sharper. And, unexpectedly, kinder. Sharper because they’re anchored to evidence. Kinder because the firm can answer them without assembling a war room.

She asks: “Show me whether the control was operating, not whether it exists.” The pack answers with gate outcomes and policy references recorded at the boundary, tied to the exact policy version in force at decision time. Not a document that says “we do X.” Receipts showing X happened.

She asks: “Did anything change during the period the outcomes shifted?” The evidence shows model and version and policy version per decision, with promotion records that prevent silent upgrades from going unnoticed. Amara can see whether a vendor update or configuration change coincides with the wobble. Not a hand-wavy explanation. Traceability.

Proportionate Supervision

Now she can make a proportionate decision, intervene, require remediation, or allow continued operation with constraints, grounded in evidence rather than inference.

The firm’s tiered evidence model helps: they run Tier 1 day-to-day on the in-scope decisioning route, providing inspectable, exportable receipts plus an encrypted payload vault for documentary replay when legal or audit thresholds require it.

Tier 2, the time-bounded forensic kit, is enabled on-demand for scoped incident windows to produce richer artefacts and a defensible timeline under an explicit incident basis.

Amara can request Tier 1 replay on a scoped set immediately and request Tier 2 capture for the next window without forcing the firm into permanent over-collection.

Proportionate supervision, matched by proportionate evidence.

Seeing Across Firms

And here’s the part that changes Amara’s job structurally, not just on this case.

The following week, a second firm reports a similar pattern. Not identical, but rhyming. Under the old approach, this would look like a coincidence until it becomes a headline. Under the evidence-pack approach, Amara can cluster the signals—groups of QiTraceIDs with time-aligned deltas and replay pointers—without demanding raw data transfers.

Because the pack schema is standardised, she can compare like-for-like across firms and systems. She stops adjudicating presentation quality and starts adjudicating control effectiveness.

Thematic reviews become evidence exercises, not dialect translation.

Evidence, Not Assertion

By Monday morning, Amara’s briefing to her seniors is no longer built on “the firm says.”

It’s built on “the evidence shows”: what happened at the case level, when it happened in the time-indexed regime, under what controls with policy references and gate outcomes, and what changed with versioned lineage—all backed by artefacts that can be independently verified.

Closing the Supervisory Gap

Here’s what Amara knows after nine years of supervision: regulators don’t fail because they lack authority. They fail because information asymmetry turns every intervention into a judgement call that’s hard to defend.

The firm knows more than the supervisor. The supervisor knows less than the public assumes. And in that gap, harm accumulates while evidence is negotiated.

Fix the evidence—make it decision-time, tamper-evident, standardised, and independently verifiable—and you don’t just speed up supervision. You make it objective.

Interventions become defensible because they’re grounded in receipts, not persuasion. And the firms that invest in that evidence infrastructure aren’t just compliant. They’re supervisable.

In the end, that’s the thing that protects everyone.

Get in touch now for more information

Get in touch

FCA Supervisor / Regulatory Oversight Team

“The risk isn’t the incident. The risk is whether you can prove what happened when the clock starts.”

“The risk isn’t the incident. The risk is whether you can prove what happened when the clock starts.”

“The risk isn’t the incident. The risk is whether you can prove what happened when the clock starts.”

“The risk isn’t the incident. The risk is whether you can prove what happened when the clock starts.”

“The risk isn’t the incident. The risk is whether you can prove what happened when the clock starts.”

A Supervisors Empathy Quadrant

Says:

Thinks:

Feels:

Does:

Send Me the Pack – An FCA Supervisor’s Story

The Supervisory Asymmetry

The Old Supervisory Cycle

When Good Firms Can’t Prove They’re Good

Asking for Evidence, Not Narrative

The Evidence Pack

Sharper Questions, Kinder Supervision

Proportionate Supervision

Seeing Across Firms

Evidence, Not Assertion

Closing the Supervisory Gap

Get in touch now for more information

About PARCIS.ai

Useful Links