Agent Assurance

Installment 1 · 8 July 2026

Chapter 1: The Capability Overhang

The observed condition

A person running a business can now instruct software to build software. The instruction is typed in plain language into a chat window, the same window where that person drafts emails and summarizes meeting notes. What comes back, often within the hour, is a working system: a customer database, a cash-flow dashboard, a pipeline that reads invoices and writes ledger entries, a public website that takes orders. Until a few years ago, each of these was a project. It had a budget, a timeline, a vendor or an in-house developer, and at least one person whose job included worrying about what could go wrong with it. Now each is an afternoon.

The people doing this are mostly not technologists. This book calls them operators: owner-managers, practice principals, finance directors, department heads, the people who run the work of a firm rather than its infrastructure. An operator who adopts these tools acquires, for the price of a software subscription, a capability that used to arrive only inside an employment contract or a vendor relationship. That difference in packaging matters more than it first appears.

Capability has rarely travelled alone. When a firm hires a bookkeeper, it does not receive raw bookkeeping capability; it receives a person trained in a tradition: reconciliation, a separation between the person who records and the person who approves, a month-end close. When it buys accounting software from an established vendor, the vendor's controls come in the box: access rights, an audit history, backups, a support desk that has seen every failure before. The concepts that keep capability safe have always been bundled with the capability itself, carried by professions, vendors, and institutions, and mostly invisible to the buyer, who inherits them.

Every operator already lives inside such inheritances without noticing. The bank requires a second person to release a payment above a threshold. The payroll system refuses to run the same pay cycle twice. The accountant asks the same tiresome questions every year-end, and the questions turn out to be a checklist refined by other people's disasters. None of this was designed by the operator, and none of it needed to be understood to be benefited from. It came with the bank account, the software license, the professional engagement. That is what a governance tradition looks like from the inside: mostly friction, all of it somebody else's accumulated experience arriving pre-installed.

A generated system arrives with no such bundle. The prompt produces the capability and nothing else. There is no tradition in the box.

The result is the condition this chapter records: AI has placed enterprise-grade capability generation in the hands of operators who lack the governance concepts that capability requires. The book calls this condition the capability overhang. The sections that follow establish the condition from the public record and show why it will not close on its own.

The condition, measured

The most direct measurement available comes from a workforce survey. In 2025, KPMG and the University of Melbourne published a global study of attitudes to and use of AI at work. Two of its findings describe the condition in numbers. Forty-four percent of employees reported having used AI at work in ways that contravene their organization's policies and guidelines. Almost half admitted uploading sensitive company information to AI platforms their employer had not sanctioned.1

These figures come from a single survey and should be read as that survey's estimate, not as settled fact. Their order of magnitude is the useful part. The behavior they describe is not a fringe activity conducted by a firm's most reckless employees; on this evidence it is closer to ordinary practice, running at a scale no policy document has caught up with. The second figure, stated plainly, is the sensitive information of the business, uploaded to systems the business has never evaluated, by people acting in good faith to get their work done.

In practice the behavior is mundane, which is part of why it spreads. The bookkeeper pastes the aged debtors report into a chatbot to get a summary for the partners. The office manager uploads the staff contracts to ask a question about notice periods. The analyst builds a forecasting model by describing it, then keeps using the model because it works. Each act is a person solving today's problem with the most capable tool in reach, as people have always done. What is different is what the tool does with the material, where the material now resides, and what the person has built without meaning to build anything.

Surveys record the behavior. What the behavior produces, and what it costs when it fails, is better read from incidents. For that, the record of the last capable-but-ungoverned tool is the place to look.

The spreadsheet record

There is no need to speculate about what operators do with powerful tools that arrive without governance concepts attached. It has been observable for forty years. The spreadsheet gave every desk in every business the ability to build financial models, operational registers, and de facto databases, with no training requirement, no review process, and no controls beyond whatever the author chose to impose on themselves. Most of what was built works. The failures are documented wherever an incident forced a post-mortem, and the documented failures come from the most heavily supervised institutions in the world.

In March 2012, inside JPMorgan's Chief Investment Office, traders were reporting valuations on a large derivatives portfolio that were increasingly distant from what the positions were worth. The episode became known as the London Whale. The United States Senate subcommittee that investigated it recorded a detail the loss figure obscures: the traders tracked the gap between the values they were reporting and more realistic values in a spreadsheet they maintained themselves. By the traders' own spreadsheet, the hidden losses exceeded 400 million dollars within five days, and the gap eventually exceeded 600 million.2 Inside one of the largest banks on earth, wrapped in layers of risk management and a full-time regulator, the record that most accurately described the position was a personal artifact, maintained by the people it incriminated, visible to no control function, while the formal systems held the numbers everyone was supposed to see.

On 5 October 2020, the Secretary of State for Health and Social Care told the House of Commons that Public Health England had identified, on the Friday night just past, that 15,841 positive COVID-19 test results from the previous eight days had not been included in the reported daily case counts. The cause he gave was a failure in the automated transfer of files from the laboratories to PHE's data systems. Everyone who had tested positive had been told their result in the normal way; what the failure severed was the step after. The results were not transferred to the contact tracing system, and contact tracing of the affected cases began only that Saturday, with the oldest of the missed results by then more than a week behind. Pressed on the mechanism, the minister acknowledged a maximum file size error in a legacy system already scheduled for replacement, and went no further; the press accounts that followed attributed the failure to the row limit of an aging spreadsheet file format.3

One caution about this record. It is thick with banks, regulators, and public bodies not because improvised systems concentrate there but because disclosure does. A supervised institution that loses money to a spreadsheet failure eventually produces a Senate exhibit or a regulator's final notice; a twelve-person firm that does the same produces, at most, a difficult board meeting. The public record documents the corner of the phenomenon that happens to be lit. There is no reason to believe the unlit part behaves differently, and the survey figures above suggest it is large.

Regulated firms, improvised systems

The regulators' own findings show something further: not just that spreadsheets fail, but that improvised systems can sit at the center of a regulated firm's obligations for years, treated as temporary by everyone and replaced by no one.

In January 2019, Metro Bank announced to the market an adjustment of roughly 900 million pounds to its assessment of its risk-weighted assets, the figure on which a bank's capital requirements rest. The Prudential Regulation Authority's final notice, which fined the bank 5,376,000 pounds, records the mechanism. The calculation remained largely manual throughout the period, with no automated process to validate or check the underlying data; such checking as occurred relied on the manipulation of many spreadsheets, which created operational risk and key-person dependencies on the small number of individuals familiar with them.4 The notice records a further detail: the bank's interpretations of the regulatory rules themselves, the reasoning that determined how the calculation should work, were for years documented nowhere except embedded inside those spreadsheets and working papers.

The Financial Conduct Authority's 2025 notice against Nationwide Building Society describes the pattern's usual life cycle. At the start of the period under review, in October 2016, Nationwide's system for risk-assessing its customers was, in the regulator's words, an unsophisticated, interim solution: unless a customer fell into certain limited categories, they were automatically classed as standard risk. The intended replacement was not fully operational until early 2019, additional data was not fed into it until August 2020, and that data was not used for risk scoring until April 2021.5 The interim solution held the post for roughly five years. The shape is familiar inside firms of every size: the stopgap that outlives its author's intentions, because it works well enough that replacing it never becomes urgent until a supervisor makes it so.

A 2025 notice against Barclays records the same pattern in miniature, compressed into a single check. When Barclays opened a client money account for a wealth management firm called WealthTek in January 2021, the only specific check it performed was whether the customer had been assigned one of Barclays' own internal codes indicating it was pre-determined to be eligible. The FCA's public Financial Services Register, the authoritative source, was one lookup away and was not consulted; had it been, it would have shown that WealthTek was barred from holding client money at all.6 An improvised internal proxy stood in for the authoritative source, and the proxy answered a different question from the one that mattered.

The three cases correct a common assumption about how such systems persist. None of them was hidden. The Metro Bank spreadsheets were the bank's recognized process for calculating its capital figures. Nationwide's interim solution was known to be interim; that was its name. The Barclays code check was the documented procedure. In each case the improvised system was in plain sight, doing its job adequately by every visible measure, and the deficiency became legible only when a supervisor traced a failure back to it and wrote the tracing down.

These notices were written by supervisors about supervised firms, which is why they exist as public documents. The firms this book concerns itself with, the ones below the supervisory waterline, run the same interim solutions and improvised checks with no supervisor scheduled to ever look.

An old phenomenon

None of this is a new anxiety. The information-systems research community has studied the improvised system for decades under the names shadow IT, feral systems, and workarounds. A 2020 systematic review of the field searched the major research indexes, screened 449 results, and settled on a corpus of 77 papers for detailed analysis, more than half of them organizational case studies.7 The phenomenon is established enough to have a literature, a taxonomy, and a running argument about definitions.

It also has at least one hard prevalence measurement. A 2014 study in a peer-reviewed security journal audited the installed software across a Fortune 500 organization of more than ten thousand employees and found that roughly 15 percent of everything installed was unapproved: 2,965 unique unauthorized application versions among 19,633 applications scanned, installed more than half a million times across ten thousand devices.8 The figure is from the desktop era, before browser-based tools made installation unnecessary, and should be read with its date attached. It measures a world in which acquiring unsanctioned capability still required effort: finding the software, installing it, sometimes paying for it, always leaving a trace on a machine the firm owned. Each subsequent wave of tooling has lowered that effort. Browser-based services removed the installation; free tiers removed the payment; personal accounts removed the trace. The measured 15 percent belongs to the hardest era in which to do this. No audited equivalent for the present era stands behind a comparable figure, so the honest statement is directional: every barrier that made the 2014 figure as low as 15 percent has since been removed.

The precedent matters because it settles a question about human behavior. Given a gap between what sanctioned systems provide and what the work in front of them requires, some fraction of people in every organization will close the gap themselves, without asking, using whatever capable tool is nearest to hand. That regularity has held across four decades of tooling. There is no reason to expect it to lapse now, when the nearest capable tool is more capable than it has ever been.

The new level

What has changed is the ceiling. A spreadsheet, whatever it grows into, remains one artifact on one machine, legible in principle to anyone who opens it. The current tools remove that boundary. An operator describing what they want in a chat window can now produce a multi-user application with a database, user accounts, integrations into the firm's other systems, and a public address on the internet. The working style has picked up the name vibe coding: building software by conversational instruction, accepting what comes back without reading it. The name is flippant; the artifacts are production systems holding real data.

The gap between what the operator sees and what the operator has created is the defining feature of the class. What the operator sees is the screen: the working form, the dashboard that updates, the site that loads. What exists is everything beneath it: a database with access rules someone must have set, a hosting arrangement in some jurisdiction, credentials stored somewhere, dependencies on services with their own terms and their own failures, and a security posture that was decided, one way or another, by whatever the generating system did by default. With a spreadsheet, the visible artifact and the actual artifact are the same thing. With a generated application they have come apart, and the operator holds only the visible half.

In 2025 this class of system acquired its first prominent entry in the public vulnerability record. A flaw logged in the United States National Vulnerability Database, rated 9.3 out of 10 for severity, affected applications generated on a popular AI app-building platform: the database security policies the platform generated were insufficient, such that a remote attacker, with no authentication and no user interaction, could read or write the database tables of generated sites.9 The vendor formally disputed the record, arguing that securing application data is each customer's responsibility. The dispute is itself the instructive part. The customers were people who had adopted the platform because they could not build such systems themselves; the vendor's position was that a database access-control policy was the customer's to own; the customer, in most cases, had never encountered the concept. Between a vendor disclaiming the control and a customer unequipped to hold it, the control belonged to no one.

The documented instances above stand behind the class. The recurring forms the class takes are stated here as archetypes, the shapes a practitioner learns to recognize, and this book will return to them. The improvised customer database, assembled in an afternoon, holding personal information the firm is legally answerable for. The finance dashboard whose figures no live system feeds, maintained by hand behind a professional-looking front. The chain of no-code automations, built tool by tool by someone who has since left, that moves client data between services in ways no one remaining can enumerate. The data pipeline that exists only as a conversation transcript with a chat assistant, re-run by pasting. The founder-built public website, holding customer accounts, with no concept of patching, backup, or uptime attached.

One archetype, the audit cliff, is the moment the overhang stops being invisible to the operator. A firm built on generated and improvised systems runs for months or years without any occasion to describe those systems to an outsider. Then an occasion arrives: a larger customer's procurement process, a supplier's due-diligence questionnaire, an insurer's proposal form, a bank's onboarding review. The questionnaire asks who has access to customer data and how access is revoked. It asks where data is stored and in which jurisdiction, how long it is retained, how it would be recovered, when the arrangements were last tested. These are not exotic questions; they are the standard grammar of one business deciding whether another is safe to depend on. The operator, reading them, discovers two things at once: that the firm has committed itself, in substance, to obligations it has never enumerated, and that the honest answer to most of the questions is that nobody has ever thought about it. The questionnaire assumes a tradition. The afternoon in which the system was built did not include one. What happens at that cliff edge, and what kind of help the operator reaches for, is where the rest of this book begins.

The asymmetry

The condition described so far could, in principle, be temporary. Tools arrive, institutions adjust, the gap closes. The reason to expect otherwise, and the reason this book exists at all, is an asymmetry in how the two sides of the gap move.

Capability now diffuses at the speed of a subscription. There is no procurement cycle, no installation, no training requirement; the distance between never having used an AI tool and operating one on live company data is a sign-up form and a sentence. Each new model release raises what the same sentence returns. The capability side of the gap compounds, and it compounds inside a consumer product that reaches everyone at once.

The concepts required to govern capability move through entirely different channels, and those channels are slow. Separation of duties, reconciliation, access control, change control, the audit trail: none of these is complicated to state, yet each took decades to become an unremarkable habit of business life, because concepts of this kind are not adopted from documentation. They are carried by professions that train their members, embedded in software by vendors who have absorbed the incidents of their industry, imposed by regulators and insurers and auditors, and passed between practitioners as the folklore of what goes wrong. That transmission works on the timescale of careers, not product cycles. It is, in the strict sense, generational: the concepts become universal when the people trained under them become the people in charge.

Consider how the last generation of business software acquired its safety. Accounting packages encode double-entry bookkeeping, a control so old it reads as arithmetic rather than as a control. Payroll systems refuse illegal states because vendors absorbed decades of payroll disasters into validation rules. The permissions screens in every serious business application are the residue of incidents at other firms, in other years, priced in by vendors who could amortize the lesson across every customer. That absorption took the industry decades per lesson, and it worked because the vendor stood between the incident and the user, accumulating. A system generated privately for one firm has no vendor in that position. Nobody is accumulating on the operator's behalf.

Nothing in the current wave shortens the second channel. The tools do not teach governance to their users, and the vendor incentive runs toward removing friction rather than adding tradition. The professions that carry these concepts have not yet extended them to cover agent-built systems; later chapters record how such extensions have historically been built, and by whom. In the meantime the asymmetry does the arithmetic: one side of the gap compounds monthly, the other moves generationally. Left alone, the overhang widens.

The population

The rest of the book is addressed to the situation of the people this condition creates.

The capability overhang produces a population of operators who have adopted capability faster than they have adopted the concepts required to govern it. On the survey evidence, they are not a deviant minority; behavior of this kind approaches the norm.1 They are not negligent in any sense that would survive comparison with their peers, and the sequence they are living is the historically ordinary one: capability first, governance after. What distinguishes their position is concrete. They now operate systems that perform real work, on real data, under real obligations, whose failure classes they cannot name. Where a system was inherited from a vendor or a profession, somebody upstream had already priced its failures into controls the operator never had to think about. The systems they have generated for themselves carry no such inheritance. Ask what happens when the row limit is hit, when the departed founder's automation chain breaks, when the questionnaire arrives, and the honest answer is that the operator does not know, and does not know that there is a category of person whose job is to know.

The population is not defined by firm size. Its most visible member is the small firm's principal, because in a small firm the improvised system has no one else to belong to. But the survey evidence above was gathered largely inside organizations big enough to have policies worth contravening, and the regulator findings earlier in this chapter show the same shapes inside institutions with entire departments dedicated to preventing them. The defining condition is positional: a person operating consequential capability outside the reach of whatever governance function their context does or does not have. The department head running an unsanctioned automation inside a bank belongs to this population as surely as the founder whose whole firm runs on one.

The concepts, therefore, have to be supplied from outside: by people who hold the governance tradition and can attach it to the new capability, deliberately, the way professions and vendors once attached it invisibly. Whether such supply is possible, what it consists of, and why the operator cannot simply verify the work themselves are the subject of the next chapter. What this chapter records is the demand: a large population, running consequential systems, on the wrong side of a gap that is widening on its own.

Notes

  1. KPMG and the University of Melbourne, Trust, Attitudes and Use of AI: A Global Study, 2025. Ledger: ch01-e01 (prevalence of use against policy), ch01-e02 (sensitive information uploaded to unsanctioned platforms).
  2. United States Senate, Permanent Subcommittee on Investigations, hearing record on the JPMorgan Chase "London Whale" trades, March 2013. Ledger: ch01-e03.
  3. Hansard, House of Commons, Covid-19 Update, 5 October 2020: statement of the Secretary of State for Health and Social Care and subsequent questions. Ledger: ch01-e24. The row-limit mechanism is a press attribution, widely reported as the older Excel .xls format's row cap; the official account acknowledged only a "maximum file size error" in a legacy system, without naming the format. Ledger: ch01-e18.
  4. Prudential Regulation Authority, Final Notice to Metro Bank plc, 21 December 2021. Ledger: ch01-e04 (the adjustment and fine), ch01-e05 (the manual spreadsheet process), ch01-e06 (rule interpretations embedded in spreadsheets).
  5. Financial Conduct Authority, Final Notice to Nationwide Building Society, 11 December 2025. Ledger: ch01-e16.
  6. Financial Conduct Authority, Final Notice to Barclays Bank UK plc, 14 July 2025. Ledger: ch01-e12 (the internal code check), ch01-e13 (the unconsulted Financial Services Register).
  7. Systematic literature review of shadow IT and workaround research, Information Technology and Control, 2020. Ledger: ch01-e19.
  8. Study of unauthorized software prevalence in a Fortune 500 organization, Computers & Security, 2014. Ledger: ch01-e21.
  9. CVE-2025-48757, National Vulnerability Database, published 29 May 2025: insufficient row-level security in applications generated on the Lovable platform, CVSS 9.3, formally disputed by the vendor. Ledger: ch01-e22.