What Happens When Regulators Ask,
“What’s in the Lake?”

Nick Pollard
Managing Director, EMEA

The corporate data lake: A place where data goes to disappear.
A few months ago, I found myself talking to someone in financial services about DORA—the Digital Operational Resilience Act. They seemed calm. Suspiciously calm.
"Yeah, we’re across it," they said.
I nodded.
"So… you know what’s in your data lake?" I asked.
A long pause.
"Well… no. Not exactly. But we log everything."
Ah. Logging everything—the corporate equivalent of hoarding every receipt you've ever received just in case one of them matters someday.
And now, thanks to DORA, someone (most likely from compliance) is going to come along, rifle through all of it, and ask if you even know what you’ve got, why you’re keeping it, and what you’d do if something went wrong.
At this point, most companies stare blankly into the middle distance.
DORA: The Compliance Tsunami That’s About to Hit
The Digital Operational Resilience Act (DORA) is the EU’s latest attempt to stop financial institutions from collapsing the moment something goes wrong.
It’s about digital risk—cyberattacks, data breaches, third-party failures—and how banks, insurance companies, and investment firms manage and recover from digital disruptions.
On the surface, it sounds like standard regulatory fare:
- ✅ Risk management – Be aware of the risks in your IT estate.
- ✅ Incident reporting – Spot problems quickly, and report them.
- ✅ Resilience testing – Regularly test that your systems can handle shocks.
- ✅ Third-party risk management – Ensure vendors don’t compromise security.
It all seems reasonable until you realise something deeply inconvenient: you cannot manage, report on, or test something if you don’t know what’s in it.
And this is where the data lake problem begins.
The Myth of the Organised Data Lake
At some point in the last 15 years, someone in IT had a bright idea.
"Let’s build a data lake!" they said. "It’ll be structured! Governed! Full of valuable insights!"
Fast-forward to today, and that data lake has become a swamp—a place where data goes to disappear, never to be seen again. It’s an unsearchable, ungoverned mass of logs, transactions, emails, documents, and customer records, scattered across multiple storage systems, with no real classification or lifecycle management.
Which brings us to the central problem: DORA doesn’t just ask whether you store data securely—it asks if you can prove what’s in the lake.
And suddenly, financial institutions have to answer some very uncomfortable questions:
- What’s in the lake? (Nobody really knows.)
- Which of it is actually important? (Unclear.)
- What happens if a regulator asks you to find something specific? (Panic.)
It turns out that "we log everything" is not a strategy—it’s just a slow-motion disaster waiting to unfold.
Lake Michigan, but Digital
To illustrate just how absurdly big these data lakes are, let’s talk numbers.
We recently spoke to a financial institution with an estimated 70 petabytes of data spread across multiple silos. They weren’t exactly sure of the number.
That’s 35 trillion pages of documents. If you printed them all out and stacked them up, they’d reach the moon and back, multiple times.
If someone in compliance asks you to search through all of that for a specific transaction from 2018, you have two options:
- Try to find it and watch your infrastructure melt in real time.
- Spend a small fortune on cloud services to brute-force the search.
Neither is a great outcome.
The Cost of Keeping Everything "Just in Case"
The problem with financial services isn’t that they don’t want to be compliant—it’s that compliance at this scale is horrifically expensive. Let’s break it down.
1PB of Storage = £1M+ Over Five Years
Storing 1PB of data for five years costs upwards of £1 million in infrastructure, cloud costs, energy, and security.
Now multiply that by 70. A 70PB data lake is not cheap.
And yet, companies persist with the belief that they should store everything forever, just in case it turns out to be important later. It’s a bit like keeping every email you’ve ever received, including that one from 2011 about the office Christmas party that you never attended.
Why DORA is a Wake-Up Call
Under DORA, financial institutions need to demonstrate:
What’s in their data lake
(Good luck)
How they manage risk in real time
(Difficult if blind)
How they’ll respond to an incident
(See above)
DORA isn’t just a cybersecurity regulation—it’s about proving that your entire digital estate is functional, recoverable, and actually understood. Which means the strategy of “just storing everything” no longer works.
How to Fix the Mess (Before a Regulator Forces You To)
The solution isn’t deleting everything (although that would be satisfying). It’s about actually understanding what’s in your data lake, classifying it properly, and ensuring that when something bad happens, you can act fast. That means:
- ✅ Data Classification: If you don’t know what you’ve got, you can’t manage it.
- ✅ Smart Search & Automation: Because no human is manually reviewing 70PB of data.
- ✅ Resilience Testing: It has to actually work under pressure.
- ✅ Regulator-Ready Reporting: If you don’t want to be the next headline, you need real-time insights.
This is what financial institutions should have been doing all along—but now, thanks to DORA, it’s no longer optional.
The Lake is Rising
If you really want to get a sense of where your organisation stands, ask your IT team:
"If a regulator asked us to retrieve specific data from our lake within 24 hours, how would we do it?"
If the answer is "We wouldn’t know where to start,"—it’s probably time to start looking.
Explore DORA Solutions
Nick Pollard is Managing Director (EMEA) for Harmony House Technology. He is a seasoned leader with more than 20 years of experience working in real-time investigation, legal and compliance workflows across highly regulated environments.
Connect