Back to Insights
AI & Innovation May 25, 2025

How can you trust your
Digital Twin if the data is unreliable?

Nick Pollard

Nick Pollard

Managing Director, EMEA

Digital Twin

The map is not the territory.

I’ve used discovery technology for years to uncover what organisations didn’t want to see. Legacy PST files, forgotten file shares, redundant records held long past their legal expiry. Data that was duplicated, misplaced, misclassified, or flat-out misleading.

It’s never clean, and frankly, it’s rarely trusted.

Recently, I have been involved in the preparation of data for a Digital Twin. Knowing what sort of data risk (mess) is in unstructured data made me really nervous that the project was flawed from the start.

Digital Twins look impressive. Animated, responsive, interactive. They simulate storms, market crashes, disaster recovery, supply chain delays, you name it! They can produce dashboards, forecasts, and perhaps even some regulatory comfort.

But they only work if the data feeding them is clean, current, and complete. And most of the time, let's face it, it’s none of those things.

The simulation is built on ROT and shadows

This isn’t theory. I’ve spent most of my career in enterprise data estates, and I know the truth: around 30% of what’s sitting in unstructured storage is ROT - Redundant, Outdated, or Trivial. A further 50% or more is dark i.e. no one knows what it is, who owns it, or why it's there.

That means up to 80% of your Digital Twin’s inputs could be data junk. Data you don’t need. Data you haven’t reviewed. Data that introduces noise, bias, risk, and cost.

So how exactly do you expect the Twin to give you the truth, when most of what it’s learning from is either irrelevant or unknown?

Six brutal questions your twin can’t answer

Before you put your trust in a Twin, ask:

1. Provenance

Where did this data come from? If your answer involves a shared drive and a shrug, you don’t have provenance. You have a risk.

2. Liveness

Is this data still alive? Real-time simulation needs real-time inputs. If your feeds are cold, your predictions are frozen in the past.

3. Ownership

Who owns this data? If you don’t know who maintains it, you don’t know who’s responsible when it goes wrong.

4. Completeness

What’s missing? Data doesn’t warn you when it’s incomplete. But your model will assume it’s the full picture.

5. Duplication

Your storage may well do block deduplication but that doesn't stop logical duplication. How much bloat are you about to ingest?

6. Trust

Why do we trust this? Because the dashboard is pretty? Or because the data has been audited, validated, and classified?

Lightning IQ – Understand and de-risk data at scale

Lightning IQ doesn’t build the Twin. It makes sure the data going into it is actually fit for purpose.

We scan unstructured data at scale, petabytes of it. We classify, deduplicate, trace lineage, detect ROT, flag sensitive content, and expose blind spots. We tell you what should never be allowed into your planning models. And we do it fast enough that your project doesn’t stall while we audit.

In short, we help you earn the right to trust your Twin.

By doing this, you essentially make the Digital Twin the superior data set. Clean, clear, data-aware. So perhaps now is the time to turn your attention back to the enterprise data set. What are you missing? What could you fix now, not just simulate later?

The Final Question

Before you ask what the twin can predict, ask this:

What makes you so sure it deserves your trust, and can you prove that every dataset it consumes is accurate, current, complete, secure, and legally compliant?

Because if you can’t answer that, your Twin isn’t modelling the future. It’s recycling the past.


Nick Pollard

Nick Pollard is Managing Director (EMEA) for Harmony House Technology. He is a seasoned leader with more than 20 years of experience working in real-time investigation, legal and compliance workflows across highly regulated environments.

Connect