
A Lot of Your Data Is Redundant, Obsolete, or Trivial (ROT), and It Puts You at Risk
You can’t govern or secure data you’ve lost track of.
Bruno Kurtic
President and CEO, Co-founder
Share:
Every organization carries a growing weight of redundant, obsolete, and trivial data, also known as ROT. It lives in object stores, databases, data warehouses, cloud drives, and file shares. It adds cost, slows down systems, and hides sensitive information. Most teams underestimate its cost, and impact until something is exposed.
- Redundant data includes outdated objects in stores like AWS S3, Azure Blob, or Google Cloud Storage (GCS), repeated exports in data warehouses, duplicate files in cloud drives, and outdated versions of documents in file shares.
- Obsolete data shows up as deprecated schemas in databases, archived logs in object stores past retention window, or legacy reports that are no longer referenced.
- Trivial data includes personal media files uploaded to shared drives, temporary analytics outputs dumped into object stores, abandoned Jupyter notebooks, or scratch tables created during development.
Historically, ROT was treated as a storage problem. But now, with AI models, agents and copilots and enterprise search tools crawling everything, it’s become a provenance, security and compliance issue.
Without clear provenance, outdated or incorrect datasets may be used in analytics, dashboards, or AI systems. This leads to flawed outputs, risky decisions, and long-term loss of trust in data-driven initiatives.
ROT data also undermines compliance. Regulations like GDPR and CPRA emphasize data minimization and retention control. But you can’t govern what you don’t know exists. ROT stretches the attack surface, introduces ambiguity, and makes classification and response more difficult.
And let’s not forget, ROT inflates infrastructure, backups, and scan overhead. ROT often makes up more than half of what enterprises store across these systems.
Deleting ROT manually is a losing game. Objects, tables, files are copied, renamed, saved, and forgotten every day. Instead, organizations need a persistent way to detect ROT, assess risk, and reduce its footprint over time.
The key is continuous discovery, classification and fingerprinting. AI-powered tools can identify ROT based on business context, duplication, or similarity to other datasets. This goes beyond regex or keywords. It’s about knowing what the data is, not just what strings it contains. It means fingerprinting data to identify derivatives, even if they are not full copies. It means understanding data taxonomy to identify trivial data that is not business relevant.
Lineage and duplication analysis let you trace which dataset or file is the source and which are just copies or derivatives. Contextual tagging reveals who owns it, who’s using it, and whether it contains sensitive data.
With this context, you can apply governance policies that retain, archive, or delete data based on risk and value. Instead of relying on manual tags or shared folders, you enforce rules consistently across the environment.
ROT management isn’t about deleting everything old. It’s about reducing noise and improving fidelity. A smaller, better-understood data footprint means fewer false positives, lower risk, narrower data risk surface, and better protection against unwanted exposure.
One real example: an S3 bucket used for analytics exports contains 20 versions of a customer revenue dataset. Each was generated by different jobs, at different times, with no clear ownership or tagging. Some are obsolete, some redundant, and one contains misclassified PII. If it is unclear which dataset has provenance, a wrong one may be used for model training or inference resulting in poor or inaccurate AI capabilities that may expose sensitive data. This is ROT, but it’s also risk.
Managing ROT is foundational. You can’t govern or secure data you’ve lost track of. ROT may be invisible, but its consequences aren’t.
What is your biggest concern and why: redundant, obsolete, or trivial data?
#DataSecurity #DataGovernance #ROT