Your Data Retention Policy Is the Most Ignored Policy You Have

Summary

Most organizations assume keeping data is safe and deleting it is risky. The legal and business reality is the opposite. When Nathan Ulery received a breach notification from Columbia University for data the school had held for 30 years without a policy basis to keep it, the incident illustrated a problem most organizations share: data retention policies that exist on paper but aren't enforced in practice. This piece examines the legal trajectory pushing organizations toward storage limitation, the three categories of risk that stale data creates, and a practical exercise any CIO or CISO can run today to measure how much unnecessary risk their organization is currently carrying.

[Estimated read time: 5 minutes]

A thirty-year-old name on a marketing list

Last week I opened the mailbox to find a letter from Columbia University. 

My son is starting his college search, so I assumed it was a recruiting mailer addressed to the wrong member of the household. It was not. It was a notice of security incident. An attacker had taken my name and social security number from Columbia’s systems. 

But here is the better question. Why did Columbia University have my social security number in the first place? 

I never applied to Columbia. I went to DePauw University in Greencastle, Indiana, a long way from Manhattan in more than just miles. 

The most plausible explanation lands in 1995. That year I took the ACT and the SAT. Both organizations used social security number as a primary student identifier at the time. Both ran student search programs that sold prospective student data to colleges. ACT has since stopped collecting social security numbers, which is itself a quiet admission that they were never needed. 

So Columbia almost certainly bought my data thirty years ago to send me a recruiting brochure, never deleted it, and lost it in a breach. 

Their own data retention policy says they shouldn’t have had it

The argument against Columbia is not hard to make, because Columbia made it for me. 

Columbia’s published retention policy is plain. Records of admitted-but-not-enrolled applicants and denied applicants are maintained for ten years, then quarantined. 

I was never admitted. I was never denied. I was never an applicant. I was a name on a marketing list. And even if I had been an applicant, that record should have been quarantined twenty years ago. 

The data Columbia lost was data they had no policy basis to keep, no legal reason to retain, and no business purpose to hold. 

The data privacy law direction is one-way

Organizations often default to the assumption that holding data is safe and deleting it is risky. The legal trajectory is moving the other direction, and quickly. 

California’s CPRA was the first US law to explicitly require both purpose limitation and storage limitation: collect only what you need, retain only as long as you need it. Virginia, Colorado, Connecticut, and Texas have followed with broadly similar provisions. New York, Columbia’s home state, has the SHIELD Act, which holds organizations accountable for data they keep beyond legitimate need. More states are coming, and federal direction is consistent with them. 

And FERPA, the law universities most often cite when pressed on student data, doesn’t even apply to applicants who never enrolled. Columbia couldn’t have invoked FERPA to justify keeping my record. There is no FERPA record. There’s just a thirty-year-old name with a social security number that someone bought, no one questioned, and no one deleted. 

The pattern across these laws is consistent. Compliance and ethics have moved past deleting data on request. The questions concerning compliant, ethical data stewardship are should you have collected it in the first place, and do you have any defensible reason to still be holding it. 

For most organizations, on most fields, the honest answer is no. 

The risks of data you do not need

It’s easy to see why organizations hoard data: Classifying it is hard, and enforcing deletion takes governance. Both require deciding what data should be kept and what can be let go of. And that decision requires accepting that someday someone will say, “I wish we had kept that one thing.” 

So they keep everything “to be safe,” and the cost feels free. Until they’re mailing out letters and they realize they were wrong. 

There are three flavors of that risk, and the frequency of each is increasing. 

The first is the breach. Every record kept past its useful life is a record that may eventually appear in a notice letter. 

The second is AI exposure. A well-intentioned employee summarizes a file using an AI tool. A vendor trains a model on data that should not have been retained at all. A shadow AI workflow no one approved gets wired into a business process. Data you forgot you had does not stay forgotten when an AI tool comes looking for it. 

The third is the insider problem. Sometimes a rogue employee. Sometimes someone hired under false pretenses, as in several well-documented cases involving cybersecurity firms hiring what turned out to be foreign operatives. The more sensitive data sitting accessible inside your environment, the larger the attack surface misplaced trust exposes. 

None of these risks scale linearly with the size of your organization. They scale with the volume and sensitivity of the data you keep.

A breach is a brand event

Before joining Resultant, I led a B2B marketing agency. I know how much time, money, and discipline CMOs invest in building brand affinity. I’ve also seen how a single security incident can erase years of that investment in a matter of weeks. The damage is sharper still when the breach involves data the organization shouldn’t have been holding in the first place. 

Columbia’s lack of care for my data does not make me want to send my son there. That’s one application Columbia is unlikely to receive. Some version of the same calculation is happening, quietly, in some of the other 870,000 households that got the same letter. 

A breach is not only a security event. It is a brand event mailed to every person whose data was lost. Every recipient is asking the same question Columbia is struggling to answer: why did you have my data, and why did you keep it? I know they are struggling. I called the 800-number, emailed the address on their website, and I am still waiting for a response. The breach is the first failure. The response is the second. 

The same dynamic applies in the public sector. Constituents who had no choice about which agency held their data still notice when that data ends up in the news, and the trust public institutions rely on won’t survive many letters in mailboxes. 

What to do about it

The most secure data is the data you no longer have. It cannot be breached. It cannot be fed into an AI tool by mistake. It cannot be used by a rogue employee. 

Here’s a straightforward exercise in ethical data stewardship for any CIO or CISO. Choose one system in your organization. Assess what data it truly needs to produce your desired results and how long it needs to keep it. Then identify what the systems should have already gotten rid of. The amount of unnecessary data that system is hanging onto measures how much risk your organization is currently carrying, and how much trust you stand to lose when that risk becomes a notification to a constituent or customer. 

Most organizations won’t run that exercise until the letter’s already been printed. 

About the author

Connect

Find out how our team can help you achieve great outcomes.

Insights delivered to your inbox