Stop Being Data-Driven

Why we are fooled by data and how to stop it

Cai Parry-Jones

Published in

Towards Data Science

10 min read

2 hours ago

—

source: unsplash.com

Often, when making decisions based on data, we feel as if we have made a more intelligent, accurate choice. The reality is a little different. From using it to purchase property, to deciding to go on a diet, to selecting a new board member for your company, data can be an amazing catalyst to help make the worst decisions.

Using real-world examples, this article covers common ways data can drive us over a cliff of misinformation. Fortunately, this article also gives actionable, easy-to-use advice on how to avoid such scenarios by following a four-step ladder to assess the quality of data-driven insights:

evaluating the data source,
considering the bias of the data presenter,
recognising the bias of the data reader,
identifying any logical missteps between the data and the insight.

TL;DR

Always question the source of the data. If the source isn’t credible or isn’t provided, don’t be shy in rejecting the entire analysis.
Don’t hesitate to fact-check claims, especially when they seem counterintuitive.
Be aware of the data presenters bias. Ask yourself if they’d present the same analysis if it led to the opposite conclusion.
Be aware of confirmation bias. Ask yourself if you’d accept the same analysis if it led to the opposite conclusion.
Consider whether the analysis makes logical sense. Just because two things are correlated doesn’t mean one causes the other.
Recognise that data interpretation often involves assumptions and leaps in logic. It’s crucial to identify and evaluate these leaps using your own judgment.

Question One — Where did the data come from?

Here is an excerpt from an article published in The Spectator, titled: ‘The unfashionable truth about the riots’. Its subject is the 2024 anti-immigration riots in the UK.

…I decided to do some checking on the employment stats for some of the northern towns that have seen the worst rioting in the past week. I also checked the 2011 statistics and then compared the two. I should warn you in advance that if you’re easily depressed, you should look away now.

Back in 2011, the proportion on out-of-work benefits (including incapacity benefit) in Sunderland was 18 per cent; today it is 19 per cent. In 2011 the unemployment figure in Rotherham was 16 per cent; today it is 18 per cent. In Hartlepool, it was 21 per cent; today, 23 per cent.

— The unfashionable truth about the riots, The Spectator

So, where did this data come from?

In the example, all we have is “the employment stats”… What the heck does that mean? In the UK there isn’t a centralised place for employment statistics, so it’s far from obvious what data the author is referring to. You might be content with dismissing the data and thus dismissing the rest of the article, since its premise is based on this data. However, I’m a try-hard, so I put on my best Sherlock Holmes hat and set about finding the un-cited data.

Sadly, I am not Sherlock Holmes. Despite considerable online sleuthing, I couldn’t find any data that matched the article’s. I did, however, find other “employment stats”. Specifically the Office of National Statistics (ONS)’s measure of unemployment by region. It paints a very different picture. In fact the polar opposite picture! Of the three areas the article highlights as having reduced in employment since 2011, the ONS in fact finds they all increased!* I’m afraid the only unfashionable truth illuminated by this article is the degradation of news media. (Or was it always like this?)

*Data links: Sunderland, Rotherham, and Hartlepool employment stats.

Question Two — What if the opposite result were true?

…Using laboratory and field experiments, we find that signing before–rather than after–the opportunity to cheat makes ethics salient when they are needed most and significantly reduces dishonesty.

— Signing at the beginning makes ethics salient and decreases dishonest self-reports in comparison to signing at the end, Lisa L. Shu et al.

This quote comes from the abstract of a highly influential research paper in Behavioural Science. In case you didn’t quite understand the quote, since it is written in the classic academic obfuscation style, the quote is saying that signing your name on the top of a document makes you less likely to lie about the document than if you had signed on the bottom.

The data came from a collection of researchers based in highly regarded universities, most notably Francesca Gino who was a professor at Harvard Business School at the time of publication. For me, this passes our first question about the data (‘Where did the data come from?’).

Next we need to ask: ‘Would the data be presented if the opposite result were true?’ To answer this we first need to understand a little about the people and/or institutes presenting the data. In this case it is a group of Behavioural Science academics. Academia is a cut-throat industry, with an exceptionally high failure rate. Therefore, it is critical for academics to be considered ‘successful’ as early as possible to avoid being culled. The key factor to determine an academic’s success is, unsurprisingly, their work. But how do you measure the quality of their work? A common method is citation count, that is the number of times an academic’s work is mentioned in the work of other academics.

The issue with this is you are far more likely to to be cited if you have an unexpected/interesting result in your paper. Going back to the original quote, do you think a result where there was no difference in honesty between the group who signed at the top vs the bottom of a document would have been nearly as influential? No, because that would have been the expectation. The conclusion is that there is a strong incentive for academics to generate unexpected/interesting results. Therefore, in our example, we should understand that there is a strong bias for the authors to claim a (significant) difference in honesty between the two groups.

This sort of bias is so strong that it has led the majority of scientific research to be unreproducible by fellow academics. Unreproducible means an academic writes a research paper that copies another academic’s research paper’s method/experiment, but ends up with significantly different results to the original paper. Turning once more to our case study as an example, following the impactful success of the research paper, a number of other studies attempted to run the same or similar experiment. All results of the experiments found no correlation between honesty and location of document signature. It turned out the original paper’s finding was very likely the result of data fraud. As were a number of other influential papers co-authored by the prestigious Harvard professor, Francesca Gino.

This sort of skin-in-the-game bias isn’t only found in academia. From tech vendors posting their latest performances vs competitors, to investment banks presenting to prospective clients that they are no. 1 in the league tables*, bias is rife. And it isn’t only other people you should be concerned about. The strongest bias of all is most likely you—yes, YOU. Commonly referred to as confirmation bias, it is the tendency to search for, interpret, favour, and recall information in a way that confirms or supports your prior beliefs.

Ever wondered why such a mass of people could be so stupid on so many political topics? Sadly, it is more likely a result of your own confirmation bias than the collective dim-wittedness of millions of people.

If you want to avoid the pitfalls of your own biases, as well as the biases of the data presenter, you not only need to ask: ‘Would the data be presented if the opposite result were true?’ But also: ‘Would I consider the data presented if the opposite result were true?’ It may be a tough pill to swallow, but overcoming your own bias may enlighten you more than all the research in the world.

*May Contain Lies, Alex Edmans. Page 130.

Question Three — Is there a flaw with the leap from data to insight?

For most university graduates, having a degree pays.

Over the course of a lifetime, estimates suggest women can expect to earn about £250,000 more if they have a degree, while the figure is roughly £170,000 for men.

— The degrees that make you rich… and the ones that don’t, BBC News

This quote comes from the news department of the prestigious British Broadcasting Corporation (BBC). Specifically, their article titled: ‘The degrees that make you rich… and the ones that don’t’. First things first, is there a reliable data source?

The source. BBC News, The degrees that make you rich… and the ones that don’t

Great news, the answer is (probably*) yes! The article cites their data from another prestigious organisation, the Institute of Fiscal Studies. You may want to delve deeper into the exact report of the data source, but for a surface-level analysis, I’m happy with where this raw data has come from.

Second we want to ask ourselves, ‘would this information be presented if the opposite result was true?’ Given the BBC has (to my knowledge) no reason to support higher education institutions, it’s fair to assume that the BBC would have published an article highlighting no financial benefits to studying at university if the data suggested it.

However, unfortunately, we still can’t trust the BBC’s findings just yet. We have one final important question to ask: ‘Does the analysis make sense?’ After looking at the article’s chart on average earnings by subject, have a go at answering the following question before moving on to the next paragraph: ‘Given people with Medicine degrees earn more than the average degree, does it make sense to assume that the Medicine degree was the cause of the higher salary?’

My answer: Medicine degrees are necessary to become qualified doctors, and doctors generally earn a high income. This supports the idea that a Medicine degree is the specific reason for average higher future earnings. On the other hand, you don’t just need to pick Medicine, you also need to be picked for Medicine. Given the average A-level results for a UK Medicine graduate is an outstanding AAA, you could argue the reason for the high average salaries is because the average Medicine student is exceptionally cognitively capable (in academia at least), which has nothing to do with whether they actually chose Medicine as a degree or not.

What about the degree with the second highest average salary, Economics? Unlike Medicine, you don’t strictly need an Economics degree to qualify for any specific high-paying jobs. I’d argue an Economics degree is much less likely a direct factor in increasing a graduate’s future salary. For example, someone who is genuinely interested in money at school will be more likely to base their career decisions on what pays more. That same person may also be more likely to pick Economics at university.** If that’s true, then pushing someone who is not interested in money to study Economics may not benefit their future career or financial prospects as much as the article is implying.

Similarly, someone who earns a university degree in general may be little to no better off than if they hadn’t gone to university at all. It may simply be a side effect. For example, in the UK children with wealthy parents are significantly more likely to attend university. The wealth of their parents may also have an impact on the child’s high-paying job prospects; for example, they might provide invaluable knowledge to their children on how to interview, climb the greasy corporate ladder, and/or network. Given this thought process, the article’s finding that “For most university graduates, having a degree pays” is, at the very least, not a proven fact, and should not be treated as such.

Deciding what ‘makes sense’ is subjective; for example, you may not have been convinced by my arguments above. However, it’s important to recognise the leaps in assumptions people make from data to insights, and to do your own ‘makes sense’ assessment before accepting them.

*It’s important to acknowledge the limitations of time we have on the depth of our investigation. For very serious things you will want to go deeper, but as a pragmatist, it isn’t possible for every data analysis we see.

**You may be thinking: ‘hang on a minute, you didn’t provide any data to back up that theory’. And you’re right! However, often times we don’t have all the data available to us, we simply have to use our own experiences and intuition. As an Economics graduate who has also been fascinated with finance from an early age, the theory makes sense to me.

Wrapping up

In conclusion, data should not be the driver of your decisions. Like an unpredictable friend, data belongs firmly in the passenger seat, preferably with the child-lock on. They can provide suggestions, but ultimately you need to be sceptical before making any decision based off of them.

Remember, when looking into data, statistics, and/or research, you need to ask yourself:

Where did this data come from?
Would this data be presented if the opposite result were true?
Would I consider the data if the opposite result were true?
Is there a flaw with the leap from data to insight?

Final note for data scientists/analysts

For data scientists and analysts, these questions are not just a safeguard — they’re a responsibility. As gatekeepers of data-driven insights, your role is crucial in ensuring that decisions are informed by sound reasoning, not just raw numbers and algorithms. By applying this mindset, you can avoid being that unpredictable friend in your organisation. And who know’s, one day you may even be invited into the driving seat.

Stop Being Data-Driven

Why we are fooled by data and how to stop it

TL;DR

Question One — Where did the data come from?

Question Two — What if the opposite result were true?

Question Three — Is there a flaw with the leap from data to insight?

Wrapping up

Final note for data scientists/analysts

Midjourney opens website to all users, offering 25 free AI image generations

Sabrent’s new Thunderbolt 4 Docking Station with Monitor Desk Mount: supports up to 3 displays

Micron Expands Datacenter DRAM Portfolio with MR-DIMMs

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

Stop Being Data-Driven

Why we are fooled by data and how to stop it

TL;DR

Question One — Where did the data come from?

Question Two — What if the opposite result were true?

Question Three — Is there a flaw with the leap from data to insight?

Wrapping up

Final note for data scientists/analysts

Midjourney opens website to all users, offering 25 free AI image generations

Sabrent’s new Thunderbolt 4 Docking Station with Monitor Desk Mount: supports up to 3 displays

Micron Expands Datacenter DRAM Portfolio with MR-DIMMs

Supercharge Your Portfolio with Future Tech Stocks!

Join us for Profitable Insights & Expert Tips!

Subscribe