One of the more popular courses you could take at my college to fulfill the finance major requirements was Behavioral Finance. The main “textbook” was Inefficient Markets and we learned about how there are qualitative ways to value a security beyond what the efficient market hypothesis purports. During the financial crisis of 2008, psychology professor and behavioral economist Dan Ariely published Predictably Irrational to much fanfare. The gist of the book is that humans are less rational than what economic theory tells us. With the knowledge that humans are irrational (what a surprise) when it comes to investing and other aspects of life, the capitalist would try to find the edge in a situation to get a profit. That is, until, recent reports have surfaced showing that the results of Dan Ariely’s experiments are fabricated (Ariely partially admits to it). This episode looks at how the data was potentially fabricated to skew the final results.
Background on the controversy surrounding Dan Ariely’s fabricated data
In short, Ariely’s main experiment coming under fire is one he ran with an auto insurance company. The auto insurance company asks customers to provide odometer readings. Ariely claims that if you “nudge” the customer first by having them sign an “honesty declaration” at the top of the form saying they won’t lie on the odometer reading, they will provide more accurate (higher) readings.
I was a fan of Predictably Irrational. It was an easy read, and Ariely’s storytelling in his TED talk from 15 years ago is compelling. I first heard that Ariely’s experiments were coming under scrutiny from this Planet Money episode called Did two honesty researchers fabricate their data? The episode walks through how Ariely a thought leader and used his status to get paid behavioral economics consulting gigs and to give talks. Apparently the Israeli Ministry of Finance paid Ariely to look into ways to reduce traffic congestion. In the Planet Money episode, they talk about how other behavioral scientists like Professor Michael Sanders applied Ariely’s findings to the Guatemalan government by encouraging businesses to accurately report taxes. Sanders was the one who originally questioned the efficacy of Ariely’s findings. Here is part of the abstract from the paper Sanders wrote with his authors:
The trial involves short messages and choices presented to taxpayers as part of a CAPTCHA pop-up window immediately before they file a tax return, with the aim of priming honest declarations. […] Treatments include: honesty declaration; information about public goods; information about penalties for dishonesty, questions allowing a taxpayer to choose which public good they think tax money should be spent on; or questions allowing a taxpayer to state a view on the penalty for not declaring honestly. We find no impact of any of these treatments on the average amount of tax declared. We discuss potential causes for this null effect and implications for ‘online nudges’ around honesty priming.
If you want to dive deeper into Dan Ariely’s story, how he rose to fame, and the events surrounding this controversy, this New Yorker article by Gideon Lewis-Kraus is well researched and reported. NPR also did a podcast episode about this a few months ago. This undergraduate student only has one video in his YouTube account, but it tells the story about Ariely quite well:
Instead of discussing Ariely’s career and his character, I’m going to focus on the data irregularities in the Excel file Ariely used to come up with the findings from the auto insurance experiment. This podcast/newsletter is about data analysis, after all.
Instead of dissecting the Excel file myself, I’m basically going to re-hash the findings from this Data Colada blog post. Data Colada is a blog run by three behavioral scientists: Uri Simonsohn, Leif Nelson, and Joe Simmons. Their posts demonstrate how “p-hacking” is used to massage data to get the results you want.
Irregularity #1: Uniform distribution vs. normal distribution of miles driven
This is the raw driving dataset from the experiment (download the file here). Each row represents an individual insurance policy and each column shows the odometer reading for each car in the policy before and after the form was presented to the customer.
The average number of miles driven per year irrespective of this experiment is around 13,000. In this dataset, you would expect to see a lot of numbers around 13,000, and a few numbers below 1,000 and a few numbers above 50,000 (as an example). This is what normal distribution or bell curve looks like:
In Ariely’s dataset, there is a uniform distribution of miles driven. This means the number of people driving 1,000 miles per year is similar to those who 13,000 miles/year and those who drove 50,000 miles/year.
No bell curve. No normal distribution. This by itself makes the dataset very suspect. One could argue that the data points were cherry-picked to massage the data a certain way, but the other irregularities will show that something more sinister was at play. You’ll also notice in the chart created by Data Colada is that the data abruptly stops at 50,000 miles per year. Although 50,000 miles driven per year is a lot, it’s highly unlikely that there are no observations above 50,000.
Irregularity #2: Mileage reported after people were shown form are not rounded and RANDBETWEEN() was used
People in the experiment were asked to recall their mileage driven and write the number on a piece of paper. If you were to report on a large number, you’d probably round the number to the nearest 100 or 1,000. In the screenshot below, you’ll see how some of the reported mileage are indeed rounded. What’s peculiar is that mileage reported after people were shown the form (Column D) were generally not rounded at all:
Did these customers all of a sudden remember their mileage driven down to the single digit? Highly suspect. Data Colada suggests that the
RANDBETWEEN() function in Excel was used to fabricate the mileage in Column D. The reasoning is that
RANDBETWEEN() doesn’t round numbers at all.
Even the numbers in Column C (mileage reported before shown the form) seem suspect given how many places most numbers go to. If Ariely or members in his lab did in fact use
RANDBETWEEN() to generate the mileage in Column D, they could’ve at least tried to hide it better using the
ROUND() function which would allow them to round the numbers to the 100 or 1,000th place. This is just pure laziness.
This chart from Data Colada further shows how the last digit in the baseline mileage (before people were shown the form) is disproportionately 0. This supports that these numbers are indeed reported accurately. The last digit in the updated mileage (after people were shown the form) again has a uniform distribution further adding to the evidence that the numbers were fabricated.
Irregularity #3: Two fonts randomly used throughout Excel file
This is by far the most amateur mistake when it comes to judging the validity of any dataset. When you open the Excel file, something instantly feels off about the data. That’s because half of the rows have Calibri font (default Excel font) and the other half have Cambria font (in the same font family as Calibri).
Were some of the rows copied and pasted from another Excel file into the main file and then sorted in some fashion? Did someone incorrectly select half the data and set it to Cambria?
According to Data Colada, the numbers probably started out in Calibri and the
RANDBETWEEN() function was used again to generate a number between 0 and 1,000 to be added to the number in Calibri. The resulting number is in Cambria:
To recap what the data hacking looks like with this irregularity:
- 13,000 baseline car readings are composed of Calibri and Cambria font (almost exactly 50/50)
- 6,500 “accurate” observations have Calibri
- 6,500 new observations were fabricated in Cambria
- To mask the new observations, a random number between 0 and 1,000 was added to the original numbers in Calibri to form the fabricated numbers in Cambria
In the screenshot above, this pattern of the Cambria number being almost identical to the Calibri number is what leads Data Colada to believe that the Cambria numbers (half the dataset) are fabricated.
To put the cherry on top of this font irregularity, very few of the numbers in Cambria font are rounded. As discussed in irregularity #2 above, using
RANDBETWEEN() without using
ROUND() will lead to numbers not being rounded. Not having rounded numbers is again, highly suspicious when you consider that these mileage numbers are reported by humans who tend to round large numbers.
Why did Ariely allegedly fabricate the numbers?
Easy. Fame, notoriety, and consulting gigs. Again, I’d read the New Yorker piece to learn more about Ariely’s background and character. The narrative Ariely wanted to tell was that nudges have an outsize impact on behavior, and the data was skewed to prove this.
Ariely actually acknowledged Data Colada’s analysis and basically responded with “I’ll check my data better next time” over email. The New Yorker article talks about maybe someone at the auto insurance company fabricating the data before it was sent to Ariely, which means Ariely can claim he had no hand in fabricating the data.
Even if that were the case, you wouldn’t at least scroll through the dataset to see–I don’t know–that the data is in two different fonts? Your future TED talks, published books, and paid consulting gigs are dependent on your findings from this Excel file and you don’t bother to check the validity of it? The file is just over 13,000 rows long so it’s not even that huge of a dataset. While not on the same scale, this narrative feels similar to what happened with Theranos. Similar to Elizabeth Holmes, Ariely claims he can’t recall who sent him datasets or how the data was transformed (as reported in the New Yorker).
Excel mistakes are different from fabricating data
I’ve dissected a few Excel blunders on the podcast such as the error that led to a $6.2B loss at JPMorgan Chase, Enron’s spreadsheet woes, the DCF spreadsheet error leading to a mistake with a Tesla acquisition, and many others. In these cases, the pilot simply misused the instrument which led to a massive mistake.
With the fabricated data in Ariely’s experiment, Ariely, members of his lab, or someone at the auto insurance company knowingly massaged the data with the intention of not getting caught. Better auditing or controls cannot prevent data drudging to this magnitude.
Perhaps Ariely (or whoever fabricated the data) knew that if they could tell this narrative that “nudging” does indeed lead to changes in human behavior, there would be a size-able financial payout somewhere down then line.
Blowing the whistle on Ariely
In the Planet Money episode referenced earlier, Professor Michael Sanders is credited with first calling bullshit on Ariely’s findings after his own failed project with the Guatemalan government. Data Colada’s blog post really made clear what issues exited in Ariely’s spreadsheet.
Data Colada kind of reminds me of the European Spreadsheet Risks Interest Group (EuRpRIG), a group of individuals who document all these Excel errors in the hopes that analysts won’t make the same errors. By detailing Ariely’s spreadsheet tactics, hopefully it will be easier to spot issues like this in the future.
The New Yorker article shows that it’s hard to evaluate the true intentions of each party in this case. It’s easy to point fingers at Ariely and say he committed spreadsheet fraud for his own personal gain. But what about Data Colada? While the behavioral scientists behind the blog seem like upstanding citizens, who knows what benefit they stand to gain from uncovering these issues and calling out fraud? Simmons, Nelson, and Simonsohn also get their share of the limelight in this recent WSJ article highlighting the impact of the group’s research.
Like Ariely, maybe more consulting gigs get thrown their way based on their ability to take down high profile authors and scientists? Remember when Hindenburg Research came out with the hit piece on Nikola leading to the resignation of the CEO? Not only did Hindenburg stand to gain from short-selling the stock, they also drew more attention to their investment research services. They also probably got more inbound interest from people who have an axe to grind with some other company CEO and want to take down the company.
Open source wins the day
I’ve been a fan of open source ever since I got into software since, well, the whole fucking Internet runs on it. One of my favorite data cleaning tools (OpenRefine) is completely free to use and is just as powerful as Microsoft Power Query for cleaning data.
The beautiful thing about open source is that anyone can analyze and investigate how the code really works. There is no narrative about what the tool or library can do. These same values should also be applied to researchers and scientists. I really like how the Data Colada team ended their post on Ariely’s spreadsheet issues:
There will never be a perfect solution, but there is an obvious step to take: Data should be posted. The fabrication in this paper was discovered because the data were posted. If more data were posted, fraud would be easier to catch. And if fraud is easier to catch, some potential fraudsters may be more reluctant to do it. Other disciplines are already doing this. For example, many top economics journals require authors to post their raw data. There is really no excuse. All of our journals should require data posting. Until that day comes, all of us have a role to play. As authors (and co-authors), we should always make all of our data publicly available. And as editors and reviewers, we can ask for data during the review process, or turn down requests to review papers that do not make their data available. A field that ignores the problem of fraud, or pretends that it does not exist, risks losing its credibility. And deservedly so.
Hopefully this episode nudges you in the right direction.
Other Podcasts & Blog Posts
In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting: