Dear Analyst #125: How to identify Taylor Swift’s most underrated songs using data with Andrew Firriolo

Sometimes pop culture and data analysis meet and the result is something interesting, thought-provoking, and of course controversial. How can one use data to prove definitely which Taylor Swift songs are the most underrated? Isn’t this a question for your heart to answer? Andrew Firriolo sought to answer this question over the last few months and the results are interesting (if you’re a Taylor Swift fan). As a Swiftie since 2006 (moniker for Taylor Swift fans), Andrew wanted to find a way to bridge his passions for Taylor Swift and data analysis. He’s currently a senior data analyst at Buzzfeed, and published his findings on Buzzfeed to much reaction from the Swiftie community. In the words of Taylor Swift, Andrew’s methodology and analysis just “hits different.”

From comp sci to data analytics

Andrew studied computer science at New Jersey Institute of Technology but realized he liked the math parts of his degree over the engineering parts. Like many guests on this podcast, he made a transition to data analytics. Interestingly, it wasn’t a job that propelled him into the world of data analytics. But rather, going to graduate school at Georgia Institute of Technology (Georgia Tech). GIT has some really affordable online technical programs including data analytics. After getting his master’s degree, he worked at Rolling Stone as a data analyst. This is the beginning of Andrew’s exploration into the Spotify API to see the data behind music. You can see some of the articles Andrew published while at Rolling Stone here.

Source: Pocketmags

After Rolling Stone, Andrew landed his current role at Buzzfeed building internal dashboards and doing internal analysis. In both of his roles, he talks about using a lot of SQL and R. A big part of his job is explaining the analyses he’s doing to his colleagues. This is where the data storytelling aspect of a data analyst’s job comes into play. I call this the “soft” side of analytics but some would argue that it’s the most important part of a data analyst’s job. In most data analyst roles you aren’t just sitting at your desk writing SQL queries and building Excel models. You’re a business partner with other people in the organization communication skills are more important than technical skills.

Answering a Taylor Swift question with data

Andrew became a Taylor Swift fan through his sister in 2006. They both listed to the world premier of Taylor’s first album. Given his background in data, Andrew decided to answer a question about Taylor Swift that’s been on his mind for a while: what are Taylor Swift’s most underrated songs?

To read Andrew’s full article, go to this Buzzfeed post.

Andrew’s hypothesis was that there’s a way to use data to prove which songs in Taylor’s discography are most underrated. When I classify something as “underrated,” it’s usually a decision you make with your gut. But it’s always interesting to see the data (and the methodology) for determining if something is truly “underrated.”

Multiple iterations in song streaming analysis

As mentioned earlier, Andrew made good use of Spotify’s API. The API gives you a plethora of information about songs such as how “danceable” or “acoustic” a song is. Each characteristic is measured on a scale of 0 to 1.

For the first iteration of Andrew’s analysis, he simply compared a given song’s streaming performance to the album’s median streaming performance. The hypothesis here is that the less-streamed songs are considered the underrated songs. The result of this analysis was a lot of Taylor’s deluxe tracks.

Source: Genius

The second iteration was to look beyond the streaming performance of the album the song is on. Andrew compared the song’s performance relative to album’s released before and after the current album. This surfaced some more underrated songs.

Getting the opinion of Swifties

While Andrew’s analysis so far yielded some interesting songs, he found that these songs weren’t all that loved by other Swifties.

In his final iteration, Andrew implemented a quality score to his analysis. This is a more subjective number that would take into account the opinion of experts.

At Rolling Stone, they had a rolling list of expert opinions that were published in various places. He had a data set of 1,000 opinions on different Taylor Swift songs that he could use to qualify a song. The big question is, how much weight do you give the quality score? In the end, Andrew decided on a weight od 33% to each metric he tracked:

  1. Percent difference between its lifetime Spotify streams and the median streams of its album
  2. Percent difference between its lifetime Spotify streams and the median streams, including neighboring albums
  3. Average of six rankings of Taylor’s discography from media publications (quality score)

The quality score basically took into account the wisdom of the Swifty community.

Source: Know Your Meme

Getting to the #1 most underrated song: Holy Ground (Red)

Andrew was able to use R–a tool he’s already using every day on his job–to do this analysis. After dumping all the data from the Spotify API into a CSV, he used the Tidyverse R packages do crunch the numbers. One of the most commonly used packages for data visualization in Tidyverse is ggplot. But superimposing the images of Taylor Swift’s albums onto the charts created by ggplot was a new script Andrew had to write in R. I asked Andrew if he had to learn any new skills for this Taylor Swift analysis, and the main skill Andrew said he had to learn was data visualization. Here’s an example of a visual from Andrew’s blog post for the #1 most underrated Taylor Swift song:

Source: Republic Records / Tidyverse / Andrew Firriolo / BuzzFeed

To make sure he was on the right track, Andrew asked other Swifties what their #1 most underrated Taylor Swift song was. To Andrew’s delight, two co-workers said Holy Ground. Getting this qualitative feedback let Andrew know he was on the right track.

On the Buzzfeed article, half of the commenters agree that Holy Ground is indeed the most underrated song. The other half talk about other songs that should on the list. When Andrew posted his analysis on LinkedIn, most people commented on his methodology and thought process (like we did in this episode).

Using science to see which re-releases of Taylor’s songs most resemble the original song

Of course, “science” is used a bit loosely here. But similar to Andrew’s underrated song analysis, this analysis utilized the Spotify API to see which Taylor’s Version song most closely matches the original song. This was Andrew’s first analysis on Taylor Swift published late last year.

Read the Buzzfeed article for the full details on the methodology. Andrew also used R and various packages like the HTTP request package to pull the data from Spotify. To skip right to the results: the #1 song where Taylor’s version is most similar to the original is Welcome to New York.

Source: Republic Records/Big Machine Records/Tidyverse/Andrew Firriolo/BuzzFeed

Euclidean Pythagorean distance scores and Taylor Swift

When Andrew first brought up this concept I just scratched my head. Sounds advanced and if someone is bringing up Euclid in a Taylor Swift analysis, you trust that it must be thorough and accurate.

In reality, this concept harkens back to your high school geometry/algebra days. The distance formula simply measures the distance between two points on an X-Y plot:

Source: HowStuffWorks

In this analysis, Andrew utilized 7 metrics from the Spotify API for each version of Taylor’s songs. So each song could be plotted on an X-Y plot where the X might be the acousticness of the original song and the Y would be the acousticness of Taylor’s Version. The beauty of this formula is that it can find the distance between N points in N dimensions. I definitely went down the rabbit hole on this one to learn more about this formula I originally learned in high school. Here’s an explanation of the distance formula in 3-D space (something we can comprehend visually):

But in this analysis, there are 7 points. That means there are points in 7 dimensions. How do we even visualize that many dimensions? This explanation discusses a solution to this problem of how to think about plotting points beyond three dimensions. Math and linear algebra for the win!

I asked Andrew what the next Taylor Swift analysis will be. He said once he sees enough people asking a question about Taylor Swift that can potentially be answered by data, he’ll start an exploratory analysis (most likely with the Spotify API).

Getting your big break in data analytics

Andrew’s #1 advice for landing a job in data analytics or transitioning to a career in data is getting your master’s degree. We haven’t heard this advice too much on the podcast, but Andrew is a shining example of how a master’s degree in data can help. Especially at a university like GIT where the cost is quite low relative to a traditional university.

Andrew also discussed the importance of knowing SQL as the key technical skill for a data analytics role. Who knew that a database query language from 1970 would still be in high demand today?

Source: Medium / Çağatay Kılınç

The final piece of advice Andrew gave regarding skills you need for a career in data analytics is communication. Specifically, knowing how to communicate your analysis to a non-technical audience. At the beginning of his career at Buzzfeed, Andrew received feedback that his explanations were too technical. He realized that everyone didn’t need to know how the SQL query was constructed and people just cared about the trends and final results.

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!