Dear Analyst #65: Eliminating biases in sports data and doing a data science bootcamp with Caiti Donovan

When you think of sports and data, you may think about all the data collect on player performance and game stats. There’s another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, the VP of Data & Insights at Sports Innovation Lab, a sports market research firm. Caiti started her career in marketing and business development at Viacom and Spotify where she used data storytelling to work with advertisers and partners. More recently she learned how build the data systems she was once only a consumer of. We’ll discuss how she made the transition to data, getting a data science certification at The Fu Foundation School of Engineering and Applied Science at Columbia University, and current projects she’s working on at Sports Innovation Lab.

Working with data at ViacomCBS and Spotify

Caiti spent 15 years in marketing and sales roles where data was a core part of her day-to-day projects. She used a lot of proprietary data systems and even helped build some of these systems. Using the data available to her, she’d take different datasets and turn the data into a format useful for data storytelling. These stories would be used for partnership development or working with advertisers. Data storytelling is a common theme on this podcast. See episode 62 with Janie Ho, episode 56 with John Napolean-Kuofie, and episode 35 on the Shape of Dreams.

At ViacomCBS, Caiti would look at the data behind shows like Jersey Shore and SpongeBob to see what type of revenue opportunities her team could create based on the audience of these shows. The data could also be analyzed to help inform content development for these shows. The goal was to understand their younger fans and figure out what it meant to have conversations with the fans of these shows.

After a stint working with a few startups in a consulting capacity, Caiti eventually landed at Spotify. At the time, Spotify had a hard time turning all the data they were sitting on into narratives in a B2B and B2C context. She worked with clients like the NBA, Ford, and Nike. In terms of the data stories she was saying to her clients from a B2B perspective, she also had to make sure it carried over to the B2C side (Spotify subscribers).

From there, Caiti made a big hump from entertainment to sports. She realized her “purpose meets passion” moment is finding ways to use data to have impact on the world. She wanted to tackle challenges faced by women in sports and also find a way to better connect with the fans of women’s sports. Caiti eventually co-founded the non-profit SheIS Sport to bring together every single professional women’s sports league. Through this experience, Caiti learned a lot about the biases and inequities in data in the sports world. She realized she needed more technical expertise to have a direct impact on how data is collected and analyzed in this world, and went back to school for data science (more on this later).

Spotify’s billion points of data per day

When Caiti was at Spotify, one of her projects was figuring out how to translate the billion points of data generated by Spotify users into product opportunities. In addition to product opportunities, the ad sales team needed to have stories they could tell to their clients that were backed up by data.

She started evaluating how her team could clean and dissect the data to productize the data Spotify was generating and storing every day. Using proprietary algorithms, her team analyzed people’s music listening behavior with to figure out what a listener might be doing at the time they were listening to a song. This became known as the “moment marketing” which carried a lot of context about the subscriber. This context allowed advertisers to tap into the moment the subscriber was in like if they were at the gym, in their car, or at a party. Some of the metrics the team analyzed included bpm, device-level data, and types of playlists people were creating. What better time for Nike to target a consumer with new shoes than when the consumer might be doing a workout or training for a sport?

Wanting to build her own data systems

To get closer to the data systems she was using, Caiti made the decision to go back to school and learn more about data science. She was accepted into a data science bootcamp at Columbia’s Fu School of Engineering and Applied Science. The topics covered in the bootcamp included Python, ETL processes, machine learning, and different tools to build data systems.

It took Caiti 6-7 years to make the decision to go back to school for a degree in data science. The catalysts for her decision included the data discrepancies she sees in the sports world and the pandemic.

When Caiti was at SheIS Sport, her team created a campaign report showing that 4% of sports media coverage focuses on women’s sports. The campaign ended up receiving half a billion impressions, 4.2 millions engagements online, and 25K people posting their stories. She realized this 4% number only covers linear TV and no digital channels. Without proper data, advertisers, partners, and leagues cannot evaluate the opportunity available in women’s sports. It’s a chicken and egg scenario where fans wanted more media coverage, and advertisers are saying they’ll get more involved if they see more eyeballs and people going to these games.

Experience at a data science bootcamp

Caiti had already been accepted into the Columbia program at end of 2019 and just deferred to the spring semester in 2020. She also looked at schools like Flatiron and some other programs in New York. What drew her to Columbia’s program was the mix of backend technical topics but also learning about related tools like Tableau and Hadoop.

Caiti’s data science bootcamp was the first bootcamp to go completely virtual. Given the intensity of the program , she stepped out of day-to-day operations at SheIS Sport to focus on her classes. The schedule was very tough and she was spending 15-20 hours per week outside of class doing homework. The difficulty with doing this virtually (as many knowledge workers can attest to) is being able to lean over to see your colleague’s screen and say “try out this function here in your code” to make the learning process more fluid.

The final project at her bootcamp had to use machine learning in some capacity. Her group needed to have a big data source and they ended up using multiple APIs. They wanted to evaluate how COVID affects player performance. Questions to be answered included what if there are no fans in the audience? Would this impact player performance? One study from the NBA I found interesting was the bubble’s impact (or lack thereof) on home court advantage.

Getting data on the NBA and WNBA and training a machine learning model

The NBA was easy since the whole season was in a bubble in 2020 but the WNBA was mixed. The NBA has this great API that goes back 10 years. For WNBA, her team had to scrape the Sports Reference website. This involved manually pulling down CSVs and uploading them into their model.

At the end of the day, Caiti’s team was not able to fully train any of the machine learning models because of data inconsistencies. It’s difficult to get consistent player data because players move to different teams, they have new teammates, and get injuries during the season. Instead of training the model, her team just did a linear regression on the data available. They saw a correlation that when most of the players are in the bubble, NBA and WNBA players played better.

Current projects at Sports Innovation Lab

Caiti is currently looking at fan data and how to democratize data for the sports industry to bring more equity to women’s sports. Ultimately, she wants to make sure the hypotheses and trends claimed in the sports industry are backed up with data. Advanced systems have been created to track player and game analytics since there are a lot of second-order effects on industries like sports betting and fantasy sports. On the business side which focuses on fan metrics, the industry is still 5 years behind.

We are seeing in the entertainment and retail industries a lot more innovation in how to get data from customers and consumers. Sports hasn’t done as much with data from fans. If you don’t have understanding of fan behavior, you’re missing out on a huge contextual piece on how a team or league may appear to brands and partners.

Data tools Caiti is excited about

At the end of our conversation, Caiti shared some tools she’s super excited about learning and using with her data projects. She mentioned a nice mix of open-source and commercial tools:

  • She started using Shiny a lot to build internal dashboards. It allows her team to visualize structured data but gives them the ability to poke holes in their data. This helps them find ways to further clean up and transform the raw data.
  • Tableau is a juggernaut in the data visualization space. It has acted as a connector between the sales team and Caiti’s team who is a little more in the weeds with the data. Tableau streamlines things so Caiti’s sales team can explore data with potential clients easily.
  • A final tool is RStudio which one of Caiti’s colleagues works in a lot.

Sports Innovation Lab is hiring engineers and analysts. If you believe in their mission, contact them about potential opportunities.

Other Podcasts & Blog Posts

No other podcasts!