Dear Analyst #123: Telling data stories about rugby and the NBA with Ben Wylie

When you think of data journalism, you might think of The New York Times’ nifty data visualizations and the Times’ embrace of data literacy for all their journalists. Outside of The New York Times, I haven’t met anyone who does data journalism and data storytelling full-time until I spoke with Ben Wylie. Ben is the lead financial journalist at a financial publication in London. Like many data analysts, he cut his teeth in Excel, got his equivalent of a CPA in the UK, and received his master’s degree in journalism. In this episode, we discuss how his side passion (sports analytics) led him to pursue a career in data journalism and how he approaches building sports data visualizations.

Playing with rugby data on lunch breaks

When Ben worked for an accounting firm, he would pull rugby data during his lunch breaks and just analyze it for fun. One might say this started Ben’s passion in data storytelling because he started a blog called The Chase Rubgy to share his findings. The blog was a labor of love, and at the end of 2019 he had only focused on rugby. After building an audience, he realized data journalism could be a promising career path so he did some freelance sports journalism at the end of his master’s course. At the end of 2022, he started Plot the Ball (still a side project) where the tagline is “Using data to tell better stories about sport.”

Learning new data skills from writing a newsletter

Ben spoke about how writing Plot the Ball forced him to learn new tools and techniques for cleaning and visualizing data. All the visualizations on the blog are done in R. A specific R package Ben uses to scrape data from websites is rvest. Through the blog, Ben learned how to scrape, import, and clean data before he even started doing any data visualizations. Sports data all came from Wikipedia.

I’ve spoken before about how the best way to show an employer you want a job in analytics is to create a portfolio of your data explorations. Nothing is better than starting a blog where you can just showcase stuff you’re interested in.

How the NBA became a global sport

One of my favorite posts from Plot the Ball is this post entitled Wide net. It’s a short post but the visualization tells a captivating story on how the NBA became global over the last 30 years. Here’s the main visualization from the post:

Source: Plot the Ball

Ben first published a post about NBA phenom Victor Wembanyama in June 2023 (see the post for another great visualization). Ben talks about this post being a good data exercise because there is no good NBA data in tabular form. This “waffle” chart was Ben’s preferred visualization since it allows you to better see the change in the subgroups. A stacked bar chart would’ve been fine as well, but since each “row” of data represents a roster of 15 players, the individual squares abstracts the team composition each year.

Home Nations closing the gap with Tri Nations in rugby

Ben talked about another popular post from his blog entitled Heading South. The post started as a data exploration exercise where Ben was simply trying to find trends instead of telling a story. For some background, rugby has traditionally been dominated by a few teams (e.g. Australia, New Zealand, and South Africa). The most recent finals was between New Zealand and South Africa and these two clubs have won a majority of World Cups.

Ben was interested in seeing how these elite teams and other teams were trending over time. Ireland and France have started doing well over the last few years but there is not bird’s eye view of how these teams are performing as a whole. So Ben decided to create this visualization:

Source: Plot the Ball

Cognitive overload is a concept many data visualization professionals care about. When a visualization has more information than an individual has the mental capacity to process, the message and story gets lost. A few factors about the visualization above eases the path for understanding the story:

  1. Gridline color is muted
  2. Data labels only show up at the end of the line charts
  3. The colors of the lines match the series name in the title of the chart

If it’s not clear what the trend is, the main header of the chart even tells you the key takeaway from the chart.

Other Podcasts & Blog Posts

No other podcasts or blog posts mentioned in this episode!