Dear Analyst #83: Cleaning up and doing an exploratory data analysis of all Friends episodes raw data in Excel
Podcast: Play in new window | Download
Subscribe: Google Podcasts | Spotify | Stitcher | TuneIn | RSS
I was playing around with a dataset on TikTok recently to show a PivotTable trick. The raw data is a list of all Friends episodes that have aired starting in 1994. I noticed that the raw data requires some cleanup so decided to take the dataset and to some cleaning operations and also try out some exploratory data analysis. I also explored building a simple dashboard using PivotTable Slicers in Excel. This is the process I might follow with cleaning up any raw dataset. The data isn’t too complex so it might be a good dataset to practice on. If you want to use the sample data, download the raw data here. This episode might be better for watching than listening, so you can see the full YouTube video below as well.
A quick view of the data
The dataset is relatively straightforward. It’s just a list of all Friends episodes with various attributes associated with each episode:
Issues with the dataset
You’ll notice some issues with this raw data that will make it difficult to do any exploratory analysis:
Episodecolumn is in dates instead of integers
Rating/Sharecolumn is one value and ideally they should be separated into two unique columns
U.S. viewerscolumn is actually a text column where the “number” is written out as “21.5 million”
There are some other formatting issues with the dataset but these three issues prevent us from putting this data into a PivotTable where we can filter and sort the data. We want numbers to be actual numbers so that we can apply a
AVERAGE to the column when summarizing the column in a PivotTable. The tutorial walks through solving these issues with the
Simple dashboarding with PivotTable Slicers
In addition to doing some exploratory analysis, we also want to make this data usable for our colleagues. By adding some basic PivotTable Slicers, we can turn the analysis into an interactive dashboard that anyone can filter and sort without having to go into the PivotTable options. Slicers are also available in Google Sheets too!
Other Podcasts & Blog Posts
No other episodes mentioned!
[…] see in school. When you’re working with real world data, most of the time you’re just cleansing data to get the data in a usable […]
[…] talked about data cleaning in multiple episodes and I’ve discovered a few things about the process over the […]