Dear Analyst https://www.thekeycuts.com/category/podcast/ A show made for analysts: data, data analysis, and software. Mon, 10 May 2021 03:32:48 +0000 en-US hourly 1 https://wordpress.org/?v=5.7.2 This is a podcast made by a lifelong analyst. I cover topics including Excel, data analysis, and tools for sharing data. In addition to data analysis topics, I may also cover topics related to software engineering and building applications. I also do a roundup of my favorite podcasts and episodes. KeyCuts clean episodic KeyCuts info@thekeycuts.com info@thekeycuts.com (KeyCuts) A show made for analysts: data, data analysis, and software. Dear Analyst https://www.thekeycuts.com/wp-content/uploads/2019/03/dear_analyst_logo-1.png https://www.thekeycuts.com/excel-blog/ TV-G New York, NY New York, NY 50542147 Dear Analyst #69: Import data from another Google Sheet and filter the results to show just what you need https://www.thekeycuts.com/dear-analyst-69-import-data-from-another-google-sheet-and-filter-the-results-to-show-just-what-you-need/ https://www.thekeycuts.com/dear-analyst-69-import-data-from-another-google-sheet-and-filter-the-results-to-show-just-what-you-need/#respond Mon, 10 May 2021 04:24:00 +0000 https://www.thekeycuts.com/?p=50879 You may be filtering and sorting a big dataset in a Google Sheet and want to see that dataset in another Google Sheet without having to copying and pasting the data each time the “source” data is updated. To solve this problem, you need to somehow import the data from the “source” worksheet to your […]

The post Dear Analyst #69: Import data from another Google Sheet and filter the results to show just what you need appeared first on .

]]>
You may be filtering and sorting a big dataset in a Google Sheet and want to see that dataset in another Google Sheet without having to copying and pasting the data each time the “source” data is updated. To solve this problem, you need to somehow import the data from the “source” worksheet to your “target” worksheet. When the source worksheet is updated with new sales or customers data, your target worksheet gets updated as well. On top of that, the data that shows up in your target worksheet should be filtered so you only see the data that you need and matters to you. The key to doing this is the IMPORTRANGE() function in conjunction with the FILTER() or QUERY() functions. I’ll go over two methods for importing data from another Google Sheet and talk about the pros and cons of each. You can use this “source” Google Sheet as the raw data and see this target Google Sheet which contains the formulas.

Watch a video tutorial of this post/episode below:

Your Google Sheet is your database

No matter which team you work on, at one point or another your main “database” or “source of truth” was some random Google Sheet. This Google Sheet might have been created by someone in your operations or data engineering team. It may be a data dump from your company’s internal database and whether you like it or not, it contains business-critical data and your team can’t operate without it. The Google Sheet might contain customers data, marketing campaign data, or maybe bug report data that is exported from your team’s Jira workspace.

The reasons why people default to using Google Sheets as their “database” is because anyone can access it in their browser, and more importantly, you can share that Sheet easily with people as long as you have their email address. This is probably your security team’s worst nightmare, but at this point too many teams rely on this Google Sheet so it’s hard to break away from it as a solution.

Credit card customer data

Before we get into the solution, let’s take a look at our data set. Our “source” dataset is a bunch of credit card customer data (5,000 rows) with a customer’s demographic and credit card spending data:

There are a ton of columns in this dataset I don’t care about. I also only want to see the rows where the Education_Level is “Graduate” and the Income_Category is “$80K-$120K.” Perhaps I’m doing an analysis on credit card customers who are high earners and have graduated some college. How do I get that filtered data of graduates earning $80K-$120K into this “target” Sheet:

Google Sheets is not the most ideal solution as a database, but you gotta live with it so let’s see how we can get the data we need from our source Google Sheet over to the target. The money function is IMPORTRANGE() but there are multiple ways of using IMPORTRANGE() as I describe below.

Method 1: The long way with FILTER() and INDEX()

When you use the IMPORTRANGE() function on its own, you will just get all the data from your source Sheet into your target Sheet. In this formula below, I just get all the data from columns A:U in my source Sheet with all the credit card customer data:

The first parameter can be the full URL of the Google Sheet but you can also just get the Sheet ID from the URL to make the formula shorter. The 2nd parameter are the columns you want to pull into your target Sheet.

Again, this will basically give you an exact copy of the source Sheet into your current Sheet. When data is updated in the source, your target Sheet gets updated too. For a lot of scenarios this might be all you need! But let’s go further and try to get a filtered dataset from the source Sheet.

The first thing you’ll probably think of is to use the FILTER() function. The question is what do we put for the second parameter in the FILTER() function?

The first parameter we’ll just use our IMPORTRANGE() function but the second parameter we need to filter by the column that we’re interested in with something like this to get only the rows where the Education_Level is “Graduate”:

=filter(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"), F:F="Graduate")

This doesn’t work because F:F is referencing the current worksheet. Our dataset is pulling from a different worksheet and there’s no way to filter that source before it gets into our current worksheet.

The solution is to use the INDEX() function with the FILTER() function like this:

=filter(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),index(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),0,6)="Graduate")

This INDEX() function is telling Google Sheets to look at the source data and focus on the 6th column and see which rows have “Graduate” in them.

We want to filter the data that not only has “Graduate” as the education level but also customers who have a salary of “$80K-$120K.” We can just add additional conditions to our FILTER() formula using this INDEX() trick:

=filter(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),index(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),0,6)="Graduate",index(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),0,8)="$80K - $120K")

We know have a filtered list of about 300 rows:

Pros and cons of this method

The main benefit of this method is that it’s using functions that you may already be familiar with. The main trick is to know how to use the INDEX() function within the FILTER() function.

There are several cons to this method which is why I wouldn’t recommend it (especially if you have a large dataset). Just from filtering two columns, you have to run the IMPORTRANGE() function twice! Imagine filtering on 10 columns. There has got to be a more scalable method than having to nest the IMPORTRANGE() function multiple times in the FILTER() function. This method will definitely get slow over time for large datasets.

Another downside is you can’t control the number of columns that gets returned. Our source data has 21 columns and all 21 get returned. What’s the point of filtering your dataset if you can’t filter the columns that get returned too? You’ll end up hiding a bunch of columns that don’t matter for you in your target worksheet which doesn’t feel right.

Finally, the column headers in this method are manually entered. Our formula in this method actually gets entered in cell A2 to allow us to copy/paste the column headers into row 1. This means if new columns get added to the source data, you’ll have to remember to add those column headers in your target worksheet. Also not the best method in terms of maintaining this Google Sheet long-term:

Method 2 (preferred): Using QUERY() with a little bit of SQL

The QUERY() function is a relatively advanced function in Google Sheets. Episode 32 was all about how to use the QUERY() function. The reason why it’s not used as much is because it requires you to know a little bit of SQL. To filter our source data to the customers who are “Graduates” and earn “$80K-$120K,” the formula looks like this:

=query(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),"SELECT Col1,Col3,Col6,Col7,Col8,Col9 WHERE Col6='Graduate' and Col8='$80K - $120K'",1)

Just like the FILTER() function, our IMPORTRANGE() is the first parameter. The second parameter is where we have to do a little SQL magic to pull the data we need. All those columns after the SELECT clause are simply the columns we want to pull into our target sheet. This already makes this method more powerful than the first method because we can specify which columns we want from our source Google Sheet. Usually when you use the QUERY() function, you can reference the column by referring to the column letter. With IMPORTRANGE() you have to use the “Col” prefix.

After that, you add in the conditions after the WHERE clause. The trick here is to count the number of columns you want to filter on. In this case, “Col6” is Education_Level and “Col8” is Income_Category.

What’s that last “1” before the closing parentheses? That just tells Google Sheets that our source data has headers so we can pull back our filtered data and the relevant column names. We now get this nice filtered dataset with only the columns we care about:

Pros and cons of this method

In addition to being a much shorter formula, the QUERY() function will bring in the column names. This means you can enter the formula in cell A1 of your target Google Sheet and the data and column names will dynamically update as the source data changes. This means you never have to worry about copying and pasting the column names from the source Google Sheet. This means long-term maintenance of your target Sheet will be much easier.

The main cons:

  • QUERY() is a hard function to learn. Learning a new syntax is difficult so if you want to do more advanced filtering and sorting with QUERY() you’ll have to learn more SQL.
  • Column numbers can change. This also exists with the first method, but you’ll have to keep track of the column numbers in the source Google Sheet. If new columns get added, you’ll have to adjust your SELECT clause to “pick” the right columns to pull into your target Google Sheet

Final words on using Google Sheets are your database

I could spend another episode on the pros and cons of using Google Sheets as your team or company’s database, but will try to keep my final words short.

Those who don’t use Google Sheets and Excel every day cringe when they see workarounds like this to get the data that we need. The sooner one accepts that business-critical data will inevitably land in an Excel file or Google Sheet, the sooner we can get our jobs done. I’ve written about the unconventional use cases of spreadsheets before and this scenario is no different.

We know our database lives in a Google Sheet. That’s not going to change. Let’s just try to find the most painless way of getting that data out into another Sheet so we can do the more interesting analyses that matter for our business. If you care about the data living in a database and analysts being able to query the data using a separate BI tool, then you should probably consider getting into data engineering and be the change agent within your organization to move everyone off of spreadsheets. It’s a gargantuan task and in most cases an uphill battle.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #69: Import data from another Google Sheet and filter the results to show just what you need appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-69-import-data-from-another-google-sheet-and-filter-the-results-to-show-just-what-you-need/feed/ 0 You may be filtering and sorting a big dataset in a Google Sheet and want to see that dataset in another Google Sheet without having to copying and pasting the data each time the "source" data is updated. To solve this problem, You may be filtering and sorting a big dataset in a Google Sheet and want to see that dataset in another Google Sheet without having to copying and pasting the data each time the "source" data is updated. To solve this problem, you need to somehow import the data from the "source" worksheet to your "target" worksheet. When the source worksheet is updated with new sales or customers data, your target worksheet gets updated as well. On top of that, the data that shows up in your target worksheet should be filtered so you only see the data that you need and matters to you. The key to doing this is the IMPORTRANGE() function in conjunction with the FILTER() or QUERY() functions. I'll go over two methods for importing data from another Google Sheet and talk about the pros and cons of each. You can use this "source" Google Sheet as the raw data and see this target Google Sheet which contains the formulas.







Watch a video tutorial of this post/episode below:




https://youtu.be/7QLnAP0zHIM




Your Google Sheet is your database



No matter which team you work on, at one point or another your main "database" or "source of truth" was some random Google Sheet. This Google Sheet might have been created by someone in your operations or data engineering team. It may be a data dump from your company's internal database and whether you like it or not, it contains business-critical data and your team can't operate without it. The Google Sheet might contain customers data, marketing campaign data, or maybe bug report data that is exported from your team's Jira workspace.



The reasons why people default to using Google Sheets as their "database" is because anyone can access it in their browser, and more importantly, you can share that Sheet easily with people as long as you have their email address. This is probably your security team's worst nightmare, but at this point too many teams rely on this Google Sheet so it's hard to break away from it as a solution.







Credit card customer data



Before we get into the solution, let's take a look at our data set. Our "source" dataset is a bunch of credit card customer data (5,000 rows) with a customer's demographic and credit card spending data:







There are a ton of columns in this dataset I don't care about. I also only want to see the rows where the Education_Level is "Graduate" and the Income_Category is "$80K-$120K." Perhaps I'm doing an analysis on credit card customers who are high earners and have graduated some college. How do I get that filtered data of graduates earning $80K-$120K into this "target" Sheet:







Google Sheets is not the most ideal solution as a database, but you gotta live with it so let's see how we can get the data we need from our source Google Sheet over to the target. The money function is IMPORTRANGE() but there are multiple ways of using IMPORTRANGE() as I describe below.



Method 1: The long way with FILTER() and INDEX()



When you use the IMPORTRANGE() function on its own, you will just get all the data from your source Sheet into your target Sheet. In this formula below, I just get all the data from columns A:U in my source Shee...]]>
Dear Analyst 69 28:44 50879
Dear Analyst #68: Generate unique IDs for your dataset for building summary reports in Google Sheets https://www.thekeycuts.com/dear-analyst-68-generate-unique-ids-for-your-dataset-for-building-summary-reports-google-sheets/ https://www.thekeycuts.com/dear-analyst-68-generate-unique-ids-for-your-dataset-for-building-summary-reports-google-sheets/#respond Tue, 04 May 2021 04:46:00 +0000 https://www.thekeycuts.com/?p=50857 If your dataset doesn’t have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeholders. Usually your dataset will have some kind of unique […]

The post Dear Analyst #68: Generate unique IDs for your dataset for building summary reports in Google Sheets appeared first on .

]]>
If your dataset doesn’t have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeholders. Usually your dataset will have some kind of unique identifier like customer ID or transaction ID because that row of data might be used with some other dataset. It’s rare these days not to have one. Here are a few methods for creating your own unique identifiers using this list of customer transaction data (Google Sheets for this episode here).

Method 1: Create a sequential list of numbers as unique IDs

Each of these transactions is from a unique customer on a unique date for a unique product. We could do something as simple as creating a sequential list of numbers to “mark” each transaction. Maybe we can prefix this new transaction ID column with “tx-” so each unique ID will look something like this:

This method involves creating a dummy column (column I) of sequential numbers. Then in column A, you write “tx-” followed by the number you created in column I, and you have a unique ID. This unique ID is only relevant for this dataset, however. If there are other tables of data related to customers and transactions, those tables won’t know about this new transaction ID you just created on the fly.

Method 2: Create random numbers as unique ID

This method will make your unique IDs feel a little more “unique” since the numbers are randomized:

Notice how we happen to take the result of the RAND() function and multiply it by 100,000 to get a random number with 5 digits. Our dataset is only 1,000 rows long so the chances of duplicate values is low, but there still exists that possibility.

This is probably the least preferred solution because of the fact that there could be duplicate values (there are formula hacks to get around it). Another reason this isn’t a great solution is that you have to remember to copy and paste values from the random numbers into another column. The RAND() function is a volatile function (basically changes every time you reload the Sheet) so you would lose your unique ID every time the Sheet loads. This means you have to remember to paste just the values perhaps in the next column over before referencing that value as your unique ID.

Finally, if your dataset has timestamps like this, chances are the unique IDs are meant to be sequential (using Method 1). Assigning random unique IDs to each transaction might make reconciling the data in the future more difficult.

Method 3: Concatenate (add) columns together to create unique ID

This method involves concatenating (adding) together different columns to create a unique ID. The reason I like this method is because it makes creating reports a bit easier since you can write in the values in a cell for a lookup to reference. For instance, the unique IDs in our dataset is created by combining the Customer ID, SKU_Category, and SKU columns:

We put a dash “-” in between each of the cell references so it’s a bit easier to see all the different characters in this “unique ID.” The issue is this: what if there are multiple transactions with the same Customer ID, SKU_Category, and SKU? We insert a COUNTIF column in between columns B and C to count the number of times that unique ID appears in column B:

And then do a quick filter to see if there are any values greater than 1 in this column:

Well that sucks. Looks like we have 8 transactions that don’t have unique IDs using this method. The tricky thing with this method is figuring out what other columns can add “uniqueness” to the unique ID. The Date column can’t be used because it looks like some of these transactions happened on the same date. Maybe we can combine the Quantity and Sales_Amount columns to create a unique ID? Even that wouldn’t work because the last two rows have the same quantity and sales amount. This is where this method falls apart because as the dataset grows, you need to constantly check to see if the unique ID column you created is still in fact unique.

Great for creating summary reports

Let’s assume that we were able to create a unique ID for every transaction in this table. Now if I want to create a summary table that looks at the Sales_Amount, for instance, creating the formula might look like this:

You’re probably wondering why we would make such a complicated formula using the unique ID column versus just using the columns themselves. In the future, you might want to do a lookup to a specific transaction ID and knowing the columns that contribute to that uniqueness of that ID makes it easy to write out the hard-coded value to do the lookup.

For instance, I might know that a customer with the ID “5541” is important and I can have that Customer_ID in my summary table somewhere. Then I know that the “8ETY5” SKU is an important skew my company is tracking, and that could be another value I hard-code in my summary table somewhere. Knowing that the unique ID for the transaction includes these values might make it easier to reference that row in my summary report in the future (or perhaps in a PivotTable too).

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #68: Generate unique IDs for your dataset for building summary reports in Google Sheets appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-68-generate-unique-ids-for-your-dataset-for-building-summary-reports-google-sheets/feed/ 0 If your dataset doesn't have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeho... If your dataset doesn't have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeholders. Usually your dataset will have some kind of unique identifier like customer ID or transaction ID because that row of data might be used with some other dataset. It's rare these days not to have one. Here are a few methods for creating your own unique identifiers using this list of customer transaction data (Google Sheets for this episode here).








https://youtu.be/fjkO0kHbbKw




Method 1: Create a sequential list of numbers as unique IDs



Each of these transactions is from a unique customer on a unique date for a unique product. We could do something as simple as creating a sequential list of numbers to "mark" each transaction. Maybe we can prefix this new transaction ID column with "tx-" so each unique ID will look something like this:







This method involves creating a dummy column (column I) of sequential numbers. Then in column A, you write "tx-" followed by the number you created in column I, and you have a unique ID. This unique ID is only relevant for this dataset, however. If there are other tables of data related to customers and transactions, those tables won't know about this new transaction ID you just created on the fly.



Method 2: Create random numbers as unique ID



This method will make your unique IDs feel a little more "unique" since the numbers are randomized:







Notice how we happen to take the result of the RAND() function and multiply it by 100,000 to get a random number with 5 digits. Our dataset is only 1,000 rows long so the chances of duplicate values is low, but there still exists that possibility.



This is probably the least preferred solution because of the fact that there could be duplicate values (there are formula hacks to get around it). Another reason this isn't a great solution is that you have to remember to copy and paste values from the random numbers into another column. The RAND() function is a volatile function (basically changes every time you reload the Sheet) so you would lose your unique ID every time the Sheet loads. This means you have to remember to paste just the values perhaps in the next column over before referencing that value as your unique ID.



Finally, if your dataset has timestamps like this, chances are the unique IDs are meant to be sequential (using Method 1). Assigning random unique IDs to each transaction might make reconciling the data in the future more difficult.



Method 3: Concatenate (add) columns together to create unique ID



This method involves concatenating (adding) together different columns to create a unique ID. The reason I like this method is because it makes creating reports a bit easier since you can write in the values in a cell for a lookup to reference. For instance, the unique IDs in our dataset is created by combining the Customer ID, SKU_Category, and SKU columns:







We put a dash "-" in between each of the cell references so it's a bit easier to see all the different characters in this "unique ID." The issue is this: what if there are multiple transactions with the same Customer ID, SKU_Category, and SKU? We insert a COUNTIF column in between columns B and C to count ...]]>
Dear Analyst 68 28:19 50857
Dear Analyst #67: Automating tedious tasks with scripts and solving problems software can’t fix https://www.thekeycuts.com/dear-analyst-67-automating-tedious-tasks-with-scripts-and-solving-problems-software-cant-fix/ https://www.thekeycuts.com/dear-analyst-67-automating-tedious-tasks-with-scripts-and-solving-problems-software-cant-fix/#respond Mon, 19 Apr 2021 04:10:00 +0000 https://www.thekeycuts.com/?p=50841 This episode is actually a recap of a talk I gave at a meetup. After reflecting a bit about the subject matter, I wanted to discuss some other topics that are more important than writing VBA scripts or doing stuff in Excel. At the meetup, I discussed a VBA script and Google App Script I […]

The post Dear Analyst #67: Automating tedious tasks with scripts and solving problems software can’t fix appeared first on .

]]>
This episode is actually a recap of a talk I gave at a meetup. After reflecting a bit about the subject matter, I wanted to discuss some other topics that are more important than writing VBA scripts or doing stuff in Excel. At the meetup, I discussed a VBA script and Google App Script I wrote for filling values down in a column. I actually published these scripts in a previous episode, but went in-depth during the meetup on how the scripts work. If I step back for a minute and ask myself: “why did I create these scripts in the first place?” To solve a simple problem that I’m sure many analysts come across. More importantly, it’s a problem that doesn’t have a clear solution which the our current software (Excel and Google Sheets) can fix easily.

Software that fixes your problems

For those of you who are:

  1. Using a recent version of Excel
  2. On a PC
  3. Have a Microsoft 365 subscription (depending on your package)

Congratulations! You are able to use Power Query to transform and clean “dirty” data and the problem described in this episode is easily solved with the software. All you have to do is click this option in Power Query to fill values down:

For the rest of us (Mac Excel or Google Sheets users), you’re stuck doing this manually. Why does this feature have to be reserved to a small group (relatively speaking) when this problem is faced by thousands of people who may not have the same access as someone who works in the enterprise?

Cleaning data is part of any analyst’s job and we should be able to do these tasks as quick as possible so that we can move onto more interesting projects. The fact that you need to have Power Query to fill values down like this is annoying to me. Do you ever go out of your way to prove a point; even if it’s an extremely inefficient use of your time? Creating these VBA and Google App Scripts was just that for me. Instead of relying on the software to do the job for me, I created hacked up an inelegant but simple solution to hopefully give people more access to simple tools for cleaning up data.

Building for an audience of one

I might be over-estimating the number of people who have this fill values down problem. Maybe it’s a few hundred people? Maybe less than 100? Who knows. The important thing is that I had the problem and needed to solve the problem for myself.

Perhaps you are in a position where you can’t spend a few hours to learn how to write a script to automate one aspect of your job. That’s understandable. You need to crank out reports and time spent away from cranking means you’ll have to work after hours to get your job done.

I used to be on that hamster wheel, until I stepped back and saw the forest for the trees. Excel is just one tool in your vast array of tools to analyze and visualize data. There’s a whole world of databases, data pipelines, machine learning, and more for you to explore. Just staying in the “Excel lane” is how one gets pigeonholed into a job, a career, a life.

Learning how to write scripts changed my perspective on more than just Excel. I realized I could build tools that help others save time because I knew it saved me time. By building for an audience of one, you are in fact building for an audience of many.

Meetup recap

This write-up definitely meandered a bit but I think that’s ok. You can watch the recap of the meetup below and get lost in the details on how I loop through arrays to make the script work. The important lesson I hope you’ll walk away with is thinking outside of what Excel or Google Sheets has to offer into the other platforms and tools that come before or after your spreadsheet.

Slides from the meetup

Other Podcasts & Blog Posts

No other podcasts!

The post Dear Analyst #67: Automating tedious tasks with scripts and solving problems software can’t fix appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-67-automating-tedious-tasks-with-scripts-and-solving-problems-software-cant-fix/feed/ 0 This episode is actually a recap of a talk I gave at a meetup. After reflecting a bit about the subject matter, I wanted to discuss some other topics that are more important than writing VBA scripts or doing stuff in Excel. At the meetup, This episode is actually a recap of a talk I gave at a meetup. After reflecting a bit about the subject matter, I wanted to discuss some other topics that are more important than writing VBA scripts or doing stuff in Excel. At the meetup, I discussed a VBA script and Google App Script I wrote for filling values down in a column. I actually published these scripts in a previous episode, but went in-depth during the meetup on how the scripts work. If I step back for a minute and ask myself: "why did I create these scripts in the first place?" To solve a simple problem that I'm sure many analysts come across. More importantly, it's a problem that doesn't have a clear solution which the our current software (Excel and Google Sheets) can fix easily.







Software that fixes your problems



For those of you who are:



* Using a recent version of Excel* On a PC* Have a Microsoft 365 subscription (depending on your package)



Congratulations! You are able to use Power Query to transform and clean "dirty" data and the problem described in this episode is easily solved with the software. All you have to do is click this option in Power Query to fill values down:







For the rest of us (Mac Excel or Google Sheets users), you're stuck doing this manually. Why does this feature have to be reserved to a small group (relatively speaking) when this problem is faced by thousands of people who may not have the same access as someone who works in the enterprise?



Cleaning data is part of any analyst's job and we should be able to do these tasks as quick as possible so that we can move onto more interesting projects. The fact that you need to have Power Query to fill values down like this is annoying to me. Do you ever go out of your way to prove a point; even if it's an extremely inefficient use of your time? Creating these VBA and Google App Scripts was just that for me. Instead of relying on the software to do the job for me, I created hacked up an inelegant but simple solution to hopefully give people more access to simple tools for cleaning up data.



Building for an audience of one



I might be over-estimating the number of people who have this fill values down problem. Maybe it's a few hundred people? Maybe less than 100? Who knows. The important thing is that I had the problem and needed to solve the problem for myself.



Perhaps you are in a position where you can't spend a few hours to learn how to write a script to automate one aspect of your job. That's understandable. You need to crank out reports and time spent away from cranking means you'll have to work after hours to get your job done.







I used to be on that hamster wheel, until I stepped back and saw the forest for the trees. Excel is just one tool in your vast array of tools to analyze and visualize data. There's a whole world of databases, data pipelines, machine learning, and more for you to explore. Just staying in the "Excel lane" is how one gets pigeonholed into a job, a career, a life.



Learning how to write scripts changed my perspective on more than just Excel. I realized I could build tools that help others save time because I knew it saved me time. By building for an audience of one, you are in fact building for an audience of many.



Meetup recap



This write-up definitely meandered a bit but I think that's ok. You can watch the recap of the meetup below and get lost in the details on how I loop through arrays to m...]]>
Dear Analyst 67 55:56 50841
Dear Analyst #66: How to update and add new data to a PivotTable with ramen ratings data https://www.thekeycuts.com/dear-analyst-66-how-to-update-and-add-new-data-to-a-pivottable-with-ramen-ratings-data/ https://www.thekeycuts.com/dear-analyst-66-how-to-update-and-add-new-data-to-a-pivottable-with-ramen-ratings-data/#respond Mon, 12 Apr 2021 04:05:00 +0000 https://www.thekeycuts.com/?p=50811 PivotTables have been on my mind lately (you’ll see why in a couple weeks). An issue you may face with PivotTables is how to change the source data for a PivotTable you’ve meticulously set up. You have some new data being added to your source data, and you have to change the PivotTable source data […]

The post Dear Analyst #66: How to update and add new data to a PivotTable with ramen ratings data appeared first on .

]]>
PivotTables have been on my mind lately (you’ll see why in a couple weeks). An issue you may face with PivotTables is how to change the source data for a PivotTable you’ve meticulously set up. You have some new data being added to your source data, and you have to change the PivotTable source data to reference the additional rows that show up at the bottom of your source data. This may not be a big issue for you because maybe you’re not getting new data added often so manually going into the PivotTable settings and changing the reference to the source data doesn’t feel onerous. If you have new data coming in every day or every hour, you may want to automate this process.

Here are a few methods to accomplish this in both Excel and Google Sheet. My preferred method is to turn your source data into a table in Excel or reference the entire columns in Google Sheets. Download the Excel file or copy the Google Sheets with the dataset for this episode.

Ramen ratings from ramenphiles

I’m a big fan of these niche datasets like the one for this episode. It’s a list of ramen products and their ratings created by a website called The Ramen Rater. The list consists of 2,500 ramen products along with that product’s country of origin, the style (Pack or Bowl), and of course the rating. It appears the ratings are all done by one person. More importantly, the list contains the full name of the ramen product which means you can do some interesting text analysis to see what words are used most often in ramen products, how words might correlate with ratings, etc. For our purposes, this dataset is a great for creating a PivotTable with the rating being the main metric to analyze.

Method 1: Reference the entire column

Excel PivotTables

As shown in the first screenshot, the source data for the PivotTable in the Excel file comes from the “ramen-ratings” worksheet from cells $A$1:$G$2581. As you add more data to the source data, you’ll have to change the source reference to reference a higher row number. If you add 10 more ramen ratings, you’ll have to change the PivotTable reference to $A$1:$G$2591. We want to avoid having to change the reference every time we add new data, so we can just reference the entire columns in $A:$G:

The problem is the PivotTable we have in the “Ramen Pivot Table” worksheet now has this “(blank)” item in both the columns and rows fields of our PivotTable. Why? Because we’re referencing a bunch of empty rows of empty countries and ramen styles:

This isn’t a huge issue, because we can just remove the “(blank)” via the row and column filters:

Now when you add new rows of ramen ratings to the source data and then you refresh the PivotTable, the PivotTable will automatically pick up all the new rows of data since it’s referencing the entire columns from column A to column G.

Google Sheets PivotTables

The same solution applies to Google Sheets:

I find the user interface much easier to use in Google Sheets for a variety of reasons:

  1. Less clicks – Right when you click on the PivotTable (as shown in the above gif), you can see and edit the source data in the top right of the PivotTable field settings. In Excel, you have to click on the PivotTable Analyze tab in the ribbon and then “Change Data Source.”
  2. Can use left/arrow keys in cell reference – It’s a small annoyance in Excel, but notice how in the above gif you can just use the right arrow key to move the cursor to the right in the cell reference? This makes it easy to delete the row numbers. In Excel, using the left/right arrow keys changes the cell reference based on where your active cursor is in the spreadsheet. 9 times out of 10, you end up creating an incorrect formula and have to exit out of the menu, undo, or a combination of those two.
  3. PivotTable automatically refreshes – Less UI and more of a core feature in PivotTables in Google Sheets, but PivotTables automatically refresh when you add or edit data in your source. In Excel, you have to right-click and click “Refresh” or refresh via the ribbon every time you want to refresh the PivotTable. I’m sure there’s some pivot cache or performance reason why Excel doesn’t refresh automatically, but Google Sheets just gets it right on this one. I know there’s some settings in Excel like refreshing the PivotTable every time the file opens or refreshing the PivotTable at some interval you define (e.g. every 10 minutes), but it just adds additional overhead for the user who wants to just see their PivotTable updated in real time. This is 2021.

Overview of this method

For most use cases of PivotTables, I’d argue this solution is fine. This Excel file is pretty basic with one data source and one PivotTable. The dataset is also not super huge so you don’t have to worry about performance issues with referencing the entire columns of data with all those empty rows.

If you work in a corporate environment and you’re tasked with analyzing multiple datasets and have multiple data sources and PivotTables in your file, you may need something more scalable. This is where method 2 comes into play.

Method 2: Turn source data into a table (recommended)

Excel PivotTables

If you turn the source data into an Excel table and give the table a name, new data that gets added to the source will automatically get included in the table “reference.” Once you’re in the data source, press CTRL+T and hit ENTER to turn the data into an Excel table:

While your cursor is still in the newly created table, rename the table name to “Ramen” in the top-left:

Then we go back to the main ramen PivotTable, and change the source to equal this new Ramen table by just typing =Ramen in the Location field:

Now when you add new ramen ratings to the source data, the table reference automatically “expands” to include these new rows of data. In the gif below, I’m just copying some additional rows of data from another sheet and pasting it at the bottom of the Ramen source data table:

Notice how when you paste in the new data, the Excel table automatically expands the alternating row colors to include this new data. This shows that Excel was able to add this additional data to the table reference. If you refresh the PivotTable, it will automatically include the rows that got added since the source is still =Ramen.

Advantages of turning your PivotTable data source into an Excel table

Keep in mind: method 1 above is a totally acceptable solution for most simple PivotTable use cases. It’s really the edge cases where method 1 starts to break down. With method 2, not only do you eliminate some of these edge cases, but you get some additional benefits as well:

  1. (From method 1) Always need to deselect (blank) – If you’re doing any sort of bigger analysis, you’re going to be building multiple PivotTables. As you copy and paste the first able you created into new worksheets, that “(blank)” will always need to be deselected in the columns and rows. That shouldn’t be a problem in most use cases, but as you hit “select all” in the PivotTable filters as you’re doing your analysis, you’ll need to remember to scroll down to always keep that (blank) value deselected. It’s just some additional overhead that you don’t want to worry about.
  2. Easy to read table reference – Just as you may have multiple PivotTables in your file, you will probably have multiple data sources your PivotTables are built on. Instead of referring to the data source with the traditional A1:B2 cell references, it’s easier to just read a table reference as Ramen and know that it’s referencing your ramen dataset. If you accidentally name the worksheet something generic like source_data, you’ll have to double-check that your traditional cell reference is indeed referencing the ramen ratings data you’re interested in.
  3. See all table references in one place – Building of of the previous benefit, you can quickly see all your table references driving your PivotTables in the “Define Name” button on the Formulas tab in the ribbon. If you need to see the exact cell reference for your tables, this is the main place to see those cell references:

Google Sheets PivotTables

Tables don’t exist in Google Sheets :(.

I’m baffled as to why this feature doesn’t exist in Google Sheets, but I’m sure the team will build this functionality at some point to get to feature parity with Excel. In my opinion, the fact that Google Sheets PivotTables auto-refresh as you edit or add data outweighs the benefits of turning your source data into PivotTables. Most Google Sheets PivotTables I’m creating these days are pretty simple in nature so I’m not working with many PivotTables or data sources in one Sheet.

Now there are some formula tricks you can do with the FILTER(), OFFSET(), and COUNTA() functions to replicate the features of Excel tables, but it’s not as simple as the Excel tables feature. It probably also isn’t very performant on larger datasets when you’re using these functions to reference the source data correctly. But it’s possible!

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #66: How to update and add new data to a PivotTable with ramen ratings data appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-66-how-to-update-and-add-new-data-to-a-pivottable-with-ramen-ratings-data/feed/ 0 PivotTables have been on my mind lately (you'll see why in a couple weeks). An issue you may face with PivotTables is how to change the source data for a PivotTable you've meticulously set up. You have some new data being added to your source data, PivotTables have been on my mind lately (you'll see why in a couple weeks). An issue you may face with PivotTables is how to change the source data for a PivotTable you've meticulously set up. You have some new data being added to your source data, and you have to change the PivotTable source data to reference the additional rows that show up at the bottom of your source data. This may not be a big issue for you because maybe you're not getting new data added often so manually going into the PivotTable settings and changing the reference to the source data doesn't feel onerous. If you have new data coming in every day or every hour, you may want to automate this process.



Here are a few methods to accomplish this in both Excel and Google Sheet. My preferred method is to turn your source data into a table in Excel or reference the entire columns in Google Sheets. Download the Excel file or copy the Google Sheets with the dataset for this episode.








https://www.youtube.com/watch?v=i2BI0RaEuYQ




Ramen ratings from ramenphiles



I'm a big fan of these niche datasets like the one for this episode. It's a list of ramen products and their ratings created by a website called The Ramen Rater. The list consists of 2,500 ramen products along with that product's country of origin, the style (Pack or Bowl), and of course the rating. It appears the ratings are all done by one person. More importantly, the list contains the full name of the ramen product which means you can do some interesting text analysis to see what words are used most often in ramen products, how words might correlate with ratings, etc. For our purposes, this dataset is a great for creating a PivotTable with the rating being the main metric to analyze.







Method 1: Reference the entire column



Excel PivotTables



As shown in the first screenshot, the source data for the PivotTable in the Excel file comes from the "ramen-ratings" worksheet from cells $A$1:$G$2581. As you add more data to the source data, you'll have to change the source reference to reference a higher row number. If you add 10 more ramen ratings, you'll have to change the PivotTable reference to $A$1:$G$2591. We want to avoid having to change the reference every time we add new data, so we can just reference the entire columns in $A:$G:







The problem is the PivotTable we have in the "Ramen Pivot Table" worksheet now has this "(blank)" item in both the columns and rows fields of our PivotTable. Why? Because we're referencing a bunch of empty rows of empty countries and ramen styles:







This isn't a huge issue, because we can just remove the "(blank)" via the row and column filters:







Now when you add new rows of ramen ratings to the source data and then you refresh the PivotTable, the PivotTable will automatically pick up all the new rows of data since it's referencing the entire columns from column A to column G.



Google Sheets PivotTables



The same solution applies to Google Sheets:







I find the user interface much easier to use in Google Sheets for a variety of reasons:



* Less clicks - Right when you click on the PivotTable (as shown in the above gif), you can see and edit the source data in the top right of the PivotTable...]]>
Dear Analyst 66 28:59 50811
Dear Analyst #65: Eliminating biases in sports data and doing a data science bootcamp with Caiti Donovan https://www.thekeycuts.com/dear-analyst-65-eliminating-biases-in-sports-data-and-doing-a-data-science-bootcamp-with-caiti-donovan/ https://www.thekeycuts.com/dear-analyst-65-eliminating-biases-in-sports-data-and-doing-a-data-science-bootcamp-with-caiti-donovan/#respond Mon, 29 Mar 2021 04:10:00 +0000 https://www.thekeycuts.com/?p=50784 When you think of sports and data, you may think about all the data collect on player performance and game stats. There’s another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, the VP of Data & Insights at Sports Innovation Lab, a sports market research firm. […]

The post Dear Analyst #65: Eliminating biases in sports data and doing a data science bootcamp with Caiti Donovan appeared first on .

]]>
When you think of sports and data, you may think about all the data collect on player performance and game stats. There’s another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, the VP of Data & Insights at Sports Innovation Lab, a sports market research firm. Caiti started her career in marketing and business development at Viacom and Spotify where she used data storytelling to work with advertisers and partners. More recently she learned how build the data systems she was once only a consumer of. We’ll discuss how she made the transition to data, getting a data science certification at The Fu Foundation School of Engineering and Applied Science at Columbia University, and current projects she’s working on at Sports Innovation Lab.

Working with data at ViacomCBS and Spotify

Caiti spent 15 years in marketing and sales roles where data was a core part of her day-to-day projects. She used a lot of proprietary data systems and even helped build some of these systems. Using the data available to her, she’d take different datasets and turn the data into a format useful for data storytelling. These stories would be used for partnership development or working with advertisers. Data storytelling is a common theme on this podcast. See episode 62 with Janie Ho, episode 56 with John Napolean-Kuofie, and episode 35 on the Shape of Dreams.

At ViacomCBS, Caiti would look at the data behind shows like Jersey Shore and SpongeBob to see what type of revenue opportunities her team could create based on the audience of these shows. The data could also be analyzed to help inform content development for these shows. The goal was to understand their younger fans and figure out what it meant to have conversations with the fans of these shows.

After a stint working with a few startups in a consulting capacity, Caiti eventually landed at Spotify. At the time, Spotify had a hard time turning all the data they were sitting on into narratives in a B2B and B2C context. She worked with clients like the NBA, Ford, and Nike. In terms of the data stories she was saying to her clients from a B2B perspective, she also had to make sure it carried over to the B2C side (Spotify subscribers).

From there, Caiti made a big hump from entertainment to sports. She realized her “purpose meets passion” moment is finding ways to use data to have impact on the world. She wanted to tackle challenges faced by women in sports and also find a way to better connect with the fans of women’s sports. Caiti eventually co-founded the non-profit SheIS Sport to bring together every single professional women’s sports league. Through this experience, Caiti learned a lot about the biases and inequities in data in the sports world. She realized she needed more technical expertise to have a direct impact on how data is collected and analyzed in this world, and went back to school for data science (more on this later).

Spotify’s billion points of data per day

When Caiti was at Spotify, one of her projects was figuring out how to translate the billion points of data generated by Spotify users into product opportunities. In addition to product opportunities, the ad sales team needed to have stories they could tell to their clients that were backed up by data.

She started evaluating how her team could clean and dissect the data to productize the data Spotify was generating and storing every day. Using proprietary algorithms, her team analyzed people’s music listening behavior with to figure out what a listener might be doing at the time they were listening to a song. This became known as the “moment marketing” which carried a lot of context about the subscriber. This context allowed advertisers to tap into the moment the subscriber was in like if they were at the gym, in their car, or at a party. Some of the metrics the team analyzed included bpm, device-level data, and types of playlists people were creating. What better time for Nike to target a consumer with new shoes than when the consumer might be doing a workout or training for a sport?

Wanting to build her own data systems

To get closer to the data systems she was using, Caiti made the decision to go back to school and learn more about data science. She was accepted into a data science bootcamp at Columbia’s Fu School of Engineering and Applied Science. The topics covered in the bootcamp included Python, ETL processes, machine learning, and different tools to build data systems.

It took Caiti 6-7 years to make the decision to go back to school for a degree in data science. The catalysts for her decision included the data discrepancies she sees in the sports world and the pandemic.

When Caiti was at SheIS Sport, her team created a campaign report showing that 4% of sports media coverage focuses on women’s sports. The campaign ended up receiving half a billion impressions, 4.2 millions engagements online, and 25K people posting their stories. She realized this 4% number only covers linear TV and no digital channels. Without proper data, advertisers, partners, and leagues cannot evaluate the opportunity available in women’s sports. It’s a chicken and egg scenario where fans wanted more media coverage, and advertisers are saying they’ll get more involved if they see more eyeballs and people going to these games.

Experience at a data science bootcamp

Caiti had already been accepted into the Columbia program at end of 2019 and just deferred to the spring semester in 2020. She also looked at schools like Flatiron and some other programs in New York. What drew her to Columbia’s program was the mix of backend technical topics but also learning about related tools like Tableau and Hadoop.

Caiti’s data science bootcamp was the first bootcamp to go completely virtual. Given the intensity of the program , she stepped out of day-to-day operations at SheIS Sport to focus on her classes. The schedule was very tough and she was spending 15-20 hours per week outside of class doing homework. The difficulty with doing this virtually (as many knowledge workers can attest to) is being able to lean over to see your colleague’s screen and say “try out this function here in your code” to make the learning process more fluid.

The final project at her bootcamp had to use machine learning in some capacity. Her group needed to have a big data source and they ended up using multiple APIs. They wanted to evaluate how COVID affects player performance. Questions to be answered included what if there are no fans in the audience? Would this impact player performance? One study from the NBA I found interesting was the bubble’s impact (or lack thereof) on home court advantage.

Getting data on the NBA and WNBA and training a machine learning model

The NBA was easy since the whole season was in a bubble in 2020 but the WNBA was mixed. The NBA has this great API that goes back 10 years. For WNBA, her team had to scrape the Sports Reference website. This involved manually pulling down CSVs and uploading them into their model.

At the end of the day, Caiti’s team was not able to fully train any of the machine learning models because of data inconsistencies. It’s difficult to get consistent player data because players move to different teams, they have new teammates, and get injuries during the season. Instead of training the model, her team just did a linear regression on the data available. They saw a correlation that when most of the players are in the bubble, NBA and WNBA players played better.

Current projects at Sports Innovation Lab

Caiti is currently looking at fan data and how to democratize data for the sports industry to bring more equity to women’s sports. Ultimately, she wants to make sure the hypotheses and trends claimed in the sports industry are backed up with data. Advanced systems have been created to track player and game analytics since there are a lot of second-order effects on industries like sports betting and fantasy sports. On the business side which focuses on fan metrics, the industry is still 5 years behind.

We are seeing in the entertainment and retail industries a lot more innovation in how to get data from customers and consumers. Sports hasn’t done as much with data from fans. If you don’t have understanding of fan behavior, you’re missing out on a huge contextual piece on how a team or league may appear to brands and partners.

Data tools Caiti is excited about

At the end of our conversation, Caiti shared some tools she’s super excited about learning and using with her data projects. She mentioned a nice mix of open-source and commercial tools:

  • She started using Shiny a lot to build internal dashboards. It allows her team to visualize structured data but gives them the ability to poke holes in their data. This helps them find ways to further clean up and transform the raw data.
  • Tableau is a juggernaut in the data visualization space. It has acted as a connector between the sales team and Caiti’s team who is a little more in the weeds with the data. Tableau streamlines things so Caiti’s sales team can explore data with potential clients easily.
  • A final tool is RStudio which one of Caiti’s colleagues works in a lot.

Sports Innovation Lab is hiring engineers and analysts. If you believe in their mission, contact them about potential opportunities.

Other Podcasts & Blog Posts

No other podcasts!

The post Dear Analyst #65: Eliminating biases in sports data and doing a data science bootcamp with Caiti Donovan appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-65-eliminating-biases-in-sports-data-and-doing-a-data-science-bootcamp-with-caiti-donovan/feed/ 0 When you think of sports and data, you may think about all the data collect on player performance and game stats. There's another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, When you think of sports and data, you may think about all the data collect on player performance and game stats. There's another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, the VP of Data & Insights at Sports Innovation Lab, a sports market research firm. Caiti started her career in marketing and business development at Viacom and Spotify where she used data storytelling to work with advertisers and partners. More recently she learned how build the data systems she was once only a consumer of. We'll discuss how she made the transition to data, getting a data science certification at The Fu Foundation School of Engineering and Applied Science at Columbia University, and current projects she's working on at Sports Innovation Lab.







Working with data at ViacomCBS and Spotify



Caiti spent 15 years in marketing and sales roles where data was a core part of her day-to-day projects. She used a lot of proprietary data systems and even helped build some of these systems. Using the data available to her, she'd take different datasets and turn the data into a format useful for data storytelling. These stories would be used for partnership development or working with advertisers. Data storytelling is a common theme on this podcast. See episode 62 with Janie Ho, episode 56 with John Napolean-Kuofie, and episode 35 on the Shape of Dreams.



At ViacomCBS, Caiti would look at the data behind shows like Jersey Shore and SpongeBob to see what type of revenue opportunities her team could create based on the audience of these shows. The data could also be analyzed to help inform content development for these shows. The goal was to understand their younger fans and figure out what it meant to have conversations with the fans of these shows.







After a stint working with a few startups in a consulting capacity, Caiti eventually landed at Spotify. At the time, Spotify had a hard time turning all the data they were sitting on into narratives in a B2B and B2C context. She worked with clients like the NBA, Ford, and Nike. In terms of the data stories she was saying to her clients from a B2B perspective, she also had to make sure it carried over to the B2C side (Spotify subscribers).



From there, Caiti made a big hump from entertainment to sports. She realized her "purpose meets passion" moment is finding ways to use data to have impact on the world. She wanted to tackle challenges faced by women in sports and also find a way to better connect with the fans of women's sports. Caiti eventually co-founded the non-profit SheIS Sport to bring together every single professional women's sports league. Through this experience, Caiti learned a lot about the biases and inequities in data in the sports world. She realized she needed more technical expertise to have a direct impact on how data is collected and analyzed in this world, and went back to school for data science (more on this later).



]]>
Dear Analyst 65 57:01 50784
Dear Analyst #64: Architecting revenue data pipelines and switching to a career in analytics with Lauren Adabie of Netlify https://www.thekeycuts.com/dear-analyst-64-architecting-revenue-data-pipelines-and-switching-to-a-career-in-analytics-with-lauren-adabie-of-netlify/ https://www.thekeycuts.com/dear-analyst-64-architecting-revenue-data-pipelines-and-switching-to-a-career-in-analytics-with-lauren-adabie-of-netlify/#respond Mon, 22 Mar 2021 10:56:00 +0000 https://www.thekeycuts.com/?p=50767 Transforming Netlify’s data pipeline one SQL statement at a time. Lauren Adabie started her career analyzing data and answering questions about the data at Zynga. As a data analyst at Netlify, she’s doing more than just exploratory analysis. She’s also helping build out Netlify’s revenue data pipeline; something she’s never done before. We discuss how […]

The post Dear Analyst #64: Architecting revenue data pipelines and switching to a career in analytics with Lauren Adabie of Netlify appeared first on .

]]>
Transforming Netlify’s data pipeline one SQL statement at a time. Lauren Adabie started her career analyzing data and answering questions about the data at Zynga. As a data analyst at Netlify, she’s doing more than just exploratory analysis. She’s also helping build out Netlify’s revenue data pipeline; something she’s never done before. We discuss how her team is transforming data with SQL, how to get her stakeholders to have confidence in the data, and the path that led her to a career in data analytics.

Re-architecting a Revenue Pipeline

Lauren joined the Netlify team near the beginning of this revenue pipeline project project. Currently the pipeline is a combination of a few workflows. There are hourly processes to export the data to CSVs and Databricks jobs to load and aggregate data and then producing topic-specific tables. Lauren is currently helping migrate this workflow to dbt. With the current pipeline, if there’s failure downstream, it’s hard to find when and where the failure is happening.

Lauren’s first task was bringing raw data into the “staging” layer (data lake). She initially tackled it by pulling all data into the staging layer right away. Looking back, she would have done it differently now that she knows more about the tools and processes. The goal is to help her team monitor and catch issues before it reaches the business stakeholders. As we saw with Canva’s data pipeline, the benefit for the data team and the people who rely on the data is saving frustration and time.

A good data pipeline is one that doesn’t have many issues. More importantly, when issues do come up, it should be very easy for the data team to diagnose the issue. This impact of this revenue pipeline project is reducing time spent triaging issues, increase speed and ease at accessing data, and analyzing data at various levels. Additionally, the team can decrease communication difficulties with a a version-controlled dictionary of their metrics (similar to the data dictionary Education Perfect is creating). 

Learning the tools of the trade

As a data analyst, you may not be diving into GitHub and the various workflows engineers typically use for reviewing and pushing code. Lauren’s team is a huge proponent of Github Issues to manage internal processes (she had an outstanding GitHub issue to work on as we were speaking). If engineers add new products to Netfliy’s product line, they add a new GitHub issue for Lauren’s team to address.

I was curious how Lauren gained the skills for some of the tools she uses every day. When you think of the tools a data analyst uses, you might think of Excel, SQL, R, etc. These are not necessarily tools or platforms you take classes for in college, so what was Lauren’s learning path?

Lauren has learned most tools on the job. She learned Python after graduating college.

I learned [python] partially because I was trying to do things in Excel that were frustrating. I was pushing Excel to do too much with VLOOKUPs, references, etc.

Here’s a tool you don’t hear every day: in college, Lauren learned Fortran 90 because people in her environmental engineering department were still using this programming language. She ended up learning SQL solely because she wanted to go into analytics and learned from a book. One thing she said about the tools she uses is that it’s all about the nuance and control you have over the tool that make you stick with the tool. It’s the small little things that keep you going back to that tool long term.

It’s all about nuance when working with stakeholders

Lauren explained that there is sometimes a mismatch between how we communicate and what we mean in terms of explaining metrics. Sometimes you need to sit down explain and where the data is coming from and showing why the numbers are what they are. Something she’s doing more of now is explaining the specific nuances of the data her team produces.

As analysts, we need to think in the big picture and the nuances.

Stakeholders need confidence in the numbers but also analysts also need to validate the numbers with other data sources the stakeholders are looking at. Sometimes the stakeholder you need to win over is yourself.

When Lauren was doing an experimental analysis at a previous company, she was expecting to see more clicks on a certain report. The hardest part with this experiment was that product managers typically run experiments and the analytics team is just assisting with driving the outcomes. The initial hypothesis about the numbers is driven by the PMs, not by analytics.

When you’re working with business stakeholders and trying to get them to have confidence in the numbers, simply being a kind person and good communicator can help. Lauren likes to remind myself that people typically mean well and everyone is coming to the table with the same information. These conversations about why numbers don’t look the way they should (from the perspective of the stakeholder) can be uncomfortable and not always fun. If you’re having trouble communicating, come to it with kindness and transparency.

From wastewater treatment to data analytics

We also talked about how Lauren started her career in data analytics which she discussed at length at a talk with the Society of Women Engineers. During college, Lauren majored in environmental engineering and thought she was going to be a civil engineer after graduation. Specifically, she wanted to go into wastewater treatment.

After working at a wastewater treatment plant, however, Lauren discovered she was more passionate about answering questions about the data in the wastewater treatment space. At the time, she didn’t even realize data analytics existed as a potential career path.

I think it’s difficult to find any job where working with data is not part of the job responsibilities. Lauren’s advice for people who want to get into a data analytics role but may not necessarily have the proper experience is reframing what you learned in school or at a current job to the role you’re interested in. For instance, Lauren took various match courses in college and was able to map the technical language from her studies and her environmental science job into skills required for a data analytics role at Zynga.

Lauren also talked about the power of connections and meeting people. If she could change one thing about how she got started in data analytics, it would be participating and contributing in various analytics communities. In particular, many of these tools she uses have thriving communities where like-minded people hang out and discuss product improvements, questions on how to do stuff, etc. Lauren plans on being more active in some of these communities which includes conferences like PyCon.

Tools for 2021

Finally, we discussed tools Lauren is excited to try out and use this year. She’s a big fan of dbt because of its ability to implement tests and its various documentation features. She’s also excited to start using transform.io to help with Netlify’s data dictionary. Another crowd favorite is Mode Analytics. Another area she’s excited to learn about in the next year is microservices and building analyses on top of what she builds there.

From a personal perspective, Lauren is thinking about starting a data blog. As an amateur blogger myself, I’ll vouch for that :).

Other Podcasts & Blog Posts

No other podcasts!

The post Dear Analyst #64: Architecting revenue data pipelines and switching to a career in analytics with Lauren Adabie of Netlify appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-64-architecting-revenue-data-pipelines-and-switching-to-a-career-in-analytics-with-lauren-adabie-of-netlify/feed/ 0 Transforming Netlify's data pipeline one SQL statement at a time. Lauren Adabie started her career analyzing data and answering questions about the data at Zynga. As a data analyst at Netlify, she's doing more than just exploratory analysis. Transforming Netlify's data pipeline one SQL statement at a time. Lauren Adabie started her career analyzing data and answering questions about the data at Zynga. As a data analyst at Netlify, she's doing more than just exploratory analysis. She's also helping build out Netlify's revenue data pipeline; something she's never done before. We discuss how her team is transforming data with SQL, how to get her stakeholders to have confidence in the data, and the path that led her to a career in data analytics.







Re-architecting a Revenue Pipeline



Lauren joined the Netlify team near the beginning of this revenue pipeline project project. Currently the pipeline is a combination of a few workflows. There are hourly processes to export the data to CSVs and Databricks jobs to load and aggregate data and then producing topic-specific tables. Lauren is currently helping migrate this workflow to dbt. With the current pipeline, if there's failure downstream, it's hard to find when and where the failure is happening.







Lauren's first task was bringing raw data into the "staging" layer (data lake). She initially tackled it by pulling all data into the staging layer right away. Looking back, she would have done it differently now that she knows more about the tools and processes. The goal is to help her team monitor and catch issues before it reaches the business stakeholders. As we saw with Canva's data pipeline, the benefit for the data team and the people who rely on the data is saving frustration and time.



A good data pipeline is one that doesn't have many issues. More importantly, when issues do come up, it should be very easy for the data team to diagnose the issue. This impact of this revenue pipeline project is reducing time spent triaging issues, increase speed and ease at accessing data, and analyzing data at various levels. Additionally, the team can decrease communication difficulties with a a version-controlled dictionary of their metrics (similar to the data dictionary Education Perfect is creating). 



Learning the tools of the trade



As a data analyst, you may not be diving into GitHub and the various workflows engineers typically use for reviewing and pushing code. Lauren's team is a huge proponent of Github Issues to manage internal processes (she had an outstanding GitHub issue to work on as we were speaking). If engineers add new products to Netfliy's product line, they add a new GitHub issue for Lauren's team to address.



I was curious how Lauren gained the skills for some of the tools she uses every day. When you think of the tools a data analyst uses, you might think of Excel, SQL, R, etc. These are not necessarily tools or platforms you take classes for in college, so what was Lauren's learning path?



Lauren has learned most tools on the job. She learned Python after graduating college.



I learned [python] partially because I was trying to do things in Excel that were frustrating. I was pushing Excel to do too much with VLOOKUPs, references, etc.



Here's a tool you don't hear every day: in college, Lauren learned Fortran 90 because people in her environmental engineering department were still using this programming language. She ended up learning SQL solely because she wanted to go into analytics and learned from a bo...]]>
Dear Analyst 64 48:29 50767
Dear Analyst #63: Cleaning Bitcoin Tweet data with OpenRefine, a free and open source alternative to Power Query https://www.thekeycuts.com/dear-analyst-63-cleaning-bitcoin-tweet-data-with-openrefine-a-free-and-open-source-alternative-to-power-query/ https://www.thekeycuts.com/dear-analyst-63-cleaning-bitcoin-tweet-data-with-openrefine-a-free-and-open-source-alternative-to-power-query/#respond Mon, 15 Mar 2021 04:22:00 +0000 https://www.thekeycuts.com/?p=50738 Numerous studies claim that data scientists spend too much time cleaning and preparing data (although this article claims it is a bullshit measure). I agree with some points in that article in that you should get your hands dirty with cleaning data to understand what eventually goes into the analysis. You may already be cleaning […]

The post Dear Analyst #63: Cleaning Bitcoin Tweet data with OpenRefine, a free and open source alternative to Power Query appeared first on .

]]>
Numerous studies claim that data scientists spend too much time cleaning and preparing data (although this article claims it is a bullshit measure). I agree with some points in that article in that you should get your hands dirty with cleaning data to understand what eventually goes into the analysis. You may already be cleaning up messy data today with Power Query, an add-in for Excel ten years ago which now is its own standalone application. For those who don’t have Office 365, a recent version of Excel, or a Mac, what tool can you use for cleaning up data? The main tool I’ve been using is OpenRefine. The main reason I reach for this tool: it’s free. It’s like Power Query for the masses. I’ve been wanting to do this episode for a while, so get your messy dataset ready. The Google Sheet for the examples in this episode is here.

If you want to watch just the tutorial portion of this episode, see the video below:

OpenRefine history

You can read more about OpenRefine’s history on this blog post. The tool started as an open source project in 2010 before Google bought the company that created the tool (Metaweb). The tool was renamed to Google Refine for two years, but Google eventually stopped supporting it in 2012. The blog post cites a few reasons why Google stopped supporting the tool. I think one of the main reasons is that it’s a desktop application and not run in the cloud. This probably conflicted with Google’s own cloud ambitions for what is now Google Cloud Platform where they have data cleaning tools all in the cloud.

Since Google dropped support in 2012, it’s exciting to see a good number of contributors to the project and an active mailing list. One feature I think that will keep OpenRefine relevant among analysts and data scientists who need to clean their data is the reconciliation service (similar to Excel’s rich data types). More on this later.

Clean messy data, not organize clean data

As I’ve been using OpenRefine over the years, I’ve found that I reach for OpenRefine for specific use cases. It doesn’t aim to be an all-in-one tool. When you first launch OpenRefine, you’ll see the main tagline for the tool in the top left:

A power tool for working with messy data.

It just says it like it is. It doesn’t do PivotTables, charts, or other things you might find in a spreadsheet tool. It does one thing and one thing well: clean messy data. I also think it does a good job of exploring outliers of your data, but it’s all in service of ridding your dataset of inconsistencies.

This post from Alex Petralia is a good read with regards to how you should think about OpenRefine:

In fact, what differentiates clean data from messy data is not organizational structure but data consistency. While clean datasets may not be organized as you’d like (eg. datetimes are stored as strings), they are at least consistent. Messy datasets, on the other hand, are defined by inconsistency: typos abound and there is no standardization on how the data should be input. You will only ever find messy datasets, practically by definition, when humans input the data themselves.

If you are working with data that is being produced by a computer, chances are OpenRefine will not be that helpful in terms of transforming your data into something you need to use for a downstream analysis. On the other hand, if you are working with a lot of user-generated data (as is the case with our Bitcoin Tweets data), OpenRefine is a perfect tool for the job and on par with Power Query.

Bitcoin Tweets

This is another dataset I pulled from Kaggle which shows recent Bitcoin Tweets in February and March 2021. There are a little under 50K rows in the dataset which represents a little over 30 days of Tweets. The main qualifier for these Tweets is that the Twitter user had to include the hashtags #Bitcoin or #BTC in the Tweet.

We’ve seen Bitcoin rise above $60K this past week and some of the rise might be due to all the chatter on social media. This dataset is a great example of messy data since you have the user_location column which isn’t standardized (you can put whatever value you want in the location field in Twitter). There’s also a hashtags column which of course can contain a variety of spelling of Bitcoin like “btc,” “BTC,” etc. So a lot of messy data to try to clean up and standardize using OpenRefine. More importantly for the Bitcoin speculators, you can try to spot some trends or patterns in the content people on the Internet are saying about Bitcoin and see if it correlates with Bitcoin’s price.

Cleaning up date formats

After loading the data, the first thing you can do to allow you to better sort, filter, and facet the dataset is setting the date format for certain column. In this case, I’m just transforming the date column to be a “date” format. Pretty simply stuff you can do in Excel or Google Sheets:

Here’s where things can get a little interesting. You’ll notice that for the user_created column contains dates with inconsistent formats. Sometimes the date will be in the U.S. date format with a timestamp (e.g. “3/19/19 21:33:01”) but other times it will be in the European format with the day followed by the month (e.g. “19/03/19”). I changed the dataset a little bit so that only the rows that have a date of February 10th have the European date format in the user_created column. So our goal is to convert those dates in the European format to the U.S. format.

First thing we do is isolate the rows that have February 10th in the date column. This can be done by filtering and faceting, two of the most common operations in OpenRefine. If you do the timeline facet, you’ll get this nice scatterplot in the left sidebar showing how your data is distributed based on the date column. Unfortunately, we can’t exactly pinpoint February 10th on this facet:

As you move the left and right handles in the timeline plot, you’ll start seeing the number of records getting smaller. It’s a nice way of filtering out the rows you don’t care about. I’m going to do a basic Text filter and just look out the dates that contain “2021-02-10.” I now have 3526 rows out of ~50K that match this criteria:

Now I can start transforming the user_created column so that the date is in the format I want. After you click on “Transform” in the column settings, you’ll see a field to utilize OpenRefine’s on coding language called GREL (General Refine Expression Language). It feels very similar to Javascript. We can start transforming the column of data by using the value variable to get the date format we want. As you type the expression in the box, you see what the output looks like in the preview pane below:

After you apply the transformation, OpenRefine changes the date for you in that column to the format that we want. You can remove the filter or facet and then apply the “To date” transformation to this column so we have a clean date column to work with.

Adding column from examples in Power Query

By filtering and faceting your data and then applying transformations with GREL, you’ll be able to do a majority of data cleaning tasks that you might do in Excel or Power Query. This totally could’ve been done in Excel, but you’d be creating a couple columns to store the correctly formatted data and doing MID() formulas left and right.

In Power Query, the Add Column from examples feature basically does the date cleaning task I just showed above in a more user-friendly way. Instead of writing out your own expression, you start typing in the date you actually want next to the “dirty” date, and then Power Query infers what the date format should be and fills that transformation down for you to all your “dirty” dates. Behind the scenes, Power Query writes the expression for you in its own M formula language. This prevents you from having to write it all out yourself. A little more magic and a little less control.

Clustering and editing groups of values

This is the main feature I use in OpenRefine when dealing with messy data. Nothing is worse than having the city “London” spelled in 10 different ways when you’re trying to build a report based on, well, London. What if the “L” isn’t capitalized, or the person shortened the spelling to “Lon?” This is exactly what the Cluster and Edit feature aims to solve.

The user_location column in our dataset is filled with inconsistent city and country spellings, so this is a great use case for Cluster and Edit. Once you apply this feature, you can filter on the number of rows in the cluster (among other filters) to quickly fix the major data inconsistencies in the dataset. Surprisingly a large number of Twitter users cite “Lagos, Nigeria” as their location. Once you see that there is a consistent spelling of a city name, you can merge the inconsistent spellings to start cleaning up the data:

At the top of the menu, you’ll see the Method and Keying Function dropdowns. These are different algorithms you can utilize to group the data if the current algorithm doesn’t appear to capture all the inconsistencies. I really like this feature because I don’t aim to get a perfectly clean dataset if the number of clusters is very large (as is the case with this dataset). I just care about cleaning up the major problems, and dragging the handlebars on the right allow me to find those problem values.

The Cluster values feature in Power Query allows you to do something similar, but I think OpenRefine’s multiple algorithms and ability to filter down to the clusters you care about make OpenRefine more robust for handling misspellings.

Reconciliation and rich data types

A lot of people in the Excel community are cuckoo for Cocoa Puffs when the data types feature was released. Instead of copying and pasting additional data about stocks or geography into your Excel file, data types allow you to pull this information automatically from Wolfram.

OpenRefine’s analogous feature is called reconciliation. Not going to lie, I think the naming of this feature could be better. Feature name notwithstanding, you can “enrich” your existing data with numerous reconciliation services. From doing a quick scan of the different services, it does feel like there’s an academic bent on the types of libraries available. I’m going to use a basic Wikidata to see what additional data we can find based on the user_location column in our dataset. After you click on Reconcile and then Start Reconciling in the column settings, you can add services by adding the URL of the service. With the Wikidata service, I’m going to see if I can make the user_location column a rich “data type”:

The “use relevant details” settings gives the ability to include additional columns to put into the request from the service so that it can better find a match for you. I’m going to leave that alone for now and see what this does for our dataset:

For some locations, it found perfect matches like “Atlanta” and “London.” For values like “Europa,” we have the option to click on the box with the one or two checkmarks. This is applying data cleaning to the data enrichment process. Perhaps I only want row 5 to be the “Europa” rich value (in which case I would click the box with one checkmark). If I want all 50 instances of “Europa” to resolve to the rich value Wikidata suggests, I would click on the box with two checkmarks.

If you click on the new value in this location column, you’ll see the Wikidata page for that value. Let’s try to project out some values from this “enriched” data. After clicking on “Add columns from reconciled values,” you’ll see a lit of available properties you can add to the dataset. At this stage, you can click a property and preview what the values might look like before committing the operation. After adding the “head of government” column, we get another rich data type:

On the left sidebar, you can further filter the “best candidate’s score” so that the new head of government column includes only the best matches based on the location provided to the service. This is another great data cleaning feature to remove any false positives where the fuzzy match didn’t work out as well as we would’ve liked.

Other features for reaching parity with Power Query

Before you go off and start saying OpenRefine might look great for data cleaning, how does it compare to the other features available in Power Query?

Recording steps in data transformation process

One powerful feature in Power Query is the ability to see the different “steps” in the data transformation process. OpenRefine also has these steps that let you go forward and backward in the process. It’s kind of like going to a step in a macro:

If you’ll be applying the same “steps” to cleaning up your data in the future, you can export the steps and apply them to another instance of OpenRefine. This way you don’t have to do the manual process of doing each step all over again. You get this json-like code of transformations which you can save into a text file:

Export into multiple formats

Just like Power Query, you can export the final cleaned dataset into an Excel file, but there are many other formats OpenRefine allows. My guess is that others in the OpenRefine community have built other exporters to connect your output into other online tools that you might use in the workplace:

Merging tables not available

One feature that’s not available in OpenRefine is the ability to merge different datasets together with a point-and-click interface. Specifically, this is the ability to denormalize or “unpivot” your data so you can get one long stats table. This is possible if you use the cross function in the GREL language, but it requires coding the transformation you’re looking for versus clicking on a few dropdowns in Power Query. Again, the tension between the tool doing magic and the tool giving you control.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #63: Cleaning Bitcoin Tweet data with OpenRefine, a free and open source alternative to Power Query appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-63-cleaning-bitcoin-tweet-data-with-openrefine-a-free-and-open-source-alternative-to-power-query/feed/ 0 Numerous studies claim that data scientists spend too much time cleaning and preparing data (although this article claims it is a bullshit measure). I agree with some points in that article in that you should get your hands dirty with cleaning data to ... Numerous studies claim that data scientists spend too much time cleaning and preparing data (although this article claims it is a bullshit measure). I agree with some points in that article in that you should get your hands dirty with cleaning data to understand what eventually goes into the analysis. You may already be cleaning up messy data today with Power Query, an add-in for Excel ten years ago which now is its own standalone application. For those who don't have Office 365, a recent version of Excel, or a Mac, what tool can you use for cleaning up data? The main tool I've been using is OpenRefine. The main reason I reach for this tool: it's free. It's like Power Query for the masses. I've been wanting to do this episode for a while, so get your messy dataset ready. The Google Sheet for the examples in this episode is here.







If you want to watch just the tutorial portion of this episode, see the video below:




https://www.youtube.com/watch?v=0Fyh5x1QjMs




OpenRefine history



You can read more about OpenRefine's history on this blog post. The tool started as an open source project in 2010 before Google bought the company that created the tool (Metaweb). The tool was renamed to Google Refine for two years, but Google eventually stopped supporting it in 2012. The blog post cites a few reasons why Google stopped supporting the tool. I think one of the main reasons is that it's a desktop application and not run in the cloud. This probably conflicted with Google's own cloud ambitions for what is now Google Cloud Platform where they have data cleaning tools all in the cloud.







Since Google dropped support in 2012, it's exciting to see a good number of contributors to the project and an active mailing list. One feature I think that will keep OpenRefine relevant among analysts and data scientists who need to clean their data is the reconciliation service (similar to Excel's rich data types). More on this later.



Clean messy data, not organize clean data



As I've been using OpenRefine over the years, I've found that I reach for OpenRefine for specific use cases. It doesn't aim to be an all-in-one tool. When you first launch OpenRefine, you'll see the main tagline for the tool in the top left:



A power tool for working with messy data.



It just says it like it is. It doesn't do PivotTables, charts, or other things you might find in a spreadsheet tool. It does one thing and one thing well: clean messy data. I also think it does a good job of exploring outliers of your data, but it's all in service of ridding your dataset of inconsistencies.



This post from Alex Petralia is a good read with regards to how you should think about OpenRefine:



In fact, what differentiates clean data from messy data is not organizational structure but data consistency. While clean datasets may not be organized as you’d like (eg. datetimes are stored as strings), they are at least consistent. Messy datasets, on the other hand, are defined by inconsistency: typos abound and there is no standardization on how the data should be input.]]>
Dear Analyst 53 30:22 50738
Dear Analyst #62: Using data storytelling to close billions of dollars worth of deals at LinkedIn with Janie Ho https://www.thekeycuts.com/dear-analyst-62-using-data-storytelling-to-close-billions-of-dollars-worth-of-deals-at-linkedin-with-janie-ho/ https://www.thekeycuts.com/dear-analyst-62-using-data-storytelling-to-close-billions-of-dollars-worth-of-deals-at-linkedin-with-janie-ho/#comments Mon, 01 Mar 2021 11:47:11 +0000 https://www.thekeycuts.com/?p=50707 This episode is all about data storytelling at a “traditional” enterprise company like LinkedIn and also at a major news publication. Janie Ho is a former global account analyst at LinkedIn in NYC where she facilitated data-driven presentations to close revenue deals for LinkedIn’s top global strategic accounts. Currently, she is a senior editor in […]

The post Dear Analyst #62: Using data storytelling to close billions of dollars worth of deals at LinkedIn with Janie Ho appeared first on .

]]>
This episode is all about data storytelling at a “traditional” enterprise company like LinkedIn and also at a major news publication. Janie Ho is a former global account analyst at LinkedIn in NYC where she facilitated data-driven presentations to close revenue deals for LinkedIn’s top global strategic accounts. Currently, she is a senior editor in growth and audience at the New York Daily News under Tribune Publishing. This episode goes into best practices for creating data-driven presentations, learning new skills in non-traditional methods, and tools journalists use to find new stories to pursue.

Upleveling skills: from SEO to data

As a former journalist at various publications like ABC News and Businessweek, Janie forged a non-traditional path to a career in data.

In New York there was a popular platform called Mediabistro where they held these one-night courses. Many of them were free, and Janie took as many free courses as she could. She took many courses on SEO, and her SEO skills ended up being her gateway into data analytics.

I always find it interesting how people from all different backgrounds end up getting into data whether it’s learning Excel, SQL, or some other data tool. It further shows that no matter what your role is, you will come across a spreadsheet at one point or another. In the world of SEO, you have tons of data around keyword performance, traffic estimates, rank, and more to play with.

LinkedIn: an enterprise behemoth

Janie eventually found herself at LinkedIn in 2011 as the first analyst in her group focused on global revenue accounts. When she left LinkedIn three years later, there were 50 analysts. Most of the analysts were recruited from management consulting so these analysts most likely had some data experience. Luckily, LinkedIn emphasized professional development so Janie was able to not only learn data skills, but also how to build data-driven presentations.

Most people don’t realize that LinkedIn is expensive enterprise software that powers a lot of hiring functions around the world. Seats for the software cost $10K/year and above. When Janie joined LinkedIn, the company was in high-growth mode since there was so much demand for the product on the enterprise side.

LinkedIn was basically hiring salespeople as fast as they could, and the salespeople were expected to start selling the next day. There wasn’t an extended onboarding period; they just needed people to sell. With all these salespeople doing QBRs and creating new pitch decks for the C-suite, LinkedIn needed many analysts like Janie to help produce these presentations at a fast rate.

Concise business review data presentations

In order to create these presentations, Janie and her fellow analysts were basically downloading LinkedIn usage data and slicing and dicing the data in Excel. She had to show LinkedIn’s top strategic clients how things are going during these QBRs, but also what the opportunities are to spend more on LinkedIn.

Internally, LinkedIn had a program called Data-Driven University which was created by former Bain consultants. Janie would learn the key data storytelling skills from this “university” and turn around and train salespeople. Some examples of slides that Janie would create are below. These are the “after” slides that show how the data could tell a better story where there’s only one key takeaway per slide:

Compare these slides to the slide below where there are too many elements on the slide and the key takeaway for the audience is not clear:

One-click data-driven presentations

The insights team at LinkedIn ended up creating a tool called Merlin that was built on Tableau. All you needed as an analyst was the the client’s company ID and all the visualizations would get created with one click. The output was a 50-slide deck with takeaways written in plain English.

One of the neat features of this one-click dashboard was that it would create an “icebreaker” game in each deck depending on which clients you were talking to. You could just plug in all the names attending. the meeting into the tool, and it would create a slide asking the meeting attendees who the most popular person is on LinkedIn since the tool obviously had access to all meeting attendees’ LinkedIn information.

LinkedIn’s sales data—sometimes close to a petabyte or more—exists among internal databases, Google Analytics, Salesforce.com, and third party tools. Previously, one analyst on LinkedIn’s team serviced daily sales requests from over 500 salespeople, creating a reporting queue of up to 6 months.

In response, the business analytics team centralized this disparate data into Tableau Server to create a series of customer success dashboards. LinkedIn embeds Tableau Server into their internal analytics portal, nicknamed “Merlin.”

Today, thousands of sales people visit the portal on a weekly basis—equivalent to up to 90% of LinkedIn’s sales team—to track customer churn, risk indicators, and sales performance.

Source: Tableau

Janie still had to download additional usage data and do custom reports and PivotTables to get her clients the data they needed. She eventually learned SQL to further automate her data needs. Nonetheless, this solution in Tableau really helped salespeople get the slides they needed to tell data-driven stories and close deals.

Data visualization best practices

Through her training at LinkedIn, Janie learned all types of best practices for how to tell data-driven stories. One of the key questions she would ask herself is this: Can you explain the slide in plain English to someone who is not in that specific industry?

If you can’t, chances are the slide could be simplified and data can be removed. We talked about all types of best practices in this episode, but here were a few that stood out:

  • Slide headlines should be in the same position on each slide so your audience isn’t scanning the slide for the headline and instead focuses on the body of the slide.
  • Use colors and charts sparingly: you should have one specific bar, line, or color you want the audience to focus on to grasp the key takeaway from the slide
  • 3-5 second rule: if you look at the slide for 3-5 seconds you should be able to understand the takeaway

The slides are not for you. They are for your audience.

In this following slide, the audience is drawn to one specific bar and color to understand the key takeaway of the slide:

Janie saw parallels between her experience at LinkedIn and her former journalist days. You’re tempted to add more data and visualizations to the slides, but you don’t want your audience’s attention to be distracted. You want that one key trend or number to be stamped into your audience’s head which is like writing a really catchy news headline.

Learning and teaching Google Sheets/Excel

According to Janie, 80% of a data analyst’s job is cleaning data despite all the expensive tools and AI that have been developed over the years. Even with the Merlin to at LinkedIn, analysts still had to use Excel. That’s why she had to learn how to automate as much as she could in Excel and SQL and then pass on these tools to incoming analysts.

They say the best developer is a lazy developer.

After LinkedIn, Janie started working for smaller companies such as nonprofits and would report directly to the CEO. A lot of them were in Google Sheets all day and couldn’t write formulas like VLOOKUP. They were doing things by hand across thousands of rows and manually changing the formatting with the paintbrush tool in Excel.

To teach these CEOs how to use Excel, she would first walk them through the formulas she was building and the final product in Excel. Then she revert all her changes and ask them to do the exact same thing and say they have to create the same output as what she showed them.

They don’t know what they don’t know.

Speaking of acquiring skills, Janie made an interesting point about how many people learned web programming skills back in the early 2000s. This was during the heyday of Myspace and Xanga. Myspace users were teaching themselves HTML, CSS, and Javascript just to do simple things with their Myspace pages. That same same need to learn how to edit a website is not as common now with platforms like Facebook.

People were learning these 6-figure skills just to get a unicorn to pop out from their Myspace profiles.

Audience development at The New York Daily News

Janie oversees many different assets at The New York Daily News including homepage, social media platforms, podcasts, breaking news emails, mobile alerts, and newsletters just to name a few.

Data is still an important part of what she does in her current role. Tools like Chartbeat and Tableau are used for reporting purposes. OneSignal is used for pushing mobile/web alerts. All the data generated from these platforms are pushed into Google Analytics 360 dashboards built by the national Tribune team.

Twice daily, Janie reports on the best “meta” headlines to NY Daily News journalists (these are the SEO titles from top performing articles). For her team, the One Metric that Matters (OMTM) is getting new subscribers. I think many teams call their OMTM their “north star metric” or something similar. In the world of SaaS, that might be MAUs or DAUs. Here is an example of a chart Janie might show her team during one of these meetings showing the performance of stories:

We talked about how Janie’s team helps journalists predict which stories will be “hits.” The New York Daily News’ biggest news source is still news about NYC. They don’t do feature stories on Broadway openings and restaurants anymore given the size of the team. The stats Janie presents is only one-half of what journalists rely on to figure out what stories and beats to pursue.

Ultimately, it’s an art and science to find a story to pitch the editors.

You can find Janie on Twitter at @janieho16.

Other Podcasts & Blog Posts

No other podcasts or blog posts this week!

The post Dear Analyst #62: Using data storytelling to close billions of dollars worth of deals at LinkedIn with Janie Ho appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-62-using-data-storytelling-to-close-billions-of-dollars-worth-of-deals-at-linkedin-with-janie-ho/feed/ 1 This episode is all about data storytelling at a "traditional" enterprise company like LinkedIn and also at a major news publication. Janie Ho is a former global account analyst at LinkedIn in NYC where she facilitated data-driven presentations to clos... This episode is all about data storytelling at a "traditional" enterprise company like LinkedIn and also at a major news publication. Janie Ho is a former global account analyst at LinkedIn in NYC where she facilitated data-driven presentations to close revenue deals for LinkedIn's top global strategic accounts. Currently, she is a senior editor in growth and audience at the New York Daily News under Tribune Publishing. This episode goes into best practices for creating data-driven presentations, learning new skills in non-traditional methods, and tools journalists use to find new stories to pursue.







Upleveling skills: from SEO to data



As a former journalist at various publications like ABC News and Businessweek, Janie forged a non-traditional path to a career in data.



In New York there was a popular platform called Mediabistro where they held these one-night courses. Many of them were free, and Janie took as many free courses as she could. She took many courses on SEO, and her SEO skills ended up being her gateway into data analytics.







I always find it interesting how people from all different backgrounds end up getting into data whether it's learning Excel, SQL, or some other data tool. It further shows that no matter what your role is, you will come across a spreadsheet at one point or another. In the world of SEO, you have tons of data around keyword performance, traffic estimates, rank, and more to play with.



LinkedIn: an enterprise behemoth



Janie eventually found herself at LinkedIn in 2011 as the first analyst in her group focused on global revenue accounts. When she left LinkedIn three years later, there were 50 analysts. Most of the analysts were recruited from management consulting so these analysts most likely had some data experience. Luckily, LinkedIn emphasized professional development so Janie was able to not only learn data skills, but also how to build data-driven presentations.







Most people don't realize that LinkedIn is expensive enterprise software that powers a lot of hiring functions around the world. Seats for the software cost $10K/year and above. When Janie joined LinkedIn, the company was in high-growth mode since there was so much demand for the product on the enterprise side.



LinkedIn was basically hiring salespeople as fast as they could, and the salespeople were expected to start selling the next day. There wasn't an extended onboarding period; they just needed people to sell. With all these salespeople doing QBRs and creating new pitch decks for the C-suite, LinkedIn needed many analysts like Janie to help produce these presentations at a fast rate.



Concise business review data presentations



In order to create these presentations, Janie and her fellow analysts were basically downloading LinkedIn usage data and slicing and dicing the data in Excel. She had to show LinkedIn's top strategic clients how things are going during these QBRs, but also what the opportunities are to spend more on LinkedIn.



Internally, LinkedIn had a program called Data-Driven University which was created by former Bain consultants. Janie would learn the key data storytelling skills from this "university" and turn around and train salespeople. Some examples of slides that Janie would create are below. These are the "after" slides that show how the data could tell a better story where there's only one key takeaway per slide:
]]>
Dear Analyst 62 50:50 50707
Dear Analyst #61: Empowering businesses and individuals with data literacy skills with Oz du Soleil https://www.thekeycuts.com/dear-analyst-61-empowering-businesses-and-individual-with-data-literacy-skills-with-oz-du-soleil/ https://www.thekeycuts.com/dear-analyst-61-empowering-businesses-and-individual-with-data-literacy-skills-with-oz-du-soleil/#comments Mon, 22 Feb 2021 05:16:00 +0000 https://www.thekeycuts.com/?p=50671 Oz is one of the best creators of Excel content I know with his Excel on Fire YouTube channel. Unlike traditional “how-to” videos, his videos blend education with entertainment making the learning process feel like binging your favorite Netflix show. Oz and I met on Google+ way back in the day and in person at […]

The post Dear Analyst #61: Empowering businesses and individuals with data literacy skills with Oz du Soleil appeared first on .

]]>
Oz is one of the best creators of Excel content I know with his Excel on Fire YouTube channel. Unlike traditional “how-to” videos, his videos blend education with entertainment making the learning process feel like binging your favorite Netflix show. Oz and I met on Google+ way back in the day and in person at the 2014 Modeloff competition. While Oz is an Excel MVP and Excel trainer on LinkedIn, our conversation goes deeper into data literacy and understanding where your data is coming from before it gets into the spreadsheet.

Know just enough Excel to get your job done

When I first met Oz at the Modeloff in 2014, he told me a story about how he discovered the power of Excel for changing people’s lives. This story really shows the human side of a spreadsheet program that is typically associated with business and enterprise use.

Oz was teaching Excel at a medical school and helping the students in his class automate their reports. He met one student who was simply copying and pasting cells up and down the spreadsheet, and was spending an hour doing these manual operations. He realized the student just needed one formula to automate the task she was doing, she just didn’t know what that formula was.

I started learning about people who needed to know how to use certain features in Excel, but didn’t need to know how to learn how to use everything in Excel.

Once the student saw how the formula could eliminate all the tedious work she was doing, it changed how she worked and gave her so much more time to focus on more important aspects of her job.

I think a lot of people approach their tools and software with a similar mindset. You know there is probably a better or faster way of doing something, but you go with what you know. There’s a bit of the JTBD (jobs-to-be-done) framework here. Knowledge workers need to know just enough to solve the problems they face on the job, and can leave the rest of the software’s feature set for the power users.

You’ll work with data no matter what role you have

Prior to our conversation, Oz mentioned to me he wanted to talk about more than just Excel tips and tricks. These topics are covered at nauseum by other content creators; and for good measure as people need and want this training (yours truly has benefited from creating this type of content). What really tickles my fancy are the topics surrounding Excel, and there is no one better to go in-depth with me on these topics than Oz.

Analyst might not be in your title.

Nonetheless, you are or will be sorting, filtering, and summarizing data no matter what department or level your work in. Excel is merely a tool to get you from the raw data to the story you tell to your internal stakeholders to launch X feature or to external clients to purchase your product.

Oz talks about how people taking an Excel class will get them feeling comfortable about using the tool, but it only goes so far. As you get real world experience, you’ll start to ask questions about data quality and the data source(s). These are topics that go beyond Excel and into the realm of databases, data transformation, and data pipelines; topics I’m trying to cover more of on this podcast.

Oz opined about the dilemma one faces with duplicate data. Do you de-duplicate at the source (perhaps in a view in a database) or do you do it in the spreadsheet? Most analysts (present company included) will make the necessary changes in Excel or Google Sheets for one reason: it’s fast. Harkening back to the previous section’s takeaway: I just need to get a job done and and don’t care (for now) how it gets completed.

Before data storytelling, there’s data literacy

I’ve talked about data storytelling on numerous episodes (see the data storytelling episode with the New York Times). It’s a hot topic for a lot of companies as they start incorporating software into their product offerings (if you’re a SaaS company, you’re already swimming in a big data lake).

Before one can create these masterful data-driven stories, Oz believes there is a more fundamental skill one needs to acquire: data literacy. When you look at a report, you should be able to answer questions like “Can I trust the data source?” and “What am I really looking at with this data?”.

A recent article by Sara Brown at MIT Sloan highlights the following data literacy skills today’s knowledge worker should have:

  • Read with data, which means understanding what data is and the aspects of the world it represents.
  • Work with data, including creating, acquiring, cleaning, and managing it.
  • Analyze data, which involves filtering, sorting, aggregating, comparing, and performing other analytic operations on it.
  • Argue with data, which means using data to support a larger narrative that is intended to communicate some message or story to a particular audience.

The article goes on to explain the different steps a company can take to build an effective data literacy plan. An interesting stat Brown highlights in the study is this one from a Gartner survey conducted by Accenture:

In a survey of more than 9,000 employees in a variety of roles, 21% were confident in their data literacy skills.

Should we be surprised by this finding? I think not.

Did you ever need to take an Intro to Data Literacy course in middle or high school? Was learning spreadsheets part of the curriculum? Things change a bit at the university level as deans and presidents realize their students are not meeting the demands of hiring managers. I reference an episode of Freakonomics in episode 22 where they break down the deficiencies in the U.S.’s math curriculum. Key takeaway: a majority of what you learn in the K-12 system does not prepare you for a job requiring data literacy.

Empowering small businesses to use Excel

Oz made a great point about not just the content produced about Excel, but the features many bloggers and trainers decide to demonstrate in their content.

I worry that so much conversation has enterprises in mind, or the start-ups that want to get huge. But there are a lot of small businesses, and they’re lost in conversations that they don’t know aren’t meant for them.

Naturally, the type of professional who can spend a few hundred or few thousands dollars on a comprehensive Excel training probably works at a large enterprise or well-funded startup. But there are millions of flower shops, retail stores, and non-profits who may still be using Excel like the way Oz’s student was using Excel at that medical school.

This is an area Oz is passionate about and there is clearly a need to provide Excel training for this demographic. Chances are the flower shop won’t need to do complex VLOOKUPs and mess with Power Query. They just need to know the features–hope you’re starting to see the theme here–to get their jobs done.

Is Excel a database?

For many of these small businesses, yes.

Oz has seen small 5-person companies have some database platform installed and no one in the company uses the database because no one knows how to. He saw a non-profit where the DBA was a woman who worked half a day a week. If anyone needed to get data or add data to that database, they had to wait for the 4 hours a week she was available to handle their requests.

While it pains many of you (I include myself here) to see businesses inefficiently store their data in Excel or a Google Sheet, we must come to accept that not every business scenario needs to have auto-refreshing PivotTables and VBA macros.

Oz talks about the need to have more honesty and empowerment around what is possible with Excel. He hears the database vendors and data science crowd talk about using the latest and greatest database platforms or programming in R or Javascript. These are all great solutions for the enterprise, but who is going to implement these solutions at the flower shop? Perhaps this is the realm for the no-code platforms like Shopify to make e-commerce as simple as possible.

At the end of the day, Oz realized (like many analysts) that his Excel skills are necessary for many businesses whose data are trapped in databases. He would be in conversations with companies who need to create detailed reports, but then argue about which cost center is going to “fund” the project. Then you have green light committees who need to approve the SOW.

You’ll find these types of internal battles at corporates all over the world. But Oz knows if he just gets the data dump from the database, he can clean up the data and get the business the reports and stats they need with his knowledge of Excel, but more importantly, his understanding of the business logic.

Build vs buy

At the very end we talked a bit about a podcast I listened to recently (see Other Podcasts section below) where the classic dichotomy between build vs. buy was brought up. The main idea is that software engineers are not always great at putting a dollar value on the time it takes to build an application (versus just buying the off-the-shelf version).

Like Oz, I agree that Excel and Google Sheets should be treated as development platforms. Oz talked about working on a consulting project where the client was paying something like $60K/year for an industry-specific software application. The issue was that his client was only using a fraction of the features the software offered. When you purchase expensive software like this, you may also need to purchase the customer support for situations where the software breaks.

Instead, Oz was able to develop a prototype in Excel that had just the features the client needed and was using from the expensive enterprise software.

So there are situations where building can be more beneficial than buying the shiny software that’s targeted for your use case and industry. Additionally, you become the customer support because you know the ins and outs of the solution you created which is an empowering feeling.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #61: Empowering businesses and individuals with data literacy skills with Oz du Soleil appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-61-empowering-businesses-and-individual-with-data-literacy-skills-with-oz-du-soleil/feed/ 2 Oz is one of the best creators of Excel content I know with his Excel on Fire YouTube channel. Unlike traditional "how-to" videos, his videos blend education with entertainment making the learning process feel like binging your favorite Netflix show. Oz is one of the best creators of Excel content I know with his Excel on Fire YouTube channel. Unlike traditional "how-to" videos, his videos blend education with entertainment making the learning process feel like binging your favorite Netflix show. Oz and I met on Google+ way back in the day and in person at the 2014 Modeloff competition. While Oz is an Excel MVP and Excel trainer on LinkedIn, our conversation goes deeper into data literacy and understanding where your data is coming from before it gets into the spreadsheet.







Know just enough Excel to get your job done



When I first met Oz at the Modeloff in 2014, he told me a story about how he discovered the power of Excel for changing people's lives. This story really shows the human side of a spreadsheet program that is typically associated with business and enterprise use.



Oz was teaching Excel at a medical school and helping the students in his class automate their reports. He met one student who was simply copying and pasting cells up and down the spreadsheet, and was spending an hour doing these manual operations. He realized the student just needed one formula to automate the task she was doing, she just didn't know what that formula was.



I started learning about people who needed to know how to use certain features in Excel, but didn't need to know how to learn how to use everything in Excel.



Once the student saw how the formula could eliminate all the tedious work she was doing, it changed how she worked and gave her so much more time to focus on more important aspects of her job.







I think a lot of people approach their tools and software with a similar mindset. You know there is probably a better or faster way of doing something, but you go with what you know. There's a bit of the JTBD (jobs-to-be-done) framework here. Knowledge workers need to know just enough to solve the problems they face on the job, and can leave the rest of the software's feature set for the power users.



You'll work with data no matter what role you have



Prior to our conversation, Oz mentioned to me he wanted to talk about more than just Excel tips and tricks. These topics are covered at nauseum by other content creators; and for good measure as people need and want this training (yours truly has benefited from creating this type of content). What really tickles my fancy are the topics surrounding Excel, and there is no one better to go in-depth with me on these topics than Oz.



Analyst might not be in your title.



Nonetheless, you are or will be sorting, filtering, and summarizing data no matter what department or level your work in. Excel is merely a tool to get you from the raw data to the story you tell to your internal stakeholders to launch X feature or to external clients to purchase your product.



Oz talks about how people taking an Excel class will get them feeling comfortable about using the tool, but it only goes so far. As you get real world experience, you'll start to ask questions about data quality and the data source(s). These are topics that go beyond Excel and into the realm of databases, data transformation, and data pipelines; topics I'm trying to cover more of on this podcast.



Oz opined about the dilemma one faces with duplicate data. Do you de-duplicate at the source (perhaps in a view in a database) or do you do it in the spreads...]]>
Dear Analyst 61 48:22 50671
Dear Analyst #60: Going from a corporate accountant to building an Excel training academy with John Michaloudis https://www.thekeycuts.com/dear-analyst-60-going-from-a-corporate-accountant-to-building-an-excel-training-academy-with-john-michaloudis/ https://www.thekeycuts.com/dear-analyst-60-going-from-a-corporate-accountant-to-building-an-excel-training-academy-with-john-michaloudis/#respond Mon, 15 Feb 2021 15:57:53 +0000 https://www.thekeycuts.com/?p=50657 It’s a story we’ve all heard before. You’re working a full-time job, and you have more fun doing your side hustle than your 9 to 5. This is what happened to John Michaloudis. He was a financial controller at General Electric but found his passion sharing Excel tips and tricks on an internal GE newsletter […]

The post Dear Analyst #60: Going from a corporate accountant to building an Excel training academy with John Michaloudis appeared first on .

]]>
It’s a story we’ve all heard before. You’re working a full-time job, and you have more fun doing your side hustle than your 9 to 5. This is what happened to John Michaloudis. He was a financial controller at General Electric but found his passion sharing Excel tips and tricks on an internal GE newsletter which his colleagues ate up. John decided to become an entrepreneur and built an Excel training company from the ground up. We chatted about how he got started, his favorite marketing tactics, and of course, why he loves Excel.

10,000 followers on an internal company blog

At General Electric, there was an internal blog called Colab where employees could could write and publish articles only for GE employees to see. As a financial controller, John became well-versed in Excel and decided to contribute to the internal blog. He started posting Excel tips, and eventually he had a weekly column just devoted to being better at Excel.

GE’s Colab

John quickly amassed more than 10,000 subscribers to his column as he saw how hungry people were for Excel knowledge. But it was only his side gig at GE.

I liked doing the blog more than my actual job. I felt the subscribers valued me more than my boss valued me.

After getting this positive feedback from his colleagues around the world, he wanted to find a way to take his Excel column to the next level. For the next 12 months, he went off and created a course all about PivotTables. He asked his boss at the time if he could sell his course to the 10K subscribers to his column, but of course compliance told him no. He decided to leave GE, and as a last salvo sent out a message to his followers about a webinar he was going to host about his PivotTables course.

Creating a library of Excel content

Based on the feedback he got from his subscribers to his weekly Excel column, John was able to find a few topics to build additional Excel classes about. A perennial favorite of mine, keyboard shortcuts were high on the list. Creating charts was also a big topics since most of his now students work at companies, and presenting data in a compelling way is important.

John is constantly learning new Excel features but ultimately the content he produces is determined by what his students, the customers, want to learn. He periodically sends his students a survey and asks them what they want to learn about. These topics are what you’ll see on MyExcelOnline, John’s Excel training company.

Taking the leap to become an entrepreneur

While the idea of going off on your own and being your boss is a romantic one, for many the decision is a matter of dollars and cents. John was (and currently still is) working in Spain, and started earning a few grand from offering his courses on Udemy. He realized this was enough for him and his family to live on, and went full-time on his training company in January 2015.

His advice for aspiring entrepreneurs is don’t just leave with nothing. Create a product and test it out. Use cheap methods like Adwords to validate your idea 4-Hour Workweek style.

On Udemy, John’s PivotTable course originally would earn him about $2K/month but this went up to $7K/month. The problem was he was also selling his course on his website for $290. If his customers who bought the course fro his website found out they could get it cheaper on Udemy, it would result in a bad customer experience. So he decided to pull his course off of Udemy

These online education platforms are a blessing and a curse. While you can earn a lot more from publishing your courses on your own website and domain, these MOOCs spend the money and time to acquire customers for you. I’ve been teaching Excel on Skillshare since 2014 and have always thought about starting my own course off my website, but the Skillshare just makes it so easy to tap into a “built-in” audience and I can just focus on creating the educational content.

Early marketing tactics to get customers

For new entrepreneurs, the key to the early game is distribution. For John, the marketing tactics he employed for the start of MyExcelOnline revolved around affiliate sales. A tried and true method.

Some of those affiliates included Chandoo, My Online Training Hub, Excel Campus, and Contextures. The Excel community and the Excel training community especially are small and tight knit. From reading blog posts and attending webinars over the years from many of these content creators and trainers, I can tell how much dedication and work goes into creating these valuable resources.

To further build interest in his classes, John also hosts free webinars that give students a taste of what you can learn in his Excel classes. He’s been doing these webinars for the last 5 years and it’s driven the most interest in his classes. Then there is the coveted email newsletter which gives you (the content creator) a direct line of access to current and potential students.

We also chatted a bit about the creative ways other Excel trainers are using social media platforms to reach their target audiences. For instance, Kat Norton runs a TikTok channel called Miss Excel and has creates super entertaining videos with Excel tips (she also has her own Excel course linked in her bio):

https://www.tiktok.com/@miss.excel/video/6888079232983993605

John is all about experimenting with new channels and social media strategies, but his target customer is not using platforms like TikTok. His customers are a bit older, and most likely using platforms like Facebook and LinkedIn.

The other factor to consider is that the younger audience on TikTok might not turn into paying customers at a high rate compared to a “traditional” marketing channel like an email list. Nonetheless, it’s great to see so many young people wanting to learn Excel tips and tricks via short video content on TikTok.

New and unknown Excel features

One of John’s annual podcasts is the Excel tips roundup for the year (see the 2020 roundup here). He created a roundup of audio clips from some of the top Excel content creators sharing their favorite Excel tips.

Most of the tips John already knew, but one that stood out for him was importing from PDF using Power Query. This is a relatively unknown feature because it requires you to have Office 365. Exporting and importing from PDF is a huge topic and a lot of people over the years have built custom add-ins to do this in Excel (and made money doing it). Microsoft finally decided to build a native feature and put this into Power Query directly. I tend to think that Power Query and Power BI feel like separate applications from Excel, but they really extend the power and functionality of Excel in new ways.

Source: MyExcelOnline

Near the end of the episode we talked a bit about strategies to improve the speed and performance of your models based on a blog post I read a few weeks ago (see the other podcasts and blog posts section below). John’s advice? Put your data into a PivotTable to build your model versus using formulas to summarize everything.

I’ve never tried this myself, but you could build an entire P&L from a PivotTable and in the cases where you can’t do it in the PivotTable directly, you can use the GETPIVOTDATA formula to pull the data out of the PivotTable you need for analysis.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #60: Going from a corporate accountant to building an Excel training academy with John Michaloudis appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-60-going-from-a-corporate-accountant-to-building-an-excel-training-academy-with-john-michaloudis/feed/ 0 It's a story we've all heard before. You're working a full-time job, and you have more fun doing your side hustle than your 9 to 5. This is what happened to John Michaloudis. He was a financial controller at General Electric but found his passion shari... It's a story we've all heard before. You're working a full-time job, and you have more fun doing your side hustle than your 9 to 5. This is what happened to John Michaloudis. He was a financial controller at General Electric but found his passion sharing Excel tips and tricks on an internal GE newsletter which his colleagues ate up. John decided to become an entrepreneur and built an Excel training company from the ground up. We chatted about how he got started, his favorite marketing tactics, and of course, why he loves Excel.







10,000 followers on an internal company blog



At General Electric, there was an internal blog called Colab where employees could could write and publish articles only for GE employees to see. As a financial controller, John became well-versed in Excel and decided to contribute to the internal blog. He started posting Excel tips, and eventually he had a weekly column just devoted to being better at Excel.



GE's Colab



John quickly amassed more than 10,000 subscribers to his column as he saw how hungry people were for Excel knowledge. But it was only his side gig at GE.



I liked doing the blog more than my actual job. I felt the subscribers valued me more than my boss valued me.



After getting this positive feedback from his colleagues around the world, he wanted to find a way to take his Excel column to the next level. For the next 12 months, he went off and created a course all about PivotTables. He asked his boss at the time if he could sell his course to the 10K subscribers to his column, but of course compliance told him no. He decided to leave GE, and as a last salvo sent out a message to his followers about a webinar he was going to host about his PivotTables course.



Creating a library of Excel content



Based on the feedback he got from his subscribers to his weekly Excel column, John was able to find a few topics to build additional Excel classes about. A perennial favorite of mine, keyboard shortcuts were high on the list. Creating charts was also a big topics since most of his now students work at companies, and presenting data in a compelling way is important.



John is constantly learning new Excel features but ultimately the content he produces is determined by what his students, the customers, want to learn. He periodically sends his students a survey and asks them what they want to learn about. These topics are what you'll see on MyExcelOnline, John's Excel training company.



Taking the leap to become an entrepreneur



While the idea of going off on your own and being your boss is a romantic one, for many the decision is a matter of dollars and cents. John was (and currently still is) working in Spain, and started earning a few grand from offering his courses on Udemy. He realized this was enough for him and his family to live on, and went full-time on his training company in January 2015.







His advice for aspiring entrepreneurs is don't just leave with nothing. Create a product and test it out. Use cheap methods like Adwords to validate your idea 4-Hour Workweek style.



On Udemy, John's PivotTable course originally would earn him about $2K/month but this went up to $7K/month. The problem was he was also selling his course on his website for $290. If his customers who bought the course fro his website found out they could get it cheaper on Udemy,]]>
Dear Analyst 60 35:54 50657