Dear Analyst https://www.thekeycuts.com/category/podcast/ A show made for analysts: data, data analysis, and software. Mon, 19 Jul 2021 02:15:06 +0000 en-US hourly 1 https://wordpress.org/?v=5.8 This is a podcast made by a lifelong analyst. I cover topics including Excel, data analysis, and tools for sharing data. In addition to data analysis topics, I may also cover topics related to software engineering and building applications. I also do a roundup of my favorite podcasts and episodes. KeyCuts clean episodic KeyCuts info@thekeycuts.com info@thekeycuts.com (KeyCuts) A show made for analysts: data, data analysis, and software. Dear Analyst https://www.thekeycuts.com/wp-content/uploads/2019/03/dear_analyst_logo-1.png https://www.thekeycuts.com/excel-blog/ TV-G New York, NY New York, NY 50542147 Dear Analyst #74: Quick hack to count the number of words in a cell with LEN and SUBSTITUTE https://www.thekeycuts.com/dear-analyst-74-quick-hack-to-count-the-number-of-words-in-a-cell-with-len-and-substitute/ https://www.thekeycuts.com/dear-analyst-74-quick-hack-to-count-the-number-of-words-in-a-cell-with-len-and-substitute/#respond Mon, 19 Jul 2021 04:06:00 +0000 https://www.thekeycuts.com/?p=51005 While this little Excel/Google Sheets trick is a pretty straightforward hack, it led me to think about how we use our tools, how we stretch the capabilities of our tools, and think outside the box to come to arrive at a solution. Seems like a lot for a formula trick on counting the number of […]

The post Dear Analyst #74: Quick hack to count the number of words in a cell with LEN and SUBSTITUTE appeared first on .

]]>
While this little Excel/Google Sheets trick is a pretty straightforward hack, it led me to think about how we use our tools, how we stretch the capabilities of our tools, and think outside the box to come to arrive at a solution. Seems like a lot for a formula trick on counting the number of words in a cell to bring to the table. Perhaps I’m looking into it too much. Perhaps it’s just me rambling and opining on something meaningless. Or perhaps, it might cause you to stop, think, and reflect for even a minute about something as trivial as counting words in a cell. Come on this journey with me and learn how a stupid formula trick triggered my synapses to fire in a million directions. Link to the Google Sheet with the formula is here.

Video tutorial of the formula to count the number of words in a cell:

Why would you want to count words in a cell?

It’s a great question. Maybe you need to see how many words are in a paragraph before you submit some online form that only allows you to submit an answer with 150 characters or 50 words or less. I’ve probably had to count the number of words in a cell a handful of times and it was probably for cleaning data purposes (more on this later). What may be more common is counting the number of characters in a cell to detect anomalies. In any event, the formula you use for counting the number of words in a cell is similar to how you might count the number of characters in a cell. If you want to skip straight to the answer (or doing a search on Google and maybe this will be highlighted in yellow):

=len(A2)-len(substitute(A2," ",""))+1

This formula should work in Excel and Google Sheets and A2 contains the cell with the words you are trying to count. Let’s break this down a bit more, because when you break things down that’s where the real learning takes place and it may spur other ideas you can incorporate into your spreadsheets.

Counting and substituting stuff

Composability gives formulas some pretty amazing capabilities. Greater than the sum of its parts kind of thing. One their own, the LEN() and SUBSTITUTE() functions do pretty standard things. The LEN function simply counts the number of characters (including spaces) in a cell:

SUBSTITUTE acts as you might expect. Turn all the “A”s in a cell into “X”s. Turn all the 5s into 9s. The first argument is the cell that contains the data, the second argument is what you are searching for in the value to replace, and the third argument is what everything in the second argument should be replaced with. In the example below, We are looking for all the spaces in cell A5 and replacing them with an empty string. Note the syntax here. A space (what we are looking for) is denoted by two double quotes with a space in between: ” “. An empty string is two double quotes next to each other with no space in between them: “”. The result, in this case, are sentences with no spaces in between them:

Composability is where the magic happens

What happens when you combine the two formulas together? You may be an Excel or Google Sheets guru and have built advanced nested formulas before. When I step back and see how these formulas–when combined–create interesting results that you wouldn’t have expected. People say we’re just number crunchers and just know when and how to use formulas correctly.

I’d say the composability of formulas is what inspires creativity and makes building a model a creative endeavor. When you deconstruct someone else’s deeply nested formula, you learn something new about the formulas and a use case for the formulas you wouldn’t have otherwise figured out on your own (unless you’re Googling stuff, of course). Interestingly, the top result on Google right now for “composability” is an article on DeFi (decentralized finance). This is ahead of links to the dictionary definition, Wikipedia, and HP Enterprise. I only bring this up because I’m a believer and user of various DeFi platforms and I dig into this stuff in the 2nd half of episode 72.

Ok back to the magic. We have this random text with no spaces in it. Let’s put this formula inside the LEN function and see what happens. We get the length of the sentence or paragraph with no spaces in it:

Why is this important? Well, if we have the length of the sentence with spaces, and the length of the sentence without spaces, we can subtract the latter from the former and get pretty close to the actual count of words in the sentence:

If you look at the text in cell A2, there are 11 words in “It’s easy to forget that as recently as six days ago.” There are 10 spaces in between those words. That’s the result of the formula in column D. It’s one less than what we need, so we just add a 1 to the end of the formula and we’re done:

Are we counting spaces or counting words?

The answer is: does it really matter? Just because the goal is to count words, does that mean mean you can’t count spaces to get to the solution?

During WWII, the Germans had a bomber plane called the Dornier 17. Most of these bomber planes have been destroyed during the war, but divers found one at the bottom of the North Sea off the UK coast in 2008. A team from Imperial College London was tasked to salvage the plane from further corrosion so that the plane could be showcased at a museum. The team tried all sorts of solutions and methods, but found out that spraying the plane with lemon water would not only prevent the corrosion, but would also clean the plane as well. This was a long-winded example to show that there are many ways to get to a solution :).

Within the first 5 seconds of being told you need to count the number of words in a cell, did you think about what Excel or Google Sheets function would actually count the words in a sentence? Maybe you thought of a creative use of the COUNTA() function or maybe there’s some sort of loop you can push all the words from the cell into and set some sort of counter every time the loop found a word.

Like many other hacks in Google Sheets and Excel, the solution involves detecting a pattern in the data and then figuring out which functions allow you to manipulate that pattern for your use case. The first step is just thinking that counting spaces may be a more reliable solution than counting the actual words. The next step is knowing that you can count spaces and characters with functions we have at our disposal in Google Sheets.

The best tool for the JTBD (job to be done)

Going back to the original question: why would you need to count the number of words in a cell? Let’s assume you have a business reason for counting the words in the cell because you need to take that number and use it somewhere else in your spreadsheet. From a tool perspective, I’d argue that Google Sheets is not the right tool for counting the number of words in your sentence or paragraph.

The formula itself seems a bit inefficient too right? You have to find the length of a paragraph by its characters, do this whole substitution thing, and then count that long string of characters again. Isn’t this what tools like Word and various online word counters are built to do? Even as I’m typing this post in WordPress, I can see the number of words as new words are typed.

Even if Google Sheets and Excel are not the right tools for counting the words in your paragraph(s), I think it shows how much we are willing to stretch our tools to find a solution to our problem. For diehard spreadsheet users, coming up with a formula for unconventional use cases is what makes using spreadsheets “fun” and creative like I stated earlier. There’s joy in knowing you’re exploring the frontiers of your tools and have discovered new treasures and lands that others may not have found.

Other word count algorithms

If Microsoft Word or online word counters are indeed the “right” tools for counting the words in your paragraph, how do they count their words? Maybe they also count spaces? They must use some advanced algorithm that is 10X better than this LEN and SUBSTITUTE solution.

The answer is I don’t really know. After some quick research, it looks like the algorithm in Microsoft Word also counts spaces, so perhaps its algorithm is not too far from our formula! Word counting is actually a pretty challenging problem because handling special characters and other edge cases can throw off the count. Microsoft even shows inconsistent word counts depending on what parts of the UI in Windows and Word you are using.

Some people have built their own algorithms (like this one in C#) that mimic the word count algorithm in Word. This page actually walks through the algorithm for counting words step-by-step. I liked this pseudo code the author wrote in Java to describe the algorithm:

public class Word count {
  static char[] separators = {' ', '.', ','};
  static boolean state = true;
  static int countWords(String str) {
  boolean state = true;
  int word count = 0;

  for (int i = 0; i < str.length(); i++) {
    if (SeparatorArrayContains(str.charAt(i)) || str.charAt(i) == '\n' || str.charAt(i) == '\t') {
      state = true;
    }
    else if (state == true) {
      state = false;
      word count ++;
    }
  }
  return word count ;
}

static boolean SeparatorArrayContains(char c) {
  boolean found = false;
  for (int k = 0; k < separators.length; k++) {
    if (separators[k] == c) {
      found = true;
    }
  }
  return found;
}

At a high-level, the algorithm is also looking at the separators and delimiters in your text to count the number of words. This means this solution to count the number of spaces to get the number of words is most likely similar to how other applications count words too! A simple one line nested formula with LEN and SUBSTITUTE can have the same power as this pseudo Java code is pretty darn cool.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #74: Quick hack to count the number of words in a cell with LEN and SUBSTITUTE appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-74-quick-hack-to-count-the-number-of-words-in-a-cell-with-len-and-substitute/feed/ 0 While this little Excel/Google Sheets trick is a pretty straightforward hack, it led me to think about how we use our tools, how we stretch the capabilities of our tools, and think outside the box to come to arrive at a solution. While this little Excel/Google Sheets trick is a pretty straightforward hack, it led me to think about how we use our tools, how we stretch the capabilities of our tools, and think outside the box to come to arrive at a solution. Seems like a lot for a formula trick on counting the number of words in a cell to bring to the table. Perhaps I'm looking into it too much. Perhaps it's just me rambling and opining on something meaningless. Or perhaps, it might cause you to stop, think, and reflect for even a minute about something as trivial as counting words in a cell. Come on this journey with me and learn how a stupid formula trick triggered my synapses to fire in a million directions. Link to the Google Sheet with the formula is here.







Video tutorial of the formula to count the number of words in a cell:




https://youtu.be/JRp5BSr5kuI




Why would you want to count words in a cell?



It's a great question. Maybe you need to see how many words are in a paragraph before you submit some online form that only allows you to submit an answer with 150 characters or 50 words or less. I've probably had to count the number of words in a cell a handful of times and it was probably for cleaning data purposes (more on this later). What may be more common is counting the number of characters in a cell to detect anomalies. In any event, the formula you use for counting the number of words in a cell is similar to how you might count the number of characters in a cell. If you want to skip straight to the answer (or doing a search on Google and maybe this will be highlighted in yellow):



=len(A2)-len(substitute(A2," ",""))+1



This formula should work in Excel and Google Sheets and A2 contains the cell with the words you are trying to count. Let's break this down a bit more, because when you break things down that's where the real learning takes place and it may spur other ideas you can incorporate into your spreadsheets.



Counting and substituting stuff



Composability gives formulas some pretty amazing capabilities. Greater than the sum of its parts kind of thing. One their own, the LEN() and SUBSTITUTE() functions do pretty standard things. The LEN function simply counts the number of characters (including spaces) in a cell:







SUBSTITUTE acts as you might expect. Turn all the "A"s in a cell into "X"s. Turn all the 5s into 9s. The first argument is the cell that contains the data, the second argument is what you are searching for in the value to replace, and the third argument is what everything in the second argument should be replaced with. In the example below, We are looking for all the spaces in cell A5 and replacing them with an empty string. Note the syntax here. A space (what we are looking for) is denoted by two double quotes with a space in between: " ". An empty string is two double quotes next to each other with no space in between them: "". The result, in this case, are sentences with no spaces in between them:







Composability is where the magic happens



What happens when you combine the two formulas together? You may be an Excel or Google Sheets guru and have built advanced nested formulas before. When I step back and see how these formulas--when combined--create interesting results that you wouldn't have expected. People say we're just number crunchers and just know when and how to use formulas correctly.







I'd say the composability of formulas is what inspires creativity and makes bui...]]>
Dear Analyst 74 33:17 51005
Dear Analyst #73: From a career in the U.S. Navy to data analytics YouTuber with Luke Barousse https://www.thekeycuts.com/dear-analyst-73-from-a-career-in-the-u-s-navy-to-data-analytics-youtuber-with-luke-barousse/ https://www.thekeycuts.com/dear-analyst-73-from-a-career-in-the-u-s-navy-to-data-analytics-youtuber-with-luke-barousse/#respond Tue, 22 Jun 2021 04:42:00 +0000 https://www.thekeycuts.com/?p=50985 The path to a career in data analytics can be full of twists and turns. Along the way, you pick up tools like Excel, Python, Tableau, and R. What about learning how to use YouTube and growing an audience of 50,000+ from publishing videos about data analytics? I’m always fascinated by people who are able […]

The post Dear Analyst #73: From a career in the U.S. Navy to data analytics YouTuber with Luke Barousse appeared first on .

]]>
The path to a career in data analytics can be full of twists and turns. Along the way, you pick up tools like Excel, Python, Tableau, and R. What about learning how to use YouTube and growing an audience of 50,000+ from publishing videos about data analytics? I’m always fascinated by people who are able to combine the technical aspects of being a data analyst with other careers like science, art, and even wastewater treatment. Luke Barousse is a data analyst and YouTuber and we chatted about how he learned Excel, built a portfolio of his data work, and becoming a YouTuber.

From the U.S. Navy to Excel

Prior to joining the U.S. Navy, Luke took a C++ course as an undergrad and got a taste of coding. After joining the Navy and working in various roles, he didn’t get a chance to utilize some of the coding skills he learned at university. Eventually he went to get his MBA and took an Excel course taught by Professor Elliot Bendoly (who also wrote the book Excel Basics to Blackbelt). Luke started seeing the potential of Excel as he dug into VBA and some of the coding capabilities in Excel.

In terms of content creation, Luke’s day-to-day experience as a data analyst influences the videos he creates. The main reason he started getting into creating videos was because his colleagues wanted him to show them how to do things in Excel and other tools. Instead of teaching each of his colleagues one by one, he created videos to avoid the repetitive nature of teaching in person.

I realized that content creation is a way to automate teaching.

Using social media to share a Google Sheets template

For the class’ capstone project, Luke built a meal prep Excel file. Even after the class was over, he took the meal prep Excel file tried to turn it into an application. He transferred it from Excel to Google Sheets and it started to gain some traction. He used Instagram and created content around meal prep and drew more attention to the Google Sheet template. Eventually he started using Python and Django to try and create an application since people were always messing up the formulas on the Google Sheet.

I find it interesting that Google Sheets template become the lowest common denominator when you need to create and share a simple tool, and are ok with it being rough around the edges. It may be too costly (and frankly overkill) to create a custom application with code with a Google Sheets template will suffice.

Realizing Excel is not the solution

Luke realized the Excel file Luke he created for his meal prep use case was going to be an issue because Excel is not the best medium to distribute his template (hence the move to Google Sheets). As Luke started using Excel more at work, he found his Excel files were constantly hitting row limits. His team worked with different suppliers and new data would get ingested every week into the Excel file. He also was building formula on top of formula and the result was an Excel file that took 2-3 minutes to load and was buggy with the cobbled together formulas. Luke details all this in this video below:

Even after he left that group, his old teammates still asked him to update the Excel file since they didn’t know how. In situations like this, you and your team need to make a decision on whether to go with Excel that is “good enough” for the job or start from scratch with better tools for the job. Luke wanted to explore putting the data into SQL but moved to a different team before he could tackle that project.

Differentiating yourself when applying to data analyst jobs

Luke discusses some of his strategies for landing a job as a data analyst (he has some YouTube videos about this too). He talked about the job hunting process being a very humbling experience because you are being rejected left and right. From his business school days, however, he knew that you have to come up with ways to stand out from your competition.

To differentiate himself, Luke created an online portfolio of his data projects to showcase his creativity and skills. He published a project where he did some analysis on the script from The Office using Python. As a huge fan of The Office, I think there are so many ways you could analyze the script and come up with a creative analysis. How many times does Michael say “That’s what she said!”? Were they said in a sentence with positive or negative sentiment? Which characters on the show was Michael saying this to the most? The key takeaway is that some recruiter out there may also be a fan of The Office and they come across your project. Having this public portfolio gives the recruiter a chance to see that you have the actual experience of analyzing a data set and creating data visualizations.

Google data analytics certificate

Luke’s most popular video is about the new Google Data Analytics Professional Certificate in partnership with Coursera. Some believe this certificate could disrupt the college education system since the class gives students and professionals the actual skills they need to do data analytics in the real world. Since publishing the video, Luke’s views about the certificate have changed a little bit.

Luke said he was interviewing a civil engineer, and the candidate wanted to get into engineering analytics. This certificate gives the candidate an opportunity to learn about data analytics and combine it with his education in engineering. This is another way for college students and professionals who are a looking for a career change to differentiate themselves. The certificate by itself won’t guarantee you a job as a data analyst but it will broaden your mind to new skills you may not have picked up from previous roles or in school.

More importantly, the certificate will help you figure out if you even like working in data analytics. It fees like everyone wants to be a software engineer today because the salaries and benefits are great, but do you really enjoy the actual day-to-day responsibilities of a software engineer? Same could be said about data analytics, in my opinion. If you are able to combine your interests from a different field with data analytics, you will have a really unique skillset to showcase to potential employers and recruiters.

Being productive while working at home

Luke tries to get on his mountain bike and do Crossfit once a day. Luke realized he isn’t really that productive in the office and at home, he has “sprints” of 2-3 hours of deep work. If he starts checking his phone or social media, that’s when he knows it’s time for a change of scenery and he’ll go out on the mountain bike or go to his local box.

Instead focusing on the absolute number of hours you spend in the office or are at your desk, Luke believes these 2-3 hour sprints allow him to be more productive because it’s much more focused work. He may be putting in less hours but those hours are much more productive compared to the 8 hours you’re sitting at the office and wasting time. Taking a break and going outside or going to the gym gives your brain time to solve other problems besides the ones you’re given at work. As I’m finishing writing this post, I feel like it’s time to take a break myself and come back later for another block of focused work :).

Best ways to get in touch with Luke are through his YouTube channel (he responds to every comment) and LinkedIn.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

No other podcasts/blog posts mentioned in this episode!

The post Dear Analyst #73: From a career in the U.S. Navy to data analytics YouTuber with Luke Barousse appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-73-from-a-career-in-the-u-s-navy-to-data-analytics-youtuber-with-luke-barousse/feed/ 0 The path to a career in data analytics can be full of twists and turns. Along the way, you pick up tools like Excel, Python, Tableau, and R. What about learning how to use YouTube and growing an audience of 50, The path to a career in data analytics can be full of twists and turns. Along the way, you pick up tools like Excel, Python, Tableau, and R. What about learning how to use YouTube and growing an audience of 50,000+ from publishing videos about data analytics? I'm always fascinated by people who are able to combine the technical aspects of being a data analyst with other careers like science, art, and even wastewater treatment. Luke Barousse is a data analyst and YouTuber and we chatted about how he learned Excel, built a portfolio of his data work, and becoming a YouTuber.







From the U.S. Navy to Excel



Prior to joining the U.S. Navy, Luke took a C++ course as an undergrad and got a taste of coding. After joining the Navy and working in various roles, he didn't get a chance to utilize some of the coding skills he learned at university. Eventually he went to get his MBA and took an Excel course taught by Professor Elliot Bendoly (who also wrote the book Excel Basics to Blackbelt). Luke started seeing the potential of Excel as he dug into VBA and some of the coding capabilities in Excel.







In terms of content creation, Luke's day-to-day experience as a data analyst influences the videos he creates. The main reason he started getting into creating videos was because his colleagues wanted him to show them how to do things in Excel and other tools. Instead of teaching each of his colleagues one by one, he created videos to avoid the repetitive nature of teaching in person.



I realized that content creation is a way to automate teaching.



Using social media to share a Google Sheets template



For the class' capstone project, Luke built a meal prep Excel file. Even after the class was over, he took the meal prep Excel file tried to turn it into an application. He transferred it from Excel to Google Sheets and it started to gain some traction. He used Instagram and created content around meal prep and drew more attention to the Google Sheet template. Eventually he started using Python and Django to try and create an application since people were always messing up the formulas on the Google Sheet.



I find it interesting that Google Sheets template become the lowest common denominator when you need to create and share a simple tool, and are ok with it being rough around the edges. It may be too costly (and frankly overkill) to create a custom application with code with a Google Sheets template will suffice.



Realizing Excel is not the solution



Luke realized the Excel file Luke he created for his meal prep use case was going to be an issue because Excel is not the best medium to distribute his template (hence the move to Google Sheets). As Luke started using Excel more at work, he found his Excel files were constantly hitting row limits. His team worked with different suppliers and new data would get ingested every week into the Excel file. He also was building formula on top of formula and the result was an Excel file that took 2-3 minutes to load and was buggy with the cobbled together formulas. Luke details all this in this video below:




https://www.youtube.com/watch?v=3TBwY4VjLX8




Even after he left that group, his old teammates still asked him to update the Excel file since they didn't know how. In situations like this,]]>
Dear Analyst 73 45:46 50985
Dear Analyst #72: A simple trick to be faster in Excel on the Mac (like you are on the PC) https://www.thekeycuts.com/dear-analyst-72-a-simple-trick-to-be-faster-in-excel-on-the-mac-like-you-are-on-the-pc/ https://www.thekeycuts.com/dear-analyst-72-a-simple-trick-to-be-faster-in-excel-on-the-mac-like-you-are-on-the-pc/#comments Mon, 14 Jun 2021 04:37:00 +0000 https://www.thekeycuts.com/?p=50961 The impetus for this episode is a new Google Sheet (and Excel) tip I just shared on Instagram and TikTok (I never thought I’d join these platforms to start posting tips but alas, this is how people learn these days). After I learned how to be productive the PC version of Excel, I opened my […]

The post Dear Analyst #72: A simple trick to be faster in Excel on the Mac (like you are on the PC) appeared first on .

]]>
The impetus for this episode is a new Google Sheet (and Excel) tip I just shared on Instagram and TikTok (I never thought I’d join these platforms to start posting tips but alas, this is how people learn these days). After I learned how to be productive the PC version of Excel, I opened my Macbook and realized all my favorite shortcuts didn’t carry over. Back in the day (whatever that means), you had a ThinkPad at work where you did your “serious work” and your personal Mac was for doing the “personal stuff.” I found the PC equivalent shortcuts for the Mac and was able to be dangerous again in Excel and Google Sheets on the Mac. But there was one group of shortcuts I couldn’t quite duplicate until I changed one little setting on my Mac.

Function keys on the Macbook

By default, the function keys on your Macbook do things on your Mac OS like increase brightness (F2), see/search Mac apps (F4), or decrease volume (F11).

If you’re coming from using Excel on the PC, you know that these function keys are coveted tools in being faster in Excel (and Google Sheets). The most useless key is probably F1 because it brings up the help menu, and you might hit it by accident when you’re debugging a formula and alternating between pressing F2 and ESC. Excel users will go as far as popping out the F1 key from their keyboard so they don’t accidentally hit it and having to wait a few seconds for the help menu to open only to close it right away:

Change the default behavior of the functions keys for Mac Excel

To “unlock” the power of the function keys on your Mac so that they do what you expect them to do (like on a PC), follow these steps (this is for MacOS Big Sure v11.4):

1. Click the Apple icon in top-left and open system preferences

2. Click on “Keyboard”

3. Click on the “Use F1, F2, etc.” checkbox

With this simple setting checked off, you can now use the function keys like you’re used to using them on the PC. The downside is (of course everything has a tradeoff) is if you want to increase the brightness on your Mac, you need to press the FN key PLUS the F2 key. With these function keys enabled, some of these common operations become available to you (using Google Sheets as an example):

Enter “edit” mode in a cell formula (F2)

By tapping F2 you can go into the cell and start editing the formula. Once you’re in the formula you can use the arrow keys to move around and when you’re done editing the formula, just hit ESC. In the gif below, I’m just alternating between hitting F2 and ESC to get into the formulas in B2 and C4:

Alternate between locking and unlocking cell references (F4)

One of my favorite shortcuts when working on a big hairy formula is locking and unlocking the cell reference without having to manually type in the dollar sign in front of the column or row. The manual way of doing this is using your left and right arrows to move to the different cell references and typing in the dollar sign (or deleting them one by one):

With the F4 key, you can move your cursor to the cell reference and cycle between locking the row, the column, both the row & column, or nothing at all and keeping the reference as relative:

I tried to show the power of the F4 key in a pithy way in my first Instagram post. Let me know if this is interesting…or not:

Go to special cells (F5)

This shortcut only applies to Excel, but it allows you to bring up the “Go to” menu. This is helpful if you want to quickly go to a named reference or an Excel table. I like to use it for formatting purposes by going to blank cells and formatting them a certain way. This beats having to manually select each empty cell and pressing CTRL (PC) or COMMAND (Mac) as you click the empty cells:

Hopefully this one little trick for “unlocking” the function keys on your Mac can save you some time from clicking around in Excel. More importantly, I hope it shows that Excel or Google Sheets on the Mac are just as powerful as the PC and you should feel confident you can get your real work done on your Mac.

2021 Excel Tables class on Skillshare

This week I launched my 2nd advanced Excel course on Excel tables called Mastering Excel Tables: How to Make and Use Them Like a Pro. As I mention in the intro video, Excel tables are a relatively under-utilized feature in Excel but can really speed up your dashboard creation, make your formulas less error prone, and make it easier for your colleagues to understand how you built your model. In less then 40 minutes, you can learn all the essentials of Excel tables as well as some advanced features for building a robust data-capturing system.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #72: A simple trick to be faster in Excel on the Mac (like you are on the PC) appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-72-a-simple-trick-to-be-faster-in-excel-on-the-mac-like-you-are-on-the-pc/feed/ 1 The impetus for this episode is a new Google Sheet (and Excel) tip I just shared on Instagram and TikTok (I never thought I'd join these platforms to start posting tips but alas, this is how people learn these days). The impetus for this episode is a new Google Sheet (and Excel) tip I just shared on Instagram and TikTok (I never thought I'd join these platforms to start posting tips but alas, this is how people learn these days). After I learned how to be productive the PC version of Excel, I opened my Macbook and realized all my favorite shortcuts didn't carry over. Back in the day (whatever that means), you had a ThinkPad at work where you did your "serious work" and your personal Mac was for doing the "personal stuff." I found the PC equivalent shortcuts for the Mac and was able to be dangerous again in Excel and Google Sheets on the Mac. But there was one group of shortcuts I couldn't quite duplicate until I changed one little setting on my Mac.







Function keys on the Macbook



By default, the function keys on your Macbook do things on your Mac OS like increase brightness (F2), see/search Mac apps (F4), or decrease volume (F11).







If you're coming from using Excel on the PC, you know that these function keys are coveted tools in being faster in Excel (and Google Sheets). The most useless key is probably F1 because it brings up the help menu, and you might hit it by accident when you're debugging a formula and alternating between pressing F2 and ESC. Excel users will go as far as popping out the F1 key from their keyboard so they don't accidentally hit it and having to wait a few seconds for the help menu to open only to close it right away:







Change the default behavior of the functions keys for Mac Excel



To "unlock" the power of the function keys on your Mac so that they do what you expect them to do (like on a PC), follow these steps (this is for MacOS Big Sure v11.4):



1. Click the Apple icon in top-left and open system preferences







2. Click on "Keyboard"







3. Click on the "Use F1, F2, etc." checkbox







With this simple setting checked off, you can now use the function keys like you're used to using them on the PC. The downside is (of course everything has a tradeoff) is if you want to increase the brightness on your Mac, you need to press the FN key PLUS the F2 key. With these function keys enabled, some of these common operations become available to you (using Google Sheets as an example):



Enter "edit" mode in a cell formula (F2)



By tapping F2 you can go into the cell and start editing the formula. Once you're in the formula you can use the arrow keys to move around and when you're done editing the formula, just hit ESC. In the gif below, I'm just alternating between hitting F2 and ESC to get into the formulas in B2 and C4:







Alternate between locking and unlocking cell references (F4)



One of my favorite shortcuts when working on a big hairy formula is locking and unlocking the cell reference without having to manually type in the dollar sign in front of the column or row. The manual way of doing this is using your left and right arrows to move to the different cell references and typing in the dollar sign (or deleting them one by one):







With the F4 key, you can move your cursor to the cell reference and cycle between locking the row, the column, both the row & column, or nothing at all and keeping the reference as relative:







I tried to show the power of the F4 key in a pithy way in my first...]]>
Dear Analyst 72 25:31 50961
Dear Analyst #71: Benn Stancil, Co-Founder and Chief Analytics Officer at Mode Analytics on all things analytics https://www.thekeycuts.com/dear-analyst-71-benn-stancil-co-founder-and-chief-analytics-officer-at-mode-analytics-on-all-things-analytics/ https://www.thekeycuts.com/dear-analyst-71-benn-stancil-co-founder-and-chief-analytics-officer-at-mode-analytics-on-all-things-analytics/#respond Mon, 24 May 2021 10:33:45 +0000 https://www.thekeycuts.com/?p=50927 Learning the tips and tricks for doing data analysis in Excel is great and all, but stepping back to see the bigger picture leads to better questions (and answers) you can ask as an analyst. Benn Stancil is one of the founders at Mode Analytics, a data visualization platform you may have used before. Benn […]

The post Dear Analyst #71: Benn Stancil, Co-Founder and Chief Analytics Officer at Mode Analytics on all things analytics appeared first on .

]]>
Learning the tips and tricks for doing data analysis in Excel is great and all, but stepping back to see the bigger picture leads to better questions (and answers) you can ask as an analyst. Benn Stancil is one of the founders at Mode Analytics, a data visualization platform you may have used before. Benn has done a ton of typical “analyst” work (check out his newsletter and previous Medium blog) but in this episode we talk about building Mode in the early days, asking good questions, and of course a little bit about Excel and the tools he uses.

Life before and founding story of Mode Analytics

Benn was a math and economics major in college. He worked in DC for a few years for a think tank doing economic policy research. It was an interesting time to do this type of analysis because this was right around the start of the financial crisis in 2008. Frustrated with his research not being able to make an impact given the glacial pace government adopts change, Benn ended up finding a job at Yammer (a social network for professionals, sound familiar?). Yammer was acquired in 2012 by Microsoft and Benn and some folks from the analytics team at Yammer left Microsoft to start Mode.

I would take a look at this blog post from Benn to get his reflections on the early days at Mode, but here is a quick summary. The founding team at Mode includes Derek Steer (CEO), Josh Ferguson (CTO), and Benn (the analytics guy). Before there was really a product, Derek was off talking to investors and Josh is talking to the engineers. Benn’s expertise is in doing data-related stuff, but the problem is there wasn’t a lot of data to analyze or explore and not many customers to get data from.

In the early days, Benn wrote blog posts that were generally about data but not about data products. He was basically doing content marketing for a small nice of data professionals. Today this is called data journalism and Benn was writing data stories related to sports (before 538 became ta thing). Once Mode had more customers, Benn’s role changed often as he did tours of duties through marketing, customer support, sales, and a variety of other roles.

You have a “job title” and a “role” and those end up being two are very different things.

We talked a bit about a blog post Benn wrote in 2015 that had some traction in the data community on Facebook’s “magic metric” of getting 7 friends in 10 days. The key takeaway is that Benn was doing the analysis in Excel, R, and a little python for web scraping. Benn had taught himself how to use some of these tools before. Now that he had a goal to work towards (creating these blog posts), this was the extra push for him to get over the hump to learn these tools more in-depth.

Source: Mode blog

Key skills for doing exploratory data analysis

I liked Benn’s answer to this open-ended question because it doesn’t involve mastering X tool or taking advanced statistics classes:

First and foremost you have to be curious. Be relentless in knowing there’s a better answer out there.

Most analysts get a dataset and just look at the data to start generating questions as they do the analysis. Benn suggests going the other way. Generate questions you have about the dataset before you get into the analysis. This leads to answers that are expected or unexpected. Regardless, this strategy will leda you to keep asking questions.

The most interesting questions are the ones that you don’t start with.

I’d recommend checking out some of Benn’s older posts where he documents some of his exploratory analyses like this one about the price of weed.

Learning tools other than Excel like SQL and Python

During Benn’s stint in economic policy research, he was primarily using Excel to do analysis. While it’s the first tool analysts reach for, Benn said that you can still be a good analyst even if you don’t use Excel. Some of the best analysts he knows aren’t using Excel every day but are asking good questions about the data.

Having said that, learning other tools outside of Excel also opens the possibilities for you to do your analysis faster and opens the door to more interesting questions. Using Excel for all your data analysis needs just means you might have to do more manual work than what’s necessary.

One caution Benn brought up is that there may be a danger in learning the advanced features of different data tools. It’s easy to go down the rabbit hole and get addicted to these advanced features. What ends up happening is that you try to do everything in that tool when another tool would’ve been better for that use case.

Benn references a time when I was learning D3 for data visualization. Before he knew it, he was trying to use D3 for all his data visualization needs since he knew the platform so well, but a simple chart in Google Sheets might have sufficed.

How tools might be influencing how you think about data

I’ve noticed that the proliferation of online SaaS tools can influence how you think about work. Tools you use for communication, design, marketing, and data analysis have built-in “opinions” about how the companies behind those tools view the world. By buying into the tool, you are implicitly buying into their ways of working and being productive.

I asked Benn on his thoughts about this topic as it relates to data tools. He brought up an interesting example with a tool that’s been around for some time: Tableau.

Tableau is kind of like a giant PivotTable. By using Tableau, you get nudged into thinking about your data a certain way. If you want to do a time series analysis, you could bend Tableau to make it do the analysis you want. But it wasn’t made to do time series analysis easily, and there might be a better tool for the job (e.g. Excel or Google Sheets). Same can be said about using SQL for statistical analysis. SQL makes you think about structuring your data tables a certain way for easy querying, but it’s nothing compared to R for statistical analysis.

Let the questions shape the tool.

The future of Mode

I would take a listen near the end of the episode to hear Benn’s take on what the future of Mode looks like, but the quick takeaways are:

  • Finding ways to let analysts answer questions quickly
  • Extending Mode into other departments beyond the data group so that the sales team can start asking questions about their data (side note: we’re seeing this with a lot of other online tools that are moving from “single-player” to “multi-player” like Figma for design teams)
  • Mode may be competitive with other BI tools like Looker and Tableau, but can be overlapping with these tools a well in terms of consuming data

Other Podcasts & Blog Posts

No other podcasts or blog posts!

The post Dear Analyst #71: Benn Stancil, Co-Founder and Chief Analytics Officer at Mode Analytics on all things analytics appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-71-benn-stancil-co-founder-and-chief-analytics-officer-at-mode-analytics-on-all-things-analytics/feed/ 0 Learning the tips and tricks for doing data analysis in Excel is great and all, but stepping back to see the bigger picture leads to better questions (and answers) you can ask as an analyst. Benn Stancil is one of the founders at Mode Analytics, Learning the tips and tricks for doing data analysis in Excel is great and all, but stepping back to see the bigger picture leads to better questions (and answers) you can ask as an analyst. Benn Stancil is one of the founders at Mode Analytics, a data visualization platform you may have used before. Benn has done a ton of typical "analyst" work (check out his newsletter and previous Medium blog) but in this episode we talk about building Mode in the early days, asking good questions, and of course a little bit about Excel and the tools he uses.







Life before and founding story of Mode Analytics



Benn was a math and economics major in college. He worked in DC for a few years for a think tank doing economic policy research. It was an interesting time to do this type of analysis because this was right around the start of the financial crisis in 2008. Frustrated with his research not being able to make an impact given the glacial pace government adopts change, Benn ended up finding a job at Yammer (a social network for professionals, sound familiar?). Yammer was acquired in 2012 by Microsoft and Benn and some folks from the analytics team at Yammer left Microsoft to start Mode.







I would take a look at this blog post from Benn to get his reflections on the early days at Mode, but here is a quick summary. The founding team at Mode includes Derek Steer (CEO), Josh Ferguson (CTO), and Benn (the analytics guy). Before there was really a product, Derek was off talking to investors and Josh is talking to the engineers. Benn's expertise is in doing data-related stuff, but the problem is there wasn't a lot of data to analyze or explore and not many customers to get data from.



In the early days, Benn wrote blog posts that were generally about data but not about data products. He was basically doing content marketing for a small nice of data professionals. Today this is called data journalism and Benn was writing data stories related to sports (before 538 became ta thing). Once Mode had more customers, Benn's role changed often as he did tours of duties through marketing, customer support, sales, and a variety of other roles.



You have a "job title" and a "role" and those end up being two are very different things.



We talked a bit about a blog post Benn wrote in 2015 that had some traction in the data community on Facebook's "magic metric" of getting 7 friends in 10 days. The key takeaway is that Benn was doing the analysis in Excel, R, and a little python for web scraping. Benn had taught himself how to use some of these tools before. Now that he had a goal to work towards (creating these blog posts), this was the extra push for him to get over the hump to learn these tools more in-depth.



Source: Mode blog



Key skills for doing exploratory data analysis



I liked Benn's answer to this open-ended question because it doesn't involve mastering X tool or taking advanced statistics classes:



First and foremost you have to be curious. Be relentless in knowing there's a better answer out there.



Most analysts get a dataset and just look at the data to start generating questions as they do the analysis.]]>
Dear Analyst 71 43:46 50927
Dear Analyst #70: New advanced PivotTable class and a PivotTable calculated field trick for percentages https://www.thekeycuts.com/dear-analyst-70-new-advanced-pivottable-class-and-a-pivottable-calculated-field-trick-for-percentages/ https://www.thekeycuts.com/dear-analyst-70-new-advanced-pivottable-class-and-a-pivottable-calculated-field-trick-for-percentages/#respond Tue, 18 May 2021 04:39:00 +0000 https://www.thekeycuts.com/?p=50905 I’ve been planning a few advanced Excel classes with Skillshare and excited to launch my first one today called Advanced PivotTable Techniques for Analyzing and Presenting Data Faster. I use PivotTables on and off depending on the task at hand. In preparation for this class, I had the opportunity to research and learn some advanced […]

The post Dear Analyst #70: New advanced PivotTable class and a PivotTable calculated field trick for percentages appeared first on .

]]>
I’ve been planning a few advanced Excel classes with Skillshare and excited to launch my first one today called Advanced PivotTable Techniques for Analyzing and Presenting Data Faster. I use PivotTables on and off depending on the task at hand. In preparation for this class, I had the opportunity to research and learn some advanced techniques that I personally didn’t even know. I then pulled out the skills I think analysts would need the most (80/20 baby!) to be productive in their jobs and put them into this fast-paced 1-hour advanced PivotTable class. As a small teaser, I go through a calculated field technique for calculating percentages in your PivotTables below. To see some of my beginner Excel classes, take a look here. I’ll be creating more bite-sized content on Instagram as well.

Click below to learn more and sign up for my advanced PivotTable class:

Credit card customer attrition data

This example is actually from the workbook used in the class project of my Advanced PivotTable class. This is the Google Sheet that shows the problem we’re trying to solve. Let’s take a quick look at the data:

It’s a list of credit card customers and some demographic information about them. The most important column to note is the Attrition column because it indicates whether that specific customer churned or attrited (had to look up the past tense of attrition). This type of customer data would be great to summarize and analyze in a PivotTable like so:

You can get some summary stats about your customers, but what about the Attrition? If you throw that column into the PivotTable, you’ll get something like this:

Not very helpful because our Attrition column consists of “Yes” and “No” as values. What I really care about is finding the Attrition % no matter how I set up my PivotTable. You could do something like this where you drag the Attrition column into the columns of the PivotTable. This would get you the Attrition % but it’s a manual calculation and you can’t see the Attrition % by different columns and properties in your PivotTable:

The minute you change up the PivotTable, column E will potentially get overwritten and you’ll have to re-write the Attrition % for the cut of the data you care about. In order to get the Attrition % you may be thinking the calculated field is the way to go. That’s partially right, so let’s explore that option.

Adding a calculated field in Google Sheets for Attrition %

Adding a calculated field to your PivotTable in Google Sheets is similar to Excel. You have to go through the right sidebar instead of the ribbon:

A little known fact about PivotTables in Google Sheets or Excel (something I go over in my Advanced PivotTables class) is that you can add IF() statements to calculated fields. If we try to create a calculated field for Attrition %, however, we don’t have the right data type to create this percentage. Additionally, the columns you put into the calculated field are summed. I tried experimenting with a few variations, but ultimately I couldn’t find a formula to create a calculated field given the data we have:

The issue is that we want to divide the customers who have attrited by the total number of customers based on the current PivotTable fields. Even with the IF() formula, we can’t take the Attrition column by itself to calculate the percentage. This is where we have to do a little augmentation to our dataset. I don’t particularly like this solution because it’s not a sustainable solution (e.g. you have new data coming in every day). But it works for one-off analyses.

Solution: Create a calculated field off columns with numbers in them

Watch the video tutorial of this solution below:

There are two new columns to add to the dataset: COUNT and Attrition Flag. The COUNT column is literally an entire column of 1s and the Attrition Flag is an IF() function that outputs 1 or 0 depending on the “Yes” or “No” in the Attrition column:

The key insight here is to create columns that output a number so that you can use those columns in the calculated field. Back in our PivotTable, you should change your source data to include columns F and G in the raw data worksheet and the calculated field formula looks like this:

I believe that by default, Google Sheets and Excel automatically sums the columns you add in the calculated field formula. Our Attrition Flag is a bunch of 1s and 0s and the COUNT is always 1, so this formula is kind of like:

=SUM('Attrition Flag')/SUM(COUNT)

This means you’ll get the right Attrition % no matter how you create your PivotTable:

One final plug

–> Learn more advanced PivotTable techniques like this in my Advanced PivotTables class on Skillshare <–

Other Podcasts & Blog Posts

No other podcasts/blog posts mentioned in this episode!

The post Dear Analyst #70: New advanced PivotTable class and a PivotTable calculated field trick for percentages appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-70-new-advanced-pivottable-class-and-a-pivottable-calculated-field-trick-for-percentages/feed/ 0 I've been planning a few advanced Excel classes with Skillshare and excited to launch my first one today called Advanced PivotTable Techniques for Analyzing and Presenting Data Faster. I use PivotTables on and off depending on the task at hand. I've been planning a few advanced Excel classes with Skillshare and excited to launch my first one today called Advanced PivotTable Techniques for Analyzing and Presenting Data Faster. I use PivotTables on and off depending on the task at hand. In preparation for this class, I had the opportunity to research and learn some advanced techniques that I personally didn't even know. I then pulled out the skills I think analysts would need the most (80/20 baby!) to be productive in their jobs and put them into this fast-paced 1-hour advanced PivotTable class. As a small teaser, I go through a calculated field technique for calculating percentages in your PivotTables below. To see some of my beginner Excel classes, take a look here. I'll be creating more bite-sized content on Instagram as well.



Click below to learn more and sign up for my advanced PivotTable class:







Credit card customer attrition data



This example is actually from the workbook used in the class project of my Advanced PivotTable class. This is the Google Sheet that shows the problem we're trying to solve. Let's take a quick look at the data:







It's a list of credit card customers and some demographic information about them. The most important column to note is the Attrition column because it indicates whether that specific customer churned or attrited (had to look up the past tense of attrition). This type of customer data would be great to summarize and analyze in a PivotTable like so:







You can get some summary stats about your customers, but what about the Attrition? If you throw that column into the PivotTable, you'll get something like this:







Not very helpful because our Attrition column consists of "Yes" and "No" as values. What I really care about is finding the Attrition % no matter how I set up my PivotTable. You could do something like this where you drag the Attrition column into the columns of the PivotTable. This would get you the Attrition % but it's a manual calculation and you can't see the Attrition % by different columns and properties in your PivotTable:







The minute you change up the PivotTable, column E will potentially get overwritten and you'll have to re-write the Attrition % for the cut of the data you care about. In order to get the Attrition % you may be thinking the calculated field is the way to go. That's partially right, so let's explore that option.



Adding a calculated field in Google Sheets for Attrition %



Adding a calculated field to your PivotTable in Google Sheets is similar to Excel. You have to go through the right sidebar instead of the ribbon:







A little known fact about PivotTables in Google Sheets or Excel (something I go over in my Advanced PivotTables class) is that you can add IF() statements to calculated fields. If we try to create a calculated field for Attrition %, however, we don't have the right data type to create this percentage. Additionally, the columns you put into the calculated field are summed. I tried experimenting with a few variations, but ultimately I couldn't find a formula to create a calculated field given the data we have:







]]>
Dear Analyst 70 16:02 50905
Dear Analyst #69: Import data from another Google Sheet and filter the results to show just what you need https://www.thekeycuts.com/dear-analyst-69-import-data-from-another-google-sheet-and-filter-the-results-to-show-just-what-you-need/ https://www.thekeycuts.com/dear-analyst-69-import-data-from-another-google-sheet-and-filter-the-results-to-show-just-what-you-need/#respond Mon, 10 May 2021 04:24:00 +0000 https://www.thekeycuts.com/?p=50879 You may be filtering and sorting a big dataset in a Google Sheet and want to see that dataset in another Google Sheet without having to copying and pasting the data each time the “source” data is updated. To solve this problem, you need to somehow import the data from the “source” worksheet to your […]

The post Dear Analyst #69: Import data from another Google Sheet and filter the results to show just what you need appeared first on .

]]>
You may be filtering and sorting a big dataset in a Google Sheet and want to see that dataset in another Google Sheet without having to copying and pasting the data each time the “source” data is updated. To solve this problem, you need to somehow import the data from the “source” worksheet to your “target” worksheet. When the source worksheet is updated with new sales or customers data, your target worksheet gets updated as well. On top of that, the data that shows up in your target worksheet should be filtered so you only see the data that you need and matters to you. The key to doing this is the IMPORTRANGE() function in conjunction with the FILTER() or QUERY() functions. I’ll go over two methods for importing data from another Google Sheet and talk about the pros and cons of each. You can use this “source” Google Sheet as the raw data and see this target Google Sheet which contains the formulas.

Watch a video tutorial of this post/episode below:

Your Google Sheet is your database

No matter which team you work on, at one point or another your main “database” or “source of truth” was some random Google Sheet. This Google Sheet might have been created by someone in your operations or data engineering team. It may be a data dump from your company’s internal database and whether you like it or not, it contains business-critical data and your team can’t operate without it. The Google Sheet might contain customers data, marketing campaign data, or maybe bug report data that is exported from your team’s Jira workspace.

The reasons why people default to using Google Sheets as their “database” is because anyone can access it in their browser, and more importantly, you can share that Sheet easily with people as long as you have their email address. This is probably your security team’s worst nightmare, but at this point too many teams rely on this Google Sheet so it’s hard to break away from it as a solution.

Credit card customer data

Before we get into the solution, let’s take a look at our data set. Our “source” dataset is a bunch of credit card customer data (5,000 rows) with a customer’s demographic and credit card spending data:

There are a ton of columns in this dataset I don’t care about. I also only want to see the rows where the Education_Level is “Graduate” and the Income_Category is “$80K-$120K.” Perhaps I’m doing an analysis on credit card customers who are high earners and have graduated some college. How do I get that filtered data of graduates earning $80K-$120K into this “target” Sheet:

Google Sheets is not the most ideal solution as a database, but you gotta live with it so let’s see how we can get the data we need from our source Google Sheet over to the target. The money function is IMPORTRANGE() but there are multiple ways of using IMPORTRANGE() as I describe below.

Method 1: The long way with FILTER() and INDEX()

When you use the IMPORTRANGE() function on its own, you will just get all the data from your source Sheet into your target Sheet. In this formula below, I just get all the data from columns A:U in my source Sheet with all the credit card customer data:

The first parameter can be the full URL of the Google Sheet but you can also just get the Sheet ID from the URL to make the formula shorter. The 2nd parameter are the columns you want to pull into your target Sheet.

Again, this will basically give you an exact copy of the source Sheet into your current Sheet. When data is updated in the source, your target Sheet gets updated too. For a lot of scenarios this might be all you need! But let’s go further and try to get a filtered dataset from the source Sheet.

The first thing you’ll probably think of is to use the FILTER() function. The question is what do we put for the second parameter in the FILTER() function?

The first parameter we’ll just use our IMPORTRANGE() function but the second parameter we need to filter by the column that we’re interested in with something like this to get only the rows where the Education_Level is “Graduate”:

=filter(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"), F:F="Graduate")

This doesn’t work because F:F is referencing the current worksheet. Our dataset is pulling from a different worksheet and there’s no way to filter that source before it gets into our current worksheet.

The solution is to use the INDEX() function with the FILTER() function like this:

=filter(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),index(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),0,6)="Graduate")

This INDEX() function is telling Google Sheets to look at the source data and focus on the 6th column and see which rows have “Graduate” in them.

We want to filter the data that not only has “Graduate” as the education level but also customers who have a salary of “$80K-$120K.” We can just add additional conditions to our FILTER() formula using this INDEX() trick:

=filter(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),index(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),0,6)="Graduate",index(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),0,8)="$80K - $120K")

We know have a filtered list of about 300 rows:

Pros and cons of this method

The main benefit of this method is that it’s using functions that you may already be familiar with. The main trick is to know how to use the INDEX() function within the FILTER() function.

There are several cons to this method which is why I wouldn’t recommend it (especially if you have a large dataset). Just from filtering two columns, you have to run the IMPORTRANGE() function twice! Imagine filtering on 10 columns. There has got to be a more scalable method than having to nest the IMPORTRANGE() function multiple times in the FILTER() function. This method will definitely get slow over time for large datasets.

Another downside is you can’t control the number of columns that gets returned. Our source data has 21 columns and all 21 get returned. What’s the point of filtering your dataset if you can’t filter the columns that get returned too? You’ll end up hiding a bunch of columns that don’t matter for you in your target worksheet which doesn’t feel right.

Finally, the column headers in this method are manually entered. Our formula in this method actually gets entered in cell A2 to allow us to copy/paste the column headers into row 1. This means if new columns get added to the source data, you’ll have to remember to add those column headers in your target worksheet. Also not the best method in terms of maintaining this Google Sheet long-term:

Method 2 (preferred): Using QUERY() with a little bit of SQL

The QUERY() function is a relatively advanced function in Google Sheets. Episode 32 was all about how to use the QUERY() function. The reason why it’s not used as much is because it requires you to know a little bit of SQL. To filter our source data to the customers who are “Graduates” and earn “$80K-$120K,” the formula looks like this:

=query(importrange("1H5JljkscteL2qRMJ8ky342uTeP839jjDGg81c8Eg0es","A:U"),"SELECT Col1,Col3,Col6,Col7,Col8,Col9 WHERE Col6='Graduate' and Col8='$80K - $120K'",1)

Just like the FILTER() function, our IMPORTRANGE() is the first parameter. The second parameter is where we have to do a little SQL magic to pull the data we need. All those columns after the SELECT clause are simply the columns we want to pull into our target sheet. This already makes this method more powerful than the first method because we can specify which columns we want from our source Google Sheet. Usually when you use the QUERY() function, you can reference the column by referring to the column letter. With IMPORTRANGE() you have to use the “Col” prefix.

After that, you add in the conditions after the WHERE clause. The trick here is to count the number of columns you want to filter on. In this case, “Col6” is Education_Level and “Col8” is Income_Category.

What’s that last “1” before the closing parentheses? That just tells Google Sheets that our source data has headers so we can pull back our filtered data and the relevant column names. We now get this nice filtered dataset with only the columns we care about:

Pros and cons of this method

In addition to being a much shorter formula, the QUERY() function will bring in the column names. This means you can enter the formula in cell A1 of your target Google Sheet and the data and column names will dynamically update as the source data changes. This means you never have to worry about copying and pasting the column names from the source Google Sheet. This means long-term maintenance of your target Sheet will be much easier.

The main cons:

  • QUERY() is a hard function to learn. Learning a new syntax is difficult so if you want to do more advanced filtering and sorting with QUERY() you’ll have to learn more SQL.
  • Column numbers can change. This also exists with the first method, but you’ll have to keep track of the column numbers in the source Google Sheet. If new columns get added, you’ll have to adjust your SELECT clause to “pick” the right columns to pull into your target Google Sheet

Final words on using Google Sheets are your database

I could spend another episode on the pros and cons of using Google Sheets as your team or company’s database, but will try to keep my final words short.

Those who don’t use Google Sheets and Excel every day cringe when they see workarounds like this to get the data that we need. The sooner one accepts that business-critical data will inevitably land in an Excel file or Google Sheet, the sooner we can get our jobs done. I’ve written about the unconventional use cases of spreadsheets before and this scenario is no different.

We know our database lives in a Google Sheet. That’s not going to change. Let’s just try to find the most painless way of getting that data out into another Sheet so we can do the more interesting analyses that matter for our business. If you care about the data living in a database and analysts being able to query the data using a separate BI tool, then you should probably consider getting into data engineering and be the change agent within your organization to move everyone off of spreadsheets. It’s a gargantuan task and in most cases an uphill battle.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #69: Import data from another Google Sheet and filter the results to show just what you need appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-69-import-data-from-another-google-sheet-and-filter-the-results-to-show-just-what-you-need/feed/ 0 You may be filtering and sorting a big dataset in a Google Sheet and want to see that dataset in another Google Sheet without having to copying and pasting the data each time the "source" data is updated. To solve this problem, You may be filtering and sorting a big dataset in a Google Sheet and want to see that dataset in another Google Sheet without having to copying and pasting the data each time the "source" data is updated. To solve this problem, you need to somehow import the data from the "source" worksheet to your "target" worksheet. When the source worksheet is updated with new sales or customers data, your target worksheet gets updated as well. On top of that, the data that shows up in your target worksheet should be filtered so you only see the data that you need and matters to you. The key to doing this is the IMPORTRANGE() function in conjunction with the FILTER() or QUERY() functions. I'll go over two methods for importing data from another Google Sheet and talk about the pros and cons of each. You can use this "source" Google Sheet as the raw data and see this target Google Sheet which contains the formulas.







Watch a video tutorial of this post/episode below:




https://youtu.be/7QLnAP0zHIM




Your Google Sheet is your database



No matter which team you work on, at one point or another your main "database" or "source of truth" was some random Google Sheet. This Google Sheet might have been created by someone in your operations or data engineering team. It may be a data dump from your company's internal database and whether you like it or not, it contains business-critical data and your team can't operate without it. The Google Sheet might contain customers data, marketing campaign data, or maybe bug report data that is exported from your team's Jira workspace.



The reasons why people default to using Google Sheets as their "database" is because anyone can access it in their browser, and more importantly, you can share that Sheet easily with people as long as you have their email address. This is probably your security team's worst nightmare, but at this point too many teams rely on this Google Sheet so it's hard to break away from it as a solution.







Credit card customer data



Before we get into the solution, let's take a look at our data set. Our "source" dataset is a bunch of credit card customer data (5,000 rows) with a customer's demographic and credit card spending data:







There are a ton of columns in this dataset I don't care about. I also only want to see the rows where the Education_Level is "Graduate" and the Income_Category is "$80K-$120K." Perhaps I'm doing an analysis on credit card customers who are high earners and have graduated some college. How do I get that filtered data of graduates earning $80K-$120K into this "target" Sheet:







Google Sheets is not the most ideal solution as a database, but you gotta live with it so let's see how we can get the data we need from our source Google Sheet over to the target. The money function is IMPORTRANGE() but there are multiple ways of using IMPORTRANGE() as I describe below.



Method 1: The long way with FILTER() and INDEX()



When you use the IMPORTRANGE() function on its own, you will just get all the data from your source Sheet into your target Sheet. In this formula below, I just get all the data from columns A:U in my source Shee...]]>
Dear Analyst 69 28:44 50879
Dear Analyst #68: Generate unique IDs for your dataset for building summary reports in Google Sheets https://www.thekeycuts.com/dear-analyst-68-generate-unique-ids-for-your-dataset-for-building-summary-reports-google-sheets/ https://www.thekeycuts.com/dear-analyst-68-generate-unique-ids-for-your-dataset-for-building-summary-reports-google-sheets/#respond Tue, 04 May 2021 04:46:00 +0000 https://www.thekeycuts.com/?p=50857 If your dataset doesn’t have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeholders. Usually your dataset will have some kind of unique […]

The post Dear Analyst #68: Generate unique IDs for your dataset for building summary reports in Google Sheets appeared first on .

]]>
If your dataset doesn’t have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeholders. Usually your dataset will have some kind of unique identifier like customer ID or transaction ID because that row of data might be used with some other dataset. It’s rare these days not to have one. Here are a few methods for creating your own unique identifiers using this list of customer transaction data (Google Sheets for this episode here).

Method 1: Create a sequential list of numbers as unique IDs

Each of these transactions is from a unique customer on a unique date for a unique product. We could do something as simple as creating a sequential list of numbers to “mark” each transaction. Maybe we can prefix this new transaction ID column with “tx-” so each unique ID will look something like this:

This method involves creating a dummy column (column I) of sequential numbers. Then in column A, you write “tx-” followed by the number you created in column I, and you have a unique ID. This unique ID is only relevant for this dataset, however. If there are other tables of data related to customers and transactions, those tables won’t know about this new transaction ID you just created on the fly.

Method 2: Create random numbers as unique ID

This method will make your unique IDs feel a little more “unique” since the numbers are randomized:

Notice how we happen to take the result of the RAND() function and multiply it by 100,000 to get a random number with 5 digits. Our dataset is only 1,000 rows long so the chances of duplicate values is low, but there still exists that possibility.

This is probably the least preferred solution because of the fact that there could be duplicate values (there are formula hacks to get around it). Another reason this isn’t a great solution is that you have to remember to copy and paste values from the random numbers into another column. The RAND() function is a volatile function (basically changes every time you reload the Sheet) so you would lose your unique ID every time the Sheet loads. This means you have to remember to paste just the values perhaps in the next column over before referencing that value as your unique ID.

Finally, if your dataset has timestamps like this, chances are the unique IDs are meant to be sequential (using Method 1). Assigning random unique IDs to each transaction might make reconciling the data in the future more difficult.

Method 3: Concatenate (add) columns together to create unique ID

This method involves concatenating (adding) together different columns to create a unique ID. The reason I like this method is because it makes creating reports a bit easier since you can write in the values in a cell for a lookup to reference. For instance, the unique IDs in our dataset is created by combining the Customer ID, SKU_Category, and SKU columns:

We put a dash “-” in between each of the cell references so it’s a bit easier to see all the different characters in this “unique ID.” The issue is this: what if there are multiple transactions with the same Customer ID, SKU_Category, and SKU? We insert a COUNTIF column in between columns B and C to count the number of times that unique ID appears in column B:

And then do a quick filter to see if there are any values greater than 1 in this column:

Well that sucks. Looks like we have 8 transactions that don’t have unique IDs using this method. The tricky thing with this method is figuring out what other columns can add “uniqueness” to the unique ID. The Date column can’t be used because it looks like some of these transactions happened on the same date. Maybe we can combine the Quantity and Sales_Amount columns to create a unique ID? Even that wouldn’t work because the last two rows have the same quantity and sales amount. This is where this method falls apart because as the dataset grows, you need to constantly check to see if the unique ID column you created is still in fact unique.

Great for creating summary reports

Let’s assume that we were able to create a unique ID for every transaction in this table. Now if I want to create a summary table that looks at the Sales_Amount, for instance, creating the formula might look like this:

You’re probably wondering why we would make such a complicated formula using the unique ID column versus just using the columns themselves. In the future, you might want to do a lookup to a specific transaction ID and knowing the columns that contribute to that uniqueness of that ID makes it easy to write out the hard-coded value to do the lookup.

For instance, I might know that a customer with the ID “5541” is important and I can have that Customer_ID in my summary table somewhere. Then I know that the “8ETY5” SKU is an important skew my company is tracking, and that could be another value I hard-code in my summary table somewhere. Knowing that the unique ID for the transaction includes these values might make it easier to reference that row in my summary report in the future (or perhaps in a PivotTable too).

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #68: Generate unique IDs for your dataset for building summary reports in Google Sheets appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-68-generate-unique-ids-for-your-dataset-for-building-summary-reports-google-sheets/feed/ 0 If your dataset doesn't have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeho... If your dataset doesn't have a unique identifier (e.g. customer ID, location ID, etc.), sometimes you have to make one up. The reason you need this unique ID is to summarize your dataset into a nice report to be shared with a client or internal stakeholders. Usually your dataset will have some kind of unique identifier like customer ID or transaction ID because that row of data might be used with some other dataset. It's rare these days not to have one. Here are a few methods for creating your own unique identifiers using this list of customer transaction data (Google Sheets for this episode here).








https://youtu.be/fjkO0kHbbKw




Method 1: Create a sequential list of numbers as unique IDs



Each of these transactions is from a unique customer on a unique date for a unique product. We could do something as simple as creating a sequential list of numbers to "mark" each transaction. Maybe we can prefix this new transaction ID column with "tx-" so each unique ID will look something like this:







This method involves creating a dummy column (column I) of sequential numbers. Then in column A, you write "tx-" followed by the number you created in column I, and you have a unique ID. This unique ID is only relevant for this dataset, however. If there are other tables of data related to customers and transactions, those tables won't know about this new transaction ID you just created on the fly.



Method 2: Create random numbers as unique ID



This method will make your unique IDs feel a little more "unique" since the numbers are randomized:







Notice how we happen to take the result of the RAND() function and multiply it by 100,000 to get a random number with 5 digits. Our dataset is only 1,000 rows long so the chances of duplicate values is low, but there still exists that possibility.



This is probably the least preferred solution because of the fact that there could be duplicate values (there are formula hacks to get around it). Another reason this isn't a great solution is that you have to remember to copy and paste values from the random numbers into another column. The RAND() function is a volatile function (basically changes every time you reload the Sheet) so you would lose your unique ID every time the Sheet loads. This means you have to remember to paste just the values perhaps in the next column over before referencing that value as your unique ID.



Finally, if your dataset has timestamps like this, chances are the unique IDs are meant to be sequential (using Method 1). Assigning random unique IDs to each transaction might make reconciling the data in the future more difficult.



Method 3: Concatenate (add) columns together to create unique ID



This method involves concatenating (adding) together different columns to create a unique ID. The reason I like this method is because it makes creating reports a bit easier since you can write in the values in a cell for a lookup to reference. For instance, the unique IDs in our dataset is created by combining the Customer ID, SKU_Category, and SKU columns:







We put a dash "-" in between each of the cell references so it's a bit easier to see all the different characters in this "unique ID." The issue is this: what if there are multiple transactions with the same Customer ID, SKU_Category, and SKU? We insert a COUNTIF column in between columns B and C to count ...]]>
Dear Analyst 68 28:19 50857
Dear Analyst #67: Automating tedious tasks with scripts and solving problems software can’t fix https://www.thekeycuts.com/dear-analyst-67-automating-tedious-tasks-with-scripts-and-solving-problems-software-cant-fix/ https://www.thekeycuts.com/dear-analyst-67-automating-tedious-tasks-with-scripts-and-solving-problems-software-cant-fix/#respond Mon, 19 Apr 2021 04:10:00 +0000 https://www.thekeycuts.com/?p=50841 This episode is actually a recap of a talk I gave at a meetup. After reflecting a bit about the subject matter, I wanted to discuss some other topics that are more important than writing VBA scripts or doing stuff in Excel. At the meetup, I discussed a VBA script and Google App Script I […]

The post Dear Analyst #67: Automating tedious tasks with scripts and solving problems software can’t fix appeared first on .

]]>
This episode is actually a recap of a talk I gave at a meetup. After reflecting a bit about the subject matter, I wanted to discuss some other topics that are more important than writing VBA scripts or doing stuff in Excel. At the meetup, I discussed a VBA script and Google App Script I wrote for filling values down in a column. I actually published these scripts in a previous episode, but went in-depth during the meetup on how the scripts work. If I step back for a minute and ask myself: “why did I create these scripts in the first place?” To solve a simple problem that I’m sure many analysts come across. More importantly, it’s a problem that doesn’t have a clear solution which the our current software (Excel and Google Sheets) can fix easily.

Software that fixes your problems

For those of you who are:

  1. Using a recent version of Excel
  2. On a PC
  3. Have a Microsoft 365 subscription (depending on your package)

Congratulations! You are able to use Power Query to transform and clean “dirty” data and the problem described in this episode is easily solved with the software. All you have to do is click this option in Power Query to fill values down:

For the rest of us (Mac Excel or Google Sheets users), you’re stuck doing this manually. Why does this feature have to be reserved to a small group (relatively speaking) when this problem is faced by thousands of people who may not have the same access as someone who works in the enterprise?

Cleaning data is part of any analyst’s job and we should be able to do these tasks as quick as possible so that we can move onto more interesting projects. The fact that you need to have Power Query to fill values down like this is annoying to me. Do you ever go out of your way to prove a point; even if it’s an extremely inefficient use of your time? Creating these VBA and Google App Scripts was just that for me. Instead of relying on the software to do the job for me, I created hacked up an inelegant but simple solution to hopefully give people more access to simple tools for cleaning up data.

Building for an audience of one

I might be over-estimating the number of people who have this fill values down problem. Maybe it’s a few hundred people? Maybe less than 100? Who knows. The important thing is that I had the problem and needed to solve the problem for myself.

Perhaps you are in a position where you can’t spend a few hours to learn how to write a script to automate one aspect of your job. That’s understandable. You need to crank out reports and time spent away from cranking means you’ll have to work after hours to get your job done.

I used to be on that hamster wheel, until I stepped back and saw the forest for the trees. Excel is just one tool in your vast array of tools to analyze and visualize data. There’s a whole world of databases, data pipelines, machine learning, and more for you to explore. Just staying in the “Excel lane” is how one gets pigeonholed into a job, a career, a life.

Learning how to write scripts changed my perspective on more than just Excel. I realized I could build tools that help others save time because I knew it saved me time. By building for an audience of one, you are in fact building for an audience of many.

Meetup recap

This write-up definitely meandered a bit but I think that’s ok. You can watch the recap of the meetup below and get lost in the details on how I loop through arrays to make the script work. The important lesson I hope you’ll walk away with is thinking outside of what Excel or Google Sheets has to offer into the other platforms and tools that come before or after your spreadsheet.

Slides from the meetup

Other Podcasts & Blog Posts

No other podcasts!

The post Dear Analyst #67: Automating tedious tasks with scripts and solving problems software can’t fix appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-67-automating-tedious-tasks-with-scripts-and-solving-problems-software-cant-fix/feed/ 0 This episode is actually a recap of a talk I gave at a meetup. After reflecting a bit about the subject matter, I wanted to discuss some other topics that are more important than writing VBA scripts or doing stuff in Excel. At the meetup, This episode is actually a recap of a talk I gave at a meetup. After reflecting a bit about the subject matter, I wanted to discuss some other topics that are more important than writing VBA scripts or doing stuff in Excel. At the meetup, I discussed a VBA script and Google App Script I wrote for filling values down in a column. I actually published these scripts in a previous episode, but went in-depth during the meetup on how the scripts work. If I step back for a minute and ask myself: "why did I create these scripts in the first place?" To solve a simple problem that I'm sure many analysts come across. More importantly, it's a problem that doesn't have a clear solution which the our current software (Excel and Google Sheets) can fix easily.







Software that fixes your problems



For those of you who are:



* Using a recent version of Excel* On a PC* Have a Microsoft 365 subscription (depending on your package)



Congratulations! You are able to use Power Query to transform and clean "dirty" data and the problem described in this episode is easily solved with the software. All you have to do is click this option in Power Query to fill values down:







For the rest of us (Mac Excel or Google Sheets users), you're stuck doing this manually. Why does this feature have to be reserved to a small group (relatively speaking) when this problem is faced by thousands of people who may not have the same access as someone who works in the enterprise?



Cleaning data is part of any analyst's job and we should be able to do these tasks as quick as possible so that we can move onto more interesting projects. The fact that you need to have Power Query to fill values down like this is annoying to me. Do you ever go out of your way to prove a point; even if it's an extremely inefficient use of your time? Creating these VBA and Google App Scripts was just that for me. Instead of relying on the software to do the job for me, I created hacked up an inelegant but simple solution to hopefully give people more access to simple tools for cleaning up data.



Building for an audience of one



I might be over-estimating the number of people who have this fill values down problem. Maybe it's a few hundred people? Maybe less than 100? Who knows. The important thing is that I had the problem and needed to solve the problem for myself.



Perhaps you are in a position where you can't spend a few hours to learn how to write a script to automate one aspect of your job. That's understandable. You need to crank out reports and time spent away from cranking means you'll have to work after hours to get your job done.







I used to be on that hamster wheel, until I stepped back and saw the forest for the trees. Excel is just one tool in your vast array of tools to analyze and visualize data. There's a whole world of databases, data pipelines, machine learning, and more for you to explore. Just staying in the "Excel lane" is how one gets pigeonholed into a job, a career, a life.



Learning how to write scripts changed my perspective on more than just Excel. I realized I could build tools that help others save time because I knew it saved me time. By building for an audience of one, you are in fact building for an audience of many.



Meetup recap



This write-up definitely meandered a bit but I think that's ok. You can watch the recap of the meetup below and get lost in the details on how I loop through arrays to m...]]>
Dear Analyst 67 55:56 50841
Dear Analyst #66: How to update and add new data to a PivotTable with ramen ratings data https://www.thekeycuts.com/dear-analyst-66-how-to-update-and-add-new-data-to-a-pivottable-with-ramen-ratings-data/ https://www.thekeycuts.com/dear-analyst-66-how-to-update-and-add-new-data-to-a-pivottable-with-ramen-ratings-data/#respond Mon, 12 Apr 2021 04:05:00 +0000 https://www.thekeycuts.com/?p=50811 PivotTables have been on my mind lately (you’ll see why in a couple weeks). An issue you may face with PivotTables is how to change the source data for a PivotTable you’ve meticulously set up. You have some new data being added to your source data, and you have to change the PivotTable source data […]

The post Dear Analyst #66: How to update and add new data to a PivotTable with ramen ratings data appeared first on .

]]>
PivotTables have been on my mind lately (you’ll see why in a couple weeks). An issue you may face with PivotTables is how to change the source data for a PivotTable you’ve meticulously set up. You have some new data being added to your source data, and you have to change the PivotTable source data to reference the additional rows that show up at the bottom of your source data. This may not be a big issue for you because maybe you’re not getting new data added often so manually going into the PivotTable settings and changing the reference to the source data doesn’t feel onerous. If you have new data coming in every day or every hour, you may want to automate this process.

Here are a few methods to accomplish this in both Excel and Google Sheet. My preferred method is to turn your source data into a table in Excel or reference the entire columns in Google Sheets. Download the Excel file or copy the Google Sheets with the dataset for this episode.

Ramen ratings from ramenphiles

I’m a big fan of these niche datasets like the one for this episode. It’s a list of ramen products and their ratings created by a website called The Ramen Rater. The list consists of 2,500 ramen products along with that product’s country of origin, the style (Pack or Bowl), and of course the rating. It appears the ratings are all done by one person. More importantly, the list contains the full name of the ramen product which means you can do some interesting text analysis to see what words are used most often in ramen products, how words might correlate with ratings, etc. For our purposes, this dataset is a great for creating a PivotTable with the rating being the main metric to analyze.

Method 1: Reference the entire column

Excel PivotTables

As shown in the first screenshot, the source data for the PivotTable in the Excel file comes from the “ramen-ratings” worksheet from cells $A$1:$G$2581. As you add more data to the source data, you’ll have to change the source reference to reference a higher row number. If you add 10 more ramen ratings, you’ll have to change the PivotTable reference to $A$1:$G$2591. We want to avoid having to change the reference every time we add new data, so we can just reference the entire columns in $A:$G:

The problem is the PivotTable we have in the “Ramen Pivot Table” worksheet now has this “(blank)” item in both the columns and rows fields of our PivotTable. Why? Because we’re referencing a bunch of empty rows of empty countries and ramen styles:

This isn’t a huge issue, because we can just remove the “(blank)” via the row and column filters:

Now when you add new rows of ramen ratings to the source data and then you refresh the PivotTable, the PivotTable will automatically pick up all the new rows of data since it’s referencing the entire columns from column A to column G.

Google Sheets PivotTables

The same solution applies to Google Sheets:

I find the user interface much easier to use in Google Sheets for a variety of reasons:

  1. Less clicks – Right when you click on the PivotTable (as shown in the above gif), you can see and edit the source data in the top right of the PivotTable field settings. In Excel, you have to click on the PivotTable Analyze tab in the ribbon and then “Change Data Source.”
  2. Can use left/arrow keys in cell reference – It’s a small annoyance in Excel, but notice how in the above gif you can just use the right arrow key to move the cursor to the right in the cell reference? This makes it easy to delete the row numbers. In Excel, using the left/right arrow keys changes the cell reference based on where your active cursor is in the spreadsheet. 9 times out of 10, you end up creating an incorrect formula and have to exit out of the menu, undo, or a combination of those two.
  3. PivotTable automatically refreshes – Less UI and more of a core feature in PivotTables in Google Sheets, but PivotTables automatically refresh when you add or edit data in your source. In Excel, you have to right-click and click “Refresh” or refresh via the ribbon every time you want to refresh the PivotTable. I’m sure there’s some pivot cache or performance reason why Excel doesn’t refresh automatically, but Google Sheets just gets it right on this one. I know there’s some settings in Excel like refreshing the PivotTable every time the file opens or refreshing the PivotTable at some interval you define (e.g. every 10 minutes), but it just adds additional overhead for the user who wants to just see their PivotTable updated in real time. This is 2021.

Overview of this method

For most use cases of PivotTables, I’d argue this solution is fine. This Excel file is pretty basic with one data source and one PivotTable. The dataset is also not super huge so you don’t have to worry about performance issues with referencing the entire columns of data with all those empty rows.

If you work in a corporate environment and you’re tasked with analyzing multiple datasets and have multiple data sources and PivotTables in your file, you may need something more scalable. This is where method 2 comes into play.

Method 2: Turn source data into a table (recommended)

Excel PivotTables

If you turn the source data into an Excel table and give the table a name, new data that gets added to the source will automatically get included in the table “reference.” Once you’re in the data source, press CTRL+T and hit ENTER to turn the data into an Excel table:

While your cursor is still in the newly created table, rename the table name to “Ramen” in the top-left:

Then we go back to the main ramen PivotTable, and change the source to equal this new Ramen table by just typing =Ramen in the Location field:

Now when you add new ramen ratings to the source data, the table reference automatically “expands” to include these new rows of data. In the gif below, I’m just copying some additional rows of data from another sheet and pasting it at the bottom of the Ramen source data table:

Notice how when you paste in the new data, the Excel table automatically expands the alternating row colors to include this new data. This shows that Excel was able to add this additional data to the table reference. If you refresh the PivotTable, it will automatically include the rows that got added since the source is still =Ramen.

Advantages of turning your PivotTable data source into an Excel table

Keep in mind: method 1 above is a totally acceptable solution for most simple PivotTable use cases. It’s really the edge cases where method 1 starts to break down. With method 2, not only do you eliminate some of these edge cases, but you get some additional benefits as well:

  1. (From method 1) Always need to deselect (blank) – If you’re doing any sort of bigger analysis, you’re going to be building multiple PivotTables. As you copy and paste the first able you created into new worksheets, that “(blank)” will always need to be deselected in the columns and rows. That shouldn’t be a problem in most use cases, but as you hit “select all” in the PivotTable filters as you’re doing your analysis, you’ll need to remember to scroll down to always keep that (blank) value deselected. It’s just some additional overhead that you don’t want to worry about.
  2. Easy to read table reference – Just as you may have multiple PivotTables in your file, you will probably have multiple data sources your PivotTables are built on. Instead of referring to the data source with the traditional A1:B2 cell references, it’s easier to just read a table reference as Ramen and know that it’s referencing your ramen dataset. If you accidentally name the worksheet something generic like source_data, you’ll have to double-check that your traditional cell reference is indeed referencing the ramen ratings data you’re interested in.
  3. See all table references in one place – Building of of the previous benefit, you can quickly see all your table references driving your PivotTables in the “Define Name” button on the Formulas tab in the ribbon. If you need to see the exact cell reference for your tables, this is the main place to see those cell references:

Google Sheets PivotTables

Tables don’t exist in Google Sheets :(.

I’m baffled as to why this feature doesn’t exist in Google Sheets, but I’m sure the team will build this functionality at some point to get to feature parity with Excel. In my opinion, the fact that Google Sheets PivotTables auto-refresh as you edit or add data outweighs the benefits of turning your source data into PivotTables. Most Google Sheets PivotTables I’m creating these days are pretty simple in nature so I’m not working with many PivotTables or data sources in one Sheet.

Now there are some formula tricks you can do with the FILTER(), OFFSET(), and COUNTA() functions to replicate the features of Excel tables, but it’s not as simple as the Excel tables feature. It probably also isn’t very performant on larger datasets when you’re using these functions to reference the source data correctly. But it’s possible!

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #66: How to update and add new data to a PivotTable with ramen ratings data appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-66-how-to-update-and-add-new-data-to-a-pivottable-with-ramen-ratings-data/feed/ 0 PivotTables have been on my mind lately (you'll see why in a couple weeks). An issue you may face with PivotTables is how to change the source data for a PivotTable you've meticulously set up. You have some new data being added to your source data, PivotTables have been on my mind lately (you'll see why in a couple weeks). An issue you may face with PivotTables is how to change the source data for a PivotTable you've meticulously set up. You have some new data being added to your source data, and you have to change the PivotTable source data to reference the additional rows that show up at the bottom of your source data. This may not be a big issue for you because maybe you're not getting new data added often so manually going into the PivotTable settings and changing the reference to the source data doesn't feel onerous. If you have new data coming in every day or every hour, you may want to automate this process.



Here are a few methods to accomplish this in both Excel and Google Sheet. My preferred method is to turn your source data into a table in Excel or reference the entire columns in Google Sheets. Download the Excel file or copy the Google Sheets with the dataset for this episode.








https://www.youtube.com/watch?v=i2BI0RaEuYQ




Ramen ratings from ramenphiles



I'm a big fan of these niche datasets like the one for this episode. It's a list of ramen products and their ratings created by a website called The Ramen Rater. The list consists of 2,500 ramen products along with that product's country of origin, the style (Pack or Bowl), and of course the rating. It appears the ratings are all done by one person. More importantly, the list contains the full name of the ramen product which means you can do some interesting text analysis to see what words are used most often in ramen products, how words might correlate with ratings, etc. For our purposes, this dataset is a great for creating a PivotTable with the rating being the main metric to analyze.







Method 1: Reference the entire column



Excel PivotTables



As shown in the first screenshot, the source data for the PivotTable in the Excel file comes from the "ramen-ratings" worksheet from cells $A$1:$G$2581. As you add more data to the source data, you'll have to change the source reference to reference a higher row number. If you add 10 more ramen ratings, you'll have to change the PivotTable reference to $A$1:$G$2591. We want to avoid having to change the reference every time we add new data, so we can just reference the entire columns in $A:$G:







The problem is the PivotTable we have in the "Ramen Pivot Table" worksheet now has this "(blank)" item in both the columns and rows fields of our PivotTable. Why? Because we're referencing a bunch of empty rows of empty countries and ramen styles:







This isn't a huge issue, because we can just remove the "(blank)" via the row and column filters:







Now when you add new rows of ramen ratings to the source data and then you refresh the PivotTable, the PivotTable will automatically pick up all the new rows of data since it's referencing the entire columns from column A to column G.



Google Sheets PivotTables



The same solution applies to Google Sheets:







I find the user interface much easier to use in Google Sheets for a variety of reasons:



* Less clicks - Right when you click on the PivotTable (as shown in the above gif), you can see and edit the source data in the top right of the PivotTable...]]>
Dear Analyst 66 28:59 50811
Dear Analyst #65: Eliminating biases in sports data and doing a data science bootcamp with Caiti Donovan https://www.thekeycuts.com/dear-analyst-65-eliminating-biases-in-sports-data-and-doing-a-data-science-bootcamp-with-caiti-donovan/ https://www.thekeycuts.com/dear-analyst-65-eliminating-biases-in-sports-data-and-doing-a-data-science-bootcamp-with-caiti-donovan/#respond Mon, 29 Mar 2021 04:10:00 +0000 https://www.thekeycuts.com/?p=50784 When you think of sports and data, you may think about all the data collect on player performance and game stats. There’s another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, the VP of Data & Insights at Sports Innovation Lab, a sports market research firm. […]

The post Dear Analyst #65: Eliminating biases in sports data and doing a data science bootcamp with Caiti Donovan appeared first on .

]]>
When you think of sports and data, you may think about all the data collect on player performance and game stats. There’s another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, the VP of Data & Insights at Sports Innovation Lab, a sports market research firm. Caiti started her career in marketing and business development at Viacom and Spotify where she used data storytelling to work with advertisers and partners. More recently she learned how build the data systems she was once only a consumer of. We’ll discuss how she made the transition to data, getting a data science certification at The Fu Foundation School of Engineering and Applied Science at Columbia University, and current projects she’s working on at Sports Innovation Lab.

Working with data at ViacomCBS and Spotify

Caiti spent 15 years in marketing and sales roles where data was a core part of her day-to-day projects. She used a lot of proprietary data systems and even helped build some of these systems. Using the data available to her, she’d take different datasets and turn the data into a format useful for data storytelling. These stories would be used for partnership development or working with advertisers. Data storytelling is a common theme on this podcast. See episode 62 with Janie Ho, episode 56 with John Napolean-Kuofie, and episode 35 on the Shape of Dreams.

At ViacomCBS, Caiti would look at the data behind shows like Jersey Shore and SpongeBob to see what type of revenue opportunities her team could create based on the audience of these shows. The data could also be analyzed to help inform content development for these shows. The goal was to understand their younger fans and figure out what it meant to have conversations with the fans of these shows.

After a stint working with a few startups in a consulting capacity, Caiti eventually landed at Spotify. At the time, Spotify had a hard time turning all the data they were sitting on into narratives in a B2B and B2C context. She worked with clients like the NBA, Ford, and Nike. In terms of the data stories she was saying to her clients from a B2B perspective, she also had to make sure it carried over to the B2C side (Spotify subscribers).

From there, Caiti made a big hump from entertainment to sports. She realized her “purpose meets passion” moment is finding ways to use data to have impact on the world. She wanted to tackle challenges faced by women in sports and also find a way to better connect with the fans of women’s sports. Caiti eventually co-founded the non-profit SheIS Sport to bring together every single professional women’s sports league. Through this experience, Caiti learned a lot about the biases and inequities in data in the sports world. She realized she needed more technical expertise to have a direct impact on how data is collected and analyzed in this world, and went back to school for data science (more on this later).

Spotify’s billion points of data per day

When Caiti was at Spotify, one of her projects was figuring out how to translate the billion points of data generated by Spotify users into product opportunities. In addition to product opportunities, the ad sales team needed to have stories they could tell to their clients that were backed up by data.

She started evaluating how her team could clean and dissect the data to productize the data Spotify was generating and storing every day. Using proprietary algorithms, her team analyzed people’s music listening behavior with to figure out what a listener might be doing at the time they were listening to a song. This became known as the “moment marketing” which carried a lot of context about the subscriber. This context allowed advertisers to tap into the moment the subscriber was in like if they were at the gym, in their car, or at a party. Some of the metrics the team analyzed included bpm, device-level data, and types of playlists people were creating. What better time for Nike to target a consumer with new shoes than when the consumer might be doing a workout or training for a sport?

Wanting to build her own data systems

To get closer to the data systems she was using, Caiti made the decision to go back to school and learn more about data science. She was accepted into a data science bootcamp at Columbia’s Fu School of Engineering and Applied Science. The topics covered in the bootcamp included Python, ETL processes, machine learning, and different tools to build data systems.

It took Caiti 6-7 years to make the decision to go back to school for a degree in data science. The catalysts for her decision included the data discrepancies she sees in the sports world and the pandemic.

When Caiti was at SheIS Sport, her team created a campaign report showing that 4% of sports media coverage focuses on women’s sports. The campaign ended up receiving half a billion impressions, 4.2 millions engagements online, and 25K people posting their stories. She realized this 4% number only covers linear TV and no digital channels. Without proper data, advertisers, partners, and leagues cannot evaluate the opportunity available in women’s sports. It’s a chicken and egg scenario where fans wanted more media coverage, and advertisers are saying they’ll get more involved if they see more eyeballs and people going to these games.

Experience at a data science bootcamp

Caiti had already been accepted into the Columbia program at end of 2019 and just deferred to the spring semester in 2020. She also looked at schools like Flatiron and some other programs in New York. What drew her to Columbia’s program was the mix of backend technical topics but also learning about related tools like Tableau and Hadoop.

Caiti’s data science bootcamp was the first bootcamp to go completely virtual. Given the intensity of the program , she stepped out of day-to-day operations at SheIS Sport to focus on her classes. The schedule was very tough and she was spending 15-20 hours per week outside of class doing homework. The difficulty with doing this virtually (as many knowledge workers can attest to) is being able to lean over to see your colleague’s screen and say “try out this function here in your code” to make the learning process more fluid.

The final project at her bootcamp had to use machine learning in some capacity. Her group needed to have a big data source and they ended up using multiple APIs. They wanted to evaluate how COVID affects player performance. Questions to be answered included what if there are no fans in the audience? Would this impact player performance? One study from the NBA I found interesting was the bubble’s impact (or lack thereof) on home court advantage.

Getting data on the NBA and WNBA and training a machine learning model

The NBA was easy since the whole season was in a bubble in 2020 but the WNBA was mixed. The NBA has this great API that goes back 10 years. For WNBA, her team had to scrape the Sports Reference website. This involved manually pulling down CSVs and uploading them into their model.

At the end of the day, Caiti’s team was not able to fully train any of the machine learning models because of data inconsistencies. It’s difficult to get consistent player data because players move to different teams, they have new teammates, and get injuries during the season. Instead of training the model, her team just did a linear regression on the data available. They saw a correlation that when most of the players are in the bubble, NBA and WNBA players played better.

Current projects at Sports Innovation Lab

Caiti is currently looking at fan data and how to democratize data for the sports industry to bring more equity to women’s sports. Ultimately, she wants to make sure the hypotheses and trends claimed in the sports industry are backed up with data. Advanced systems have been created to track player and game analytics since there are a lot of second-order effects on industries like sports betting and fantasy sports. On the business side which focuses on fan metrics, the industry is still 5 years behind.

We are seeing in the entertainment and retail industries a lot more innovation in how to get data from customers and consumers. Sports hasn’t done as much with data from fans. If you don’t have understanding of fan behavior, you’re missing out on a huge contextual piece on how a team or league may appear to brands and partners.

Data tools Caiti is excited about

At the end of our conversation, Caiti shared some tools she’s super excited about learning and using with her data projects. She mentioned a nice mix of open-source and commercial tools:

  • She started using Shiny a lot to build internal dashboards. It allows her team to visualize structured data but gives them the ability to poke holes in their data. This helps them find ways to further clean up and transform the raw data.
  • Tableau is a juggernaut in the data visualization space. It has acted as a connector between the sales team and Caiti’s team who is a little more in the weeds with the data. Tableau streamlines things so Caiti’s sales team can explore data with potential clients easily.
  • A final tool is RStudio which one of Caiti’s colleagues works in a lot.

Sports Innovation Lab is hiring engineers and analysts. If you believe in their mission, contact them about potential opportunities.

Other Podcasts & Blog Posts

No other podcasts!

The post Dear Analyst #65: Eliminating biases in sports data and doing a data science bootcamp with Caiti Donovan appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-65-eliminating-biases-in-sports-data-and-doing-a-data-science-bootcamp-with-caiti-donovan/feed/ 0 When you think of sports and data, you may think about all the data collect on player performance and game stats. There's another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, When you think of sports and data, you may think about all the data collect on player performance and game stats. There's another world of sports data that is usually overlooked: the fans. In this episode, I speak with Caiti Donovan, the VP of Data & Insights at Sports Innovation Lab, a sports market research firm. Caiti started her career in marketing and business development at Viacom and Spotify where she used data storytelling to work with advertisers and partners. More recently she learned how build the data systems she was once only a consumer of. We'll discuss how she made the transition to data, getting a data science certification at The Fu Foundation School of Engineering and Applied Science at Columbia University, and current projects she's working on at Sports Innovation Lab.







Working with data at ViacomCBS and Spotify



Caiti spent 15 years in marketing and sales roles where data was a core part of her day-to-day projects. She used a lot of proprietary data systems and even helped build some of these systems. Using the data available to her, she'd take different datasets and turn the data into a format useful for data storytelling. These stories would be used for partnership development or working with advertisers. Data storytelling is a common theme on this podcast. See episode 62 with Janie Ho, episode 56 with John Napolean-Kuofie, and episode 35 on the Shape of Dreams.



At ViacomCBS, Caiti would look at the data behind shows like Jersey Shore and SpongeBob to see what type of revenue opportunities her team could create based on the audience of these shows. The data could also be analyzed to help inform content development for these shows. The goal was to understand their younger fans and figure out what it meant to have conversations with the fans of these shows.







After a stint working with a few startups in a consulting capacity, Caiti eventually landed at Spotify. At the time, Spotify had a hard time turning all the data they were sitting on into narratives in a B2B and B2C context. She worked with clients like the NBA, Ford, and Nike. In terms of the data stories she was saying to her clients from a B2B perspective, she also had to make sure it carried over to the B2C side (Spotify subscribers).



From there, Caiti made a big hump from entertainment to sports. She realized her "purpose meets passion" moment is finding ways to use data to have impact on the world. She wanted to tackle challenges faced by women in sports and also find a way to better connect with the fans of women's sports. Caiti eventually co-founded the non-profit SheIS Sport to bring together every single professional women's sports league. Through this experience, Caiti learned a lot about the biases and inequities in data in the sports world. She realized she needed more technical expertise to have a direct impact on how data is collected and analyzed in this world, and went back to school for data science (more on this later).



]]>
Dear Analyst 65 57:01 50784