Dear Analyst https://www.thekeycuts.com/category/podcast/ A show made for analysts: data, data analysis, and software. Mon, 23 May 2022 15:04:21 +0000 en-US hourly 1 https://wordpress.org/?v=5.9.3 This is a podcast made by a lifelong analyst. I cover topics including Excel, data analysis, and tools for sharing data. In addition to data analysis topics, I may also cover topics related to software engineering and building applications. I also do a roundup of my favorite podcasts and episodes. KeyCuts clean episodic KeyCuts info@thekeycuts.com info@thekeycuts.com (KeyCuts) A show made for analysts: data, data analysis, and software. Dear Analyst https://www.thekeycuts.com/wp-content/uploads/2019/03/dear_analyst_logo-1.png https://www.thekeycuts.com/excel-blog/ TV-G New York, NY New York, NY 50542147 Dear Analyst #95: Nobody ever got fired for choosing Google Sheets https://www.thekeycuts.com/dear-analyst-95-nobody-ever-got-fired-for-choosing-google-sheets/ https://www.thekeycuts.com/dear-analyst-95-nobody-ever-got-fired-for-choosing-google-sheets/#respond Mon, 23 May 2022 05:00:00 +0000 https://www.thekeycuts.com/?p=51795 The motivation for this post/episode is a selfish one (scroll to the very bottom or skip to the end of the podcast to see why). As I thought about the framing of this post during my normal “thinking” moments (commuting, on the toilet, during useless meetings), I realized I’m going to take a different approach […]

The post Dear Analyst #95: Nobody ever got fired for choosing Google Sheets appeared first on .

]]>
The motivation for this post/episode is a selfish one (scroll to the very bottom or skip to the end of the podcast to see why). As I thought about the framing of this post during my normal “thinking” moments (commuting, on the toilet, during useless meetings), I realized I’m going to take a different approach to why people should and shouldn’t use Google Sheets. I’m not going to list all the features and do a pros/cons list. This episode is more about how software (in this case Google Sheets) makes you feel. Yup, that’s right. The gooey emotions you feel from watching a rom com can also be applied to the software you use at work. This could be a stretch, but come along for the ride and let’s see where the rabbit hole takes us.

Reliability over innovation

The title of this post is a rip off of the old adage “Nobody gets fired for buying an IBM.” In the early to mid-20th century, IBM was selling their computers and mainframes to the government and the enterprise. They were seen as innovators and consequently grew to one of the largest corporations during that time. Their brand was associated with providing excellent customer service and of course top of the line machines.

Source: Science Photo Library

Over time, the innovation slowed at IBM. Nevertheless, companies continued buying from IBM. The brand and product was seen as a reliable choice over startups and new entrants. In a nutshell, you could de-risk the decision by going with a vendor that everyone else knew and trusted.

Fast forward to the present, what are the IBMs of our generation? Microsoft Office? The Google Suite? Do we reach for Google Sheets because we know everyone else on our team uses it or because it’s actually the best tool to get the job done? I think it’s mix of both.

Just good enough to get the job done

Maybe your company doesn’t have an Office 365 subscription and you need a little bit more of those collaborative and sharing features. So you reach for Google Sheets. It mimics enough of Excel’s features such that you’re comfortable with putting important company data in it. Like Excel, it’s the most utilitarian tool and all you care about is that the formulas work and that Google Sheets doesn’t go down. You don’t want to feel anything from using Google Sheets. It’s just good enough to be your CRM or event planner.

What it’s actually great for is modeling and financial analysis to help you make a decision. So it’s a decision-making tool. But since so much of our work is about making non-financial decisions, we use Google Sheets for everything because it’s just good enough. We’re ok with it not having the bells and whistles of other tools.

That’s the rub. Good enough. It doesn’t excite us like getting a match on Tinder does. It doesn’t tickle our senses like a sizzling steak does when it comes out of the kitchen. Should we feel ok with “settling” with something we have to use every day? More data is being generated from our products and services and is the only way to make sense of it all is to put it into Google Sheets?

For most people, settling with Google Sheets is absolutely fine. You don’t want your spreadsheet to incite any emotions because it’s there to do one job and one job only. Keep your data organized and help you make decisions. Anything more or less is a waste of your time.

The allure of free

No matter what function you work in, chances are you have to export your data from Salesforce, Jira, or some other internal tool into a Google Sheet. Your data is “trapped” in these SaaS tools and it’s difficult to make decisions about what bugs to work on or which email campaign to send when you don’t have the right stats in front of you.

Everyone has a Google account and Google Docs and Google Sheets are free. Why bother using anything besides Google Sheets? It’s has most of Excel’s features, it’s free, and like I said above it’s good enough.

For small businesses and startups, free and “good enough” are sufficient qualities. When I had my own startup I also used Google Sheets and Google Docs to get shit done.

But for the people out there working at big corporations who have a budget to spend on software, Google Sheets is still the default because of its ease of use. More importantly, it doesn’t have to go through the IT or procurement ringer because it’s free. You won’t ever get fired for picking up free software right?

I really like this megathread in the /r/projectmanagement subreddit where the mod basically rails on spreadsheets. You can tell he’s seen multiple big companies try to use spreadsheets for project management and there’s always a sad ending to the story:

You just work here and want to keep your job

The last place you expect to use your creative muscles is in that shared Google Sheet everyone uses to update OKRs or leave notes about inventory. Maybe put in a little extra work to format the Google Sheet or include data validation features so that people don’t accidentally add incorrect data. The goal here is to maintain the status quo and not introduce new processes or tools that would require everyone to learn something new. We have important jobs to do and features to ship. What’s the point of making the spreadsheet better (or picking a new tool) when it’s merely supporting the real things you care about?

At the end of the day, you aren’t going to risk your job on a better CRM tool or project management tracker. Even though the new and shiny software will actually solve some of your business workflow problems, the potential damage to your reputation and ego outweigh the blandness of a free Google Sheet.

That’s the problem innit? We all just work somewhere. If you’re not just at XYZ company for a higher salary and actually feel a sense of ownership with your work, you would feel less risk-averse. You would feel like making the lives of your colleagues better. And that comes down to the experience each of your teammates feel when they launch Slack, Teams, or Outlook. Is it too farfetched to say that using our “work” applications should feel like using our “personal” applications (minus the addictiveness)? We are in these applications all day anyway, why not have a user experience that delights us and have us actually enjoy updating our project statuses or aggregating a list of marketing assets?

This is starting to spiral into a soliloquy about psychological safety in the workplace and finding your purpose. We are still just talking about Google Sheets here. But the Google Sheet (like IBM) could be a symbol of the risk-aversion at your company.

The need to share

Or as the SaaS venture capital world calls it: “multiplayer mode.” The biggest value proposition for Google Sheets when it first came out in 2006 was the ability to share your spreadsheet and have your teammates collaborate with you. Take a look at the SaaS applications or software you use today and you’ll see how they are becoming more multiplayer in nature.

I think this real-time nature of Google Sheets spurred teams to adopt the lean startup model to their spreadsheets. Instead of working independently on your spreadsheet for days and weeks on end and shipping your final “product” to your team, you shared your Google Sheet internally with a giant “WIP” somewhere on the Google Sheet. You share the Google Sheet, get feedback, iterate, and the cycle continues. This results in a better output and makes your teammates feel more vested in the final model or analyses.

It’s all fun and games until the Google Sheet breaks

All parties win until the Google Sheet is only understood by a few people or maybe the one analyst who set up the whole Google Sheet. If you go back to bullet #4 in the screenshot I shared above from the /r/projectmanagement subreddit, the person who understands the formulas and structure of that Google Sheet becomes the single point of failure for that entire system.

I can’t tell you how many times I’ve been shown a Google Sheet or Google Apps Script and the person showing me it says: “Mary was our analyst who created this spreadsheet, but she just left last week.” When Mary is on the team and she’s actively managing the Google Sheet, everything is great! The team can rely on Mary. The minute she jumps ship, panic, confusion, and even resentment may creep in because you don’t know enough about the construction of the Google Sheet to fix it.

This is why SaaS applications and software that are meant to do project management, inventory management, and customer relationship management is a $144B market!

In episode #46, I walk through a project management template I created in Google Sheets. The beauty of this Google Sheet template is that it can handle task dependencies and output a hacky gantt chart/timeline. This episode/blog post is one of my top performing blog posts. It’s scary to think that teams may be using this template to manage real projects. There might be a few individuals on your team that understand how these formulas work and how to edit the template, but the rest of the team are just “consumers” of the template.

Screenshot of the project management template from episode #46

I can see the appeal of this template for a team that wants to get started with a project tool quickly. The small barrier to resistance from your teammates (it’s free) and likelihood that someone on your team understands how it works is enough for you to want to get started. But the minute a customization or additional features are needed, you are relying on someone who loves doing stuff in Google Sheets (there are a lot of them out there!) to pick up the tab.

A half-hearted way of getting work done

Is there more to your SaaS tools and workplace software than just getting your work done? Can it make you feel something? Can it inspire you to be more creative? I think we are at a moment in time where SaaS applications are so commodified that these features that tug on your emotions are the only way to differentiate.

But it’s an uphill battle. The allure of the risk-averse and free Google Sheet will keep you going back to what you feel comfortable with, or what your team or company feels comfortable using. Can you take a risk with trying that new tool you heard about without feeling like a complete loser if your team ultimately doesn’t use it?

It’s such a half-hearted way to work. You spend most of your day in tool that you know everyone hates to a certain degree. But it’s good enough, free enough, and enough to make you fit in. If you decide not to settle and take the risk, perhaps you and your team will change for the better.

How to better use Google Sheets for your team

I know that most of you who listen to this episode or read this post will think I’m crazy. We’re just talking about Google Sheets here. You won’t be risking your career on new tools or platforms and will continue using Google Sheets for all the aforementioned reasons.

Knowing that, I recently created two online classes to learn Google Sheets (specifically for a team setting). I mentioned this in the very first sentence of this episode/post. After everything that I’ve said, I have a vested interest in you using Google Sheets and wanting to learn more tips and tricks on how to use it for your team. I launched both of these classes on Skillshare, and you can learn more about them by clicking these links. The first one is a beginner class and the second one is an advanced class:

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #95: Nobody ever got fired for choosing Google Sheets appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-95-nobody-ever-got-fired-for-choosing-google-sheets/feed/ 0 The motivation for this post/episode is a selfish one (scroll to the very bottom or skip to the end of the podcast to see why). As I thought about the framing of this post during my normal "thinking" moments (commuting, on the toilet, The motivation for this post/episode is a selfish one (scroll to the very bottom or skip to the end of the podcast to see why). As I thought about the framing of this post during my normal "thinking" moments (commuting, on the toilet, during useless meetings), I realized I'm going to take a different approach to why people should and shouldn't use Google Sheets. I'm not going to list all the features and do a pros/cons list. This episode is more about how software (in this case Google Sheets) makes you feel. Yup, that's right. The gooey emotions you feel from watching a rom com can also be applied to the software you use at work. This could be a stretch, but come along for the ride and let's see where the rabbit hole takes us.







Reliability over innovation



The title of this post is a rip off of the old adage "Nobody gets fired for buying an IBM." In the early to mid-20th century, IBM was selling their computers and mainframes to the government and the enterprise. They were seen as innovators and consequently grew to one of the largest corporations during that time. Their brand was associated with providing excellent customer service and of course top of the line machines.



Source: Science Photo Library



Over time, the innovation slowed at IBM. Nevertheless, companies continued buying from IBM. The brand and product was seen as a reliable choice over startups and new entrants. In a nutshell, you could de-risk the decision by going with a vendor that everyone else knew and trusted.



Fast forward to the present, what are the IBMs of our generation? Microsoft Office? The Google Suite? Do we reach for Google Sheets because we know everyone else on our team uses it or because it's actually the best tool to get the job done? I think it's mix of both.



Just good enough to get the job done



Maybe your company doesn't have an Office 365 subscription and you need a little bit more of those collaborative and sharing features. So you reach for Google Sheets. It mimics enough of Excel's features such that you're comfortable with putting important company data in it. Like Excel, it's the most utilitarian tool and all you care about is that the formulas work and that Google Sheets doesn't go down. You don't want to feel anything from using Google Sheets. It's just good enough to be your CRM or event planner.



What it's actually great for is modeling and financial analysis to help you make a decision. So it's a decision-making tool. But since so much of our work is about making non-financial decisions, we use Google Sheets for everything because it's just good enough. We're ok with it not having the bells and whistles of other tools.







That's the rub. Good enough. It doesn't excite us like getting a match on Tinder does. It doesn't tickle our senses like a sizzling steak does when it comes out of the kitchen. Should we feel ok with "settling" with something we have to use every day? More data is being generated from our products and services and is the only way to make sense of it all is to put it into Google Sheets?



For most people, settling with Google Sheets is absolutely fine. You don't want your spreadsheet to incite any emotions because it's there to do one job and one job only. Keep your data organized and help you make decisions. Anything more or less is a waste of your time.



The allure of free



No matter what function you work in, chances are you have to export your data from Salesforce, Jira, or some other internal tool into a Google Sheet. Your data is "trapped" in these SaaS tools and it's difficult to make decisions about what bugs to work on or which email...]]>
Dear Analyst 95 29:39 51795
Dear Analyst #94: Helen Mary Barrameda on having a “portfolio career” prior to being a data analyst, winning the NASA Space Apps challenge, and wfh tips https://www.thekeycuts.com/helen-mary-barrameda-on-having-a-portfolio-career-prior-to-being-a-data-analyst-winning-the-nasa-space-apps-challenge-and-wfh-tips/ https://www.thekeycuts.com/helen-mary-barrameda-on-having-a-portfolio-career-prior-to-being-a-data-analyst-winning-the-nasa-space-apps-challenge-and-wfh-tips/#respond Mon, 16 May 2022 05:37:00 +0000 https://www.thekeycuts.com/?p=51630 A common theme I’ve noticed from talking with many data analysts and data engineers is that they didn’t come from a “data” background. Helen Mary Barrameda someone who exemplifies this theme. She is based in the Philippines and started her career as a freelance writer in 2004 writing lifestyle pieces. She used her earnings from […]

The post Dear Analyst #94: Helen Mary Barrameda on having a “portfolio career” prior to being a data analyst, winning the NASA Space Apps challenge, and wfh tips appeared first on .

]]>
A common theme I’ve noticed from talking with many data analysts and data engineers is that they didn’t come from a “data” background. Helen Mary Barrameda someone who exemplifies this theme. She is based in the Philippines and started her career as a freelance writer in 2004 writing lifestyle pieces. She used her earnings from her writing gigs to pay for her engineering education, and eventually became a geodata engineer working in the field of Geographic Information Systems (GIS). In this episode, she discusses falling in love with Python, how her engineering background helps with a career in data, working from home before everyone did it, winning the a NASA challenge, and more.

How web development can help with your data analyst skills

Helen started her career in geomatics. Prior to this conversation I didn’t know much about this industry:

Geomatics is the integrated approach of measurement, analysis, and management of the descriptions and locations of geospatial data.

Source: University of Florida

Over time, Helen realized there wasn’t a lot of creativity in the geomatics space. Most of her time was spent doing research and publishing for scientific journals. It was time for a change.

Due to personal reasons, she decided to look for jobs and projects she could do entirely at home before the working from home trend really started. Given the engineering background, she started picking up projects in web development marketing automation, data analysis. She realized she could be a lot more creative with these web development and data jobs versus her old career in geomatics.

Falling into data engineering

As Helen continued taking on projects while working from home, data became more prominent in her projects. She worked with an ad tech company where she helped with getting data from their website into their CRM. She was doing a lot of data cleaning and was actually doing ETL (Extract Transform Load) for her clients. Before she knew it, she was doing data engineering work.

Helen decided to find some data science workshops in the Philippines and started working on a Master’s degree in data science. While her geodata engineering experience helped with some of the coding skills required in her data science projects, she felt that learning algorithms was still difficult.

Like many people I’ve spoken with on this podcast, people fall into data analysis and data engineering. You work on data projects without realizing you are actually doing things that a data scientist is doing. And if you want to formalize the skills you are acquiring, you can go back to school like Helen did. In Helen’s words:

I learned the practical application of data skills before learning the theory.

Building a “portfolio career”

As Helen discussed her various roles and projects, she brought up a phrase I haven’t heard before when you are progressing through your career: the “portfolio career.”

In HR speak, I think this is analogous to people who call themselves “generalists.” Helen has gotten exposure to a variety of industries and people. She brought up an interesting point about how most data analysts progress through their careers. They are laser focused on their field or industry, and maybe don’t have the time, need, or desire to interact with the C-suite, for instance. Helen talked about the benefits of working with all types of people in your organization to help move your projects along; especially if politics or bureaucracy is slowing your project down.

I’ve spoken before on the benefits of being a generalist in your career. Take listen to this old episode from 2019 where I discuss David Epstein’s book Range: Why Generalists Triumph In A Specialized World. Cliff notes from the book via Four Minute Books:

  1. To become excellent, don’t specialize early in life, experiment with many different paths. 
  2. You will be better at innovating and more successful if you have a breadth of experience. 
  3. The more famous you become for being an expert in one area, the more likely it is that you will be terrible at making accurate predictions about your field. 

The traditional thinking behind being a specialist is that you can become the only person that understands how X works and therefore you maximize your “rate” for being an expert. I don’t think there’s a right answer, but personally I’ve found decent success from being a generalist.

Pregnancy, weightlifting, and Excel

The conversation took an interesting turn as Helen started talking about her pregnancy. While she was pregnant, Helen collected a ton of data about her health and vitals. Most of the data she collected was from analog devices like a blood pressure reader. With all these data points, she eventually created some descriptive statistics that she could share with her doctor during her pregnancy. Imagine being the doctor who gets to analyze data that’s already well organized and formatted!

Helen’s interest in tracking her health carried over into her weightlifting passion. She got addicted to weightlifting during the pandemic and started tracking the calories she was burning in Excel. This eventually led her into the field of biohacking.

Most of the data she was tracking came from her Apple Watch. She found a way to export the data from her Apple Watch into a CSV and could analyze her data in Excel or some other tool once the raw data became available.

Winning the 2020 NASA Space Apps Challenge

During the pandemic, Helen was working for a company called Cirrolytix and the company sponsored a team for the 2020 NASA Space Apps challenge. At the time, Helen was doing her Master’s degree in data science. Helen eventually joined a team along with some other data science students from her program.

NASA’s Space Apps challenge focused solving challenges brought on by the pandemic. The hackathon was done virtually for the firs time. Out of 2,000 teams that entered, Helen’s team was one of the 6 finalists selected. Helen’s project was called G.I.D.E.O.N. (Global Impact Detection from Emitted Light, Onset of Covid-19, and Nitrogen Dioxide). Here’s a summary of the project from their website:

GIDEON is an integrated public policy information portal that aims to measure the impact of COVID on various countries and its effect on economic and environmental terms. The countries that are able to contain COVID while keeping their economy afloat with minimal impact to the environment stand the best chance of sustainably bouncing back after this crisis.

The result was a traffic light system for whether a country could open back up given the current spread of COVID. Here’s an example of the country-level dashboard the tool could create using Helen’s home country:

Source

Even though the hackathon was only 2 days, Helen’s team was able to tie a bunch of datasets together to help create this traffic light system. It’s interesting how the team used emitted light and nitrogen dioxide levels obtained from satellite imagery to look for trends. Take a look at their methodology and insights on their project website.

How to be productive while working at home

As the episode wrapped up, Helen gave some tips on how to be productive while working from home. At the end of the day, she said it’s all about energy management instead of time management. A few things she does to manage her energy:

  • Aesthetics of your work station impacts how productive you are
  • She breaks up the day with menial tasks like creating a grocery list
  • Meditation and breathing work

She mentioned an app called Focusmate that matches you with strangers who also want to be more productive and get work done. Accountability is the name of the game here and that’s how Focusmate has helped Helen when she’s really feeling a lack of productivity.

Source: CCSalesPro

Other Podcasts & Blog Posts

No other podcasts mentioned in this episode!

The post Dear Analyst #94: Helen Mary Barrameda on having a “portfolio career” prior to being a data analyst, winning the NASA Space Apps challenge, and wfh tips appeared first on .

]]>
https://www.thekeycuts.com/helen-mary-barrameda-on-having-a-portfolio-career-prior-to-being-a-data-analyst-winning-the-nasa-space-apps-challenge-and-wfh-tips/feed/ 0 A common theme I've noticed from talking with many data analysts and data engineers is that they didn't come from a "data" background. Helen Mary Barrameda someone who exemplifies this theme. She is based in the Philippines and started her career as a ... A common theme I've noticed from talking with many data analysts and data engineers is that they didn't come from a "data" background. Helen Mary Barrameda someone who exemplifies this theme. She is based in the Philippines and started her career as a freelance writer in 2004 writing lifestyle pieces. She used her earnings from her writing gigs to pay for her engineering education, and eventually became a geodata engineer working in the field of Geographic Information Systems (GIS). In this episode, she discusses falling in love with Python, how her engineering background helps with a career in data, working from home before everyone did it, winning the a NASA challenge, and more.







How web development can help with your data analyst skills



Helen started her career in geomatics. Prior to this conversation I didn't know much about this industry:



Geomatics is the integrated approach of measurement, analysis, and management of the descriptions and locations of geospatial data.Source: University of Florida



Over time, Helen realized there wasn't a lot of creativity in the geomatics space. Most of her time was spent doing research and publishing for scientific journals. It was time for a change.



Due to personal reasons, she decided to look for jobs and projects she could do entirely at home before the working from home trend really started. Given the engineering background, she started picking up projects in web development marketing automation, data analysis. She realized she could be a lot more creative with these web development and data jobs versus her old career in geomatics.







Falling into data engineering



As Helen continued taking on projects while working from home, data became more prominent in her projects. She worked with an ad tech company where she helped with getting data from their website into their CRM. She was doing a lot of data cleaning and was actually doing ETL (Extract Transform Load) for her clients. Before she knew it, she was doing data engineering work.



Helen decided to find some data science workshops in the Philippines and started working on a Master's degree in data science. While her geodata engineering experience helped with some of the coding skills required in her data science projects, she felt that learning algorithms was still difficult.



Like many people I've spoken with on this podcast, people fall into data analysis and data engineering. You work on data projects without realizing you are actually doing things that a data scientist is doing. And if you want to formalize the skills you are acquiring, you can go back to school like Helen did. In Helen's words:



I learned the practical application of data skills before learning the theory.



Building a "portfolio career"



As Helen discussed her various roles and projects, she brought up a phrase I haven't heard before when you are progressing through your career: the "portfolio career."



In HR speak, I think this is analogous to people who call themselves "generalists." Helen has gotten exposure to a variety of industries and people. She brought up an interesting point about how most data analysts progress through their careers. They are laser focused on their field or industry, and maybe don't have the time, need,]]>
Dear Analyst 94 43:07 51630
Dear Analyst #93: How to bring data literacy to schools and teaching Python with Sean Tibor and Kelly Schuster-Paredes https://www.thekeycuts.com/dear-analyst-93-how-to-bring-data-literacy-to-schools-and-teaching-python-with-sean-tibor-and-kelly-schuster-paredes/ https://www.thekeycuts.com/dear-analyst-93-how-to-bring-data-literacy-to-schools-and-teaching-python-with-sean-tibor-and-kelly-schuster-paredes/#respond Mon, 09 May 2022 06:25:00 +0000 https://www.thekeycuts.com/?p=51602 This episode is quite different from other episodes for a few reasons. One, it’s the first time I’ve had two guests on the show at the same time. Second, it’s the first time I’ve had educators on the show. Third, the guests have a podcast about Python so they taught me a thing or two […]

The post Dear Analyst #93: How to bring data literacy to schools and teaching Python with Sean Tibor and Kelly Schuster-Paredes appeared first on .

]]>
This episode is quite different from other episodes for a few reasons. One, it’s the first time I’ve had two guests on the show at the same time. Second, it’s the first time I’ve had educators on the show. Third, the guests have a podcast about Python so they taught me a thing or two about interviewing guests on a show :). Kelly Schuster-Paredes starting teaching Python in middle school about four years ago. Sean Tibor also taught Python in middle school but transitioned to a cloud engineering role earlier this year. We chat about teaching data literacy in middle schools, developing empathy, the AP Computer Science exam, and the Teaching Python podcast.

Data literacy and Python for middle school students

Hearing the words “Python,” “data literacy,” and “middle school” in the same sentence is foreign to me. When I was in middle school in the late 90s, the only exposure we had to computers was the one computer in every classroom we sometimes got to play computer games on. In high school, there was only one computer science class and the only language you could learn was C++.

This might just be my “get off my front lawn” moment. The classroom has obviously changed a lot since the late 90s, and Kelly and Sean are at the forefront of this change.

They talked about the rigid rules you typically come across when it comes to learning math and science. You might use a graphing calculator in a math class, but then use a different set of math “tools” in science class. At the end of the day, it’s all just data and how you store and manipulate it to get the results you need. Instead of using a calculator in physics, perhaps you could write a simple program to solve the problem.

Kelly and Sean eventually developed a curriculum centered around Python. They don’t teach Python to their students the way you might normally learn computer science at university. At university and in bootcamps, you’re usually given the practical knowledge and skills to be proficient in solving problems. In middle school, students learn with their “entire being” while adults learn the concrete things to get the job done. It’s all about making a connection with the students, according to Kelly and Sean

How does data fit into math and science classes?

Both science and math classes involve collecting and analyzing a lot of data. But how is that data stored and interpreted? Kelly and Sean talk about how the only class that involves storing data in tables is in science class. In math and science classes, you might draw graphs on paper or on a graphing calculator. But how do you go from that paradigm of teaching to millions of rows of data stored in a database? In math class, it’s a bit tougher to integrate data subjects because the goal of the class is to eventually be good at calculus.

Teaching skills that are used in the workplace

One of my favorite Freakonomics episodes is #391 where Stephen Dubner talks to various experts and academics about the math curriculum taught at middle and high schools. The theme of the episode is that teachers are still teaching math like we are preparing students for going to the moon. In reality, students just need to learn how to use Excel, PowerPoint, and Google Sheets since these are the tools they would use every day in the workplace.

I used to lean more heavily on the side of teaching the practical skills in middle and high schools. From talking with Kelly and Sean, I’ve started shifting my position to somewhere in the middle. Sean talks about how you still need to have English, Art, and Social Studies to create a balanced student. Adults, on the other hand, are more pragmatic (hence the rise of these data science bootcamps).

New methods and strategies for teaching math in the 21st century

Kelly brought up a really neat website that teaches math and data science in a unique way. Stanford’s Graduate School of Education has created YouCubed, a collection of activities and tasks to help K-12 students learn data fundamentals. One Kindergarten exercise Kelly talked about is this Popular Fruits exercise showing fruits in different sized circles. This exercise aims to show the power of data visualization:

Source: Youcubed

I love how one of the questions in this exercise is “what do you wonder?” How often are we asked that anymore during your Zoom calls?

Coincidentally, Jo Boaler, one of the creators of YouCubed, is mentioned in the Freakonomics episode on changing the math curriculum in schools. She was part of the Math Wars in the early 2000s. This was a debate between reformists and traditionalists on how math should be taught in schools.

Source: Stanford Graduate School of Education

So, teaching is always very hard to change because people learn it from their own school days, and then they want to become the maths teacher they had. Well, maths teachers do anyway. And when people have tried to change, they’ve really received aggressive pushback, which has caused some of them to sort of withdraw and go back into teaching the way that they were.

Jo Boaler

Boaler goes on in the episode to discuss why the current “traditionalist” curriculum hasn’t caught up with the data and computing skills students need to learn to succeed:

When we look at the world out there and the jobs students are going to have, many students will be working with big data sets. So, we haven’t adapted to help students in the most important job many people will do, which is to work with data sets in different ways. So, statistics is really important, as a course, but is under-played. This is a fifth of the curriculum in England and has been for decades. But here in the U.S., it’s sort of a poor cousin to calculus.

Jo Boaler

The College Board and the AP Computer Science exam

We chatted a bit about the AP CS exam and how it’s not supposed to focus on a specific language, but rather the discipline of computer science. Today, the AP CS exam does require the knowledge of Java. When I took it, it was C++. Sean talked about how the discipline of computer science is very different from being able to solve problems with code or technology. If the goal is to test this specific skill, you would need open-ended questions on the exam and that would be too difficult to grade.

Students ask Sean and Kelly about how they can look more attractive to colleges and universities, and Sean and Kelly have to explain to students that they are more than just a AP or SAT score. Standardized tests are starting to change but Sean has some suggestions on how students can showcase their talents beyond these scores. Building an online portfolio where you analyze data and present your insights shows the public what you are capable of. Since more than 80% of 4-year universities have dropped the SAT requirement, students should develop new ways to stand out beyond scores that don’t completely show what their potential could be.

Teaching Python on teaching Python

As I mentioned earlier, Kelly and Sean have a podcast called Teaching Python all about…teaching Python. Here’s a quick blurb about the podcast from their website:

A podcast by Kelly Paredes and Sean Tibor about their adventures teaching middle school computer science, problem-solving, handling failure, frustration, and victory through the lens of the Python programming language.

I asked why Python became the programming language of choice, and Sean described Python as being the “2nd best option” behind spreadsheets for data processing. Kelly and Sean started the podcast about 3 years ago, and their favorite episode is the first episode they ever recorded. Aptly named Hello World.

Source: Hackernoon

There are many ways to learn Python, and what makes their podcast different is that they are teaching teachers how to teach Python. Some other episodes they mentioned that are worth listening to:

When I asked what their goals are with the podcast, Kelly summed it up very nicely:

I just want to meet amazing people around the world

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #93: How to bring data literacy to schools and teaching Python with Sean Tibor and Kelly Schuster-Paredes appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-93-how-to-bring-data-literacy-to-schools-and-teaching-python-with-sean-tibor-and-kelly-schuster-paredes/feed/ 0 This episode is quite different from other episodes for a few reasons. One, it's the first time I've had two guests on the show at the same time. Second, it's the first time I've had educators on the show. Third, This episode is quite different from other episodes for a few reasons. One, it's the first time I've had two guests on the show at the same time. Second, it's the first time I've had educators on the show. Third, the guests have a podcast about Python so they taught me a thing or two about interviewing guests on a show :). Kelly Schuster-Paredes starting teaching Python in middle school about four years ago. Sean Tibor also taught Python in middle school but transitioned to a cloud engineering role earlier this year. We chat about teaching data literacy in middle schools, developing empathy, the AP Computer Science exam, and the Teaching Python podcast.







Data literacy and Python for middle school students



Hearing the words "Python," "data literacy," and "middle school" in the same sentence is foreign to me. When I was in middle school in the late 90s, the only exposure we had to computers was the one computer in every classroom we sometimes got to play computer games on. In high school, there was only one computer science class and the only language you could learn was C++.



This might just be my "get off my front lawn" moment. The classroom has obviously changed a lot since the late 90s, and Kelly and Sean are at the forefront of this change.



They talked about the rigid rules you typically come across when it comes to learning math and science. You might use a graphing calculator in a math class, but then use a different set of math "tools" in science class. At the end of the day, it's all just data and how you store and manipulate it to get the results you need. Instead of using a calculator in physics, perhaps you could write a simple program to solve the problem.







Kelly and Sean eventually developed a curriculum centered around Python. They don't teach Python to their students the way you might normally learn computer science at university. At university and in bootcamps, you're usually given the practical knowledge and skills to be proficient in solving problems. In middle school, students learn with their "entire being" while adults learn the concrete things to get the job done. It's all about making a connection with the students, according to Kelly and Sean







How does data fit into math and science classes?



Both science and math classes involve collecting and analyzing a lot of data. But how is that data stored and interpreted? Kelly and Sean talk about how the only class that involves storing data in tables is in science class. In math and science classes, you might draw graphs on paper or on a graphing calculator. But how do you go from that paradigm of teaching to millions of rows of data stored in a database? In math class, it's a bit tougher to integrate data subjects because the goal of the class is to eventually be good at calculus.







Teaching skills that are used in the workplace



One of my favorite Freakonomics episodes is #391 where Stephen Dubner talks to various experts and academics about the math curriculum taught at middle and high schools. The theme of the episode is that teachers are still teaching math like we are preparing students for going to the moon. In reality, students just need to learn how to use Excel, PowerPoint, and Google Sheets since these are the tools they would use every day in the workplace.



I used to lean more heavily on the side of teaching the practical skills in middle and high schools. From talking with Kelly and Sean, I've started shifting my position to somewhere in the mi...]]>
Dear Analyst 93 41:40 51602
Dear Analyst #92: Generating insights from vehicle telemetry data and crafting a data strategy with Victor Rodrigues https://www.thekeycuts.com/dear-analyst-92-generating-insights-from-vehicle-telemetry-data-and-crafting-a-data-strategy-with-victor-rodrigues/ https://www.thekeycuts.com/dear-analyst-92-generating-insights-from-vehicle-telemetry-data-and-crafting-a-data-strategy-with-victor-rodrigues/#respond Mon, 25 Apr 2022 05:35:00 +0000 https://www.thekeycuts.com/?p=51605 Data can come from different places, and one area I don’t hear about too often is from vehicles. Victor Rodrigues is from Brazil and transitioned into a career in data six years ago. Before that, he was working in various IT including network and infrastructure administration. He eventually relocated to Dublin working as a cloud […]

The post Dear Analyst #92: Generating insights from vehicle telemetry data and crafting a data strategy with Victor Rodrigues appeared first on .

]]>
Data can come from different places, and one area I don’t hear about too often is from vehicles. Victor Rodrigues is from Brazil and transitioned into a career in data six years ago. Before that, he was working in various IT including network and infrastructure administration. He eventually relocated to Dublin working as a cloud specialist for Microsoft helping organizations with digital transformation. We discuss a specific data project involving vehicle telemetry data, when data and the real world collide, and selling data solutions into the enterprise.

Pulling performance data about bus fleets and trucks

Who knew that cars generated so much data? With all the sensors and chips installed in cars these days, data is constantly being generated in real-time? According to statista, 25GB of data is generated every hour by modern connected cars:

One of Victor’s first roles in the data field was as a data engineer setting up data pipelines for collecting data from vehicles. The startup he worked at helped its customers collect telemetry data about their buses, trucks, and other vehicles. They would deliver insights from this data back to the customer. The North Star goal is to reduce the costs per kilometer by any means possible.

Vehicles have various apps and IoT devices collecting and producing data. The data collected would include how long is a car stopping in traffic? How often is the engine running? Victor’s job involved building ETL/ELT pipelines to collect this raw data and transform it into a data model that could be used for analytics and reporting.

Don’t sleep on data strategy and architecture

Before Victor could get into the fun parts of analyzing the data, he had to build out a data strategy and architecture. This is the part where you have to decide which tools is best for a specific part of the data pipeline.

Do you go with Google Cloud or Microsoft Azure? What is the best architecture? What’s the most cost-effective solution? I remember when I was studying for the AWS Solutions Architect exam, I came across AWS’ Well-Architected Framework. These are playbooks for picking the right tools (within the AWS ecosystem) for various use cases and scenarios:

In setting the data strategy and architecture, the main variable that affected Victor’s decision was cost. His team first started with Google Cloud and piping data into BigQuery, Google Cloud’s main data warehouse solution. All of the big data warehouse tools allow you to trial the platform before throwing all your data in. He found that BigQuery was the most cost effective solution, but collecting data in Google Cloud wasn’t as great as other cloud providers.

The ultimate architecture looked something like this:

  • Ingest billions of rows of data in Microsoft Azure
  • Pipe the data into Google Cloud Bigquery for data modeling and analytics
  • Use Tableau and PowerBI for data visualization

Finding insights from the data and delivering impact

Victor had all this data streaming into his multi-cloud architecture, so what happens next? He helped figure out what KPIs to track and what insights his team would deliver to the customer. Here are a few insights Victor gleaned from the data and the recommendations they suggested to the customer.

1) Hitting the clutch too often

One of his customers managed a fleet of buses. Through the data, Victor found that certain bus drivers were pressing the clutch too often. This would lead to the clutch wearing out and ultimately would hurt the engine on the bus. This leads to more costs for maintaining the buses. The simple recommendation was to reduce hitting the clutch. Perhaps this came in the form of new training for bus drives.

Source: Road & Track

2) Running air conditioning while vehicle is idle

It gets pretty hot in Brazil. Victor found that delivery trucks would sometimes sit idle but the engine and air conditioning in the truck would still be running. This doesn’t seem too strange if a delivery person is making a quick delivery. They drive to their destination, get out of the truck, make the delivery, and get on with their way. It wouldn’t be out of the ordinary for the engine and AC to stay running for 5-10 minutes while these activities are happening.

What is out of the ordinary is the engine and AC running for an hour or more while the truck is idling. Turns out delivery drivers would stop for their lunch break and keep the AC running so that when they got back in their trucks, the cabin was nice and cool. Across a fleet of trucks, this behavior would obviously add to gas costs. The tough recommendation here is to deal with the heat when you return to the truck and keep the AC off while you’re at lunch.

Source: FieldVibe

With both of these insights, Victor says it’s one thing to see the data in a data viz tool on your screen. It’s a whole other world when you comprehend the real-world impact and effects of the data. In the air conditioning example, the data shows the engine running for long periods of time. As an analyst, you have to be a detective and figure out what the real underlying cause is. I think it’s easy forget that this data is usually driven by human beings taking normal human actions.

Creating and selling data solutions for organizations

At Microsoft, Victor’s role involves helping customers figure out what they can do with their data at lower costs. Like many who work in enterprise sales, customer discovery is the number one priority. The tools, platform, and technology are secondary.

Victor says what he sees in the field a lot are organizations who are simply collecting data, but not doing anything with the data. He meets with C-level executives who are very bullish on digital transformation at their respective organizations. The problem is that some these people who join these organizations may not have come from a data background (similar to Victor). At the C-level, it’s a lot of education in terms of tying data solutions to business goals. While these conversations are happening, Victor also works “bottoms-up” by getting the organization’s developers on board.

Source: TechCrunch

As Victor reflects on his wide-ranging career in data, he offers some advice to those who are also thinking about transitioning to a career in data. You gain all this product experience in each organization you join, and it could culminate in being a data consultant for one of the big public cloud companies.

Building your data career through skills, certifications, and community

One thing I didn’t value before is the ability to sell something.

Getting a job in data is like any job. You’re going to be selling your skills (listen to episode #90 with Tyler Vu to learn more about this). As a potential shortcut, Victor suggests getting certifications. All the big cloud companies like AWS and Google offer industry-accepted certifications that show you have the fundamentals to get the job done. Furthermore, Victor suggests participating in the various communities behind the tools and platforms you’re learning like Python and SQL.

Speaking from experience, the Excel community is full of intelligent, helpful, and creative people. I started blogging about Excel almost 10 years ago because I saw others doing it in the community. One of my favorite blog posts from the archives is my recap of the 2013 Modeloff competition where the Excel community came together to solve Excel riddles. This image with Mr. Excel always brings me joy:

Bill Jelen (Mr. Excel) kicking off a head-to-head competition at Modeloff 2013.

What’s coming up next for Microsoft

I’ve been pretty impressed with the pace of updates from the Excel team. You’ll generally see updates every few weeks. For instance, SPLIT() is a function that was available in Google Sheets for a while. Excel launched their own version called TEXTSPLIT() last month (among other updates). I asked Victor what new shiny toys are coming out on the Azure side, and he talked about data mesh, HoloLens, metaverse apps, 5G, and more.

Other Podcasts & Blog Posts

No other podcasts mentioned in this episode!

The post Dear Analyst #92: Generating insights from vehicle telemetry data and crafting a data strategy with Victor Rodrigues appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-92-generating-insights-from-vehicle-telemetry-data-and-crafting-a-data-strategy-with-victor-rodrigues/feed/ 0 Data can come from different places, and one area I don't hear about too often is from vehicles. Victor Rodrigues is from Brazil and transitioned into a career in data six years ago. Before that, he was working in various IT including network and infra... Data can come from different places, and one area I don't hear about too often is from vehicles. Victor Rodrigues is from Brazil and transitioned into a career in data six years ago. Before that, he was working in various IT including network and infrastructure administration. He eventually relocated to Dublin working as a cloud specialist for Microsoft helping organizations with digital transformation. We discuss a specific data project involving vehicle telemetry data, when data and the real world collide, and selling data solutions into the enterprise.







Pulling performance data about bus fleets and trucks



Who knew that cars generated so much data? With all the sensors and chips installed in cars these days, data is constantly being generated in real-time? According to statista, 25GB of data is generated every hour by modern connected cars:







One of Victor's first roles in the data field was as a data engineer setting up data pipelines for collecting data from vehicles. The startup he worked at helped its customers collect telemetry data about their buses, trucks, and other vehicles. They would deliver insights from this data back to the customer. The North Star goal is to reduce the costs per kilometer by any means possible.



Vehicles have various apps and IoT devices collecting and producing data. The data collected would include how long is a car stopping in traffic? How often is the engine running? Victor's job involved building ETL/ELT pipelines to collect this raw data and transform it into a data model that could be used for analytics and reporting.



Don't sleep on data strategy and architecture



Before Victor could get into the fun parts of analyzing the data, he had to build out a data strategy and architecture. This is the part where you have to decide which tools is best for a specific part of the data pipeline.







Do you go with Google Cloud or Microsoft Azure? What is the best architecture? What's the most cost-effective solution? I remember when I was studying for the AWS Solutions Architect exam, I came across AWS' Well-Architected Framework. These are playbooks for picking the right tools (within the AWS ecosystem) for various use cases and scenarios:







In setting the data strategy and architecture, the main variable that affected Victor's decision was cost. His team first started with Google Cloud and piping data into BigQuery, Google Cloud's main data warehouse solution. All of the big data warehouse tools allow you to trial the platform before throwing all your data in. He found that BigQuery was the most cost effective solution, but collecting data in Google Cloud wasn't as great as other cloud providers.



The ultimate architecture looked something like this:



* Ingest billions of rows of data in Microsoft Azure* Pipe the data into Google Cloud Bigquery for data modeling and analytics* Use Tableau and PowerBI for data visualization



Finding insights from the data and delivering impact



Victor had all this data streaming into his multi-cloud architecture, so what happens next? He helped figure out what KPIs to tr...]]>
Dear Analyst 92 34:14 51605
Dear Analyst #91: Growing Peloton’s product analytics team and growth funnel experimentation at Superhuman with Elena Dyachkova https://www.thekeycuts.com/growing-pelotons-product-analytics-team-and-growth-funnel-experimentation-at-superhuman-with-elena-dyachkova/ https://www.thekeycuts.com/growing-pelotons-product-analytics-team-and-growth-funnel-experimentation-at-superhuman-with-elena-dyachkova/#comments Mon, 18 Apr 2022 05:40:00 +0000 https://www.thekeycuts.com/?p=51624 I first heard Elena speak on another podcast and was shocked to hear an analyst talk about one of the biggest companies to emerge during the pandemic: Peloton. Someone from the inside, as it were, is talking about topics that Peloton would likely want to keep confidential. Due to PR and the restrictions that come […]

The post Dear Analyst #91: Growing Peloton’s product analytics team and growth funnel experimentation at Superhuman with Elena Dyachkova appeared first on .

]]> I first heard Elena speak on another podcast and was shocked to hear an analyst talk about one of the biggest companies to emerge during the pandemic: Peloton. Someone from the inside, as it were, is talking about topics that Peloton would likely want to keep confidential. Due to PR and the restrictions that come with an NDA, Elena couldn’t come on the podcast to talk about the data projects she was working last year. A few months ago, Elena became a principal data scientist at Superhuman, and is now able to share a little more about her experience at Peloton. As an avid user of Peloton’s bike and app, I was extremely excited to dig into the types of projects Elena was working on that shaped Peloton’s product roadmap. We get dive into the world of product analytics at Elena’s former employer and at her current gig at Superhuman.

Joining Peloton as the first product analyst in 2018

Hard to believe, but Peloton was just a bike company when Elena first joined the company. Peloton was working on the Tread product at the time but it hadn’t been released. The product team wanted to make more data-driven decisions to help inform what product features to build next. There was an existing business intelligence team who was building reports around sales and marketing campaigns, but no product analysts were there to help guide the product roadmap. Elena was hired in 2018 to help build a practice around product analytics.

Source: Buzzfeed

Surprisingly, Elena didn’t really have direct product analytics experience. In her previous roles, she was more of a product owner and did things more akin to a product manager. It was an opportunity for Elena to define the purpose of the product analytics team and how they interacted with the rest of the business.

Answering data questions from the business

As one might imagine, the product team at Peloton is asking what features might increase repeat engagement on the app. It was Elena’s job to answer these type of questions. Like many organizations, the analytics function will get inundated with questions and Elena found herself with a backlog of important questions to answer.

There are so many great quotes and gifs from Peloton to include in this post.

As Peloton grew, so did the number of questions. Elena started hiring a team and was leading the process for how questions get asked, who gets to ask these questions, and how the product analytics team engages with internal stakeholders. Elena eventually grew the product analytics team to 14. As the analytics team grew, Elena had to maintain a ratio of product managers and analysts as the product analytics manager.

Elena mentioned a few resources that helped her with learning about product analytics, engagement, and user activation:

Establishing cross-functional KPIs and metrics definitions

In addition to getting asked data questions, Elena’s team was in charge of building dashboards showing stats about product usage and metrics. This series of events might also sound familiar to many of you who spend your days building dashboards in Mode, Looker, or Amplitude:

  1. Executives ask you to create a dashboard looking at 20 different metrics
  2. You and your team go off and build the dashboard
  3. You present the nice shiny dashboard to the executives
  4. Executives say they would like to see the dashboard cut and sliced in another 20 different ways
  5. Repeat steps 1-4 every few months

The above process is obviously not sustainable.

It takes a special type of person to step back and detach from the everyday humdrum of data requests and re-define what the product analytics team does. During this planning phase, this person also needs to take on the monumental task of getting alignment on metrics definitions.

Elena’s strategy was to tie the KPIs to Peloton’s product strategy. Feels like common sense for a company that produces hardware and software products right? But this goes deeper than answering the question: what’s on our product roadmap for next quarter?

The bigger questions to answer include what is the mission of our product? What are our users’ beliefs about our products? We’re getting into the realm of user research and feedback. But if there are KPIs to be established, you best believe that your users should be a big factor.

Breaking down Peloton’s metrics and KPIs

Elena’s team utilized the North Star framework in order to create KPIs everyone could agree on:

The North Star Framework is a model for managing products by identifying a single, crucial metric (the North Star Metric) that, according to Sean Ellis “best captures the core value that your product delivers to [its] customers.”

Amplitude Blog
Source: Amplitude Blog

What I like about this framework is that it combines both leading and lagging indicators to to help your team figure out if you are driving customer value and getting at your team’s North Star.

Elena put together a cross-functional team consisting of engineering, product, and design leaders. During this phase, no dashboards were shown. The goal was to simply brainstorm on what these input metrics might be.

For instance, Peloton’s regular paid users would be a leading indicator in this model for revenue. As stated earlier, other input metrics related to how you might keep users engaged with the bike and app. Other metrics Elena’s team looked at:

  • User acquisition
  • Growth funnels
  • Free to paid user conversion rate

I discovered a few things about the word of product analytics from talking with Elena:

  1. The metrics may not necessarily directly lead to dollars. The KPIs are pointing at the North Star of increasing user engagement.
  2. KPIs are different from OKRs. KPIs are “health” metrics for how the product is doing. OKRs are quantifiable goals based on the company’s big strategic bets.

Doing data exploration on non-bike exercises or fitness disciplines

For those new to Peloton, there are many other exercise modalities than biking. Running, strength, and meditation are just a few of the other exercise “disciplines” Peloton offers to its members. It was interesting hearing Elena use the word “discipline” since I happen to come across this word when doing my own data exploration of Peloton’s API.

Peloton’s upcoming Guide camera system. Source: The Verge

One data exploration Elena’s team did was looking into exercise variety. The question to answer:

Does pushing notifications of other types of fitness disciplines to the member result in higher retention?

If you’re someone who always uses the bike or only does strength classes, would being suggested a running exercise lead you to come back to the app more? Seems like an important question to answer since it leads directly to Peloton’s North Star metric. It also seems important given Peloton’s investment in the content and instructors for all these other fitness disciplines.

The result? Suggesting a second fitness discipline did, in fact, help with increasing retention. But suggesting that the member doing a different fitness discipline every day was not beneficial at driving retention.

For all you active Peloton members out there, I’m sure you have a bunch of other ideas Peloton could do for improving user retention and overall stickiness. As armchair analysts, we can only come up with hypotheses. But Elena’s former team (and the broader product team) are able to conduct tests on its million of members and get quantifiable results. It’s like having the power to change the font or color in a Google search page to see if it leads to more searches and clicks.

The grand-daddy tool for dashboarding: Google Sheets

As we were discussing the culmination of Elena’s KPI and metrics definitions, Elena brought up that Google Sheets was the tool of choice for data visualization. You’d think with all the advance data viz tooling out there that Peloton might use some fancy vendor for this cross-functional KPI dashboard. Spreadsheets will never die. I had to dig deeper into this fancy dashboard in Google Sheets.

I’m always fascinated by large companies who continue to use spreadsheets for non-accounting/finance purposes. Elena is a big fan of the customization of Google Sheets (as am I). She was using the Looker API to push data into Google Sheets. The dashboard also had some basic week-over-week changes, conditional formatting, and sparklines. At the end of the day, Elena felt the Google Sheets dashboard was easier to share and more customizable than Peloton’s internal BI tool.

The reason for deferring to Google Sheets was also a resourcing issue. Elena’s team didn’t have the data engineering help to build out the backend and dashboard in their BI tool. From a permissions perspective, Google Sheets is also “unbreakable” since you can have the source data in one Google Sheet and use the IMPORTRANGE() function to import that data into another Google Sheet. Your colleagues and mess up the data in that target Google Sheet, but the raw source data is unscathed.

Onboarding flows and growth funnels at Superhuman

Elena left Peloton earlier this year after being inspired by some of her direct reports. She saw all the analytical tools and techniques her reports were working on like Bayesian theory and causal inference, and wanted to become an individual contributor again. She started learning tools and frameworks like dbt, and eventually became a principal analyst at Superhuman.

Source: The Verge

Superhuman is a paid email app known for their keyboard shortcuts. They are also notoriously known for their onboarding process where you have to do a 30-minute “coaching session” in order to start using the app.

Unlike Peloton, Superhuman doesn’t have all the self-serve analytics tools for internal stakeholders. This means the marketing and support teams are asking Elena for custom analysis and reports. Internally, Superhuman is using Metabase which isn’t as self-serve as other analytics tools.

The North Star metric at Superhuman? A smooth and delightful experience during the onboarding process. These days, Elena does a lot of SQL writing and and experimentation on the onboarding specialist funnel. They tested removing a long survey that the user receives during the onboarding process since it didn’t add to a smooth and delightful experience.

Advice for aspiring data analysts

We ended the conversation with advice for new data analysts and what Elena looks for in resumes. Elena said this is the number one skill a data analyst should have:

Be able to take a vague problem statement and break it down into metrics and inputs.

Other soft skills include curiosity and tenacity. I think over-indexing on the soft skills will help you long-term. Most people who want to get into a data role can take classes or training programs to get better at Excel, SQL, and Python. Being able to tell a story about the data and constantly ask questions to find the truth are tougher skills to acquire.

Source: Y&L Consulting

Elena also brought up an important distinction between small startups and large companies (something Santiago Viquez also brought up in episode #88). At early-stage companies, having the technical skills will be more important since there most likely won’t be many learning opportunities internally. You’ll be expected to hit the ground running. You’ll be a generalist and defining your own problems, gathering the data, and rounding up the stakeholders.

At a large company, you’ll have tons of mentorship opportunities and chances to learn. When I started off as a financial analyst at a big company, we went through the same 10-day Excel training that consultants and bankers receive when they start their new careers.

Other Podcasts & Blog Posts

No other podcasts mentioned in this episode!

The post Dear Analyst #91: Growing Peloton’s product analytics team and growth funnel experimentation at Superhuman with Elena Dyachkova appeared first on .

]]>
https://www.thekeycuts.com/growing-pelotons-product-analytics-team-and-growth-funnel-experimentation-at-superhuman-with-elena-dyachkova/feed/ 1 I first heard Elena speak on another podcast and was shocked to hear an analyst talk about one of the biggest companies to emerge during the pandemic: Peloton. Someone from the inside, as it were, is talking about topics that Peloton would likely want ... I first heard Elena speak on another podcast and was shocked to hear an analyst talk about one of the biggest companies to emerge during the pandemic: Peloton. Someone from the inside, as it were, is talking about topics that Peloton would likely want to keep confidential. Due to PR and the restrictions that come with an NDA, Elena couldn't come on the podcast to talk about the data projects she was working last year. A few months ago, Elena became a principal data scientist at Superhuman, and is now able to share a little more about her experience at Peloton. As an avid user of Peloton's bike and app, I was extremely excited to dig into the types of projects Elena was working on that shaped Peloton's product roadmap. We get dive into the world of product analytics at Elena's former employer and at her current gig at Superhuman.







Joining Peloton as the first product analyst in 2018



Hard to believe, but Peloton was just a bike company when Elena first joined the company. Peloton was working on the Tread product at the time but it hadn't been released. The product team wanted to make more data-driven decisions to help inform what product features to build next. There was an existing business intelligence team who was building reports around sales and marketing campaigns, but no product analysts were there to help guide the product roadmap. Elena was hired in 2018 to help build a practice around product analytics.



Source: Buzzfeed



Surprisingly, Elena didn't really have direct product analytics experience. In her previous roles, she was more of a product owner and did things more akin to a product manager. It was an opportunity for Elena to define the purpose of the product analytics team and how they interacted with the rest of the business.



Answering data questions from the business



As one might imagine, the product team at Peloton is asking what features might increase repeat engagement on the app. It was Elena's job to answer these type of questions. Like many organizations, the analytics function will get inundated with questions and Elena found herself with a backlog of important questions to answer.



There are so many great quotes and gifs from Peloton to include in this post.



As Peloton grew, so did the number of questions. Elena started hiring a team and was leading the process for how questions get asked, who gets to ask these questions, and how the product analytics team engages with internal stakeholders. Elena eventually grew the product analytics team to 14. As the analytics team grew, Elena had to maintain a ratio of product managers and analysts as the product analytics manager.



Elena mentioned a few resources that helped her with learning about product analytics, engagement, and user activation:



* Sequoia Capital's "Building Data-Informed Products" Medium posts - Learn about growth accounting, stickiness metrics, and how to build a data team* Amplitude's Blog - Amplitude also happens to be the main product analytics vendor at Peloton* Reforge's Product Management courses - Paid course used by a lot of Silicon Valley product managers



Establishing cross-functional KPIs and metrics definitions



In addition to getting asked data questions, Elena's team was in charge of building dashboards showing stats about product usage and metrics. This series of events might also sound familiar to many of you who spend your days...]]>
Dear Analyst 91 38:25 51624 Dear Analyst #90: Biostatistics, public health, and the #1 strategy to land a job in data with Tyler Vu https://www.thekeycuts.com/dear-analyst-90-biostatistics-public-health-and-the-1-strategy-to-land-a-job-in-data-with-tyler-vu/ https://www.thekeycuts.com/dear-analyst-90-biostatistics-public-health-and-the-1-strategy-to-land-a-job-in-data-with-tyler-vu/#comments Mon, 11 Apr 2022 05:07:00 +0000 https://www.thekeycuts.com/?p=51592 You go to a family gathering and everyone is fawning over you cousin who has a cushy stats job at Harvard. Knowing your cousin, you think to yourself: if my cousin can do it, so can I. Next thing you know, you are a research fellow at Harvard University. Tyler Vu was studying applied math […]

The post Dear Analyst #90: Biostatistics, public health, and the #1 strategy to land a job in data with Tyler Vu appeared first on .

]]>
You go to a family gathering and everyone is fawning over you cousin who has a cushy stats job at Harvard. Knowing your cousin, you think to yourself: if my cousin can do it, so can I. Next thing you know, you are a research fellow at Harvard University. Tyler Vu was studying applied math at Cal State Fullerton and didn’t realize he had a passion for Biostatistics until his fellowship at Harvard. He is currently getting his PhD in Biostatistics at UCSD and is the youngest person to ever pursue a PhD in Biostats at UCSD. In this episode we talk about doing network analysis for the public health sector, facial/voice recognition, and Tyler’s #1 strategy he thinks everyone should use to land their next job or internship in data.

Predicting HIV rates when you are missing data

As a neophyte to the data science and machine learning space, Tyler definitely veered into concepts that were quite foreign to me as he discusses his current PhD thesis. His thesis involves analyzing social networks knowing that there’s a lot of missing data within the context of public health. We talk about why finding the HIV rate in a sample is different from other metrics you could get from a sample.

For instance, if you want to get the average height of people in the U.S., you pick a random sample of people, find the average height, and extrapolate this to the rest of the population (roughly). This is a straightforward analysis since each person’s height is independent of each other.

In the case of public health, people are connected via social networks. With HIV, predicting whether someone tests positive or negative is dependent on the people you are connected with and whether those people have tested positive or negative. In this type of analysis there’s a lot of bias and “non-parametric estimation of network properties,” according to Tyler. I’m not even going to pretend I know what these terms mean. There’s actually very little published work on this subject so Tyler’s thesis would be adding a lot to the current research on this subject.

Source: Alteryx community

Training a voice and face machine learning model

Tyler has a history of working on one-of-a-kind projects. During his undergrad years, he worked on a project that combined face and voice recognition. Kind of like having a double authenticator system if you wanted to unlock an iPhone, for instance. Since you’re combining both image and voice features to train a model, it creates a “highly dimensional problem.”

Tyler helped with coding the project all in MATLAB. Given the tools and frameworks available, Tyler was pleasantly surprised to see the speed in which they were able to go from hypothesis to working app on this project.

Predicting “fragile” countries

During Tyler’s research at Harvard, he worked on a project to help predict which countries will become “fragile.” This is the definition of a “fragile state” according to the United States Institute of Peace:

Each fragile state is fragile in its own way, but they all face significant governance and economic challenges. In fragile states, governments lack legitimacy in the eyes of citizens, and institutions struggle or fail to provide basic public goods—security, justice, and rudimentary services—and to manage political conflicts peacefully. 

The project’s aim was basically trying to predict which countries might become fragile in the future so that the governments could better plan for these issues in the future.

Tyler’s project involved using a super learner machine learning method created by Mark J. van der Laan. Eventually his team settled on an Occam’s Razor model to finding a model that would help them predict future fragile countries. This model was the a simple classification tree which had a 90% test accuracy.

Tyler brought up an interesting point about simplicity and machine learning models. Usually the model will be super simple if the data was well collected and accurate. In the absence of good data, Tyler says this is where you start doing the more advanced neural network type of analysis.

The #1 strategy to get your next internship, job, or grad school program

We shifted the conversation from data and machine learning to landing a job in data. Tyler had a lot to say on this subject and I think any aspiring data analysts and data scientists could learn a thing or two from Tyler’s strategy.

Tyler describes the current state of affairs: blind resume submissions. Recruiters have to sift through hundreds of resumes for popular internships and jobs, and the only way you can stand out during this screening phase is:

  1. You went to an Ivy League school or
  2. You interned or worked at a FANG company (Facebook, Apple, Netflix, Google)
Source: George Pipis, Medium

Tyler says the job search is less about finding a job and more about standing out. The question is: how far are you willing to go to stand out?

Tyler didn’t go to an Ive League school when applying for his current program and also didn’t have big tech experience. He says the way any applicant can stand out is by finding the email of the hiring manager, and send them a cold email indicating why you would be a good candidate for the position. Here’s an interesting (although somewhat devious) strategy to find the correct email address for the hiring manager according to Tyler:

  1. Create a fake resume that has the top credentials like attended Harvard and was a SWE at Google
  2. Submit that fake resume to the job or internship you’re interested in
  3. This fake resume will most likely get past the recruiter screen so you get an email from the recruiter or hiring manager on next steps
  4. You then email the hiring manager from your real email with a personalized message

Can’t knock the hustle!

Building a sales agency on the side

As if doing a PhD in Biostatistics doesn’t keep Tyler busy enough, he also found time to start a side business helping marketing agencies close deals. This came out of left field but further shows how scrappy Tyler is when it comes to executing on an idea.

Tyler was scrolling through Twitter and some Tweets from people talking about making money from remote sales work. This is not your ordinary insurance or car sales type of work. This is the type of sale where a large company is trying to land a new client to spend thousands of dollars on services. When marketing agencies and coaches need help closing new clients, they use Tyler’s network of sales people to seal the deal.

We ended the conversation with Sarah Silverman talking about shooting her shot when it comes to playing pickup basketball and standup comedy. From getting into a PhD program at UCSD to strategies to land your dream internship, it’s clear Tyler believes in shooting shots.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #90: Biostatistics, public health, and the #1 strategy to land a job in data with Tyler Vu appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-90-biostatistics-public-health-and-the-1-strategy-to-land-a-job-in-data-with-tyler-vu/feed/ 1 You go to a family gathering and everyone is fawning over you cousin who has a cushy stats job at Harvard. Knowing your cousin, you think to yourself: if my cousin can do it, so can I. Next thing you know, you are a research fellow at Harvard Universit... You go to a family gathering and everyone is fawning over you cousin who has a cushy stats job at Harvard. Knowing your cousin, you think to yourself: if my cousin can do it, so can I. Next thing you know, you are a research fellow at Harvard University. Tyler Vu was studying applied math at Cal State Fullerton and didn't realize he had a passion for Biostatistics until his fellowship at Harvard. He is currently getting his PhD in Biostatistics at UCSD and is the youngest person to ever pursue a PhD in Biostats at UCSD. In this episode we talk about doing network analysis for the public health sector, facial/voice recognition, and Tyler's #1 strategy he thinks everyone should use to land their next job or internship in data.







Predicting HIV rates when you are missing data



As a neophyte to the data science and machine learning space, Tyler definitely veered into concepts that were quite foreign to me as he discusses his current PhD thesis. His thesis involves analyzing social networks knowing that there's a lot of missing data within the context of public health. We talk about why finding the HIV rate in a sample is different from other metrics you could get from a sample.



For instance, if you want to get the average height of people in the U.S., you pick a random sample of people, find the average height, and extrapolate this to the rest of the population (roughly). This is a straightforward analysis since each person's height is independent of each other.



In the case of public health, people are connected via social networks. With HIV, predicting whether someone tests positive or negative is dependent on the people you are connected with and whether those people have tested positive or negative. In this type of analysis there's a lot of bias and "non-parametric estimation of network properties," according to Tyler. I'm not even going to pretend I know what these terms mean. There's actually very little published work on this subject so Tyler's thesis would be adding a lot to the current research on this subject.



Source: Alteryx community



Training a voice and face machine learning model



Tyler has a history of working on one-of-a-kind projects. During his undergrad years, he worked on a project that combined face and voice recognition. Kind of like having a double authenticator system if you wanted to unlock an iPhone, for instance. Since you're combining both image and voice features to train a model, it creates a "highly dimensional problem."







Tyler helped with coding the project all in MATLAB. Given the tools and frameworks available, Tyler was pleasantly surprised to see the speed in which they were able to go from hypothesis to working app on this project.



Predicting "fragile" countries



During Tyler's research at Harvard, he worked on a project to help predict which countries will become "fragile." This is the definition of a "fragile state" according to the United States Institute of Peace:



Each fragile state is fragile in its own way, but they all face significant governance and economic challenges. In fragile states, governments lack legitimacy in the eyes of citizens, and institutions struggle or fail to provide basic public goods—security, justice, and rudimentary services—and to manage political conflicts peacefully. 



The project's aim was basically trying to predict which countries might become fragile in the future so that the governments could better plan for these...]]>
Dear Analyst 90 49:58 51592
Dear Analyst #89: Leading high performing data teams and deciphering the data stack with David Jayatillake https://www.thekeycuts.com/dear-analyst-89-leading-high-performing-data-teams-and-deciphering-the-data-stack-with-david-jayatillake/ https://www.thekeycuts.com/dear-analyst-89-leading-high-performing-data-teams-and-deciphering-the-data-stack-with-david-jayatillake/#respond Mon, 04 Apr 2022 05:10:00 +0000 https://www.thekeycuts.com/?p=51589 Most episodes I have the privilege of speaking with analysts who are in the trenches using tools and doing analyses. In this episode, we look at the role of data from a manager/director of data’s perspective. David Jayatillake is currently the Chief Product & Strategy Officer at Avora, an augmented analytics solution that helps companies […]

The post Dear Analyst #89: Leading high performing data teams and deciphering the data stack with David Jayatillake appeared first on .

]]>
Most episodes I have the privilege of speaking with analysts who are in the trenches using tools and doing analyses. In this episode, we look at the role of data from a manager/director of data’s perspective. David Jayatillake is currently the Chief Product & Strategy Officer at Avora, an augmented analytics solution that helps companies make better decision. Not to say David doesn’t get into the weeds when he has to (his GitHub profile tells that story), but it was interesting hearing how to build high performing data teams. We get into topics relating to data strategy, bundling and unbundling of data tools, and ways to be an effective manager.

Moving from data to product

For most of David’s career, he’s led business intelligence and analytics teams in various industries including retail, credit, and e-commerce. That’s why I thought it was interesting when he moved into a product role at Avora.

As a leader of data teams, David was using various data tools to analyze, store, and transform data. Over time, you might start to see some of the deficiencies in these tools, and even have ideas for new product features. As an end user of some of these tools in my organization, I’m quite familiar with the little quirks of these tools that just make you shrug a little bit when you have to open the tool up in your browser.

David had been advising Avora on building various product features before moving full-time into the Chief Product Officer role at the company. In terms of transitioning to a product role, David had various PMs he could observe to learn the rituals of leading a product team like leading a standup and creating a product roadmap.

Source: Product Coalition

The great bundling debate

A few months ago, David wrote this blog post about the bundling and unbundling of tools for the data engineering profession. For those new to this world in data engineering (like me), this analogy sums it up nicely from the Analytics Engineering Roundup:

Unbundling means taking a platform that aggregates many different verticals, such as Craigslist, and splitting off one or more services into stand-alone businesses that are more focused on the experience of that niche customer base. Sometimes, as was the case with Airbnb, that experience is so much better it eclipses the market cap of the original business.

Anna Filippova, Analytics Engineering Roundup

Taking this concept to the modern data stack, David’s blog post discusses the unbundling of Airflow into many off-the-shelf and open-source tools. I’ve never worked in a data engineering role and was worried that this part of the discussion would go way over my head. I hear about ETL, streaming, and automation from being in the space but have only lightly touched some of these tools in this data stack:

Source: Analytics Engineering Roundup

David’s blog post also discusses the role the data engineer plays in a world of unbundled software:

For most companies, the cost of hiring and keeping data engineers working on pipelines that could be built with specialist “unbundled” tooling is unwarranted. The real value is in the next steps of data-enabled decision making, whether human or automated: Is this sneaker actually the same as this other sneaker? Should we lend to this customer? Should we deploy this product feature? Should we spend more on this marketing channel than the next? The curation of clean incrementally loaded data into well-defined data models and feature stores, to enable this decision making, is still a key value of analytics engineering that is not easily automated.

Cloud vendors to the rescue?

David recently spoke on the Analytics Engineering podcast with Benn Stancil, founder at Mode (listen to episode #71 with Benn) about the bundling vs. unbundling debate. He brought up companies like Oracle which were these all-in-one platforms for enterprises that weren’t necessarily that great for analytics teams.

Public cloud companies like AWS, Azure, and Google Cloud are kind of becoming these all-in-one platforms that are interacting with different parts of the data stack. These cloud vendors have data warehouse solutions that solve the most common use cases for companies, but they are still missing marketing leading data tools around ETL, data transformation, and business intelligence. For instance, if you’ve ever used AWS QuickSight or Google Data Studio, they functionally get the job done but are not best in class, according to David. Data teams have to adapt to changes in their businesses so newer needs around observability and data cataloguing have emerged with complementing tools to serve these needs.

As I mentioned earlier, I’m a neophyte to this world of data engineering tools. Like many of you, I’m part of the group that utilizes the tools built by data engineers to get our work done. To learn more about this big and beautiful world of data engineering and data pipelines, I’d recommend listening to episode 34 with Priyanka Somrah of Work-Bench (she also runs a great newsletter called The Data Source) and episode 58 with Krisha Naidu, a data engineer at Canva.

Unbundling the manager

We moved from data tools to David’s ideas on how to be a great manager. Consider this structure which is common at most hierarchical organizations: The line manager acts as the project manager to figure out what work needs to be done by the team. These managers also are also responsible for the career development and mentoring of their direct reports.

These days, many organizations have already stripped the “project manager” role from the line manager’s responsibilities. The product manager, for instance, is in charge of product prioritization. People managers are still responsible for career coaching and mentorship. David sees value in splitting these responsibilities apart so that you have people who specialize in your career development and people who specialize in mentorship:

It allows individual contributors who don’t want to do career coaching to invest in other colleagues via mentoring. They often do this regardless, but it recognises the value of it and also allows access to more of these individual contributors for mentoring roles. It allows career coaching to be taken on by non-technical staff when engineering resource is scarce and expensive. Often HR specialists could do this role really well, as they are trained in how employee life cycle processes should happen: 360 feedback, career ladders, pay review, promotion, probation, redundancy, learning… it’s a long list!

Specialized data analyst roles

We spoke about the rise of the analytics engineer role in the last few years and what future roles today’s data and business analysts should consider. From David’s perspective, analysts are becoming more specialized. When I first started as a financial analyst, just knowing Excel meant you were “good enough” to be an analyst in HR, sales, and other departments.

Today, you need to understand the tools and business logic for these different departments. That’s why you’ll see a specific skill sets required for HR data analysts who can crank through a river of employee data and for sales analysts who know how to work with Salesforce and help set quotas for a sales team.

Data analyst career path (Source: CareerFoundry)

During David’s time at at Lyst, there were teams focused on building out Lyst’s checkout product. There were data analytics engineers focused on serving just this specific product at Lyst. You have product analysts who know how to use tools like Amplitude and Mixpanel, and then analysts on the marketing side who specialize in using Google’s various ad tools.

Learning through writing

David starting writing his newsletter after Benn Stancil inspired a group of people to start writing at the end of 2021. David’s newsletter was originally going to be about career transitions as he moved from organization to organization. He realized he has benefited greatly from being in the data space, and his newsletter is a way to share his data leadership experience with the broader community. Like him, I believe publishing a newsletter or creating a podcast helps with organizing your thoughts and learning about new and unfamiliar topics (like this episode clearly shows). Check out David’s response to this post about people with a side business:

Other Podcasts & Blog Posts

No other podcasts mentioned on this episode!

The post Dear Analyst #89: Leading high performing data teams and deciphering the data stack with David Jayatillake appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-89-leading-high-performing-data-teams-and-deciphering-the-data-stack-with-david-jayatillake/feed/ 0 Most episodes I have the privilege of speaking with analysts who are in the trenches using tools and doing analyses. In this episode, we look at the role of data from a manager/director of data's perspective. Most episodes I have the privilege of speaking with analysts who are in the trenches using tools and doing analyses. In this episode, we look at the role of data from a manager/director of data's perspective. David Jayatillake is currently the Chief Product & Strategy Officer at Avora, an augmented analytics solution that helps companies make better decision. Not to say David doesn't get into the weeds when he has to (his GitHub profile tells that story), but it was interesting hearing how to build high performing data teams. We get into topics relating to data strategy, bundling and unbundling of data tools, and ways to be an effective manager.







Moving from data to product



For most of David's career, he's led business intelligence and analytics teams in various industries including retail, credit, and e-commerce. That's why I thought it was interesting when he moved into a product role at Avora.



As a leader of data teams, David was using various data tools to analyze, store, and transform data. Over time, you might start to see some of the deficiencies in these tools, and even have ideas for new product features. As an end user of some of these tools in my organization, I'm quite familiar with the little quirks of these tools that just make you shrug a little bit when you have to open the tool up in your browser.



David had been advising Avora on building various product features before moving full-time into the Chief Product Officer role at the company. In terms of transitioning to a product role, David had various PMs he could observe to learn the rituals of leading a product team like leading a standup and creating a product roadmap.



Source: Product Coalition



The great bundling debate



A few months ago, David wrote this blog post about the bundling and unbundling of tools for the data engineering profession. For those new to this world in data engineering (like me), this analogy sums it up nicely from the Analytics Engineering Roundup:



Unbundling means taking a platform that aggregates many different verticals, such as Craigslist, and splitting off one or more services into stand-alone businesses that are more focused on the experience of that niche customer base. Sometimes, as was the case with Airbnb, that experience is so much better it eclipses the market cap of the original business.Anna Filippova, Analytics Engineering Roundup



Taking this concept to the modern data stack, David's blog post discusses the unbundling of Airflow into many off-the-shelf and open-source tools. I've never worked in a data engineering role and was worried that this part of the discussion would go way over my head. I hear about ETL, streaming, and automation from being in the space but have only lightly touched some of these tools in this data stack:



Source: Analytics Engineering Roundup



David's blog post also discusses the role the data engineer plays in a world of unbundled software:



For most companies, the cost of hiring and keeping data engineers working on pipelines that could be built with specialist “unbundled” tooling is unwarranted. The real value is in the next steps of data-enabled decision making, whether human or automated: Is this sneaker actually the same as this other sneaker? Should we lend to this customer?]]>
Dear Analyst 89 25:15 51589
Dear Analyst #88: How to learn data science and machine learning from scratch with Santiago Viquez https://www.thekeycuts.com/dear-analyst-88-how-to-learn-data-science-and-machine-learning-from-scratch-with-santiago-viquez/ https://www.thekeycuts.com/dear-analyst-88-how-to-learn-data-science-and-machine-learning-from-scratch-with-santiago-viquez/#respond Mon, 21 Mar 2022 05:01:00 +0000 https://www.thekeycuts.com/?p=51536 Companies are generating more big data these days, so dumping the data into a CSV for analysis just doesn’t cut it anymore. Sure you could use Power Query or Power BI, but more analysts are turning to Python and platforms built for big data processing. The next step is to use machine learning to help […]

The post Dear Analyst #88: How to learn data science and machine learning from scratch with Santiago Viquez appeared first on .

]]>
Companies are generating more big data these days, so dumping the data into a CSV for analysis just doesn’t cut it anymore. Sure you could use Power Query or Power BI, but more analysts are turning to Python and platforms built for big data processing. The next step is to use machine learning to help predict what the future might look like. Santiago Viquez is currently a data analytics mentor at Springboard, an education platform helping students prepare for new careers. On the side, Santiago has built a ton of cool projects related to data science, natural language processing, and more. In this conversation we dig into how Santiago learned data science from scratch during the pandemic, and how he thinks analysts should learn data science.

Santiago Viquez

Started at the bottom now we’re at a multinational corporation

Santiago studied physics in Costa Rica, but realized he didn’t want to pursue a career in physics. After doing some research, he realized a career in data analytics and data science would be more suitable. Having known a little bit of Python, he started applying to a few positions and eventually got his data analytics career started as an intern at a small startup. His internship turned into a full-time role as a data analyst which he kept for two years.

Santiago left the startup and went in the complete opposite direction in terms of company size. He had roles in data analysis and data science at large corporations like Walmart and UPS working remotely the entire time. During his time at Walmart, he started working part-time at Springboard helping students land careers in data analytics.

The experience working at a startup versus a large company is night and day. We’ve seen stories of people like Preksha in episode 85 and Lauren in episode 64 make completely new transitions to a career in data. But we don’t hear about the data analytics professional who moves from startup to large company too often.

One example Santiago bought up is how corporations frame problems. You typically have clear success metrics, KPIs, stakeholders, and data sources to work with. At a startup, you are defining the problem by yourself. It’s just you. You’re in charge of collecting the data sources, providing analyses to key stakeholders, and owning the entire model or analysis end-to-end.

Reducing food waste for restaurants in Costa Rica with data science

When Santiago was a consultant, he was helping a big restaurant group in Costa Rica figure out ways to reduce food waste. The restaurant group consisted of 30-40 restaurants (which is big for Costa Rica). Each restaurant had its own manager and each manager would request food from various suppliers. The problem was that some managers were good at forecasting how much food they would need for the next 10-15 days, others were not so good.

Santiago’s goal was to create a tool that would help each manager predict how much food to order from the suppliers. The first phase of the project was gathering data. In this case, Santiago had to get the recipes from each restaurant manager. These recipes were then joined with each restaurant’s sales data to see the volume of ingredients required.

The interesting thing is that each recipe had to be broken down to its most granular ingredients. If it was a taco recipe, this meant getting tortillas. In order to make tortillas, you need flour. So the ingredient to procure from the supplier would be flour.

Source: Eater

After Santiago collects the data, the fun part comes. He models and forecasts which ingredients are essential to the recipes. There are many other variables that impact how much raw ingredients to order like how long the ingredient can sit on the shelf before it goes bad. His team would just find information on the web to see how long the shelf life was for a certain ingredient.

At the end of the day, he set up benchmarks for each restaurant and we they were able to reduce food waste by 15-20% per restaurant.

Tips on how to learn data science if he were to do it all over again

Santiago wrote this awesome blog post right when the pandemic hit. It’s all about how he would learn data science from scratch if he were to do it all over again. The reason he wrote the post was because he was isolated in his house and just got to thinking: I got into the data science field kind of randomly. I gained most of my skills on the job. What would I have done differently to learn data science?

The way I like to learn is by doing.

Santiago likes to start at high-level concepts and then get deeper into specific topics. He might start with watching a YouTube video on neural networks instead of trying to learn a neural network model right away. The YouTube videos and blog posts would spark his curiosity to want to dig deeper into a topic.

Source: Medium

Here’s a step-by-step on the tools and skills he would learn for aspiring data scientists:

  1. Learn Python through online courses or through Kaggle
  2. Data viz tools. People forget this is an important skill and just want to go straight into modeling stuff.
  3. Start implementing models like scikit-learn
  4. Try your hand at Kaggle competitions
  5. Go deeper into neural networks and more advanced topics

I would start with learning Python through courses or Kaggle. Then I’d learn how to visualize things. A lot of people forget this step and just want to model stuff. After you know the basics of Python and visualization, I’d start learning about implementing models like scikit-learn. Then you move onto Kaggle competitions.

I’d highly recommend reading Santiago’s full blog post if you’re interested in learning data science from scratch. It’s Santiago’s most popular blog post by orders of magnitude. More than 200,000 people have read the blog post. After he published the post, famous YouTubers started creating videos similar to Santiago’s post.

I love posts like this because it prevents you from having to go through the same mistakes of learning a new topic from someone who has gone through those mistakes.

Building a data science trivia game to help you prep for data science interviews

Santiago has always wanted to create a physical card game. Instead of making a physical game, he created a data science trivia game to help people prepare for data science interviews.

Source: datasciencetrivia.com

The way Santiago built the game is pretty interesting. He collected 200 questions from people in the R Studio community, the Apple community, and other online communities. He also reached out to Kaggle who sent him a bunch of great interview questions. His wife designed the cards from the colors to typography and did all this in Figma. He put all the questions in an Google Sheets. There happens to be a Figma-Google Sheets plugin where you can sync data from Google Sheets to your designs in Figma.

He put the game up on Gumroad and to date has made 500 sales. Santiago believes the success of the game was due to the communities he worked with to get the questions, testimonials from customers, and building his game in public. It was the first time Santiago promoted his own project and got involved with different communities, instead of just being a participant or viewer from the sidelines.

Create your own Harry Potter fan fiction with a bot

One last project Santiago built on the side was a Harry Potter story generator using machine learning and natural language processing. Santiago was always been a fan of the Harry Potter series since university. Before he knew about data analysis or machine learning, he’d read stories about how people would teach a bot on how to write a fictional story for famous books or TV shows like Game of Thrones. With his new data science skills, he wanted to do the same thing for Harry Potter.

Source: The Indian Express

The project involves getting all the text from every Harry Potter book. This text then feeds into a neural network. He then used a platform called Streamlit—an open source platform for data science teams to share data—to build the actual “data app.”

On the app, you say you want your new story to include Dumbledore and that the “temperature” of the story would be “normal” or “weird.” The “temperature” is scale for how close the story would feel to the actual Harry Potter series versus something more outlandish.

Source: Harry Potter And The Deep Learning Experiment

Building and learning in public

I’ve talked about building and learning in public in a variety of episodes, and it was awesome hearing Santiago share how it has impacted his side projects. He went from seeing others build in public to actively participating in the movement. I’d say KP from On Deck coined the term a few years ago.

Source: buildinpublic.xyz

When Santiago was building his Harry Potter story generator, he already knew how to use Streamlit and had some basic experience with neural networks through online classes. But the online classes didn’t compare to the experience he gained from applying the skills to a real side project.

Similar to his project reducing food waste, he had to set his own project metrics, define data sources, and more importantly, figure out how to get his project seen. This is where the actual learning happens. You get error messages you’ve never seen, you Google stuff, and read a lot of Stack Overflow. The next time you come across these errors, however, you’ll have the experience of knowing how to handle the error or know that you came across a Stack Overflow post on how to solve it.

If you are learning, learning in public. Talk about stuff you’re learning because it will not only help you, but help others who are looking to learn the same thing.

Other Podcasts & Blog Posts

No other blog posts/podcasts mentioned in this episode!

The post Dear Analyst #88: How to learn data science and machine learning from scratch with Santiago Viquez appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-88-how-to-learn-data-science-and-machine-learning-from-scratch-with-santiago-viquez/feed/ 0 Companies are generating more big data these days, so dumping the data into a CSV for analysis just doesn't cut it anymore. Sure you could use Power Query or Power BI, but more analysts are turning to Python and platforms built for big data processing.... Companies are generating more big data these days, so dumping the data into a CSV for analysis just doesn't cut it anymore. Sure you could use Power Query or Power BI, but more analysts are turning to Python and platforms built for big data processing. The next step is to use machine learning to help predict what the future might look like. Santiago Viquez is currently a data analytics mentor at Springboard, an education platform helping students prepare for new careers. On the side, Santiago has built a ton of cool projects related to data science, natural language processing, and more. In this conversation we dig into how Santiago learned data science from scratch during the pandemic, and how he thinks analysts should learn data science.



Santiago Viquez



Started at the bottom now we're at a multinational corporation



Santiago studied physics in Costa Rica, but realized he didn't want to pursue a career in physics. After doing some research, he realized a career in data analytics and data science would be more suitable. Having known a little bit of Python, he started applying to a few positions and eventually got his data analytics career started as an intern at a small startup. His internship turned into a full-time role as a data analyst which he kept for two years.







Santiago left the startup and went in the complete opposite direction in terms of company size. He had roles in data analysis and data science at large corporations like Walmart and UPS working remotely the entire time. During his time at Walmart, he started working part-time at Springboard helping students land careers in data analytics.



The experience working at a startup versus a large company is night and day. We've seen stories of people like Preksha in episode 85 and Lauren in episode 64 make completely new transitions to a career in data. But we don't hear about the data analytics professional who moves from startup to large company too often.







One example Santiago bought up is how corporations frame problems. You typically have clear success metrics, KPIs, stakeholders, and data sources to work with. At a startup, you are defining the problem by yourself. It's just you. You're in charge of collecting the data sources, providing analyses to key stakeholders, and owning the entire model or analysis end-to-end.



Reducing food waste for restaurants in Costa Rica with data science



When Santiago was a consultant, he was helping a big restaurant group in Costa Rica figure out ways to reduce food waste. The restaurant group consisted of 30-40 restaurants (which is big for Costa Rica). Each restaurant had its own manager and each manager would request food from various suppliers. The problem was that some managers were good at forecasting how much food they would need for the next 10-15 days, others were not so good.



Santiago's goal was to create a tool that would help each manager predict how much food to order from the suppliers. The first phase of the project was gathering data. In this case, Santiago had to get the recipes from each restaurant manager. These recipes were then joined with each restaurant's sales data to see the volume of ingredients required.



The interesting thing is that each recipe had to be...]]>
Dear Analyst 88 42:10 51536
Dear Analyst #87: What we can learn about Enron’s downfall from their internal spreadsheet errors https://www.thekeycuts.com/dear-analyst-87-what-we-can-learn-about-enrons-downfall-from-their-internal-spreadsheet-errors/ https://www.thekeycuts.com/dear-analyst-87-what-we-can-learn-about-enrons-downfall-from-their-internal-spreadsheet-errors/#respond Mon, 14 Mar 2022 12:34:21 +0000 https://www.thekeycuts.com/?p=51540 Everyone is probably familiar with the 600,000 emails released by Enron after their scandal right at the turn of the century. A lot of different analyses was done on those emails, but there’s one interesting analysis that I didn’t see until recently: the emails with spreadsheets as attachments. Felienne Hermans, a computer scientist at Delft […]

The post Dear Analyst #87: What we can learn about Enron’s downfall from their internal spreadsheet errors appeared first on .

]]>
Everyone is probably familiar with the 600,000 emails released by Enron after their scandal right at the turn of the century. A lot of different analyses was done on those emails, but there’s one interesting analysis that I didn’t see until recently: the emails with spreadsheets as attachments. Felienne Hermans, a computer scientist at Delft University of Technology in the Netherlands, scoured all the emails that had Excel spreadsheets attached to them and analyzed 15,000 internal Enron spreadsheets to see what patterns existed in the models, formulas, and yes, the errors. After reading her paper, my opinion is that Enron’s spreadsheet errors were not unique to Enron, but could happen at any large company.

24% of Enron’s spreadsheets contain errors

One of the key takeaways from Hermans’ paper that has been cited elsewhere is that 24% of the spreadsheets she analyzed contains an error. These are spreadsheets where there was a “runtime” error, so the errors you typically see when you divide by 0 or make an error in your formula. Here’s a table from the report showing how many of these errors appears in these spreadsheets:

The sheer number of errors is startling:

In total, we have found 2,205 spreadsheets that contained at least one Excel error, which amounts to 24% of all spreadsheets with formulas (14% of all spreadsheets). They together contain 1,662,340 erroneous formulas (49,796 unique ones), which is 585.5 (17.5 unique ones) on average. There were 755 files with over a hundred errors, with the maximum number of errors in one file being 83,273.

755 files with over one hundred errors. It’s easy to say the people creating these spreadsheets weren’t skilled Excel users, or they were deliberately trying to make these errors. Putting the ethical argument aside, I’d say these Excel errors are only one side of the story.

A lot of times you might not have a finished model or haven’t collected all the data, so there may be formula errors until the final model or report is finished. I don’t believe Hermans scanned for the version of these files, so it’s likely that many of these files were simply incomplete. I’d argue that this is the predicament at many companies. You have files at various stages of completeness or readiness, so looking at the files with only these “runtime” errors can only tell you so much about the intention of the person who worked on the spreadsheet.

Auditing Enron’s files for accuracy beyond formulas with runtime errors

One type of formula error that wouldn’t show up in Hermans research is the formula that references incorrect cells. This can be due to human error or lack of understanding of how the model should reflect the business.

In a separate study by Thomas Schmitz and Dietmar Jannach done in 2016, Shmitz and Jannach looked closely at formulas that didn’t have normal runtime errors to see if they referenced incorrect cells. This is a much more difficult analysis because you have to know a bit about the business situation the model is based on, but most of the files in their analysis are quite straightforward.

Take, for instance, this Southpoint Gas Model which appears to show how much gas Transwestern (a pipeline subsidiary of Enron) is using on an hourly basis:

The Total Gas Usage column is a straightforward formula and one would think that the same formula is applied for all the rows in that screenshot. However, once you get to row 25 in the file, something not that uncommon happens with the formula:

The formula incorrectly multiplies the value from row 24 instead of row 25. In the grand scheme of things, this error probably didn’t cost Enron a lot of trouble (they had many other issues going on) and also wasn’t a costly mistake.

I think the bigger issue is how an analyst or someone who inherits this spreadsheet would have uncovered this formula error without doing some serious auditing of the spreadsheet.

How do these incorrect row reference errors happen?

It’s hard to say definitely, of course. One would think that when you first create the formula at the top of the list, you would just drag it all the way down to the bottom of the last row of data.

Perhaps the original creator added some new data starting in row 25, and just made a simple mistake of referencing the wrong cell. Then when they dragged the formula down, every subsequent cell has this formula reference error:

Another way this error might have happened (which I think is more likely) is the creator inserted a new value, but shifted the cells down instead of the entire row. For instance, let’s say I go into cell G15 and insert a new value, and say shift cells down:

You’ll notice that it affects all the formulas in the Total Gas Usage column by incorrectly referencing the wrong rows.

Why might someone insert a value like this? There could be a lot of reasons. They realize they forgot to enter in a value in their list of data and just did a regular insert without thinking of shifting the entire row down. Or they were only responsible for data entry and didn’t realize there were formulas in other columns that depended on cells being “aligned” correctly.

There are all sorts of ways to audit formulas like this but I’d argue this is one of the more difficult formula “errors” to catch. The first thing that came to mind was to press CTRL+` to view all the formulas on the sheet:

This is by no means an error proof way of detecting this formula error. You’re basically looking for row inconsistencies in the formulas, and you can see that starting with row 25, you’re multiplying the previous row’s value with the current row’s value. In a bigger model with more complex formulas, this error checking would be more difficult, naturally.

Lack of formula diversity and “smelly” formulas

Another interesting table from Hermans’ report shows how there is very little diversity in the formulas used in Enron’s spreadsheets. This finding also wasn’t very surprising to me. I’d say this is also pretty common at most companies where there is no need for complex or advanced formulas. This table shows that 63.6% of Enron’s spreadsheets only contained 9 functions:

Most models don’t need the FREQUENCY or MID functions when all they’re trying to do is SUM up a few numbers. A lot of analysts think knowing advanced formulas is the key to understanding Excel. The real skill is knowing how to take real-world scenarios and translating those scenarios into rows and columns. Another key takeaway from this table is to learn one of the grandaddy of all formulas: VLOOKUP.

Hermans also came up with a “smell” factor which she coined in a previous analysis of the EUSES corpus of spreadsheets. I think the best way to describe these “smelly” formulas is that they are hard to debug when something goes wrong. Hermans specifically calls out Enron’s formulas have “Long Calculation Chains.” These are formulas that take other formulas as inputs. I’m not sure I would call this formula “smelly” as the whole point of a model is to create building blocks that build on top of one another.

Learning from Enron’s spreadsheet errors

We’ve seen how spreadsheet errors can lead to large financial losses. In Enron’s case, you wonder how much of an impact spreadsheet errors led to their demise, or if these errors are just the regular outputs of a big company trying to get stuff done.

Out of all the Enron emails that contained attachments to spreadsheets, 6% of them contain words relating to spreadsheet errors. These messages are quite mundane and further show that people were just trying to get accurate numbers:

“This was the original problem around the pipe option spreadsheets which we discovered yesterday and the reason why the numbers did not match between the old and new processes.”

“Dear Louise and Faith, I had a review of the spreadsheet and noticed an error in allocation of value for
the Central Maine Power deal.”

“For yet another day we seem to be having problems including all the schedules in our EPE schedule sheet.”

Human error and lack of experience would naturally lead to more spreadsheet errors. When you need to model out different scenarios or plan out a budget, you may be asking people at your company to use Excel for scenarios where Excel is not the right tool. I’ll leave you with this quote from Tim Harford from the Financial Times from The tyranny of spreadsheets:

When used by a trained accountant to carry out double-entry bookkeeping, a long-established system with inbuilt error detection, Excel is a perfectly professional tool. But when pressed into service by genetics researchers or contact tracers, it’s like using your Swiss Army Knife to fit a kitchen because it’s the tool you have closest at hand. Not impossible but hardly advisable.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #87: What we can learn about Enron’s downfall from their internal spreadsheet errors appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-87-what-we-can-learn-about-enrons-downfall-from-their-internal-spreadsheet-errors/feed/ 0 Everyone is probably familiar with the 600,000 emails released by Enron after their scandal right at the turn of the century. A lot of different analyses was done on those emails, but there's one interesting analysis that I didn't see until recently: t... Everyone is probably familiar with the 600,000 emails released by Enron after their scandal right at the turn of the century. A lot of different analyses was done on those emails, but there's one interesting analysis that I didn't see until recently: the emails with spreadsheets as attachments. Felienne Hermans, a computer scientist at Delft University of Technology in the Netherlands, scoured all the emails that had Excel spreadsheets attached to them and analyzed 15,000 internal Enron spreadsheets to see what patterns existed in the models, formulas, and yes, the errors. After reading her paper, my opinion is that Enron's spreadsheet errors were not unique to Enron, but could happen at any large company.







24% of Enron's spreadsheets contain errors



One of the key takeaways from Hermans' paper that has been cited elsewhere is that 24% of the spreadsheets she analyzed contains an error. These are spreadsheets where there was a "runtime" error, so the errors you typically see when you divide by 0 or make an error in your formula. Here's a table from the report showing how many of these errors appears in these spreadsheets:







The sheer number of errors is startling:



In total, we have found 2,205 spreadsheets that contained at least one Excel error, which amounts to 24% of all spreadsheets with formulas (14% of all spreadsheets). They together contain 1,662,340 erroneous formulas (49,796 unique ones), which is 585.5 (17.5 unique ones) on average. There were 755 files with over a hundred errors, with the maximum number of errors in one file being 83,273.



755 files with over one hundred errors. It's easy to say the people creating these spreadsheets weren't skilled Excel users, or they were deliberately trying to make these errors. Putting the ethical argument aside, I'd say these Excel errors are only one side of the story.



A lot of times you might not have a finished model or haven't collected all the data, so there may be formula errors until the final model or report is finished. I don't believe Hermans scanned for the version of these files, so it's likely that many of these files were simply incomplete. I'd argue that this is the predicament at many companies. You have files at various stages of completeness or readiness, so looking at the files with only these "runtime" errors can only tell you so much about the intention of the person who worked on the spreadsheet.



Auditing Enron's files for accuracy beyond formulas with runtime errors



One type of formula error that wouldn't show up in Hermans research is the formula that references incorrect cells. This can be due to human error or lack of understanding of how the model should reflect the business.



In a separate study by Thomas Schmitz and Dietmar Jannach done in 2016, Shmitz and Jannach looked closely at formulas that didn't have normal runtime errors to see if they referenced incorrect cells. This is a much more difficult analysis because you have to know a bit about the business situation the model is based on, but most of the files in their analysis are quite straightforward.



Take, for instance, this Southpoint Gas Model which appears to show how much gas Transwestern (a pipeline subsidiary of Enron) is using on an hourly basis:







The Total Gas Usage column is a straightforward formula ...]]>
Dear Analyst 87 33:58 51540
Dear Analyst #86: One Important Excel Feature to Know to Do Your Best Data Analysis https://www.thekeycuts.com/dear-analyst-86-one-important-excel-feature-to-know-to-do-your-best-data-analysis/ https://www.thekeycuts.com/dear-analyst-86-one-important-excel-feature-to-know-to-do-your-best-data-analysis/#respond Mon, 28 Feb 2022 18:47:34 +0000 https://www.thekeycuts.com/?p=51492 Nothing like a click-baity headline to get your spreadsheet emotion all riled up amirite? Earlier in my data analysis career, I thought knowing advanced Excel formulas and writing macros made you a good analyst. If you’ve been following this podcast/newsletter, you’ve probably discovered that there is no one magic Excel feature that automatically makes you […]

The post Dear Analyst #86: One Important Excel Feature to Know to Do Your Best Data Analysis appeared first on .

]]>
Nothing like a click-baity headline to get your spreadsheet emotion all riled up amirite? Earlier in my data analysis career, I thought knowing advanced Excel formulas and writing macros made you a good analyst. If you’ve been following this podcast/newsletter, you’ve probably discovered that there is no one magic Excel feature that automatically makes you a good data analyst. The key to good data analysis is a soft skill: asking good questions (see this episode with the co-founder of Mode, Benn Stancil). Having said that, there is one Excel feature that I learned early in my career that really helped me improve my data analysis: PivotTables. Why PivotTables and what features about PivotTables make them so good for data analysis? Read on for more. You can also download the Excel file used in this episode here.

Michael Jordan: an expert user of the pivot

Video tutorial of this episode:

1. Getting data to look right for PivotTables

This is a requirement for PivotTables that also teaches you important lessons about how to structure your data. It’s tempting to see a long list of data and just say “throw a PivotTable on it!” and not think through if the PivotTable will actually “work.” What I mean by this is that the underlying source data has to be laid out correctly in order for you to do any type of exploratory data analysis on your data set. Take this common layout of data you might see in a PowerPoint presentation or some deck (video game sales):

You have years (or some time period) along the columns and then some measure in the rows. Perfectly fine table of data for seeing sales, in this case, for different video game companies. Now there are a ton of other companies and devices to report on, so maybe for the purposes of data analysis, you want to put this into a PivotTable. Once you do that, this is what the PivotTable settings look like:

Not super useful for quickly seeing sales trends by year, seeing which devices had the most sales, etc. You have to “drag in” each year into the Columns field to see the actual sales.

Transforming and massaging the summary table

To make this data look right, it requires a bit of manual massaging of the data. There are ways you could automate this with a macro, but it’s important to understand why the data should look this way before you “throw a PivotTable” on the data:

A few things to note about this dataset:

  1. Years are no longer across the top in the columns. It’s just one column with the value being the year itself.
  2. The sales metric is also its own column and not spread out across columns.

Given that more data you’re analyzing is coming from databases, data will most likely be structured like this to begin with. But when you’re manually aggregating data from reports or other sources in Excel and need to get it into a format ready for PivotTables, this is the transformation you need to do in order for it to look right for data analysis.

2. Doing deeper data analysis after you spot a trend

One of the main benefits of PivotTables is being able to do quickly do exploratory data analysis. You can easily drag-and-drop columns from your original data table to look into trends and see if it’s something worth digging deeper into.

For instance, let’s say with out video game sales dataset, we want to see which platforms were trending in the early 1990s:

As you can see, SNES (Super Nintendo) has a clear lead in terms of global sales from 1990-1995. It is interesting to see PS (PlayStation) make a splash in the market in 1994 and 1995. That $35.92 number in 1995 is quite interesting. It beat out SNES sales and is more than 5X the previous year’s sales. Which games contributed to that sales number in 1995? Which video game publishers created those games for PlayStation?

You can double-click into the number in the PivotTable and literally dig deeper into the individual rows that “make up” that number. This is one of my favorite feature of PivotTables and is essentially like finding a trend and being able to query your database to find the specifics:

After you double-click, a new worksheet gets created showing you the underlying rows that make up the $35.92M number. The table that shows up might have many rows so you may need to filter and sort it a bit to further look for the answers to your data analysis questions. Doing a quick sort shows that the top games sold in 1995 on PlayStation were Namco Museum Vol. 1 (Sony), Tekken (Sony), and Rayman (Ubisoft). Ahh this brings back memories:

Early 1990s Sony Playstation games

3. Create your own calculations for data analysis

When you work with a dataset that has many columns, you will probably have many facts and metrics that your company tracks. This is usually sales, number of customers, or orders. This is typically the column you drag into the “Values” field of the PivotTable:

If your data looked like the table above, the two columns you would drag into the “Values” field in the PivotTable would most likely be Quantity and Unit price. You would only care about these columns because you want to be able to answer questions like:

  • In which year did we sell the lowest quantity of products?
  • What was the average unit price for cookies?
  • Which customer purchased the most products?

What if your dataset doesn’t contain the metric you’re looking for? If we go back to our video game sales data, you’ll notice we have a Units Sold column, but what if I wanted to see what the average retail price was by year?

The quickest solution you might think about doing is simply adding another column to your source data called Average Sales Price and just divide the sales by units sold:

You never want to manually create any totals, averages, or other summary metrics in your source data. I’ll repeat that again: don’t create any totals or averages in your source data. If you do this and then throw on a PivotTable, the metrics won’t be accurate as you slice and dice your data in the PivotTable.

One of the benefits of PivotTables is you can create your own calculations if one doesn’t exist yet (like the Average Sales Price in our video games sales example). These are called Calculated Fields in Excel. Calculated fields are dynamic because as you drag in columns from your source data, the calculated field will change just like any other metric you have in your source data. In our video games example, we are creating a calculated field for average sales price by taking the current total sales, multiplying it by 1,000,000 (since our sales is in $M), and dividing it by the Units Sold column:

👉 Learn more about PivotTables for data analysis

I have to shamelessly plug an advanced PivotTable course I created last year where you can learn some of the tips and tricks mentioned in this episode. I’m planning on producing more advanced PivotTable courses and publishing them later this year on Skillshare since I think they are an invaluable tool for data analysis. If you are looking to increase your skills with PivotTables, give the course below a look:

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #86: One Important Excel Feature to Know to Do Your Best Data Analysis appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-86-one-important-excel-feature-to-know-to-do-your-best-data-analysis/feed/ 0 Nothing like a click-baity headline to get your spreadsheet emotion all riled up amirite? Earlier in my data analysis career, I thought knowing advanced Excel formulas and writing macros made you a good analyst. Nothing like a click-baity headline to get your spreadsheet emotion all riled up amirite? Earlier in my data analysis career, I thought knowing advanced Excel formulas and writing macros made you a good analyst. If you've been following this podcast/newsletter, you've probably discovered that there is no one magic Excel feature that automatically makes you a good data analyst. The key to good data analysis is a soft skill: asking good questions (see this episode with the co-founder of Mode, Benn Stancil). Having said that, there is one Excel feature that I learned early in my career that really helped me improve my data analysis: PivotTables. Why PivotTables and what features about PivotTables make them so good for data analysis? Read on for more. You can also download the Excel file used in this episode here.



Michael Jordan: an expert user of the pivot



Video tutorial of this episode:




https://www.youtube.com/watch?v=WXdu4ijNly8




1. Getting data to look right for PivotTables



This is a requirement for PivotTables that also teaches you important lessons about how to structure your data. It's tempting to see a long list of data and just say "throw a PivotTable on it!" and not think through if the PivotTable will actually "work." What I mean by this is that the underlying source data has to be laid out correctly in order for you to do any type of exploratory data analysis on your data set. Take this common layout of data you might see in a PowerPoint presentation or some deck (video game sales):







You have years (or some time period) along the columns and then some measure in the rows. Perfectly fine table of data for seeing sales, in this case, for different video game companies. Now there are a ton of other companies and devices to report on, so maybe for the purposes of data analysis, you want to put this into a PivotTable. Once you do that, this is what the PivotTable settings look like:







Not super useful for quickly seeing sales trends by year, seeing which devices had the most sales, etc. You have to "drag in" each year into the Columns field to see the actual sales.



Transforming and massaging the summary table



To make this data look right, it requires a bit of manual massaging of the data. There are ways you could automate this with a macro, but it's important to understand why the data should look this way before you "throw a PivotTable" on the data:







A few things to note about this dataset:



* Years are no longer across the top in the columns. It's just one column with the value being the year itself.* The sales metric is also its own column and not spread out across columns.



Given that more data you're analyzing is coming from databases, data will most likely be structured like this to begin with. But when you're manually aggregating data from reports or other sources in Excel and need to get it into a format ready for PivotTables, this is the transformation you need to do in order for it to look right for data analysis.



2.]]>
Dear Analyst 86 28:43 51492