Dear Analyst https://www.thekeycuts.com/category/podcast/ A show made for analysts: data, data analysis, and software. Mon, 27 Mar 2023 16:49:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.1.1 This is a podcast made by a lifelong analyst. I cover topics including Excel, data analysis, and tools for sharing data. In addition to data analysis topics, I may also cover topics related to software engineering and building applications. I also do a roundup of my favorite podcasts and episodes. KeyCuts clean episodic KeyCuts A show made for analysts: data, data analysis, and software. Dear Analyst https://www.thekeycuts.com/wp-content/uploads/2019/03/dear_analyst_logo-1.png https://www.thekeycuts.com/excel-blog/ TV-G New York, NY New York, NY 5f213539-991a-51f4-96e4-df596a7aec88 50542147 Dear Analyst Episode #116: Will Microsoft’s AI Copilot for Excel replace the need for analysts? https://www.thekeycuts.com/dear-analyst-episodes-116-will-microsofts-ai-copilot-for-excel-replace-the-need-for-analysts/ https://www.thekeycuts.com/dear-analyst-episodes-116-will-microsofts-ai-copilot-for-excel-replace-the-need-for-analysts/#respond Mon, 27 Mar 2023 15:48:55 +0000 https://www.thekeycuts.com/?p=53277 This news is a bit old but I figured it’s juicy enough to talk about its future implications on Excel and artificial intelligence in general. Mid-March 2023, Microsoft announced Copilot, it’s artificial intelligence bet that will supposedly change the way we work. The video discusses how Copilot integrates with Office 365 and all your Microsoft […]

The post Dear Analyst Episode #116: Will Microsoft’s AI Copilot for Excel replace the need for analysts? appeared first on .

]]>
This news is a bit old but I figured it’s juicy enough to talk about its future implications on Excel and artificial intelligence in general. Mid-March 2023, Microsoft announced Copilot, it’s artificial intelligence bet that will supposedly change the way we work. The video discusses how Copilot integrates with Office 365 and all your Microsoft apps including Excel. Around minute 18:00, they show a demo of how Copilot helps you find trends, make adjustments to your models, and more. It’s quite impressive. You can watch just that segment from the presentation below. I watched the video a few times and wondered: will Copilot eliminate the need for entry-level data analysts? Only time will tell.

Breaking down the features in Copilot for Excel

This is the corporate marketing blurb from the Microsoft blog post announcing Copilot for Excel:

Copilot in Excel works alongside you to help analyze and explore your data. Ask Copilot questions about your data set in natural language, not just formulas. It will reveal correlations, propose what-if scenarios, and suggest new formulas based on your questions—generating models based on your questions that help you explore your data without modifying it. Identify trends, create powerful visualizations, or ask for recommendations to drive different outcomes. Here are some example commands and prompts you can try:

  • Give a breakdown of the sales by type and channel. Insert a table.
  • Project the impact of [a variable change] and generate a chart to help visualize.
  • Model how a change to the growth rate for [variable] would impact my gross margin.

The video shows the above 3 bullet points using a dataset of product sales by country:

Finding key trends with Copilot for Excel

The first demo involves giving Copilot a prompt like “analyze the data and give me 3 trends.” The output is something you might expect if you’ve done anything with ChatGPT:

This feature in Copilot is table stakes and a version of this came out in Google Sheets in 2017. The Explore panel in Google Sheets can provide similar summary trends on your data and suggest charts you should add to your analysis. Google Sheets has slowly been adding AI-like features over the last few years, so don’t sleep on Google Workspace’s own AI announcement. Below is a dataset of hotels and their locations and I simply clicked on the Explore option in the bottom-right of the Google Sheet:

The trends don’t come in a free-form text format but the different widgets are interesting. The first widget shows additional questions you might ask of your dataset (and Google Sheets spits out the answer). Then the most common visualizations like Pivot Tables and charts are displayed afterwards which makes it easy to analyze and visualize your data. This leads into the next feature in Copilot for Excel: visualizing your data.

Visualizing your data with Copilot for Excel

What’s old is new. As I explained in the previous section, Google Sheets’ Explore panel already has a flavor of this feature. The next prompt for Copilot is “Show me a breakdown of Proseware sales growth.” Yes, it’s natural language. Yes, humans are lazy and it’s easy just to ask a question in plain English and get an answer back. But the summary and data and charts already exist in Google Sheets. This just happens to be Excel’s implementation of the Explore feature and the AI is the entry point to this feature:

I like Copilot responds to the prompt by saying:

Remember to check for accuracy.

That doesn’t inspire much confidence in you, Copilot! Nonetheless, Copilot does a few things that are interesting:

  • Created a chart with a title and the title has selective formatting (assuming the AI made the “Sales” word foreground color green)
  • The tables are nicely formatted with clear headers, formatted percentages, and growth rates
  • The background colors for all the cells are white (common formatting trick for making your visualizations stand out more)
  • Columns are re-sized to fit the width of the products and the growth rates
  • Column A and Row 1 are very narrow in width and height, respectively (another common trick to making dashboards look cleaner)

Was this all AI or just smoke and mirrors?

It’s hard to say which of the above formatting operations were done by the AI versus a human who just cleaned up the spreadsheet for a demo.

Does the AI know that a summary table looks better when the background color cells are all white?

Does the AI know that analysts like to make column A and row 1 super narrow/short so that the charts and tables are flush against the edges of the spreadsheet?

If Copilot knew all this, that’s pretty slick. But this just so happens to be the vanilla formatting you’ll see in a dashboard devoid of any custom coloring or branding. It will be interesting to see how an analyst would train Copilot to create visualizations that match the theme and brand guidelines for existing reports.

The next prompt is “Help me visualize what contributed to the decline in sales growth?” The interesting leap that Copilot makes here is translating a very simply business question into a feature (conditional formatting to highlight what contributed to the decline):

But simply applying conditional formatting to a table of numbers is not nearly as impressive as all the formatting steps the AI did in the previous step to create the table in the first place.

What-if scenario analysis with Copilot for Excel

This is probably the most interesting part of the demo. The next prompt is:

What would have happened if Reusable Containers had maintained the prior quarter’s growth rate?

Before Copilot, you’d have to start thinking about duplicating your summary table and start setting up cell references to replace the current growth rate with another number. Assuming this is not some human playing around with data for the demo, Copilot does the whole thing for you:

What’s impressive is that Copilot was able to copy the original summary table and paste it directly to the right of it. This makes comparing the growth rates easy. It was also able to change the title to reflect the answer to the original prompt. Finally, the step-by-step bullet points tell you exactly what Copilot did to create the analysis.

Perhaps this type of analysis is “easy” for Copilot since you have a relatively simple summary table with clearly spelled out products and growth rates. What if there are more variables involved or there are other one-off factors that would impact the analysis? According to the longer Copilot demo, Copilot has access to the full corpus of data for your organization so it should have the domain expertise that someone who works in the business knows. This means you could ask Copilot questions whose answers are tucked away in some Outlook email, Teams thread, or PowerPoint slide. That’s pretty freaking cool.

The question still remains: Will Copilot replace the need for data analysts?

Source: The Wall Street Journal

If the analysis is as simple as what Microsoft showed in this demo, I think the answer is yes.

If you’re an entry-level analyst, this type of task is not very uncommon. You have dataset where you need to build summary tables and put them into PowerPoint decks to present during meetings. Your manager tells you: “Hey, what would growth look like for Reusable Containers if we didn’t completely tank last quarter and used historical growth rates?” You would probably follow a similar step-by-step process as the above screenshot shows. Copilot appears to be able to do the basic analyst grunt work and format the analysis in a clear visualization.

Why Copilot won’t replace analysts at large enterprises

While Copilot does look impressive, it definitely won’t replace human data analysts who understand nuance, context, and business knowledge at large enterprises. If you are a startup and building a model from scratch, Copilot might be a good solution to get something off the ground and running. The Microsoft demo clearly shows that this is possible. I can foresee a few situations where Copilot would not be used in a large enterprise:

  1. A lot of money is on the line – The Copilot prompt already tells you to “check for accuracy.” If you are working on a multi-million dollar deal, you best be sure you have a human taking a look at the numbers.
  2. Company culture may not be captured in Microsoft applications – As much as our knowledge is “written down” in Word, Outlook, and Teams, there is a lot that is not formally written down in these applications. Humans understand the nuances about company culture and how that can impact the analyses and dashboards analysts create.
  3. Existing templates have already been created – In a large enterprise, you are most likely copying an existing file to build a model or dashboard. That institutional knowledge has resulted in well-formatted dashboards where Copilot may not add much value (if formatting is a big part of the task).

Long story short, I’d love to see Copilot tackle a more complicated task that can’t be solved with a simple template. If you’re well versed in Excel, doing what this demo did by “hand” might take all of 15 minutes and you build the knowledge on how to do this analysis in the future. This knowledge makes debugging and troubleshooting models easier.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst Episode #116: Will Microsoft’s AI Copilot for Excel replace the need for analysts? appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-episodes-116-will-microsofts-ai-copilot-for-excel-replace-the-need-for-analysts/feed/ 0 This news is a bit old but I figured it's juicy enough to talk about its future implications on Excel and artificial intelligence in general. Mid-March 2023, Microsoft announced Copilot, it's artificial intelligence bet that will supposedly change the ... This news is a bit old but I figured it's juicy enough to talk about its future implications on Excel and artificial intelligence in general. Mid-March 2023, Microsoft announced Copilot, it's artificial intelligence bet that will supposedly change the way we work. The video discusses how Copilot integrates with Office 365 and all your Microsoft apps including Excel. Around minute 18:00, they show a demo of how Copilot helps you find trends, make adjustments to your models, and more. It's quite impressive. You can watch just that segment from the presentation below. I watched the video a few times and wondered: will Copilot eliminate the need for entry-level data analysts? Only time will tell.




https://www.youtube.com/watch?v=I-waFp6rLc0




Breaking down the features in Copilot for Excel



This is the corporate marketing blurb from the Microsoft blog post announcing Copilot for Excel:




Copilot in Excel works alongside you to help analyze and explore your data. Ask Copilot questions about your data set in natural language, not just formulas. It will reveal correlations, propose what-if scenarios, and suggest new formulas based on your questions—generating models based on your questions that help you explore your data without modifying it. Identify trends, create powerful visualizations, or ask for recommendations to drive different outcomes. Here are some example commands and prompts you can try:




* Give a breakdown of the sales by type and channel. Insert a table.



* Project the impact of [a variable change] and generate a chart to help visualize.



* Model how a change to the growth rate for [variable] would impact my gross margin.





The video shows the above 3 bullet points using a dataset of product sales by country:







Finding key trends with Copilot for Excel



The first demo involves giving Copilot a prompt like "analyze the data and give me 3 trends." The output is something you might expect if you've done anything with ChatGPT:







This feature in Copilot is table stakes and a version of this came out in Google Sheets in 2017. The Explore panel in Google Sheets can provide similar summary trends on your data and suggest charts you should add to your analysis. Google Sheets has slowly been adding AI-like features over the last few years, so don't sleep on Google Workspace's own AI announcement. Below is a dataset of hotels and their locations and I simply clicked on the Explore option in the bottom-right of the Google Sheet:







The trends don't come in a free-form text format but the different widgets are interesting. The first widget shows additional questions you might ask of your dataset (and Google Sheets spits out the answer). Then the most common visualizations like Pivot Tables and charts are displayed afterwards which makes it easy to analyze and visualize your data. This leads into the next feature in Copilot for Excel: visualizing your data.



Visualizing your data with Copilot for Excel



What's old is new. As I explained in the previous section, Google Sheets' Explore panel already has a flavor of this feature. The next prompt for Copilot is "Show me a breakdown of ...]]>
Dear Analyst 116 23:42 53277
Dear Analyst #115: How to count the number of colored cells or formatted cells in Google Sheets https://www.thekeycuts.com/dear-analyst-115-how-to-count-the-number-of-colored-cells-or-formatted-cells-in-google-sheets/ https://www.thekeycuts.com/dear-analyst-115-how-to-count-the-number-of-colored-cells-or-formatted-cells-in-google-sheets/#respond Mon, 27 Feb 2023 17:59:47 +0000 https://www.thekeycuts.com/?p=53182 Counting the number of colored cells or formatted cells in Google Sheets or Excel seems like it should be a basic operation. Unfortunately after much Googling, it doesn’t seem as easy as it looks. I came across this Mr. Excel forum thread where someone asks how to count the number of rows where there is […]

The post Dear Analyst #115: How to count the number of colored cells or formatted cells in Google Sheets appeared first on .

]]>
Counting the number of colored cells or formatted cells in Google Sheets or Excel seems like it should be a basic operation. Unfortunately after much Googling, it doesn’t seem as easy as it looks. I came across this Mr. Excel forum thread where someone asks how to count the number of rows where there is a colored cell. The answers range from VBA to writing formulas that indicate whether a cell should be colored to the usual online snark. I think the basic issue is this. A majority of Excel or or Google Sheets users will have a list of data and they will color-code cells to make it easier to read or comprehend the data. No fancy formulas or PivotTables. Just coloring and formatting cells so that important ones stick out. I thought this would be a simple exercise but after reading the thread, I came up with two solutions that work but have drawbacks. The Google Sheet for this episode is here.

Video walkthrough:

Color coding HR data

In the Mr. Excel thread, the original poster talks about their HR data set and the rules their team uses to color-code their data set. Many people in the thread talk about setting up rules for conditional formatting (which I agree with). But it sounds like people just look through the data set and manually color code the cells based on the “Color Key” mentioned in the post:

I think this manual color coding of cells is very common. Yes, someone could write conditional formatting logic to automate the formatting and color coding of these cells. But for most people, I’d argue just eyeballing the dataset and quickly switching the background or foreground color of the cell is easier, faster, and more understandable for a beginner spreadsheet user. If there isn’t that much data, then manually color coding cells feels less onerous.

I put a subset of the data into this Google Sheet and manually color-coded some of the cells into column B below:

Method #1 for counting colored cells: Filter by color and the SUBTOTAL formula

The quickest way to count the number of cells that have a certain color format is to filter the column by color. After applying the filter to all the column headers, you can filter a column by the cell’s background color through the column header menu. Filter by color -> Fill color -> Desired color:

Let’s say I filter this column by the yellow background color. You’ll see this results in a filtered data set with 9 rows remaining:

In order to actually count the number of cells in this filtered data set, you might be tempted to do a COUNTA() formula, but let’s see what happens when I put this into cell B51:

The formula counts all the rows in the data set including the rows that have been filtered out. Instead, you can use the SUBTOTAL() formula which magically returns the sum, count, etc. for a filtered data set. The key is to use the value “3” for the first parameter to tell Google Sheets to count only the cells in the filtered data set:

I don’t think this is the usual use case for the SUBTOTAL formula. But like many formulas in Google Sheets/Excel, it works! To recap on this method:

Pros

  • Easy to use and implement
  • Doesn’t require the use of VBA or Google Apps Script
  • Since it’s a formula, it’s dynamic and can change as your data changes (with caveats)

Cons

  • Requires a few steps to get it to work (e.g. filter your data set by a color)
  • Each time you want to count the number of formatted cells, you need to re-filter by a different color
  • Since your data is filtered, you can’t easily update the source data and requires you to re-filter by a color

Method #2: Filtered views to allow for dynamic updating of data with the SUBTOTAL formula

This is an extension of method #1. One of the cons of method #1 is that once you’ve filtered your data set, you need to un-filter the data set if you want to add or remove formatting from your cells. For instance, in column B we have a bunch of yellow colored cells. If you want to highlight another cell as yellow and then re-count the number of cells that are colored yellow, you have to un-filter the data set, highlight the cell that needs to be colored yellow, re-filter the column, and re-write the SUBTOTAL formula (assuming you put it at the bottom of column B):

To avoid filtering and un-filtering the data set, you can create a filtered view of the data set. Additionally, you can put the SUBTOTAL formula somewhere that’s not at the bottom of the data set. Let’s first create a a filtered view just on the background color yellow and we’ll call it “Yellow Cells”:

Now you can quickly switch between the filtered view of yellow-colored cells and the unfiltered data set:

Then we can put the SUBTOTAL formula somewhere below the bottom of the data set. Notice now how when we switch between the filtered view and the unfiltered data set, the SUBTOTAL formula automatically updates:

While this method is an improvement on method #1, it still has some drawbacks. A recap of this method:

Pros

  • Easily switch between the filtered and unfiltered data set
  • Update cells with new colors and have that flow into the SUBTOTAL formula dynamically

Cons

  • Filtered views are not an easily discoverable feature in Google Sheets
  • Still requires you to go through the Data menu and flip back and forth when you want to count the number of colored cells

Method #3: A macro to count the number of colored or formatted cells in a range

Almost all the other solutions for counting the number of colored or formatted cells on the Internet refer to a VBA script for Excel. This is a macro for Google Sheets using Google Apps Script. You can copy and paste the script from this gist. When you run the CountFormattedCells macro in Google Sheets, it counts all the cells that have a background color in column B below. It then outputs the count of cells in cell 52 after you’ve selected a range of cells where you want to count the colored cells:

If you want to specify a color to count, you can color cell C53 with color you want to count. Let’s say I want to count only the green cells. I would color cell C53 with green, select all the cells where I want to find the color green, and then run the macro:

The key to making this work is setting some variables up in the script. The two variables you have to set in the script are outputNumberOfFormattedCells and cellWithFormatToCount. The cells you pick will depend on the specific spreadsheets you’re working with. In the script below, you’ll see that you have to edit the first two variables fit the needs of your Google Sheet:


function CountFormattedCells() {
  
  // Output the number of formatted cells somewhere in your spreadsheet
  var outputNumberOfFormattedCells = 'C52'

  // Cell that contains the color you want to count. Default is blank.
  var cellWithFormatToCount = 'C53'

  var spreadsheet = SpreadsheetApp.getActive();
  var currentRangeColors = spreadsheet.getActiveRange().getBackgrounds();
  if (cellWithFormatToCount !== '') { var cellWithFormat = spreadsheet.getRange(cellWithFormatToCount).getBackground(); }
  var formattedCellCount = 0
  for (var i in currentRangeColors) {
    for (var j in currentRangeColors[i]) {
      if (currentRangeColors[i][j] !== '#ffffff' && cellWithFormatToCount == '') {
        formattedCellCount++
      } else if (cellWithFormatToCount !== '' && currentRangeColors[i][j] == cellWithFormat) {
        formattedCellCount++
      }
    }
  }
  if (outputNumberOfFormattedCells != '') {
    spreadsheet.getRange(outputNumberOfFormattedCells).setValue(formattedCellCount)
  }
};

The macro is very easy to use but it does require you knowing how to add macros to your Google Sheet and editing the script in Google Apps Script. The recap for this method:

Pros

  • Script is easy to copy and paste into Google Apps Script and works right out of the box
  • Just two variables to customize
  • Doesn’t require any filtering of your data set or any formulas
  • Can assign a keyboard shortcut to the macro to quickly run the macro
  • Could assign a time-based trigger to the macro so that it runs every minute or hour to give you a “dynamic” count

Cons

  • Requires knowledge of macros and editing a Google Apps Script
  • May need to change the location of the cell where you output the count of colored cells if your data changes a lot over time
  • Requires running the macro each time you want to get an updated count of the colored cells

Bottom line

None of these methods are that simple or easy to use in my opinion. Usually I have a preferred method for solving some Google Sheets or Excel problem, but in this case I can’t say I like or dislike a method over another one. If I had to pick one, I’d use method #3 since I’m comfortable with macros and editing Google Apps Scripts. But the Google Apps Script solution is far from easy to use for a beginner to Google Sheets.

The SUBTOTAL formula is indeed much easier to implement, but also comes with the added inconvenience of constantly filtering and unfiltering your data set.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #115: How to count the number of colored cells or formatted cells in Google Sheets appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-115-how-to-count-the-number-of-colored-cells-or-formatted-cells-in-google-sheets/feed/ 0 Counting the number of colored cells or formatted cells in Google Sheets or Excel seems like it should be a basic operation. Unfortunately after much Googling, it doesn't seem as easy as it looks. I came across this Mr. Counting the number of colored cells or formatted cells in Google Sheets or Excel seems like it should be a basic operation. Unfortunately after much Googling, it doesn't seem as easy as it looks. I came across this Mr. Excel forum thread where someone asks how to count the number of rows where there is a colored cell. The answers range from VBA to writing formulas that indicate whether a cell should be colored to the usual online snark. I think the basic issue is this. A majority of Excel or or Google Sheets users will have a list of data and they will color-code cells to make it easier to read or comprehend the data. No fancy formulas or PivotTables. Just coloring and formatting cells so that important ones stick out. I thought this would be a simple exercise but after reading the thread, I came up with two solutions that work but have drawbacks. The Google Sheet for this episode is here.







Video walkthrough:




https://www.youtube.com/watch?v=h-hdZPGDbDg




Color coding HR data



In the Mr. Excel thread, the original poster talks about their HR data set and the rules their team uses to color-code their data set. Many people in the thread talk about setting up rules for conditional formatting (which I agree with). But it sounds like people just look through the data set and manually color code the cells based on the "Color Key" mentioned in the post:







I think this manual color coding of cells is very common. Yes, someone could write conditional formatting logic to automate the formatting and color coding of these cells. But for most people, I'd argue just eyeballing the dataset and quickly switching the background or foreground color of the cell is easier, faster, and more understandable for a beginner spreadsheet user. If there isn't that much data, then manually color coding cells feels less onerous.



I put a subset of the data into this Google Sheet and manually color-coded some of the cells into column B below:







Method #1 for counting colored cells: Filter by color and the SUBTOTAL formula



The quickest way to count the number of cells that have a certain color format is to filter the column by color. After applying the filter to all the column headers, you can filter a column by the cell's background color through the column header menu. Filter by color -> Fill color -> Desired color:







Let's say I filter this column by the yellow background color. You'll see this results in a filtered data set with 9 rows remaining:







In order to actually count the number of cells in this filtered data set, you might be tempted to do a COUNTA() formula, but let's see what happens when I put this into cell B51:







The formula counts all the rows in the data set including the rows that have been filtered out. Instead, you can use the SUBTOTAL() formula which magically returns the sum, count, etc. for a filtered data set. The key is to use the value "3" for the first parameter to tell Google Sheets to count only the cells in the filtered data set:
]]>
KeyCuts 34:26 53182
Dear Analyst Episode #114: How a small real estate investment company uses modern data and cloud tools to make data-driven decisions https://www.thekeycuts.com/episode-114-how-a-small-real-estate-investment-company-uses-modern-data-and-cloud-tools-to-make-data-driven-decisions/ https://www.thekeycuts.com/episode-114-how-a-small-real-estate-investment-company-uses-modern-data-and-cloud-tools-to-make-data-driven-decisions/#respond Tue, 17 Jan 2023 06:58:00 +0000 https://www.thekeycuts.com/?p=52480 When you think of data pipelines, data warehouses, and ETL tools, you may be thinking about some large enterprise that is collecting and processing data from IoT devices or from a mobile app. These companies are using tools from AWS and Google Cloud to build these complex workflows to get data to where it needs […]

The post Dear Analyst Episode #114: How a small real estate investment company uses modern data and cloud tools to make data-driven decisions appeared first on .

]]>
When you think of data pipelines, data warehouses, and ETL tools, you may be thinking about some large enterprise that is collecting and processing data from IoT devices or from a mobile app. These companies are using tools from AWS and Google Cloud to build these complex workflows to get data to where it needs to be. In this episode, you’ll hear about a relatively small company who is using modern cloud and data tools rivaling these aforementioned enterprises. Elite Development Group is a real estate investment and construction company based in York, Pennsylvania and is less than 50 employees. Doug Walters is the Director of Strategy and Technology and Elite and he discusses how data at Elite was trapped in Quickbooks and in their various tools like property management software. He spearheaded projects to build data connectors to aggregate various data sources to help build a modern data stack to help make real estate decisions.

Data is stuck in silos

Elite Development Group consists of a few divisions: HVAC, home performance, energy efficiency, etc. All the typical functions you’d expect a real estate company to have. Doug first started working in IT support and realized their company didn’t have easy access to their data to make data-driven decisions. You’ve probably heard this phrase over and over again:

Data is trapped in silos.

You buy some off-the-shelf software (in this case property management) that is meant for one specific use case. Over time, that data needs to be merged with your customers data or sales data. You end up exporting the data in these silos to CSVs to further combine these data sources down the line. For Elite, data was trapped in property management software, Quickbooks, you name it.

Starting the process to export data

After doing a survey of their tools, Doug realized that there weren’t many APIs to easily extract data from the source. So he helped set up data scrapers to get data off of the HTML pages. He also used tools like Docparser to extract data from Word docs and PDFs.

Most data was either in XLS or CSV format, so Doug was able to set up an automated system where every night he’d get an email with a CSV dump from their property management system. This data then ended up in a Google Sheet for everyone to see and collaborate on. After doing this with property management, Doug started exploring getting the data out from their work order tracking system.

Creating accurate construction cost estimates

One activity Doug wanted to shine the data lens on was cost estimates as they relate to construction. Hitting budgets is a big part of the construction process. You have multiple expenditures from a job and each job needs to have a specific estimate tied to it. This could all be done in Excel or Google Sheets, but given the importance of this data, Doug decided to create something more durable. He created an internal database where each cost estimate and a specific Estimate ID. A unique identifier to give to a cost estimate.

Since Elite uses Quickbooks for their accounting, each project had to be tied to a unique Estimate ID established previously. Then each work order had a unique Work Order ID. Now Elite is able to run reports on all their projects to see what the cost estimates and actual expenditures were for a job. Now they could do a traditional budget to actual variance analysis.

The result? Project teams could start to see when they were about to hit their budgets in real time.

More importantly, this started Doug down a journey of seeing how far he could automate the data extraction and reporting for his company. With the current implementation, the data could only get refreshed every 24 hours. He eventually set up the system so that any user could click a button to refresh a report. The data workflow started from exporting data into Excel and Google Sheets and into complex data connectors and using software for business intelligence.

Income lost due to vacancy metric

When Elite prioritizes which projects to work on, they look at a metric called “income lost due to vacancy.” Without the different data connectors and systems Doug help set up, this metric wouldn’t exist. This metric essentially helps a property owner figure out how much income they are losing due to vacancies.

When looking at a portfolio of properties to improve, Elite can use this metric to figure out which project would have more high-rent units available. Previously, they would have to rely on intuition to figure out where to invest more time and money into projects.

Building out the data stack

The list of tools Elite uses to extract and process data rivals that of large enterprises. Here is a rundown of Elite’s data stack:

  • Fivetran for data loading and extraction
  • AWS Redshift as the data warehouse
  • Google Cloud functions to run one-off tasks
  • dbt for transformation and for pushing data into a datamart
  • Sisense to create actionable insights

There are multiple data connectors involved for doing the ETL process as well. With all these modern tools, Elite is able to get the most up-to-date data every 5-15 minutes.

As Elite went through this data journey, Doug and his team started to ask some of their vendors to develop an API so they could get more data out. Their data vendors would push back and say they’ve never seen these requests from such a small company. Typically these data requests are coming from their large customers which shows how deeply Doug’s team has thought about automating their data workflows.

Advice for small companies working with big data

Doug gives some practical advice on how to use some of these tools that are supposedly meant for large enterprises. The first thing is to experiment with spreadsheets before diving deep into a complicated workflow. Doing your due diligence in a spreadsheet is low stakes and helps you uncover all the various relationships between your data.

In terms of learning how to use these tools, Doug mentioned that most of these vendors have their own free or paid workshops and tutorials. I’m always surprised by how much general data training these vendors provide that many not even be about their software. You can learn about databases, SQL, and data analysis from these vendors.

At a high level, Doug says that the data you collect and visualize needs to be tied to some business strategy. These overall goals might include increasing revenue, increasing customers satisfaction, or ensuring your employees are developing new skills. At Elite, the data has allowed the team to look at their portfolio of real estate at the 30,000-foot level all the way down to individual transactions. Data is actually helping them solve real business problems.

And one last plug for Google Sheets: Doug talked about how you would have to hire someone who was an “Excel guru” or a data analyst to help you decipher your Google Sheets files. Now Google Sheets has become so robust, extensible, and–dare I say–easy to use that anyone in the company can pick it up and mold it to their needs. No one ever gets fired for using a Google Sheet 😉.

Other Podcasts & Blog Posts

No other podcasts mentioned in this episode!

The post Dear Analyst Episode #114: How a small real estate investment company uses modern data and cloud tools to make data-driven decisions appeared first on .

]]>
https://www.thekeycuts.com/episode-114-how-a-small-real-estate-investment-company-uses-modern-data-and-cloud-tools-to-make-data-driven-decisions/feed/ 0 When you think of data pipelines, data warehouses, and ETL tools, you may be thinking about some large enterprise that is collecting and processing data from IoT devices or from a mobile app. These companies are using tools from AWS and Google Cloud to... When you think of data pipelines, data warehouses, and ETL tools, you may be thinking about some large enterprise that is collecting and processing data from IoT devices or from a mobile app. These companies are using tools from AWS and Google Cloud to build these complex workflows to get data to where it needs to be. In this episode, you'll hear about a relatively small company who is using modern cloud and data tools rivaling these aforementioned enterprises. Elite Development Group is a real estate investment and construction company based in York, Pennsylvania and is less than 50 employees. Doug Walters is the Director of Strategy and Technology and Elite and he discusses how data at Elite was trapped in Quickbooks and in their various tools like property management software. He spearheaded projects to build data connectors to aggregate various data sources to help build a modern data stack to help make real estate decisions.







Data is stuck in silos



Elite Development Group consists of a few divisions: HVAC, home performance, energy efficiency, etc. All the typical functions you'd expect a real estate company to have. Doug first started working in IT support and realized their company didn't have easy access to their data to make data-driven decisions. You've probably heard this phrase over and over again:




Data is trapped in silos.




You buy some off-the-shelf software (in this case property management) that is meant for one specific use case. Over time, that data needs to be merged with your customers data or sales data. You end up exporting the data in these silos to CSVs to further combine these data sources down the line. For Elite, data was trapped in property management software, Quickbooks, you name it.







Starting the process to export data



After doing a survey of their tools, Doug realized that there weren't many APIs to easily extract data from the source. So he helped set up data scrapers to get data off of the HTML pages. He also used tools like Docparser to extract data from Word docs and PDFs.



Most data was either in XLS or CSV format, so Doug was able to set up an automated system where every night he'd get an email with a CSV dump from their property management system. This data then ended up in a Google Sheet for everyone to see and collaborate on. After doing this with property management, Doug started exploring getting the data out from their work order tracking system.



Creating accurate construction cost estimates



One activity Doug wanted to shine the data lens on was cost estimates as they relate to construction. Hitting budgets is a big part of the construction process. You have multiple expenditures from a job and each job needs to have a specific estimate tied to it. This could all be done in Excel or Google Sheets, but given the importance of this data, Doug decided to create something more durable. He created an internal database where each cost estimate and a specific Estimate ID. A unique identifier to give to a cost estimate.



Since Elite uses Quickbooks for their accounting, each project had to be tied to a unique Estimate ID established previously. Then each work order had a unique Work Order ID. Now Elite is able to run reports on all their projects to see what the cost estimates and actual expenditures were for a job. Now they could do a traditional budget to actual variance analysis.







The result? Project teams could start to see when they were about to hit their budgets in real time.



]]>
Dear Analyst 114 32:57 52480
Dear Analyst #113: Top 5 data analytics predictions for 2023 https://www.thekeycuts.com/dear-analyst-113-top-5-data-analytics-trends-for-2023/ https://www.thekeycuts.com/dear-analyst-113-top-5-data-analytics-trends-for-2023/#respond Tue, 27 Dec 2022 06:16:00 +0000 https://www.thekeycuts.com/?p=52829 It’s that time of the year again where data professionals look at their data predictions from 2022 and decide what they were wrong about and think: “this must be the year for XYZ.” Aside from the fact that these type of predictions are 100% subjective and nearly impossible to verify, it’s always fun to play […]

The post Dear Analyst #113: Top 5 data analytics predictions for 2023 appeared first on .

]]>
It’s that time of the year again where data professionals look at their data predictions from 2022 and decide what they were wrong about and think: “this must be the year for XYZ.” Aside from the fact that these type of predictions are 100% subjective and nearly impossible to verify, it’s always fun to play armchair quarterback and make a forecast about the future (see why forecasts are flawed in this episode about Superforecasting). The reason why predicting what will happen in 2023 is that my predictions are based on what other people are talking about, not necessarily what they are doing. The only data point I have on what’s actually happening within organizations is what I see happening in my own organization. So take everything with a grain of salt and let me know if these predictions resonate with you!

1) Artificial intelligence and natural language processing doesn’t eat your lunch

How could a prediction for 2023 not include something about artificial intelligence? It seems like the tech world was mesmerized by ChatGPT in the second half of 2022, and I can’t blame them. The applications and use cases are pretty slick and mind-blowing. Internally at my company, we’ve already started testing out this technology for summarizing meeting notes and it works out quite well and saves a human from having to manually summarize the notes. My favorite application of AI shared on Twitter (where else do you discover new technologies? Scientific journals?) is this bot that argues with a Comcast agent and successfully gets a discount on an Internet plan:

These examples are all fun and cute and may help you save on your phone bill, but I’m more interested in how AI will be used inside organizations to improve data quality.

Data quality is always an issue when you’re collecting large amounts in real-time every day. Historically, analysts and data engineers are running SQL queries to find data with missing values or duplicate values. With AI, could some of this manual querying and UPDATE and INSERT commands be replaced with a system that intelligently fills in the data for you? In a recent episode with Korhonda Randolph, Korhonda talks about fixing data by sometimes calling up customers to get their correct info which then gets inputted a master data management system. David Yakobovitch talks about some interesting companies in episode 101 that smartly help you augment your data using AI.

We’ve also seen examples of AI helping people code via Codex, for example. I think this might be an interesting trend to look out for as the demand for data engineers from organizations outpaces supply. Could an organization cut some corners and rely on Codex to develop some of this core infrastructure for their data warehouse? Seems unlikely if you ask me, but given the current funding environment for startups, who knows what a startup founder might do as runways shrink.

2) Enforcing data privacy and regulation in your user database

This trend has been going in since the introduction of GDPR in 2018. As digital transformation pushes all industries to move online, data privacy laws like GDPR and CCPA force these companies to put data security and governance as the number one priority for all the data these companies store. In particular is user data. Any company that has a website where you can transact allows you to create a user account. Most municipalities have a dedicated app where you can buy bus and metro tickets straight from the app. Naturally, they ask you to create a profile where your various payment methods are stored.

When it comes to SaaS tools, the issue of data privacy becomes even more tricky to navigate. Many user research and user monitoring services tout their abilities to give organizations the ability to see what your users and customers are “doing” on these organizations’ websites and apps. Every single click, mouseover, and keystroke can be tracked. How much of this information do you store? What do you anonymize? It’s a cat and mouse game where user monitoring software vendors claim they can track everything about your customers, but then you have to temper what information you actually process and store. The data team at my own company is constantly checking these data privacy regulations to ensure that we implement data storage policies that reflect current legislation.

Source: DIGIT

A closely related area to data privacy is data governance. Data governance vendors who help your organization ensure your data strategy is compliant have increased dramatically over the years as a result of data regulation and protection laws.

To bring this back to a personal use case, type in your email address in haveibeenpwned.com. This website basically tells you which companies have had data breaches and whether your personal information may have been compromised. To take this another step, try Googling your name and your phone number or address in quotes (e.g. “John Smith 123-123-1234”). You’ll be surprised by how many of these “people finder” websites have your personal information and of your family members. One of the many websites you’ve signed up for probably had a breach and this information is now out there being aggregated by these websites, and you have to manually ask these websites to take your information out of their databases. Talk about data governance.

3) Data operations and observability tools manage the data lifecycle

I’m seeing this happen within my own company and others. DevOps not only monitors the health of your organization’s website and mobile app, but also databases and warehouse. It’s becoming more important for companies who undergo the digital transformation to maintain close to 100% uptime so that customers can access their data whenever they want. Once you give your customers and users a taste of accessing their data no matter where they are, you can’t go back.

I think it’s interesting to think about treating your “data as code” and apply concepts of versioning from software engineering to your data systems. Sean Scott talks about data as code in episode #96. The ETL process is completely automated and a data engineer or analyst can clone the source code for how transformations happen to the underlying data.

I’m a bit removed from my own organization’s data systems and tooling, but I do know that the data pipeline consists of many microservices and dependencies. Observability tools help you understand this whole system and ensure that if a dependency fails, you have ways to keep your data flowing to the right endpoints. I guess the bigger question is whether microservices is the right architecture for your data systems vs. a monolith. Fortunately, this type of question is way beyond my pay grade.

Source: DevCamp

4) Bringing ESG data to the forefront

You can see this trend happening more and more, especially in consumer transportation. Organizations are more conscious about their impact on their environments with various ESG initiatives. In order to ensure organizations are following new regulations, the SEC and other regulatory bodies rely on quality data to ensure compliance.

One can guess which industries will be most impacted by providing this ESG data, but I imagine other ancillary industries will be affected too. Perhaps more data vendors will pop up to help with auditing this data so that organizations can meet compliance standards. Who knows. All I know is that consumers are asking for it, and as a result this data is required to be disclosed.

Google Flights showing CO2 emissions

We know that cloud computing and storage gets cheaper every year (e.g. Moore’s Law). Cheap from a monetary perspective, but what about the environmental impact? An interesting thought exercise is tracing the life of a query when you open Instagram on your phone and start viewing your timeline of photos. The storage and compute resources are monetarily cheap to serve that request, but there is still a data center that runs on electricity and water that needs to process that request. Apparently 1.8% of electricity and 0.5% of greenhouse gas emissions are caused by data centers in the United States (source).

When I think about all the cronjobs and DAGs that run to every second to patch up a database or serve up photos to one’s Instagram feed, I wonder how much of these tasks are unnecessarily taxing our data centers? I have created a few Google Apps Scripts over the years (like creating events from email or syncing Google Sheets with Coda). You could have these scripts run every minute or 5 minutes, but is it necessary? Considering that Google Apps Script is a 100% free service, it’s hard to understand the “cost” with running a script that hits a Google data center somewhere which may be moving gigabytes of data from one server to another. I started thinking about the cost of keeping these scripts alive for simple personal productivity hacks like creating calendar events from email. Sure, my personal footprint is small, but when you have millions of people running scripts, that naturally becomes a much bigger problem.

I still have a lot to learn about this area and my views are influenced by simple visualizations like the one above. It all starts with quality ESG data!

5) Organizations help employees acquire data literacy and data storytelling skills

This trend is a bit self-serving as I teach various online classes about Excel and Google Sheets. But as a result of data tools like Mode, Looker, and Google Data Studio pervading through organizations, not just the analysts are expected to know how to use and understand these tools. Unfortunately, data skills are not always taught in middle school or high school (they certainly weren’t taught when I was growing up). Yet, the top skills we need when entering the workforce are related to using spreadsheets and analyzing data (I talk about this subject in episode 22 referencing this Freakonomics episode). This episode with Sean Tibor and Kelly Schuster-Paredes is also worth a listen as Sean and Kelly were teachers who incorporated Python into the classroom.

In 2019, The New York Times provided a “data bootcamp” for reporters so that they could better work with data and tell stories with data. The Google Sheets files and training material from this bootcamp are still publicly available here. You can read more about this initiative by Lindsey Cook–an editor for digital storytelling and training at The Times–here. The U.S. Department of Education also believes that basic data literacy skills should be introduced earlier in the curriculum and they created this whole deck on why these skills are important. This is one of my favorite slides from that deck:

Source: U.S. Department of Education

What does this mean for organizations in 2023? Upskilling employees in data literacy and storytelling could mean online classes or simple a 1 or 2-day training with your data team. Interestingly, data vendors provide a ton of free training already. While some of this training can be specific to the data platform itself (like Google’s Analytics Academy), other platforms provide general training on databases, SQL, and Excel. So if you don’t pay for the training, at least utilize the free training provided by Mode, Looker, Google Data Studio, Tableau, etc.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #113: Top 5 data analytics predictions for 2023 appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-113-top-5-data-analytics-trends-for-2023/feed/ 0 It's that time of the year again where data professionals look at their data predictions from 2022 and decide what they were wrong about and think: "this must be the year for XYZ." Aside from the fact that these type of predictions are 100% subjective ... It's that time of the year again where data professionals look at their data predictions from 2022 and decide what they were wrong about and think: "this must be the year for XYZ." Aside from the fact that these type of predictions are 100% subjective and nearly impossible to verify, it's always fun to play armchair quarterback and make a forecast about the future (see why forecasts are flawed in this episode about Superforecasting). The reason why predicting what will happen in 2023 is that my predictions are based on what other people are talking about, not necessarily what they are doing. The only data point I have on what's actually happening within organizations is what I see happening in my own organization. So take everything with a grain of salt and let me know if these predictions resonate with you!







1) Artificial intelligence and natural language processing doesn't eat your lunch



How could a prediction for 2023 not include something about artificial intelligence? It seems like the tech world was mesmerized by ChatGPT in the second half of 2022, and I can't blame them. The applications and use cases are pretty slick and mind-blowing. Internally at my company, we've already started testing out this technology for summarizing meeting notes and it works out quite well and saves a human from having to manually summarize the notes. My favorite application of AI shared on Twitter (where else do you discover new technologies? Scientific journals?) is this bot that argues with a Comcast agent and successfully gets a discount on an Internet plan:




https://twitter.com/jbrowder1/status/1602353465753309195




These examples are all fun and cute and may help you save on your phone bill, but I'm more interested in how AI will be used inside organizations to improve data quality.



Data quality is always an issue when you're collecting large amounts in real-time every day. Historically, analysts and data engineers are running SQL queries to find data with missing values or duplicate values. With AI, could some of this manual querying and UPDATE and INSERT commands be replaced with a system that intelligently fills in the data for you? In a recent episode with Korhonda Randolph, Korhonda talks about fixing data by sometimes calling up customers to get their correct info which then gets inputted a master data management system. David Yakobovitch talks about some interesting companies in episode 101 that smartly help you augment your data using AI.



We've also seen examples of AI helping people code via Codex, for example. I think this might be an interesting trend to look out for as the demand for data engineers from organizations outpaces supply. Could an organization cut some corners and rely on Codex to develop some of this core infrastructure for their data warehouse? Seems unlikely if you ask me, but given the current funding environment for startups,]]>
Dear Analyst 113 31:12 52829
Dear Analyst #112: Greatest lessons learned from building solutions and processes for humans (re-broadcast of Secret Ops) https://www.thekeycuts.com/dear-analyst-112-greatest-lessons-learned-from-building-solutions-and-processes-for-humans-re-broadcast-of-secret-ops/ https://www.thekeycuts.com/dear-analyst-112-greatest-lessons-learned-from-building-solutions-and-processes-for-humans-re-broadcast-of-secret-ops/#respond Mon, 19 Dec 2022 06:40:00 +0000 https://www.thekeycuts.com/?p=52783 This is a re-broadcast of an episode of the Secret Ops podcast hosted by my friend Ariana Cofone. Ariana is an operations consultant and her podcast is all about business operations. While I’ve never officially held a “business operations” role, I’ve worked in roles that are related to the operations world. In this conversation, we […]

The post Dear Analyst #112: Greatest lessons learned from building solutions and processes for humans (re-broadcast of Secret Ops) appeared first on .

]]>
This is a re-broadcast of an episode of the Secret Ops podcast hosted by my friend Ariana Cofone. Ariana is an operations consultant and her podcast is all about business operations. While I’ve never officially held a “business operations” role, I’ve worked in roles that are related to the operations world. In this conversation, we dive into what operations is, how I approach building workflows and solutions for people, and of course, why knowing Excel and SQL are crucial to getting into the world of business operations.

Check out the Secret Ops podcast here if you’re interested in learning more about operations!

Other Podcasts & Blog Posts

No other podcasts mentioned in this episode!

The post Dear Analyst #112: Greatest lessons learned from building solutions and processes for humans (re-broadcast of Secret Ops) appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-112-greatest-lessons-learned-from-building-solutions-and-processes-for-humans-re-broadcast-of-secret-ops/feed/ 0 This is a re-broadcast of an episode of the Secret Ops podcast hosted by my friend Ariana Cofone. Ariana is an operations consultant and her podcast is all about business operations. While I've never officially held a "business operations" role, This is a re-broadcast of an episode of the Secret Ops podcast hosted by my friend Ariana Cofone. Ariana is an operations consultant and her podcast is all about business operations. While I've never officially held a "business operations" role, I've worked in roles that are related to the operations world. In this conversation, we dive into what operations is, how I approach building workflows and solutions for people, and of course, why knowing Excel and SQL are crucial to getting into the world of business operations.







Check out the Secret Ops podcast here if you're interested in learning more about operations!



Other Podcasts & Blog Posts



No other podcasts mentioned in this episode!
]]>
Dear Analyst 112 35:48 52783
Dear Analyst #111: Master data management at AutoTrader and working with data in a merger with Korhonda Randolph https://www.thekeycuts.com/dear-analyst-111-master-data-management-at-autotrader-and-working-with-data-in-a-merger-with-korhonda-randolph/ https://www.thekeycuts.com/dear-analyst-111-master-data-management-at-autotrader-and-working-with-data-in-a-merger-with-korhonda-randolph/#comments Mon, 12 Dec 2022 06:05:00 +0000 https://www.thekeycuts.com/?p=51917 One topic that hasn’t been covered on Dear Analyst is master data management (MDM). I’m surprised it took this long before someone brought it up. I’ve never heard of the term before and it looks like it’s a core strategy for many large corporations for manager their data. Korhonda Randolph studied systems engineering at the […]

The post Dear Analyst #111: Master data management at AutoTrader and working with data in a merger with Korhonda Randolph appeared first on .

]]>
One topic that hasn’t been covered on Dear Analyst is master data management (MDM). I’m surprised it took this long before someone brought it up. I’ve never heard of the term before and it looks like it’s a core strategy for many large corporations for manager their data. Korhonda Randolph studied systems engineering at the University of Pennsylvania and started her career in engineering. She started specializing in master data management at companies like AutoTrader, Cox Automotive, and SunTrust/BB&T (merger). In this episode, Korhonda discusses what master data management is, data cleansing the CRM at AutoTrader, and the various data issues you have to work through during a merger between two banks.

A “master” record in master data management

The definition of master data management according to Wikipedia is pretty generic:

Master data management (MDM) is a technology-enabled discipline in which business and information technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise’s official shared master data assets.

After doing some quick research, MDM is closely associated with data quality and data governance. The cynical side of me says this is one of those disciplines that was created by data vendors way back when. But given the size and scope of the projects the MDM discipline is used in, it’s very likely I just have never had any experience with people who have utilized this discipline.

Source: Info-Tech Research Group

At a high-level, the goal of MDM is very simply. Create a “master” record for a customer, product, or some other entity that doesn’t change very much. Korhonda discusses working on customer data where properties like the first and last names of a customer would be an output of MDM. This data should stay consistent no matter what team or department is looking at the customer data.

Data cleaning CRM data at AutoTrader

AutoTrader was trendsetting in the field of data. Early on, data architects created their own MDM systems to manage customer data. If the MDM system is not created properly, then other systems would not function correctly. Korhonda’s team was using Hadoop because AutoTrader works with many car dealerships who need data to help them with their businesses.

Korhonda started as a project manager at AutoTrader helping coordinate all the moving parts of AutoTrader’s MDM system. Eventually she became a solutions architect on the data side.

I’ve talked about data cleaning in multiple episodes and I’ve discovered a few things about the process over the years:

  1. Excel and SQL are still the main tools used for data cleansing
  2. The same type of data problems exist at startups and large corporations alike

At AutoTrader, they were trying to figure out if client A in the sales system was also client A in another system. There is missing data across systems, and AutoTrader would try to find 3rd-party data sources to fill the gaps in the customer data. They may even contact the customer directly to get the data the need. At the end of the day, this type of data problem is not unique to AutoTrader. To this day, it still surprises me how simple and universal these data quality issues are.

Korhonda also discusses “systems of engagement.” These are the interfaces (e.g. a form on a website) where data is entered by a customer. These systems of engagement have to ensure that all the required information is captured such as birthdays. It’s like Amazon validating you entered your address correctly before shipping you a package.

“Analysts make the data flow”

Once the MDM system was in place, AutoTrader had a single source of truth for things like customers and dealerships. There was no more duplicate data. According to Korhonda, this had profound operational impact on the business. That feeling when your data is all cleaned up can only be summed up as:

Korhonda talks about how data analysts are becoming more important at organizations where there are tons of data that needs to be analyzed. She says data analysts are just as important as the data engineers who are creating the back-end systems.

Analysts make the data flow.

Engineers are great at building systems, but knowing the right data to include in the system is where business owners come into play. Business owners are subject-matter experts who know about the business rules in the organization, and what type of data would make sense to include in the system.

Merging client data between SunTrust and BB&T

In 2019, BB&T Corporation and SunTrust Banks merged to become Truist Financial Corporation. SunTrust and BB&T were banks based primarily in the southeast of the U.S. These two banks had an overlapping footprint, so there were many customers who belonged to both banks. Behind the scenes, Korhonda was in charge of merging the customer data between these two banks. The customer data had missing birthdays, missing names, and overall there were a lot of legacy processes creating dirty data. Needless to say, it was a mess.

There are a variety of bank regulations that I don’t care about getting into, but it’s interesting to note how these regulations impact the data processes Korhonda dealt with. For instance, there are federal rules about how much a customer can deposit at a bank. If the customer deposits too much, they get added to a special report. As a result, a clean list of customers was needed for the regulators before the merger could go through.

Korhonda acted as the project manager and worked with business stakeholders to sign off on all the rules for the MDM system that was being developed. Each bank had thousands of processes for collecting and storing data, and small differences had a large impact on the project.

For instance, one system might have a 40-character limit for an address but the other system had a 50-character limit. Do you increase the field size to the larger 50 characters? Do you truncate longer addresses? Korhonda and her team had to make decisions like this thousands of times taking into feedback from a variety of stakeholders.

Advice for companies working with dirty data

We ended the conversation on advice Korhonda has for organizations working with a lot of data that needs to be queried and cleaned up. Data lineage is a hot buzzword in the data infrastructure world (see episode #59 for a list of some companies in the data lineage space). In a nutshell, data lineage tools help you visualize how your data flows from the source all the way to when it gets consumed (typically by data analysts and business users). Referring to the merger example, Korhonda said having a robust data lineage platform would help you with issues like field lengths changing.

Source: Octopai

In addition to maintaining these data flow diagrams, Korhonda made a final plug for having MDM professionals maintaining an organization’s MDM systems. Sometimes the MDM systems are owned by a systems architect or a DBA, but these people may not see or know the overall picture of the data system.

In terms of advice for data analysts, Korhonda said that it’s more than just knowing how to write SQL. You have to know how to tell the story if you want to make an impact. The data storytelling skill has been repeated quite a few times in previous episodes.

Be a visionary and display data in way that’s easy to understand.

Other Podcasts & Blog Posts

No other podcasts mentioned in this episode!

The post Dear Analyst #111: Master data management at AutoTrader and working with data in a merger with Korhonda Randolph appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-111-master-data-management-at-autotrader-and-working-with-data-in-a-merger-with-korhonda-randolph/feed/ 1 One topic that hasn't been covered on Dear Analyst is master data management (MDM). I'm surprised it took this long before someone brought it up. I've never heard of the term before and it looks like it's a core strategy for many large corporations for... One topic that hasn't been covered on Dear Analyst is master data management (MDM). I'm surprised it took this long before someone brought it up. I've never heard of the term before and it looks like it's a core strategy for many large corporations for manager their data. Korhonda Randolph studied systems engineering at the University of Pennsylvania and started her career in engineering. She started specializing in master data management at companies like AutoTrader, Cox Automotive, and SunTrust/BB&T (merger). In this episode, Korhonda discusses what master data management is, data cleansing the CRM at AutoTrader, and the various data issues you have to work through during a merger between two banks.







A "master" record in master data management



The definition of master data management according to Wikipedia is pretty generic:




Master data management (MDM) is a technology-enabled discipline in which business and information technology work together to ensure the uniformity, accuracy, stewardship, semantic consistency and accountability of the enterprise's official shared master data assets.




After doing some quick research, MDM is closely associated with data quality and data governance. The cynical side of me says this is one of those disciplines that was created by data vendors way back when. But given the size and scope of the projects the MDM discipline is used in, it's very likely I just have never had any experience with people who have utilized this discipline.



Source: Info-Tech Research Group



At a high-level, the goal of MDM is very simply. Create a "master" record for a customer, product, or some other entity that doesn't change very much. Korhonda discusses working on customer data where properties like the first and last names of a customer would be an output of MDM. This data should stay consistent no matter what team or department is looking at the customer data.







Data cleaning CRM data at AutoTrader



AutoTrader was trendsetting in the field of data. Early on, data architects created their own MDM systems to manage customer data. If the MDM system is not created properly, then other systems would not function correctly. Korhonda's team was using Hadoop because AutoTrader works with many car dealerships who need data to help them with their businesses.







Korhonda started as a project manager at AutoTrader helping coordinate all the moving parts of AutoTrader's MDM system. Eventually she became a solutions architect on the data side.



I've talked about data cleaning in multiple episodes and I've discovered a few things about the process over the years:




* Excel and SQL are still the main tools used for data cleansing



* The same type of data problems exist at startups and large corporations alike




At AutoTrader, they were trying to figure out if client A in the sales system was also client A in another system. There is missing data across systems, and AutoTrader would try to find 3rd-party data sources to fill the gaps in the customer data. They may even contact the customer directly to get the data the need. At the end of the day, this type of data problem is not unique to AutoTrader. To this day,]]>
Dear Analyst 111 27:39 51917
Dear Analyst # 110: A tutorial on how to fill values down with Excel VBA and Google Apps Script (Vancouver Power BI/Modern Excel re-broadcast) https://www.thekeycuts.com/dear-analyst-110-a-tutorial-on-how-to-fill-values-down-with-excel-vba-and-google-apps-script-vancouver-power-bi-modern-excel-re-broadcast/ https://www.thekeycuts.com/dear-analyst-110-a-tutorial-on-how-to-fill-values-down-with-excel-vba-and-google-apps-script-vancouver-power-bi-modern-excel-re-broadcast/#respond Mon, 21 Nov 2022 05:39:00 +0000 https://www.thekeycuts.com/?p=52571 Have you ever faced a spreadsheet where one column contains values for each “section” of the spreadsheet but you want to fill those values down through the rest of the column? This is a common problem when you get a data dump from a database or perhaps a copy/paste from a PivotTable. You have values […]

The post Dear Analyst # 110: A tutorial on how to fill values down with Excel VBA and Google Apps Script (Vancouver Power BI/Modern Excel re-broadcast) appeared first on .

]]>
Have you ever faced a spreadsheet where one column contains values for each “section” of the spreadsheet but you want to fill those values down through the rest of the column? This is a common problem when you get a data dump from a database or perhaps a copy/paste from a PivotTable. You have values interspersed in the column and want to fill the empty cells below those values with the value above it, but not all the way down. See this screenshot below to see what I mean:

This episode is a re-broadcast of a presentation I gave at the Vancouver Power BI & Modern Excel User Group in April 2021 hosted by Ken Puls. The full video of the presentation is below where I walk through two scripts: a VBA script for Excel an a Google Apps Script for Google Sheets. You can see the original episode/post I did about filling values down programmatically in episode #42. Filling values down is apparently a really common operation and problem faced by analysts because this post about filling values down to the last row of data has become the most popular post on my blog.

The post Dear Analyst # 110: A tutorial on how to fill values down with Excel VBA and Google Apps Script (Vancouver Power BI/Modern Excel re-broadcast) appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-110-a-tutorial-on-how-to-fill-values-down-with-excel-vba-and-google-apps-script-vancouver-power-bi-modern-excel-re-broadcast/feed/ 0 Have you ever faced a spreadsheet where one column contains values for each "section" of the spreadsheet but you want to fill those values down through the rest of the column? This is a common problem when you get a data dump from a database or perhaps... Have you ever faced a spreadsheet where one column contains values for each "section" of the spreadsheet but you want to fill those values down through the rest of the column? This is a common problem when you get a data dump from a database or perhaps a copy/paste from a PivotTable. You have values interspersed in the column and want to fill the empty cells below those values with the value above it, but not all the way down. See this screenshot below to see what I mean:







This episode is a re-broadcast of a presentation I gave at the Vancouver Power BI & Modern Excel User Group in April 2021 hosted by Ken Puls. The full video of the presentation is below where I walk through two scripts: a VBA script for Excel an a Google Apps Script for Google Sheets. You can see the original episode/post I did about filling values down programmatically in episode #42. Filling values down is apparently a really common operation and problem faced by analysts because this post about filling values down to the last row of data has become the most popular post on my blog.




https://www.youtube.com/watch?v=sQuSRx9RZnw

]]>
Dear Analyst 110 58:15 52571
Dear Analyst #109: Data strategy and optimizing the vaccine supply chain at Johnson & Johnson with Sarfraz Nawaz https://www.thekeycuts.com/dear-analyst-109-data-strategy-and-optimizing-the-vaccine-supply-chain-at-johnson-johnson-with-sarfraz-nawaz/ https://www.thekeycuts.com/dear-analyst-109-data-strategy-and-optimizing-the-vaccine-supply-chain-at-johnson-johnson-with-sarfraz-nawaz/#comments Mon, 14 Nov 2022 06:45:00 +0000 https://www.thekeycuts.com/?p=51851 Johnson & Johnson is one of the largest corporations in the world and they produce everything from medical devices to baby powder. They were also on the front lines of developing a vaccine during the pandemic. Internally, J&J is also at the forefront of digital transformation. Sarfraz Nawaz studied computer science and built data analytics […]

The post Dear Analyst #109: Data strategy and optimizing the vaccine supply chain at Johnson & Johnson with Sarfraz Nawaz appeared first on .

]]>
Johnson & Johnson is one of the largest corporations in the world and they produce everything from medical devices to baby powder. They were also on the front lines of developing a vaccine during the pandemic. Internally, J&J is also at the forefront of digital transformation. Sarfraz Nawaz studied computer science and built data analytics and decision intelligence platforms inside companies across different industries. Sarfraz currently does product and digital management for J&J’s supply chain team. In this episode, we discuss data analytics platforms, supply chain platforms, and optimization problems in the context of vaccine distribution.

Building a supply chain platform at Johnson & Johnson

Johnson & Johnson’s supply chain supports 1.2 billion consumers every day. It’s a staggering number. Building and optimizing a supply chain platform that supports so many people must be a huge problem, but could also be a fun one if you love to see things scale.

Sarfraz discusses the one thing underpinning a successful supply chain: a data strategy. You normally may not think of a supply chain when it comes to data analytics. According to Sarfraz, J&J’s data strategy lays the foundation for how other technologies at J&J are enabled. He’s referencing things like cloud, governance, and machine learning.

Increasing visibility into the supply chain is something Sarfraz works on a lot. For instance, one concept we talked about in this episode is Availability to Promise, or ATP. This concept basically ensures there is enough inventory available when a customer places an order, and that there isn’t too much inventory sitting around either. There’s a lot of esoteric software I’ve never heard of that helps corporations like J&J with ATP like Logiwa and Cogoport. Even SAP has an ATP platform showing how important this concept is for companies with big supply chains. Behind these ATP platforms are, of course, a ton of data. And more of that data is coming from customers.

Competing with Amazon

Demand forecasting and planning is a constant challenge for J&J. However, with J&J’s digital transformation initiatives and data strategy in place, the corporation is getting better at forecasting every day. An important signal for demand forecasting is customer engagement.

Sarfraz discusses the various inputs that go into this demand forecasting model. Imagine the model is an Excel file, and there are various inputs that go into the model (over-simplified analogy). There are data points coming from the manufacturing division, inventory levels, transportation, and many more inputs that go into the model. This is similar to the model Amazon Prime has built for its customers. You can go one level deeper and get data points from building control towers, temperature inside the vehicles that carry products, and the constant feedback look that arises from these data “producers.”

Source: Business Insider

Optimizing COVID-19 vaccine distribution

Sarfraz spoke a bit about the research and planning that went into developing and distributing the J&J vaccine. The key takeaway from this experience (like many problems related to big data) is scale. How does J&J ensure they have the right processes in place to produce over 500 million vaccine doses in a year?

Source: Reuters

The first step, according to Sarfraz, was simply identifying and cataloguing all the core platforms that are part of developing and distributing a vaccine. Once those core systems are identified, you then need to figure out how to tweak each component to serve an immediate need for that product. Of course, the supply chain behind this operation has to be done with the strictest health and safety protocols (which is itself a supply chain problem).

Improving decision making in the marketing analytics world

We also discussed one of Sarfraz previous roles developing a multi-touch marketing attribution platform. Those who work in marketing analytics can attest to the challenge of coalescing multiple data sources to properly give attribution to a marketing channel that drives conversion for a company. The space Sarfraz focused on was addressable media and how you can use historical data to predict future marketing spend.

Sarfraz talked about a model he built that incorporate spend data, user behavior, and other inputs that the marketing and analytics teams would use to optimize marketing spend. Large retailers like Macy’s and Neiman Marcus would feed their data into this model and the model would help figure out attribution. The output would show Macy’s that if they spent X dollars on paid search, email, and other channels, it would have Y effect on customer purchase behavior. Big retailers had direct relationships with platforms like Pinterest and Snap which allowed them to have more insight into how their marketing spend is leading to conversions.

Source: Hootsuite

Systems thinking and key takeaways

We ended the episode by talking about an episode from Hanselminutes with Inés Sombra, VP of Engineering at Fastly. Scott and Inés talk about what it takes to go fast in an organization. They also talk about systems thinking and viewing systems under the lens of technical infrastructure and people. We chatted a little about how Sarfraz and his team apply systems thinking at J&J.

In terms of key takeaways, Sarfraz highlights the importants of data, engineering, and STEM education and how its been transformational within various industries. As a proponent of digital transformation, Sarfraz talks about how this trend is just getting started.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #109: Data strategy and optimizing the vaccine supply chain at Johnson & Johnson with Sarfraz Nawaz appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-109-data-strategy-and-optimizing-the-vaccine-supply-chain-at-johnson-johnson-with-sarfraz-nawaz/feed/ 1 Johnson & Johnson is one of the largest corporations in the world and they produce everything from medical devices to baby powder. They were also on the front lines of developing a vaccine during the pandemic. Internally, Johnson & Johnson is one of the largest corporations in the world and they produce everything from medical devices to baby powder. They were also on the front lines of developing a vaccine during the pandemic. Internally, J&J is also at the forefront of digital transformation. Sarfraz Nawaz studied computer science and built data analytics and decision intelligence platforms inside companies across different industries. Sarfraz currently does product and digital management for J&J's supply chain team. In this episode, we discuss data analytics platforms, supply chain platforms, and optimization problems in the context of vaccine distribution.







Building a supply chain platform at Johnson & Johnson



Johnson & Johnson's supply chain supports 1.2 billion consumers every day. It's a staggering number. Building and optimizing a supply chain platform that supports so many people must be a huge problem, but could also be a fun one if you love to see things scale.



Sarfraz discusses the one thing underpinning a successful supply chain: a data strategy. You normally may not think of a supply chain when it comes to data analytics. According to Sarfraz, J&J's data strategy lays the foundation for how other technologies at J&J are enabled. He's referencing things like cloud, governance, and machine learning.







Increasing visibility into the supply chain is something Sarfraz works on a lot. For instance, one concept we talked about in this episode is Availability to Promise, or ATP. This concept basically ensures there is enough inventory available when a customer places an order, and that there isn't too much inventory sitting around either. There's a lot of esoteric software I've never heard of that helps corporations like J&J with ATP like Logiwa and Cogoport. Even SAP has an ATP platform showing how important this concept is for companies with big supply chains. Behind these ATP platforms are, of course, a ton of data. And more of that data is coming from customers.



Competing with Amazon



Demand forecasting and planning is a constant challenge for J&J. However, with J&J's digital transformation initiatives and data strategy in place, the corporation is getting better at forecasting every day. An important signal for demand forecasting is customer engagement.



Sarfraz discusses the various inputs that go into this demand forecasting model. Imagine the model is an Excel file, and there are various inputs that go into the model (over-simplified analogy). There are data points coming from the manufacturing division, inventory levels, transportation, and many more inputs that go into the model. This is similar to the model Amazon Prime has built for its customers. You can go one level deeper and get data points from building control towers, temperature inside the vehicles that carry products, and the constant feedback look that arises from these data "producers."



Source: Business Insider



Optimizing COVID-19 vaccine distribution



Sarfraz spoke a bit about the research and planning that went into developing and distributing the J&J vaccine. The key takeaway from this experience (like many problems related to big data) is scale. How does J&J ensure they have the right processes in place to produce over 500 million vaccine do...]]>
Dear Analyst 109 33:47 51851
Dear Analyst #108: Skills needed for a successful data analytics career (Glich podcast re-broadcast) https://www.thekeycuts.com/dear-analyst-108-skills-needed-for-a-successful-data-analytics-career-glich-podcast-re-broadcast/ https://www.thekeycuts.com/dear-analyst-108-skills-needed-for-a-successful-data-analytics-career-glich-podcast-re-broadcast/#respond Mon, 31 Oct 2022 05:26:00 +0000 https://www.thekeycuts.com/?p=52575 This episode is a re-broadcast of a podcast I did with Bassem Dghaidi, a senior software engineer at GitHub. Bassem has a podcast and YouTube channel called Glich. On his show, he covers moderate to advanced engineering topics on coding, architecture and management. Data analytics is a little outside of the topics he generally talks […]

The post Dear Analyst #108: Skills needed for a successful data analytics career (Glich podcast re-broadcast) appeared first on .

]]> This episode is a re-broadcast of a podcast I did with Bassem Dghaidi, a senior software engineer at GitHub. Bassem has a podcast and YouTube channel called Glich. On his show, he covers moderate to advanced engineering topics on coding, architecture and management. Data analytics is a little outside of the topics he generally talks about, but there are many crossovers between data analytics and engineering as you have probably heard on Dear Analyst.

We cover a lot of basic concepts like what a data analyst does, tools analysts use, and why Excel is still so dang popular. Then we get into some more “engineering” topics like whether data analysts should learn how to code. We then end on some tips for aspiring data analysts on the skills you need to be come a data analyst and what a typical data analyst career path looks like. You can see the full video interview below:

Other Podcasts & Blog Posts

No other podcasts mentioned in this episode!

The post Dear Analyst #108: Skills needed for a successful data analytics career (Glich podcast re-broadcast) appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-108-skills-needed-for-a-successful-data-analytics-career-glich-podcast-re-broadcast/feed/ 0 This episode is a re-broadcast of a podcast I did with Bassem Dghaidi, a senior software engineer at GitHub. Bassem has a podcast and YouTube channel called Glich. On his show, he covers moderate to advanced engineering topics on coding, This episode is a re-broadcast of a podcast I did with Bassem Dghaidi, a senior software engineer at GitHub. Bassem has a podcast and YouTube channel called Glich. On his show, he covers moderate to advanced engineering topics on coding, architecture and management. Data analytics is a little outside of the topics he generally talks about, but there are many crossovers between data analytics and engineering as you have probably heard on Dear Analyst.



We cover a lot of basic concepts like what a data analyst does, tools analysts use, and why Excel is still so dang popular. Then we get into some more "engineering" topics like whether data analysts should learn how to code. We then end on some tips for aspiring data analysts on the skills you need to be come a data analyst and what a typical data analyst career path looks like. You can see the full video interview below:




https://www.youtube.com/watch?v=uk_1Go_kDvA




Other Podcasts & Blog Posts



No other podcasts mentioned in this episode!
]]>
Dear Analyst 108 48:27 52575 Dear Analyst #107: Using Twitch to teach people about analytics and launching a food tech startup with Matthew Brandt https://www.thekeycuts.com/dear-analyst-107-using-twitch-to-teach-people-about-analytics-and-launching-a-food-tech-startup-with-matthew-brandt/ https://www.thekeycuts.com/dear-analyst-107-using-twitch-to-teach-people-about-analytics-and-launching-a-food-tech-startup-with-matthew-brandt/#respond Mon, 17 Oct 2022 10:30:49 +0000 https://www.thekeycuts.com/?p=51915 Matthew Brandt didn’t know analytics was a potential career option until he started managing websites with Google Analytics. Matthew is Canadian, spent most of his life in Switzerland, and went to high school in Japan. He attended the EHL hospitality school at Lausanne, and was part of the 70% who left the hospitality field after […]

The post Dear Analyst #107: Using Twitch to teach people about analytics and launching a food tech startup with Matthew Brandt appeared first on .

]]>
Matthew Brandt didn’t know analytics was a potential career option until he started managing websites with Google Analytics. Matthew is Canadian, spent most of his life in Switzerland, and went to high school in Japan. He attended the EHL hospitality school at Lausanne, and was part of the 70% who left the hospitality field after graduation to start his career in analytics. From data engineering to data architecture to reverse ETL, Matthew has done and seen it all. To merge his love for hospitality and technology, Matthew founded a food tech startup based in Zurich. In this episode, you’ll hear about Matthew’s wide-ranging experience including livestreaming about analytics on Twitch, working at a fintech SaaS company, and co-founding a food-sharing startup.

Predicting customer churn at a fintech SaaS company

Before Matthew entered the world of livestreaming, he had a “normal” job working for a SaaS company. The company provides account software for small businesses in Switzerland. Swiss tax law is very complicated so this company helps businesses with getting all this back office tax stuff figured out. Matthew was originally hired as a marketing analyst. Out of 40 people in marketing, Matthew was the only one actively working with data. Originally, he was cleaning up data, auditing data, figuring out business workflows, and understanding relationships between entities. A jack of all trades.

The company wanted to reduce churn but didn’t have the technical infrastructure to get there. Matthew introduced Salesforce and the ETL model to the company to create a clean set of tables and schemas. He ended up using machine learning to produce different outcomes so that the company knew which parts of their operation to fine tune and optimize. The model would help them identify which customers are most likely to churn. For instance, customer who hadn’t logged in in a long time were most likely going to churn.

This was an interesting behavioral exercise since the customer already made the psychological decision to cancel. But we were able to identify these customers before they made that decision.

Keeping a foot in the analytics world with conferences and livestreaming

While Matthew’s full-time job is working on a food tech startup (more on this later), he still keeps a foot in the analytics world. He runs AnalyticsCamp, for instance. AnalyticsCamp is an “unconference” focused on data analytics, data visualization, business intelligence, and UX research. If you are based in Switzerland, this might be the perfect conference for you to attend if you’re in the data analytics field.

What’s more interesting are Matthew’s forays into the livestreaming space. I’ve never met someone who is providing education and creating a community around data analytics through Twitch. During the pandemic, Twitch grew by 82% in terms of hours watched.

Like many people stuck at home, Matthew went onto Twitch and started watching music, science, and technology streams. He discovered that many people were developers and many people were doing live coding streams. Something about teaching analytics on Twitch interested Matthew, so he thought he’d give it a shot. He discovered it had to be edu-tainment. I jumped on one of Matthew’s livestreams and it’s just as Matthew said: it’s a mix of learning and entertainment. I think this might be the first time I’ve mentioned a Twitch livestream as a way to learn about analytics and tech:

Source: Matty_TwoShoes Twitch account

I asked Matthew how he figured out what type of content to stream on Twitch and what the feedback has been. He spends a few hours researching and planning out his content for every stream. One of his first streams was investigating how Uber fares have changed in NYC. You can get the Uber fare dataset straight from Kaggle. The stream consists of Matthew going through the steps to do the analysis. He shows how you can put the data into a Postgres database, creating a Docker container, and ultimately analyzing and visualizing the data.

So far Matthew has found a niche for himself. Initially there wasn’t a lot of chat in his livestreams. Over time, the stream would grow to 30-40 viewers in real-time and he would get derailed by comments in the chat. He says he is one of 50 or so livestreamers who focus on streaming for the data community.

Building a real-time analytics platform for Twitch livestreamers

Matthew started meeting more livestreamers teaching different technical subjects in Twitch communities. He was invited to join a team of developers who livestream on Twitch called The Claw. This team consists of 22 livestreamers and their tag-line is awesome:

Source: The Claw Team on Twitch

Matthew started talking to his fellow livestreamers about an idea he had to utilize his analytics background. Most livestreamers he talked to didn’t really know what goes on during their livestream besides the normal gifting, comments, and follows on Twitch. What if streamers could get real-time analytics about their streams?

Matthew approached this project much like a startup. He got feedback from livestreamers, incrementally worked on features, and built a waitlist of 70-80 people who were eager to try it out. Harkening back to Twitch’s original name (Justin.tv), Matthew called his project Justin Numbers.

Source: CNET

Real-time analytics is quite a popular subject these days because of the big data coming from all our devices and all the apps we use. All the big cloud providers like Google Cloud and AWS have data streamling and analytics solutions like Google Cloud’s Data Dataflow. I wouldn’t be surprised if Twitch comes out with their own real-time analytics given how prevalent these streaming tools are.

Making home-cooked meals accessible to everyone in Zurich

When Matthew started talking about his current startup, I was expecting it to be something related to analytics. Instead, the startup he’s working on is a bit out of left field. The idea behind Cook Eat‘s mission is to bring home-cooked meals to the masses and make it easier for people to eat lunch together. The meals are not prepared by chefs at restaurants but rather by individuals. Think Airbnb for meals.

Source: Cook Eat website

This idea has been tried before and there are a lot of operational and quality issues associated with this model. Matthew and his co-founder think they can succeed by starting with just the market in Zurich first. They manually vet each cook to verify their skills and quality of food the cook creates.

Matthew met Ela, his co-founder, 9 years ago. They were both working at the same company and batted around this idea for a few years. 4 years ago, they decided to invest some money in the idea and built an MVP. Unfortunately, there were no significant developments since they didn’t do any marketing. Last year, they quit their jobs and raised some money from 4 angel investors to work on Cook Eat full-time. At the time of the recording, they were doing an impact study by working with the Zurich housing authority to see what impact food sharing has on the environment.

We chatted about the life of a startup founder and Matthew’s experience will probably resonate with you if you’re a startup founder. You work on 50 different things at any point in time and are being pushed outside your comfort zone. Sometimes Matthew doesn’t feel like he gets to work on analytics-related projects so that’s why he continues to do livestreaming and hosting the AnalyticsCamp unconference.

Other Podcasts & Blog Posts

No other podcasts mentioned in this episode!

The post Dear Analyst #107: Using Twitch to teach people about analytics and launching a food tech startup with Matthew Brandt appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-107-using-twitch-to-teach-people-about-analytics-and-launching-a-food-tech-startup-with-matthew-brandt/feed/ 0 Matthew Brandt didn't know analytics was a potential career option until he started managing websites with Google Analytics. Matthew is Canadian, spent most of his life in Switzerland, and went to high school in Japan. Matthew Brandt didn't know analytics was a potential career option until he started managing websites with Google Analytics. Matthew is Canadian, spent most of his life in Switzerland, and went to high school in Japan. He attended the EHL hospitality school at Lausanne, and was part of the 70% who left the hospitality field after graduation to start his career in analytics. From data engineering to data architecture to reverse ETL, Matthew has done and seen it all. To merge his love for hospitality and technology, Matthew founded a food tech startup based in Zurich. In this episode, you'll hear about Matthew's wide-ranging experience including livestreaming about analytics on Twitch, working at a fintech SaaS company, and co-founding a food-sharing startup.







Predicting customer churn at a fintech SaaS company



Before Matthew entered the world of livestreaming, he had a "normal" job working for a SaaS company. The company provides account software for small businesses in Switzerland. Swiss tax law is very complicated so this company helps businesses with getting all this back office tax stuff figured out. Matthew was originally hired as a marketing analyst. Out of 40 people in marketing, Matthew was the only one actively working with data. Originally, he was cleaning up data, auditing data, figuring out business workflows, and understanding relationships between entities. A jack of all trades.







The company wanted to reduce churn but didn't have the technical infrastructure to get there. Matthew introduced Salesforce and the ETL model to the company to create a clean set of tables and schemas. He ended up using machine learning to produce different outcomes so that the company knew which parts of their operation to fine tune and optimize. The model would help them identify which customers are most likely to churn. For instance, customer who hadn't logged in in a long time were most likely going to churn.



This was an interesting behavioral exercise since the customer already made the psychological decision to cancel. But we were able to identify these customers before they made that decision.



Keeping a foot in the analytics world with conferences and livestreaming



While Matthew's full-time job is working on a food tech startup (more on this later), he still keeps a foot in the analytics world. He runs AnalyticsCamp, for instance. AnalyticsCamp is an "unconference" focused on data analytics, data visualization, business intelligence, and UX research. If you are based in Switzerland, this might be the perfect conference for you to attend if you're in the data analytics field.







What's more interesting are Matthew's forays into the livestreaming space. I've never met someone who is providing education and creating a community around data analytics through Twitch. During the pandemic, Twitch grew by 82% in terms of hours watched.



Like many people stuck at home, Matthew went onto Twitch and started watching music, science, and technology streams. He discovered that many people were developers and many people were doing live coding streams. Something about teaching analytics on Twitch interested Matthew, so he thought he'd give it a shot. He discovered it had to be edu-tainment. I jumped on one of Matthew's livestreams and it's just as Matthew said: it's a mix of learning and entertainment. I think this might be the first time I've mentioned a Twitch livestream as a way to learn about analytics and tech:



51915