Dear Analyst https://www.thekeycuts.com/category/podcast/ A show made for analysts: data, data analysis, and software. Sun, 29 Nov 2020 14:02:21 +0000 en-US hourly 1 https://wordpress.org/?v=5.5.3 This is a podcast made by a lifelong analyst. I cover topics including Excel, data analysis, and tools for sharing data. In addition to data analysis topics, I may also cover topics related to software engineering and building applications. I also do a roundup of my favorite podcasts and episodes. KeyCuts clean episodic KeyCuts info@thekeycuts.com info@thekeycuts.com (KeyCuts) A show made for analysts: data, data analysis, and software. Dear Analyst https://www.thekeycuts.com/wp-content/uploads/2019/03/dear_analyst_logo-1.png https://www.thekeycuts.com/excel-blog/ TV-G New York, NY 50542147 Dear Analyst #51: Who is the real audience for custom data types in Excel? https://www.thekeycuts.com/dear-analyst-51-who-is-the-real-audience-for-custom-data-types-in-excel/ https://www.thekeycuts.com/dear-analyst-51-who-is-the-real-audience-for-custom-data-types-in-excel/#respond Mon, 30 Nov 2020 05:35:00 +0000 https://www.thekeycuts.com/?p=50223 To much fanfare, custom data types in Excel were released late last month (October 2020). This feature started off as a way to see “rich data” relating to stocks and geography in a cell, and now Microsoft is letting you define your own data types. Perhaps you want to see all attributes for a customer […]

The post Dear Analyst #51: Who is the real audience for custom data types in Excel? appeared first on .

]]>
To much fanfare, custom data types in Excel were released late last month (October 2020). This feature started off as a way to see “rich data” relating to stocks and geography in a cell, and now Microsoft is letting you define your own data types. Perhaps you want to see all attributes for a customer such as the location, region, and account rep in a cell without seeing these columns in your worksheet (see screenshot below). In this episode, I want to dig deeper into how this feature fits into existing workflows. More importantly, I want to know who is the audience for Excel custom data types?

Source: The Verge

Redefining the data engineer role

I can’t tell if data engineer who are working in SQL are rejoicing or rolling their eyes at this new feature. One one hand, a data engineer might be happy because if a business user in their organization needs to see additional data about a customer, they don’t need to contact the engineer to add this to the SQL database. In theory, the business user can add this column to the “Customers” data type in Power Query and that column is now available for anyone to use.

On the other hand, perhaps the data engineer needs to learn how to use Power Query and Power BI now because no one in the organization knows how to do the right types of joins in Power Query. Granted, setting up joins and cleaning up data in Power Query is probably easier than writing an Airflow DAG. If you’re an analyst who is excited about using this feature, perhaps you’ll need to up-level your skills to become a proficient user of Power Query first and learn how to use joins:

Who maintains these data types and who uses them?

My skepticism for this feature started as I saw people doing tutorials on how this feature works. The end product is quite tantalizing. But there are two separate audiences Microsoft needs to convince to use this feature: the admins (most likely the data engineers) and the business user (anyone with “analyst” in their job titles).

Microsoft has made it clear that this feature is still in development and the feature has only rolled out to a subset of Office 365 subscribers. Microsoft most likely released this feature because the most ardent users of the stocks and geography data types probably sent feedback saying they want to create their own data types. If you are one of those users, I’m super curious about what your use case is and how you are using the feature. I see two camps of people:

The admin

If your organization deploys this feature, undoubtably there will be someone who has to maintain all the data types. Here is the workflow for how the admin might maintain a “Product” data type:

  1. Set up the data connection with a database table that contains all your company’s products
  2. Do some data cleanup in Power Query to ensure the right columns show up in the data type
  3. Merge with other tables as necessary to get all properties a business user may want to see
  4. When a new property is needed, decide whether this change should be made in the underlying database or done through another merge in Power Query
  5. Set up a cross-functional meeting so other business users know that a change may be coming to the organization’s data types
  6. Start at step 2 again and rinse and repeat

Maybe I have some of these steps wrong, but it doesn’t feel that much different from what a data engineer or DBA is doing today. In fact, custom data types might complicate the workflow because instead of exporting results to a CSV from a database, business users now require the data moving through Power Query and Power BI to be just as accurate as the data stored in the database.

Source: devRant

The business user

My hunch is that Microsoft wants the business user to be the real audience of this feature. All the articles and videos point to how awesome this will be for all you analysts out there.

I think the big caveat will be this: while you, the business analyst is the end user, you’ll have to learn how to use Power Query and Power BI for this feature to have a meaningful impact on your workflows. This is great for Microsoft because they have more users of Power Query/BI and maybe some more $$ for additional licenses.

In terms of data visualization and data cleansing, Power BI competes with tools like Tableau, Mode, and Looker. Even OpenRefine might be a competitor in terms of data cleaning and transformation (I’m a big fan of OpenRefine since it’s free and plan on doing an episode about it in the future). Remember Microsoft Access? Still a trusty database that won’t die, but Power Query is obviously a step up. Needless to say, it behooves Microsoft to get more users onto Power Query/BI.

This is what you’re workflow might look like with custom data types:

  1. Go to data engineer or admin who is managing data types for your organization and explain analysis you need to do
  2. Have data engineer create a view in Power Query that merges all the tables you need access to in the data type
  3. During analysis, you realized you need an extra property.
  4. In the interest of time, you pull the data separately from another Excel file rather than getting the property added to the “official” data type

As custom data types become more widely used in your organization, there will be more controls in place to ensure changes to data types are tracked and properly communicated out. This means editing a custom data type may be just as laborious as adding new columns to a table in a SQL database.

Custom data types versus traditional lookups

I know I may simply be averse to change which is leading to my skepticism for this feature. Here is the current workflow I believe most analysts utilize when pulling data:

  1. A SQL query runs in Snowflake, Mode, Looker, or some other public cloud database
  2. You combine your lookup tables into various worksheets in a workbook
  3. To “join” data, you do some VLOOKUPs or INDEX/MATCHs in an intermediate worksheet
  4. A separate worksheet contains your model, report, or one-off analysis
  5. If report needs to be updated weekly/monthly, you run the SQL query again and paste the new data in one of the original lookup worksheets
  6. (Optional) You have a cron job that outputs a CSV into a Sharepoint folder on a recurring basis and you have a PivotTable that points to that CSV which you refresh every week/month to get the latest data

While this workflow does feel a bit manual and stitched together and may be subject to human error, it’s fast and requires little overhead from the business user.

The benefits of custom data types is that all these extra lookup columns are kind of “hidden” inside the cell. You can get these extra properties by doing dot notation in the formula. For instance, if cell A1 contained your “Product” custom data type, you could get the Category property by writing a formula like this:

=A2.Category

As you start working with tables of data, your formulas will get more complicated as you refer to table names and custom data types. For instance, this formula below would reference the Category property of the Product custom data type in the Products table:

=Products[Product].Category

This syntax may feel foreign to some Excel users. As you start converting your lists of data to tables and work with property names instead of cell references, it will be important for you to learn this syntax.

Are lookups really that bad?

I don’t think so. In my opinion, the main tradeoff with using custom data types is this:

Less columns cluttering your spreadsheet with the added complexity of learning Power Query.

Should you learn Power Query? I think so. Learning Power Query will only make you more knowledgeable about your organization’s data sources and data pipelines. It will also give you insight into how your backend databases are structured.

The issue is that business users shouldn’t have to learn Power Query for the sake of having custom data types. Again, this may be my aversion to change, but I like to see all the data that I’m doing lookups to in my workbook. With custom data types, you have to jump into Power Query, select which columns you want to include in the data type, and then close and load back into Excel. Having to navigate another interface just to save some space in my worksheet doesn’t feel necessary (for most of my use cases).

While the hype and marketing around custom data types is targeted at the business user, I think this feature is really a Power Query/BI feature since most of the work will be done in Power Query versus Excel. Here is a description from the Microsoft Excel forums on when you’re organization might want to use custom data types:

Lookup-style tables that are commonly used in your organization such as product masters, customer lists, facilities, supplier lists, or asset tables are good examples of what you can now share through Excel data types.

If you already have a workflow that uses “lookup-style tables” in your worksheets, is it really necessary to move these tables to Power Query so that they can be loaded back into Excel as a custom data type? I’d really like to hear from people who have the need for this workflow. I’ve only seen YouTube examples of custom data types in action, so perhaps I just haven’t seen the light yet. Hence my skepticism around who the target audience is for this feature.

What’s next for custom data types

The feature is still in development, and the Excel community has already expressed feature requests they would like to see in the product. Some features that are table stakes in my opinion:

  • When you add a column and do a VLOOKUP on a custom data type column, that new value you bring in via the lookup currently doesn’t get added to the data type. These lookup columns need to get added to the data type.
  • You can’t select a table of data in Excel and simply “convert” that table to a custom data type. No matter what you have to go into Power Query to set up the data type. Allowing someone to create a data type straight from Excel is like creating a named range that anyone in your organization can use and reference.

The interesting thing about these two features is that it just gives the business user more flexibility to create and manage custom data types. This adds risk to the organization, however. If any business user can change the definition of a custom data type and that data type is shared across the organization, then this will affect other analysts who rely on that custom data type. You can see why your organization would need an admin to put controls in place for who can edit these data types.

Having a data engineer or admin manage your organization’s use of Power Query and Power BI is not out of line with what Microsoft wants. If your organization is already Azure SQL Databases or even Synapse, custom data types might feel like a natural feature to give to your business users (especially if you are already using Power Query and Power BI).

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #51: Who is the real audience for custom data types in Excel? appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-51-who-is-the-real-audience-for-custom-data-types-in-excel/feed/ 0 To much fanfare, custom data types in Excel were released late last month (October 2020). This feature started off as a way to see "rich data" relating to stocks and geography in a cell, and now Microsoft is letting you define your own data types. To much fanfare, custom data types in Excel were released late last month (October 2020). This feature started off as a way to see "rich data" relating to stocks and geography in a cell, and now Microsoft is letting you define your own data types. Perhaps you want to see all attributes for a customer such as the location, region, and account rep in a cell without seeing these columns in your worksheet (see screenshot below). In this episode, I want to dig deeper into how this feature fits into existing workflows. More importantly, I want to know who is the audience for Excel custom data types?



Source: The Verge



Redefining the data engineer role



I can't tell if data engineer who are working in SQL are rejoicing or rolling their eyes at this new feature. One one hand, a data engineer might be happy because if a business user in their organization needs to see additional data about a customer, they don't need to contact the engineer to add this to the SQL database. In theory, the business user can add this column to the "Customers" data type in Power Query and that column is now available for anyone to use.



On the other hand, perhaps the data engineer needs to learn how to use Power Query and Power BI now because no one in the organization knows how to do the right types of joins in Power Query. Granted, setting up joins and cleaning up data in Power Query is probably easier than writing an Airflow DAG. If you're an analyst who is excited about using this feature, perhaps you'll need to up-level your skills to become a proficient user of Power Query first and learn how to use joins:




https://www.youtube.com/watch?v=-kle5a7vbRA




Who maintains these data types and who uses them?



My skepticism for this feature started as I saw people doing tutorials on how this feature works. The end product is quite tantalizing. But there are two separate audiences Microsoft needs to convince to use this feature: the admins (most likely the data engineers) and the business user (anyone with "analyst" in their job titles).



Microsoft has made it clear that this feature is still in development and the feature has only rolled out to a subset of Office 365 subscribers. Microsoft most likely released this feature because the most ardent users of the stocks and geography data types probably sent feedback saying they want to create their own data types. If you are one of those users, I'm super curious about what your use case is and how you are using the feature. I see two camps of people:



The admin



If your organization deploys this feature, undoubtably there will be someone who has to maintain all the data types. Here is the workflow for how the admin might maintain a "Product" data type:



* Set up the data connection with a database table that contains all your company's products* Do some data cleanup in Power Query to ensure the right columns show up in the data type* Merge with other tables as necessary to get all properties a business user may want to see* When a new property is needed, decide whether this change should be made in the underlying database or done through another merge in Power Query* Set up a cross-functional meeting so other business users know that a change may be coming to the organization's data types* Start at step 2 again and rinse and repeat



]]>
Dear Analyst 51 41:15 50223
Dear Analyst #50: Walking through a VBA script for trading billions of dollars worth of derivatives with Shawn Wang https://www.thekeycuts.com/dear-analyst-50-walking-through-a-vba-script-for-trading-billions-of-dollars-worth-of-derivatives-with-shawn-wang/ https://www.thekeycuts.com/dear-analyst-50-walking-through-a-vba-script-for-trading-billions-of-dollars-worth-of-derivatives-with-shawn-wang/#respond Mon, 16 Nov 2020 05:09:26 +0000 https://www.thekeycuts.com/?p=50161 This little podcast/newsletter started as a little experiment last year. I never thought I would make it to episode number 50, but here we are! Thank you to the few of you out there who listen/read my ramblings about spreadsheets. I decided to give you all a break and invite my first guest to the […]

The post Dear Analyst #50: Walking through a VBA script for trading billions of dollars worth of derivatives with Shawn Wang appeared first on .

]]>
This little podcast/newsletter started as a little experiment last year. I never thought I would make it to episode number 50, but here we are! Thank you to the few of you out there who listen/read my ramblings about spreadsheets.

I decided to give you all a break and invite my first guest to the podcast: Shawn Wang (aka @swyx). Shawn currently works in developer experience at AWS, but has a really diverse background (check out his site to learn more). I’ve mentioned Shawn in previous episodes (25 and 49) and was honored he agreed to be the first guest on Dear Analyst. We dig into a variety of topics including negotiating your salary, Javascript frameworks, creating, and whatever else tickled my fancy.

Shawn Wang (source: swyx.io)

Becoming a Jedi

I was particularly interested in a 4,000-line Excel VBA script he wrote while working as a trader in a previous job. You can learn a lot about someone from looking at their code, and that’s exactly what we did during this episode. Shawn was kind enough to share a VBA script he built back in 2012 for his team to price billions dollars worth of derivatives. I honestly don’t understand 90% of this script, but Shawn walked through a lot of the derivative concepts he had to translate into this VBA script. You can see some of his thoughts about this script in the Tweet thread below:

I think it’s amazing that his bank relied on traders using this homegrown script to price everything from interest rates to mortgages.

One of the main takeaways from our walkthrough of this script is that the code isn’t pretty. Shawn had a problem that he needed to solve, picked up the tool that could solve that problem, and started hacking away at the solution. Shawn shared a story from his senior trader at the time on building tools for yourself:

One of the rights of passage for becoming a Jedi is building a light saber. Once you have the light saber, you just use it, and stop building it.

—Shawn Wang

For the benefit of other traders out there, Shawn also believes in learning in public. Releasing this script is just one example of that. By producing content and acknowledging gaps in your knowledge, you’ll learn faster than being a “lurker,” as Shawn puts it.

No-code is a lie

We talked a bit about an article he wrote called No code is a lie, and how programmers sometimes need to get over themselves. Programmers may get caught up in the style of their code, but the end-user just cares about whether the thing works and solves their problem.

After finance, Shawn moved from Excel and VBA scripts to Haskell, Python, and Javascript. He still has a soft spot for Excel, however. With Excel, you have your database and user interface right in front of you. This not only gives people an easy way to create, but makes creation more inclusive.

Excel is creation over code. I don’t define myself as coding, I define myself creating.

—Shawn Wang

Taming the Javascript community

Shawn got really involved with the ReactJS community and eventually became one of the moderators of the subreddit after Dan Abramov asked him to help with the community.

Shawn recently stepped down from moderating the community as he started coding with Svelte, another Javascript framework. In terms of moving from community to community, Shawn made an interesting point on encouraging renewal in communities. Mods, leaders, managers, and political figures should have limited terms to encourage innovation and different perspectives. Plus, I think when you are new to a community, you get a chance to learn from the ground up from others who are more experienced. Once you’re at the top, it’s time to find a new place and rinse, lather, and repeat.

Getting $50,000 added to his salary

We both talked about our interests in Haseeb Qureshi’s blog posts on salary negotiation. If you were a developer 4-5 years ago, you most likely came across Haseeb’s posts because it shows step-by-step how Haseeb went from finishing a coding bootcamp to getting a 6-figure salary at Airbnb.

Shawn also cited Patrick McKenzie’s post and Josh Doody’s guide on salary negotiation as good resources. I remember when I was interviewing, I relied on Haseeb’s concepts to get me through the negotiation process. Long story short? You should always negotiate.

The fallacy of measuring developer advocacy programs

I’ve read various blog posts and listened to podcasts about this subject, so figured I’d ask Shawn what he thinks about measuring developer advocacy efforts since he works at one of the largest companies on the planet. Rest assured! His team has not come up with the perfect formula either. Guess where they keep track of all their speaking engagements and content? You guessed it: in a spreadsheet.

Shawn mentioned one startup called Orbit that is trying to crack this nut. They dub themselves as the “operating system of vibrant developer communities.” Their orbit model is a bit cheesy but does attempt to quantify someone’s engagement in a community:

  • Love is a member’s level of engagement and activity in the community.
  • Reach is a measure of a community member’s sphere of influence.
  • Gravity is the attractive force of a community that acts to retain existing members and attract new ones.
  • Orbit levels are a practical tool for member segmentation and used to design different programs for each level of the community.

I’m currently working on a similar program and commend them on tackling this problem :).

Other projects

Shawn finally shared what he’s working on these days:

  • Wrote a book called Coding Career Handbook and maintaining a community for that
  • Growing the Svelte society on Twitter
  • Angel investing
  • Scouting for a VC fund
  • Writing on his blog

He talked about being disappointed in his writing and I completely agree with that sentiment. Writing these posts definitely take time but I always feel like more time can be put into the writing to make it more clear, structured, and precise. Having said that, I’ll take a page out of Shawn’s notebook and #LearInPublic 😀 .

The post Dear Analyst #50: Walking through a VBA script for trading billions of dollars worth of derivatives with Shawn Wang appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-50-walking-through-a-vba-script-for-trading-billions-of-dollars-worth-of-derivatives-with-shawn-wang/feed/ 0 This little podcast/newsletter started as a little experiment last year. I never thought I would make it to episode number 50, but here we are! Thank you to the few of you out there who listen/read my ramblings about spreadsheets. This little podcast/newsletter started as a little experiment last year. I never thought I would make it to episode number 50, but here we are! Thank you to the few of you out there who listen/read my ramblings about spreadsheets.



I decided to give you all a break and invite my first guest to the podcast: Shawn Wang (aka @swyx). Shawn currently works in developer experience at AWS, but has a really diverse background (check out his site to learn more). I've mentioned Shawn in previous episodes (25 and 49) and was honored he agreed to be the first guest on Dear Analyst. We dig into a variety of topics including negotiating your salary, Javascript frameworks, creating, and whatever else tickled my fancy.



Shawn Wang (source: swyx.io)



Becoming a Jedi



I was particularly interested in a 4,000-line Excel VBA script he wrote while working as a trader in a previous job. You can learn a lot about someone from looking at their code, and that's exactly what we did during this episode. Shawn was kind enough to share a VBA script he built back in 2012 for his team to price billions dollars worth of derivatives. I honestly don't understand 90% of this script, but Shawn walked through a lot of the derivative concepts he had to translate into this VBA script. You can see some of his thoughts about this script in the Tweet thread below:




https://twitter.com/swyx/status/1327041894853922816




I think it's amazing that his bank relied on traders using this homegrown script to price everything from interest rates to mortgages.



One of the main takeaways from our walkthrough of this script is that the code isn't pretty. Shawn had a problem that he needed to solve, picked up the tool that could solve that problem, and started hacking away at the solution. Shawn shared a story from his senior trader at the time on building tools for yourself:



One of the rights of passage for becoming a Jedi is building a light saber. Once you have the light saber, you just use it, and stop building it.—Shawn Wang



For the benefit of other traders out there, Shawn also believes in learning in public. Releasing this script is just one example of that. By producing content and acknowledging gaps in your knowledge, you'll learn faster than being a "lurker," as Shawn puts it.



No-code is a lie



We talked a bit about an article he wrote called No code is a lie, and how programmers sometimes need to get over themselves. Programmers may get caught up in the style of their code, but the end-user just cares about whether the thing works and solves their problem.







After finance, Shawn moved from Excel and VBA scripts to Haskell, Python, and Javascript. He still has a soft spot for Excel, however. With Excel, you have your database and user interface right in front of you. This not only gives people an easy way to create, but makes creation more inclusive.



Excel is creation over code. I don't define myself as coding, I define myself creating.—Shawn Wang



Taming the Javascript community



Shawn got really involved with the ReactJS community and eventually became one of the moderators of the subreddit after 50161
Dear Analyst #49: DCF spreadsheet error leads to $400M difference in Tesla acquisition https://www.thekeycuts.com/dear-analyst-49-dcf-spreadsheet-error-leads-to-400m-difference-in-tesla-acquisition/ https://www.thekeycuts.com/dear-analyst-49-dcf-spreadsheet-error-leads-to-400m-difference-in-tesla-acquisition/#comments Mon, 09 Nov 2020 05:05:00 +0000 https://www.thekeycuts.com/?p=50121 I’ve been posting quite a few episodes about spreadsheet errors as of late. The two Harvard professors and their faulty economic policies, the JPMorgan whale trader, and the EuSpRIG community who tries to catch all these errors. This is a quick story about how Tesla’s acquisition of SolarCity in 2016 was botched due to a […]

The post Dear Analyst #49: DCF spreadsheet error leads to $400M difference in Tesla acquisition appeared first on .

]]> I’ve been posting quite a few episodes about spreadsheet errors as of late. The two Harvard professors and their faulty economic policies, the JPMorgan whale trader, and the EuSpRIG community who tries to catch all these errors. This is a quick story about how Tesla’s acquisition of SolarCity in 2016 was botched due to a spreadsheet error involving double-counting SolarCity’s debt. You can read more about how this “computational error” was communicated to Tesla and SolarCity in the S-4 here. Elon must’ve been a happy man after the deal.

Two friendly CEOs

It’s 2016, and Tesla is in talks to buy SolarCity, a solar panel company. The deal is complicated because of two factors:

  1. Elon is the largest shareholder in both Tesla and SolarCity
  2. Lyndon Rive, SolarCity’s CEO, is Elon’s first cousin

To make the process transparent and not make it seem like Elon was forcing the deal to happen between the two parties, SolarCity set up a special committee of outside advisors to help with the deal. One of the advisors was Lazard, the investment bank.

Lazard gets a bunch of spreadsheets from SolarCity and starts doing what the typical discounted cash flow analysis for the deal. SolarCity has to provide Lazard with some of their spreadsheets, but it turns out their spreadsheets contained one major flaw:

The spreadsheet double-counted some of SolarCity’s projected indebtedness.

As such, Lazard valued SolarCity’s equity between $14.75-$30.00 per share instead of $18.75-$37.75 per share. This resulted in SolarCity being discounted by $400M in the deal.

Source: ValueScope

Now Lazard is a big investment bank, and they said that after they discovered the “computational error,” it wouldn’t affect the terms of the deal. If you’re a shareholder in SolarCity, you would probably think that a 16% discount on the original deal terms is a big deal (the final purchase price was $2.6B).

Whose spreadsheet is it anyway?

Was Lazard responsible for detecting this double-counting error in SolarCity’s spreadsheets before building their discounted cash flow analysis? Or was SolarCity responsible for this garbage-in-garbage-out scenario?

Given that SolarCity set up this special committee of 3rd-party advisors just to make sure the deal was valued correctly and transparently, I can see the argument for Lazard being responsible for auditing SolarCity’s spreadsheets. As with many of the other spreadsheet errors mentioned previously, time constraints, laziness, and human error ultimately are responsible for these errors.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

  • ShopTalk Show #432: SWYX

The post Dear Analyst #49: DCF spreadsheet error leads to $400M difference in Tesla acquisition appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-49-dcf-spreadsheet-error-leads-to-400m-difference-in-tesla-acquisition/feed/ 1 I've been posting quite a few episodes about spreadsheet errors as of late. The two Harvard professors and their faulty economic policies, the JPMorgan whale trader, and the EuSpRIG community who tries to catch all these errors. I've been posting quite a few episodes about spreadsheet errors as of late. The two Harvard professors and their faulty economic policies, the JPMorgan whale trader, and the EuSpRIG community who tries to catch all these errors. This is a quick story about how Tesla's acquisition of SolarCity in 2016 was botched due to a spreadsheet error involving double-counting SolarCity's debt. You can read more about how this "computational error" was communicated to Tesla and SolarCity in the S-4 here. Elon must've been a happy man after the deal.







Two friendly CEOs



It's 2016, and Tesla is in talks to buy SolarCity, a solar panel company. The deal is complicated because of two factors:



* Elon is the largest shareholder in both Tesla and SolarCity* Lyndon Rive, SolarCity's CEO, is Elon's first cousin



To make the process transparent and not make it seem like Elon was forcing the deal to happen between the two parties, SolarCity set up a special committee of outside advisors to help with the deal. One of the advisors was Lazard, the investment bank.







Lazard gets a bunch of spreadsheets from SolarCity and starts doing what the typical discounted cash flow analysis for the deal. SolarCity has to provide Lazard with some of their spreadsheets, but it turns out their spreadsheets contained one major flaw:



The spreadsheet double-counted some of SolarCity's projected indebtedness.



As such, Lazard valued SolarCity's equity between $14.75-$30.00 per share instead of $18.75-$37.75 per share. This resulted in SolarCity being discounted by $400M in the deal.



Source: ValueScope



Now Lazard is a big investment bank, and they said that after they discovered the "computational error," it wouldn't affect the terms of the deal. If you're a shareholder in SolarCity, you would probably think that a 16% discount on the original deal terms is a big deal (the final purchase price was $2.6B).



Whose spreadsheet is it anyway?



Was Lazard responsible for detecting this double-counting error in SolarCity's spreadsheets before building their discounted cash flow analysis? Or was SolarCity responsible for this garbage-in-garbage-out scenario?







Given that SolarCity set up this special committee of 3rd-party advisors just to make sure the deal was valued correctly and transparently, I can see the argument for Lazard being responsible for auditing SolarCity's spreadsheets. As with many of the other spreadsheet errors mentioned previously, time constraints, laziness, and human error ultimately are responsible for these errors.



Other Podcasts & Blog Posts



In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:



* ShopTalk Show #432: SWYX
]]>
Dear Analyst 49 18:06 50121 Dear Analyst #48: Working with numbers formatted as text with avocado sales https://www.thekeycuts.com/dear-analyst-48-working-with-numbers-formatted-as-text-with-avocado-sales/ https://www.thekeycuts.com/dear-analyst-48-working-with-numbers-formatted-as-text-with-avocado-sales/#respond Mon, 26 Oct 2020 04:08:00 +0000 https://www.thekeycuts.com/?p=49960 For spreadsheet newbies, number formatting may seem like a pretty innocuous matter. As you become more familiar with Excel or Google Sheets, you’ll find that improper number formats will lead to formulas that don’t output what you expect or formulas that straight up don’t work. This episode explores what happens when you unknowingly have numbers […]

The post Dear Analyst #48: Working with numbers formatted as text with avocado sales appeared first on .

]]>
For spreadsheet newbies, number formatting may seem like a pretty innocuous matter. As you become more familiar with Excel or Google Sheets, you’ll find that improper number formats will lead to formulas that don’t output what you expect or formulas that straight up don’t work. This episode explores what happens when you unknowingly have numbers formatted as text. I also explore the various ways you can try to debug the errors that come from numbers formatted as text. You can copy the Google Sheet here for this episode.

AutoSums and avocados

The topic for this episode came from a question about AutoSum in the Microsoft Excel community forum which has close to 40,000 views and 25 replies (some of them are quite spicy, I might add). I’ll be referencing this thread quite a bit in this episode since some of the best ideas come from–you guessed it–the comments. Props to Excel MVP Sergei Baklan for jumping into this thread and trying to help answer a somewhat ambiguous question.

No one: Spending Friday nights perusing the Microsoft Excel forums for interesting questions 🙂

The data set for this episode is a fun one: avocado sales across different cities (learn more about the dataset on Kaggle here). The prologue for this data set is amazing and will ring true for all millennials out there:

It is a well known fact that Millenials LOVE Avocado Toast. It’s also a well known fact that all Millenials live in their parents basements. Clearly, they aren’t buying homes because they are buying too much Avocado Toast! But maybe there’s hope… if a Millenial could find a city with cheap avocados, they could live out the Millenial American Dream.

-Justin Kiggins, Product Manager, Chan Zuckerberg Initiative

One can only make an episode about formatting numbers in Excel/Google Sheets so interesting, so this was my best attempt. And we go on!

Numbers formatted as text mess up formulas

The first thing you’ll notice with numbers formatted as text is that they will mess up formulas by giving you an output you would not expect. For instance, in our avocado dataset, cell C53 is simply a sum of all the “numbers” in column C, but the result of the SUM formula is 0:

How is this possible? If you click on the column C header and go to Format->Number, you’ll notice that all the cells in this column are formatted as “Plain text.”

This means any numbers you type into the cells in this column will be treated as text. Since these cells aren’t formatted as numbers, Google Sheets doesn’t know how to treat these values which means the SUM formula is trying to sum up a bunch of text values. In Excel, this is similar to the cells being formatted as “Text”:

Detecting cells that are formatted as text

If you are really diligent, you can use the ISTEXT formula to test whether a cell is indeed formatted as text. However, when you inherit an Excel model, you are just hoping that the previous modeler did things correctly and you can just plug in numbers and move on with your life. How can you figure out if some cells are improperly formatted before pulling your hair out?

Unfortunately, no answer exists. If you know of one, please comment below. For new spreadsheet users, this is probably the most frustrating issue to experience. As you pick up more experience and start to debug your spreadsheets, seeing weird outputs with your formulas should lead you to consider that your cells are improperly formatted (assuming the formula is built correctly).

In this example, we have a visual indicator letting us know that the numbers in the Total Volume column look kind of fishy. All these “numbers” are left-aligned while the rest of the spreadsheet’s numbers are right-aligned:

However, if you auto-fit the columns and remove the margins on the column headers and cells, that alignment becomes less apparent:

If you didn’t notice that the “numbers” in column C are left-aligned, you’d be stuck wondering why your SUM formula doesn’t work. A best practice when seeing numbers that don’t have commas in the thousandths and with decimals you don’t need is to simply fix that in formatting menu. As you make this fix, you’ll see that the formatting is indeed “Plain text” and shifting to the proper number format (or to “Automatic” in Google Sheets) will make the formula work again:

Extra cell formatting issues in Excel with formulas

When you have the “Automatic” format in Google Sheets or “General” format in Excel, Sheets and Excel will try their best to turn whatever values you enter into the cells into the correct format you need for data analysis and reporting. This doesn’t always work as intended, and can have some drastic consequences. I talk about the human gene naming issues in episode 40 and how the HUGO Gene Nomenclature Committee had to rename genes just so researchers can stop dealing with names being coerced into date formats.

In Excel, if you have a cell formatted as text and you try to enter a formula into the cell, you’ll get into a world of hurt. Are you are typing the formula, it looks like everything is working the way it should since you’re able to reference cells like you normally would, but the formula gets entered as text and Excel doesn’t coerce the text into the formula you want:

Only after you change the format of that cell to “General” will the formula actually calculate. Actually, even after you change the cell format to “General,” you have to go back into edit cell formula mode and commit the formula again by pressing ENTER:

As you can see, there are nuances to changing the format of cells from text to numbers. Luckily in Google Sheets, when you type a formula in a cell that is formatted as “Plain text,” the formula still calculates but the cell format still remains as “Plain text.”

Knowing how to use your tools

The Excel forum thread is full of other commenters saying AutoSum doesn’t work for reasons orthogonal to cell formatting:

  • A commenter forgets to put the cell references in the SUM formula and happens to be using international formatting for decimals (comma) bringing more confusion to number formatting
  • One accountant discusses inheriting a .XLSX file but is using Office 365, and says the formatting options in Office 365 are different leading to questions about backwards compatibility
  • One commenter talks about merged cells not working with AutoSum, which leads to a discussion about merged cells vs. center across selection (you should be doing the latter)
  • A commenter discusses his strategy: copy the numbers from Excel, do a Paste Special into word and select “Unformatted text,” and then copy/paste the numbers back into Excel

That last technique seems like the most ridiculous workaround but I’d have to admit I’ve tried it before myself to strip all formatting from numbers and words. The fact that there are all these stumbling blocks with the simple AutoSum function speaks to the bigger challenge of learning Excel:

Knowing how to use your tools correctly involves a lot of trial and error.

The right way to learn how to use spreadsheets and get over these simple formatting blunders is through real-world experience. And a lot of Googling :). This thread has close to 40,000 views from folks struggling to figure out why their AutoSumming isn’t working. It’s both enlightening (and disappointing) that the answer involves formatting your numbers as, well, numbers.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #48: Working with numbers formatted as text with avocado sales appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-48-working-with-numbers-formatted-as-text-with-avocado-sales/feed/ 0 For spreadsheet newbies, number formatting may seem like a pretty innocuous matter. As you become more familiar with Excel or Google Sheets, you'll find that improper number formats will lead to formulas that don't output what you expect or formulas th... For spreadsheet newbies, number formatting may seem like a pretty innocuous matter. As you become more familiar with Excel or Google Sheets, you'll find that improper number formats will lead to formulas that don't output what you expect or formulas that straight up don't work. This episode explores what happens when you unknowingly have numbers formatted as text. I also explore the various ways you can try to debug the errors that come from numbers formatted as text. You can copy the Google Sheet here for this episode.







AutoSums and avocados



The topic for this episode came from a question about AutoSum in the Microsoft Excel community forum which has close to 40,000 views and 25 replies (some of them are quite spicy, I might add). I'll be referencing this thread quite a bit in this episode since some of the best ideas come from--you guessed it--the comments. Props to Excel MVP Sergei Baklan for jumping into this thread and trying to help answer a somewhat ambiguous question.



No one: Spending Friday nights perusing the Microsoft Excel forums for interesting questions :)



The data set for this episode is a fun one: avocado sales across different cities (learn more about the dataset on Kaggle here). The prologue for this data set is amazing and will ring true for all millennials out there:



It is a well known fact that Millenials LOVE Avocado Toast. It's also a well known fact that all Millenials live in their parents basements. Clearly, they aren't buying homes because they are buying too much Avocado Toast! But maybe there's hope… if a Millenial could find a city with cheap avocados, they could live out the Millenial American Dream.-Justin Kiggins, Product Manager, Chan Zuckerberg Initiative



One can only make an episode about formatting numbers in Excel/Google Sheets so interesting, so this was my best attempt. And we go on!



Numbers formatted as text mess up formulas



The first thing you'll notice with numbers formatted as text is that they will mess up formulas by giving you an output you would not expect. For instance, in our avocado dataset, cell C53 is simply a sum of all the "numbers" in column C, but the result of the SUM formula is 0:







How is this possible? If you click on the column C header and go to Format->Number, you'll notice that all the cells in this column are formatted as "Plain text."







This means any numbers you type into the cells in this column will be treated as text. Since these cells aren't formatted as numbers, Google Sheets doesn't know how to treat these values which means the SUM formula is trying to sum up a bunch of text values. In Excel, this is similar to the cells being formatted as "Text":







Detecting cells that are formatted as text



If you are really diligent, you can use the ISTEXT formula to test whether a cell is indeed formatted as text. However, when you inherit an Excel model, you are just hoping that the previous modeler did things correctly and you can just plug in numbers and move on with your life. How can you figure out if some cells are improperly formatted before pulling your hair out?



Unfortunately, no answer exists. If you know of one, please comment below. For new spreadsheet users,]]>
Dear Analyst 48 36:28 49960
Dear Analyst #47: Spreadsheet horror stories from the European Spreadsheet Interests Group https://www.thekeycuts.com/dear-analyst-47-spreadsheet-horror-stories-from-the-european-spreadsheet-interests-group/ https://www.thekeycuts.com/dear-analyst-47-spreadsheet-horror-stories-from-the-european-spreadsheet-interests-group/#respond Mon, 19 Oct 2020 04:12:02 +0000 https://www.thekeycuts.com/?p=49902 The episode about how a rogue trader cost JPMorgan Chase $6.2B due to an Excel error struck a chord with folks. This episode explores three horror stories (and a recent one related to COVID) where people made simple spreadsheet errors and cost their companies and organizations millions of dollars. I don’t get too in-depth with […]

The post Dear Analyst #47: Spreadsheet horror stories from the European Spreadsheet Interests Group appeared first on .

]]>
The episode about how a rogue trader cost JPMorgan Chase $6.2B due to an Excel error struck a chord with folks. This episode explores three horror stories (and a recent one related to COVID) where people made simple spreadsheet errors and cost their companies and organizations millions of dollars. I don’t get too in-depth with the actual spreadsheet error in each story like I did with the JPMorgan Chase story, but do provide a quick analysis and lessons to be learned from each story. At the end of the day, these stories are not about the deficiencies of Excel itself, but rather human error and oversight.

EuSpRIG for the win

All the stories below come from the EuSpRIG’s website where some of the stories go back to the mid-90s. I came across the European Spreadsheet Risks Interest Group (EuSpRIG for short) watching a webinar about auditing Excel workbooks by Paula Guilfoyle. During the webinar, she mentioned these Excel “horror stories” on the EuSpRIG website, and low and behold there’s a rich archive of all these horror stories which the team has consistently been updating for what looks like to be over 20 years. You know the group and the content must be legit since the website still looks like a site from the late 90s:

EuSpRIG website

In all seriousness, Patrick O’Beirne (chair of EuSpRIG) has created an amazing community and resource all spreadsheet users should peruse to learn from past spreadsheet mistakes. Nicole Kobie at Wired recently wrote a great story about these “Excel warriors,” and I think this quote from the story highlights the main issues all you analysts out there should heed:

Part of the challenge of this work is that spreadsheet defenders must not only be Excel experts but know the industry that they’re working in.

-Nicole Cobie, Wired

From what I can tell, all the horror stories on the EuSpRIG website (and really any time you see a story in the media about a spreadsheet error) highlight something negative that happened to the company. These type of stories are the only ones that reporters pick up and lead to clicks, ad dollars, and that whole thing. Rarely do you see a story of masterfully crafting a spreadsheet formula that leads to a positive result for the company.

Story #1: $2.6B erased from Fidelity’s Magellan fund

This story originated from a thread in an e-mail listserv from 1995 called The Risks Digest. Stepping back for a bit, it’s amazing that these spreadsheet mishaps were “documented” this far back on Microsoft Excel ’95 (I learned Excel on version 2003 which seems lightyears ahead of 1995):

Excel 95, Source: Version Museum

Story goes like this: In November 1994, Fidelity was planning on making a distribution from their fund in the amount of $4.32/share. Fidelity cancelled the distribution because a tax accountant forgot to put a minus sign in front of a $1.3B net capital loss, which resulted in a positive dividend estimate that was off by $2.6B.

Analysis & lessons learned

Can you imagine being the tax accountant having to tell your boss that you forgot to put a minus sign in a cell? This error made it all the way up to then Fidelity CEO J. Gary Burkhead who sent a personal email (or maybe a physical letter–heck it’s 1995) to all shareholders about the mistake:

We have taken several steps designed to ensure that this error should not happen again. We will subject initial estimates to the same rigorous verification process that we use in preparing the distributions that the funds actually pay. This will include a thorough review not only by our own fund accountants by also by the fund’s treasurer and independent auditors. In addition, estimates will be reviewed by each fund’s portfolio manager.

What’s missing from the story, naturally, are what systems Fidelity used to output the financial records of the fund into the file the tax accountant used. Why did the tax accountant need to calculate the net capital loss in a separate Excel file?

Not much of a lesson to learn here except that if you’re copying and pasting numbers from a PDF to a spreadsheet, make sure that negative symbols are preserved. Especially for financial data (where negative numbers are typically written in between parentheses), making a simple mistake like this can lead to you having to explain your mistake to the CEO.

Story #2: Misaligned rows results in 10% profit erased ($24M)

I really wish we could see what the actual spreadsheet looked like in this story. It’s April 2003, and TransAlta Corp. (an electricity power generator company based in Alberta) needs to submit its bids for purchasing May contracts in the New York power market. Once the bids are submitted, you can’t change them, based on the rules of the power market. Of course, the bids are submitted in an Excel file, and someone did a bad cut-and-paste job and rows got misaligned.

  • The result: TransAlta bid higher for certain contracts they shouldn’t have, and ultimately overpaid for them.
  • The kicker: TransAlta knew about the error for a month and could not say anything because if their competitors found out about the incorrect bids, TransAlta could’ve lost a lot more than $24M

It was literally a cut-and-paste error in an Excel spreadsheet that we did not detect when we did our sorting and ranking of bids prior to submission.

-Steve Snyder, TransAlta president in 2003

Analysis & lessons learned

Copy and paste errors happen all the time. My guess is that the analyst who did the paste was doing this from another spreadsheet. Aside from doing error checks to ensure numbers add up correctly (and they match the originating spreadsheet), one idea this sparks is the type of paste you use.

For creating dashboards and models, you are typically doing a Paste Special Values, Formats, and Formulas. In this case, doing a vanilla copy and paste might have led the analyst to realized the error they were making. Let’s assume the source cells were really crappily formatted like the cells below. The empty cells to the right where you’re supposed to paste the bids have no formatting, so when you do a regular paste, it’s easier to see that you have misaligned the cells because you are copying over the source formatting when doing the paste.

The actual spreadsheet the TransAlta analyst submitted was much more complicated than this, so I may have completely missed the mark. But the lesson to be learned here is that cell formatting can give you a quick visual cue on whether or not you are doing the paste correctly.

Story #3: Overestimating graduate student enrollment leads to $2.4M less revenue

The University of Toledo is preparing for the 2004-2005 academic year. UT was already told that due to state budget cuts, they would get $1.5M less funding this year. It’s a tough position to be in as UT officials decide what their budget will be for the upcoming school year. Do they raise tuition? Get their donors to pitch in some more? An analyst in the institutional research office saves the day. He or she says that graduate student enrollment is expected to increase by 10% next year, even though official UT projections point in the opposite direction. With the extra tuition from these graduate students, UT can carry on with their plans for hiring more full-time faculty and other strategic initiatives.

The kicker: The official UT projections of a 10% decline in graduate student enrollment were accurate. The analyst may have not formatted the decrease correctly and officials believed it was an increase instead.

Analysis & lessons learned

This story shows the perfect confluence of a spreadsheet error, FP&A budgeting constraints, and a little bit of confirmation bias all mingling together at a party.

Imagine all the UT officials trying to figure out how they are going to manage a budget for the upcoming year and commit to all the strategic investments they want to make. You either cut costs or raise prices, and low and behold an analyst discovers a trend in the data that upper management failed to see. It’s not too crazy to think that management is not as close to spreadsheet formulas as the analyst is, and once they hear “10% increase in graduate enrollment = $2.4M projected revenue,” their eyes light up. All their problems are solved, and they can continue growing the university like they originally planned.

It’s hard to be the bearer of bad news and reveal that the unbridled optimism is unfounded, and that this projected $2.4M is not real. The budgeting team doesn’t want to hear it, neither to the UT officials. Truth hurts. Why investigate whether one cell in the spreadsheet is positive or negative when everyone is happy?

Moral hazard with a twist

I had to learn about ethics during a finance class at university. The professors talked about the concept of moral hazard and doing what’s right for shareholders and the community. The main lesson he gave:

Don’t do something today that you would regret reading about on the front page of the Wall Street Journal the next morning.

These spreadsheet errors are all (allegedly) “honest mistakes.” As I talked about in the JPMorgan Chase episode, these are examples of Hanlon’s Razor. Yet, these stories end up on the front page of newspapers because the media likes to show that simple human error can cause real financial harm and taints reputations.

Should analysts be held responsible for these mistakes like a stockbroker trading on inside information? Is it worth taking a risk knowing that at the end of the day, you can blame your spreadsheet or lack of proper tools and controls to get work done?

The UK story about COVID cases tracked in .XLS files

Related to the three stories above, a recent story about spreadsheets gone wrong made headlines since it had to do with COVID. Public Health England (the organization responsible for processing data about COVID cases) was not reporting positive COVID cases and contact tracing details correctly because they were using .XLS files (file format from Excel 2003, hooray!) instead of .XLSX. This older file format can only hold 65,536 rows per worksheet and PHE had way more to report than 65,000 cases including the contract tracing data.

My hot take: if the UK can’t even get their reporting right, imagine how other countries must be faring. Spreadsheet errors leading to losing millions is one thing, but to risk lives is pushing this issue to a whole other level. I came across this video from Matt Parker who describes the situation with great satire:

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #47: Spreadsheet horror stories from the European Spreadsheet Interests Group appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-47-spreadsheet-horror-stories-from-the-european-spreadsheet-interests-group/feed/ 0 The episode about how a rogue trader cost JPMorgan Chase $6.2B due to an Excel error struck a chord with folks. This episode explores three horror stories (and a recent one related to COVID) where people made simple spreadsheet errors and cost their co... The episode about how a rogue trader cost JPMorgan Chase $6.2B due to an Excel error struck a chord with folks. This episode explores three horror stories (and a recent one related to COVID) where people made simple spreadsheet errors and cost their companies and organizations millions of dollars. I don't get too in-depth with the actual spreadsheet error in each story like I did with the JPMorgan Chase story, but do provide a quick analysis and lessons to be learned from each story. At the end of the day, these stories are not about the deficiencies of Excel itself, but rather human error and oversight.







EuSpRIG for the win



All the stories below come from the EuSpRIG's website where some of the stories go back to the mid-90s. I came across the European Spreadsheet Risks Interest Group (EuSpRIG for short) watching a webinar about auditing Excel workbooks by Paula Guilfoyle. During the webinar, she mentioned these Excel "horror stories" on the EuSpRIG website, and low and behold there's a rich archive of all these horror stories which the team has consistently been updating for what looks like to be over 20 years. You know the group and the content must be legit since the website still looks like a site from the late 90s:



EuSpRIG website



In all seriousness, Patrick O'Beirne (chair of EuSpRIG) has created an amazing community and resource all spreadsheet users should peruse to learn from past spreadsheet mistakes. Nicole Kobie at Wired recently wrote a great story about these "Excel warriors," and I think this quote from the story highlights the main issues all you analysts out there should heed:



Part of the challenge of this work is that spreadsheet defenders must not only be Excel experts but know the industry that they’re working in.-Nicole Cobie, Wired



From what I can tell, all the horror stories on the EuSpRIG website (and really any time you see a story in the media about a spreadsheet error) highlight something negative that happened to the company. These type of stories are the only ones that reporters pick up and lead to clicks, ad dollars, and that whole thing. Rarely do you see a story of masterfully crafting a spreadsheet formula that leads to a positive result for the company.



Story #1: $2.6B erased from Fidelity's Magellan fund



This story originated from a thread in an e-mail listserv from 1995 called The Risks Digest. Stepping back for a bit, it's amazing that these spreadsheet mishaps were "documented" this far back on Microsoft Excel '95 (I learned Excel on version 2003 which seems lightyears ahead of 1995):



Excel 95, Source: Version Museum



Story goes like this: In November 1994, Fidelity was planning on making a distribution from their fund in the amount of $4.32/share. Fidelity cancelled the distribution because a tax accountant forgot to put a minus sign in front of a $1.3B net capital loss, which resulted in a positive dividend estimate that was off by $2.6B.



Analysis & lessons learned



Can you imagine being the tax accountant having to tell your boss that you forgot to put a minus sign in a cell?]]>
Dear Analyst 47 33:36 49902
Dear Analyst #46: Building a project management workflow with task dependencies in Google Sheets https://www.thekeycuts.com/dear-analyst-46-building-a-project-management-workflow-with-task-dependencies-in-google-sheets/ https://www.thekeycuts.com/dear-analyst-46-building-a-project-management-workflow-with-task-dependencies-in-google-sheets/#respond Mon, 05 Oct 2020 04:05:00 +0000 https://www.thekeycuts.com/?p=49799 I wrote a lengthy blog post comparing using Google Sheets as a project management “platform” relative to other dedicated project management software on the market. This post explores how to build a project management platform in Google Sheets. The functionality in this Google Sheets tool rivals that of some of the more popular project management […]

The post Dear Analyst #46: Building a project management workflow with task dependencies in Google Sheets appeared first on .

]]>
I wrote a lengthy blog post comparing using Google Sheets as a project management “platform” relative to other dedicated project management software on the market. This post explores how to build a project management platform in Google Sheets. The functionality in this Google Sheets tool rivals that of some of the more popular project management platforms on the market (e.g. Microsoft Project). Here’s the Google Sheet with the workflow fully built out.

Source: TaskRay

An unconventional tool for project management

Google Sheets is not the “platform” you might think of when it comes to typical project management workflows. In episode 43, I talked about how spreadsheets can be “extended” beyond its core use case (e.g. accounting, financial analysis) to tools and application. This project management “tool” is a perfect example of this concept.

Google Sheets might be the first thing your team reaches for because it’s free and allows real-time collaboration. With some basic formulas, you can build some pretty advanced functionalities and workflows similar to other SaaS tools and software. Here are the main features of the Google Sheet and how it works.

Step 1: Creating a task dependency column

Most projects have tasks with dependencies. In column A, we have all our tasks. In Column B, we have the name of the task that the current task depends on. Instead of copying and pasting a task from column A into column B every time we want to change what the dependent task is, we can use a data validation to get a dropdown of all the tasks in column A:

Now, any task that shows in column A shows up in the dropdown in in column B:

The issue is that if you change the spelling or name of a task in column A, it won’t carry through to the dropdown to the data validation cells in column B. So if “Instructor Shoot” changes to “Instructor Shooting,” cell B3 will still say “Instructor Shoot.” You have to manually click the dropdown again and select the new “Instructor Shooting” dependency. Not the end of the world assuming your dependencies don’t change often, but could be annoying if you have projects with tasks and dependencies that change often.

Step 2: Calculating task end dates

Column G, or the Task End Date, is simply the Task Start Date plus the Duration (Days) column. While this seems like a trivial formula, it’s actually not that “easy” in other project management platforms. The reason is that you are mixing a date format with a number format. In other platforms, you have to create a special formula column that uses some special function like DATEADD(). In Google Sheets, it just kind of works with adding a number (representing the number of days the task takes) to the Task Start Date:

Step 3: Calculating task dependency start and end dates

Moving over to columns H and I, we want to calculate the start and end dates of each task’s dependency task. The reason it’s important to calculate these dates is so that we know when the current task can start. We are assuming a basic Finish-to-Start dependency type where the current task cannot start until it’s dependency (e.g. predecessor) ends. In order to calculate the start and end dates of the task’s dependency, we do a VLOOKUP onto the same table using column B (the dependency task) as the lookup value and the lookup table is starts with column A (the main task) to column G (the task’s end date):

The dependency’s end date is similar to the formula above, except it return column G (7th index in the lookup table).

Step 4: Calculate task start date

We do column F last because the Task Start Date depends on when that task’s dependency ends. Therefore, Task Start Date simply equals the Dep End Date column. Some project managers may bake in a lag between the dependency task and the current task. This scenario assumes the current task can start right when the dependency task ends:

The reason we have all these dates is so that changing the Task Start Date for the first task will automatically cause all the subsequent task dates recalculate automatically. Cell F1 is a hard-coded date since that task does not have a dependency (hence why it’s the “kickoff” task).

Creating this “waterfall” date effect is difficult in some other project management platforms because it may result in a recursive formula situation. In project management platforms that have specific features around dependencies, this waterfall of dates is built into the software. On other platforms, you have to manually select the start and end dates of each task. This can be annoying if you want the same set of tasks and dependencies for each project you run.

Step 5: A hacky gantt chart visualization of tasks

Most project management platforms have a fully-featured gantt chart view of your tasks and dependencies. Typically these gantt charts allow you to drag-and-drop the ends of each bar (representing each task) to edit the start and end dates of each task. This may be useful if you want to control the start and end dates of each task, but in our case, we already know the start and end dates of each task due to the dependencies and durations of each task.

To build a simply visualization, you can add numbers across the first row (representing each day of the month). Then in the empty cells below, you have a formula that checks to see if the current day of the month falls within the task’s start and end dates. If it does, put an “x” in the cell:

Since each of these cells contains an “x” or is blank, you can apply some basic conditional formatting to get a waterfall view of all the tasks:

Fitting a square peg into a round hole

While Google Sheets or Excel may not be the best platform for this project management use case, it’s still a compelling solution when you need something that “just works.” For spreadsheet gurus out there, it’s kind of fun to try to write formulas and stretch Google Sheets beyond what it’s meant to do in order to solve a use case. Each of these scenarios poses a challenge for us to build a tool in an unconventional way.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #46: Building a project management workflow with task dependencies in Google Sheets appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-46-building-a-project-management-workflow-with-task-dependencies-in-google-sheets/feed/ 0 I wrote a lengthy blog post comparing using Google Sheets as a project management "platform" relative to other dedicated project management software on the market. This post explores how to build a project management platform in Google Sheets. I wrote a lengthy blog post comparing using Google Sheets as a project management "platform" relative to other dedicated project management software on the market. This post explores how to build a project management platform in Google Sheets. The functionality in this Google Sheets tool rivals that of some of the more popular project management platforms on the market (e.g. Microsoft Project). Here's the Google Sheet with the workflow fully built out.



Source: TaskRay



An unconventional tool for project management



Google Sheets is not the "platform" you might think of when it comes to typical project management workflows. In episode 43, I talked about how spreadsheets can be "extended" beyond its core use case (e.g. accounting, financial analysis) to tools and application. This project management "tool" is a perfect example of this concept.







Google Sheets might be the first thing your team reaches for because it's free and allows real-time collaboration. With some basic formulas, you can build some pretty advanced functionalities and workflows similar to other SaaS tools and software. Here are the main features of the Google Sheet and how it works.



Step 1: Creating a task dependency column



Most projects have tasks with dependencies. In column A, we have all our tasks. In Column B, we have the name of the task that the current task depends on. Instead of copying and pasting a task from column A into column B every time we want to change what the dependent task is, we can use a data validation to get a dropdown of all the tasks in column A:







Now, any task that shows in column A shows up in the dropdown in in column B:







The issue is that if you change the spelling or name of a task in column A, it won't carry through to the dropdown to the data validation cells in column B. So if "Instructor Shoot" changes to "Instructor Shooting," cell B3 will still say "Instructor Shoot." You have to manually click the dropdown again and select the new "Instructor Shooting" dependency. Not the end of the world assuming your dependencies don't change often, but could be annoying if you have projects with tasks and dependencies that change often.



Step 2: Calculating task end dates



Column G, or the Task End Date, is simply the Task Start Date plus the Duration (Days) column. While this seems like a trivial formula, it's actually not that "easy" in other project management platforms. The reason is that you are mixing a date format with a number format. In other platforms, you have to create a special formula column that uses some special function like DATEADD(). In Google Sheets, it just kind of works with adding a number (representing the number of days the task takes) to the Task Start Date:







Step 3: Calculating task dependency start and end dates



Moving over to columns H and I, we want to calculate the start and end dates of each task's dependency task. The reason it's important to calculate these dates is so that we know when the current task can start. We are assuming a basic Finish-to-Start dependency type where the current task cannot start until it's d...]]>
Dear Analyst 46 35:50 49799
Dear Analyst #45: Thinking long-term for structuring your dataset using U.S. public food assistance data https://www.thekeycuts.com/dear-analyst-45-thinking-long-term-for-structuring-your-dataset-using-us-public-food-assistance-data/ https://www.thekeycuts.com/dear-analyst-45-thinking-long-term-for-structuring-your-dataset-using-us-public-food-assistance-data/#respond Mon, 28 Sep 2020 04:05:00 +0000 https://www.thekeycuts.com/?p=49721 When you need to capture some data in a structured way, you’ll open up an Excel file or Google Sheet and just start throwing data into the spreadsheet. Not much thinking; just copy and paste. As that dataset grows, the original structure you had set up for that spreadsheet may not be ideal. Specifically, the […]

The post Dear Analyst #45: Thinking long-term for structuring your dataset using U.S. public food assistance data appeared first on .

]]>
When you need to capture some data in a structured way, you’ll open up an Excel file or Google Sheet and just start throwing data into the spreadsheet. Not much thinking; just copy and paste. As that dataset grows, the original structure you had set up for that spreadsheet may not be ideal. Specifically, the dataset is not ideal for putting into a PivotTable. Long-term, I’d argue that all your spreadsheets should be structured in a way that’s suitable for a PivotTable (which makes it ready for storing in a traditional database). This post explores how you can structure a dataset that looks like 99% of data out there into a structure you can analyze in a PivotTable. Link to the Google Sheet is here.

Video walkthrough of Google Sheet:

Why this is important

Telling someone that their data should be structured is a platitude like “such is life” and “forgive and forget.” Let’s be more specific in how this statement can impact your work.

To be specific: 9 times out of 10, structure your data so that it can always be analyzed in a PivotTable.

Consider this scenario:

  1. Your accounting team needs your group to start forecasting expense for next month’s budget
  2. You start gathering the data and throw it into a spreadsheet
  3. Every month new data gets added to the spreadsheet, and perhaps the CFO wants to get more granular analyses on the forecast
  4. You start adding additional columns to the spreadsheet and perhaps summary tables in other sheets in the file
  5. Other teams now need to see your data to understand how your team’s decisions will impact their decisions
  6. This spreadsheet ends up being too hard to maintain, so there’s an internal project to put this data into a real database (some ERP solution)
  7. One quarter of planning goes by, and another quarter for implementation
  8. 6 months later, the business has changed, the structure of the database needs to be adjusted, and the data engineer role still needs to be filled

This concocted scenario is quite extreme, but the key lesson is this:

Focusing on the schema and structure of your spreadsheet today takes time and requires you to think about how your data will be used and maintained in the future.

U.S. public food assistance dataset

I’ve started browsing Kaggle’s to find interesting datasets recently, and this one caught my attention since it looks at spending and household participation related to a public food assistance program called SNAP. As the creator of the dataset discusses, there are many issues with collecting government datasets. Data is spread out across different agencies, there are multiple formats, and data is sometimes aggregated. This makes consolidating the data a pain. These problems may sound familiar if you’re working at a large organization.

The “Raw” sheet in the Google Sheet simply shows the cost, households participating, and total people associated with the SNAP program for the 2019 fiscal year across four states (CA, IL, LA, NY):

In your organization, this could be sales data, headcount data, COGS, whatever. The key thing about this dataset is that you have all the numbers organized by month across the top. This table would be great for a simple time series analysis where you may want to see the cost per household for California over time. But what if you need to build out a more dynamic dashboard looking at various metrics for just a few months or a subset of states?

Pivoting this data

If you create a PivotTable with this data, you’ll run into this issue of having to select individual month names to put into the Values section of the PivotTable builder. We only have twelve months of data for FY19, imagine if this we had to this for ten years worth of data going back to FY09.

Some people asked me about what a “denormalized dataset” means in the context of Excel/Google Sheets I mentioned this term in the previous episode. We need to “denormalize” this data so that it’s easier to pivot off of. This means putting in data that may repeat itself in a certain column, but this helps with structuring the data properly for a PivotTable.

In Excel, there is hacky way of denormalizing your data, and it involves going through the antiquated PivotTable wizard (which I believe you can only access via old Excel keyboard shortcuts). I don’t think the PivotTable wizard is available in the ribbon in recent versions of Excel.

This video below shows you how to do it. It involves checking a radio button for “Multiple consolidation ranges” and then double-clicking in the grand total of the sum of Values in the PivotTable. It’s not pretty, but it works:

Unfortunately for Google Sheets users, that PivotTable wizard isn’t available. If you find a similar workaround let me know.

Moving time periods to rows in Google Sheets

Whenever you see time periods (in this case, months in 2018 and 2019) organized across the columns, think about how you can put those time periods into one column. This starts the process of denormalization. You want something that looks like this:

When you pivot off of the Period column in the PivotTable, you can then filter for and group your values by specific dates:

Moving metrics from rows to columns

In the original data set, there’s a Metric column which contains metrics we care about for each state (Cost, Households, and Persons). This structure will make a PivotTable very hard to organize and analyze because you will have to filter for a specific metric in order to get any meaningful statistics from your dataset. Additionally, this structure is mixing data types (e.g. Cost is in dollars and Households is a number).

Whenever you see metrics organized in this manner, think about moving each individual metric to its own column:

Now, each of these columns is a value you can drag and drop into the “Values” section of the PivotTable. This means you can get summary results or drill down into a specific state’s numbers:

Transposing the data

Setting up the data structure to look like the structure in the “Solution” sheet of the Google Sheet does take a little spreadsheet gymnastics. The easiest method I’ve found is to apply the TRANSPOSE function to the original dataset and then do some copying/pasting. Here’s what a TRANSPOSE looks like:

The nice thing about this function is that it puts all your time periods (months in this case) into its own column. Each metric also is organized in a top-to-down fashion. The problem is that each state’s data is still organized across the top. At this point, you’re doing a copy and paste to consolidate the 13 columns that result from the TRANSPOSE function into the 5 columns we ultimately care about: State, Period, Cost, Households, and Persons.

Setting things up for a database

You may be wondering what other benefits there are for having this data structure besides the ease of creating a PivotTable. If your data ever ends up in a regular database (e.g. SQL), this is the ideal data structure for that tool.

I’ve seen scenarios at different organizations where an Excel file or Google Sheet has hundreds of thousands of rows that represent critical business data cobbled together over time. There comes a point in time from an organizational perspective where that data needs to be put into a database for ease of querying. A data engineer will have to do some data manipulation or run an ETL process to convert the data into a suitable format for a database. Guess what? You can help your data engineer out by getting this structure correct from day one.

Data down good :), data right bad 🙁

To summarize how your data should “grow” over time (big data ain’t going nowhere), your data should NOT grow right:

Instead, it should grow down:

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #45: Thinking long-term for structuring your dataset using U.S. public food assistance data appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-45-thinking-long-term-for-structuring-your-dataset-using-us-public-food-assistance-data/feed/ 0 When you need to capture some data in a structured way, you'll open up an Excel file or Google Sheet and just start throwing data into the spreadsheet. Not much thinking; just copy and paste. As that dataset grows, When you need to capture some data in a structured way, you'll open up an Excel file or Google Sheet and just start throwing data into the spreadsheet. Not much thinking; just copy and paste. As that dataset grows, the original structure you had set up for that spreadsheet may not be ideal. Specifically, the dataset is not ideal for putting into a PivotTable. Long-term, I'd argue that all your spreadsheets should be structured in a way that's suitable for a PivotTable (which makes it ready for storing in a traditional database). This post explores how you can structure a dataset that looks like 99% of data out there into a structure you can analyze in a PivotTable. Link to the Google Sheet is here.







Video walkthrough of Google Sheet:




https://www.youtube.com/watch?v=qFhluLJQpfA




Why this is important



Telling someone that their data should be structured is a platitude like "such is life" and "forgive and forget." Let's be more specific in how this statement can impact your work.



To be specific: 9 times out of 10, structure your data so that it can always be analyzed in a PivotTable.



Consider this scenario:



* Your accounting team needs your group to start forecasting expense for next month's budget* You start gathering the data and throw it into a spreadsheet* Every month new data gets added to the spreadsheet, and perhaps the CFO wants to get more granular analyses on the forecast* You start adding additional columns to the spreadsheet and perhaps summary tables in other sheets in the file* Other teams now need to see your data to understand how your team's decisions will impact their decisions* This spreadsheet ends up being too hard to maintain, so there's an internal project to put this data into a real database (some ERP solution)* One quarter of planning goes by, and another quarter for implementation* 6 months later, the business has changed, the structure of the database needs to be adjusted, and the data engineer role still needs to be filled



This concocted scenario is quite extreme, but the key lesson is this:



Focusing on the schema and structure of your spreadsheet today takes time and requires you to think about how your data will be used and maintained in the future.



U.S. public food assistance dataset



I've started browsing Kaggle's to find interesting datasets recently, and this one caught my attention since it looks at spending and household participation related to a public food assistance program called SNAP. As the creator of the dataset discusses, there are many issues with collecting government datasets. Data is spread out across different agencies, there are multiple formats, and data is sometimes aggregated. This makes consolidating the data a pain. These problems may sound familiar if you're working at a large organization.



The "Raw" sheet in the Google Sheet simply shows the cost, households participating, and total people associated with the SNAP program for the 2019 fiscal year across four states (CA, IL, LA, NY):







In your organization, this could be sales data, headcount data, COGS, whatever. The key thing about this dataset is that you have all the numbers organized by month across the top.]]>
Dear Analyst 45 27:54 49721
Dear Analyst #44: Referencing CO₂ emissions data with INDIRECT and FILTER to build a model https://www.thekeycuts.com/dear-analyst-44-referencing-co2-emissions-data-with-indirect-and-filter-to-build-a-model/ https://www.thekeycuts.com/dear-analyst-44-referencing-co2-emissions-data-with-indirect-and-filter-to-build-a-model/#comments Mon, 21 Sep 2020 15:41:19 +0000 https://www.thekeycuts.com/?p=49661 I recently had the opportunity to build a growth model at work, and it’s been fun getting back to my roots in Excel/Google Sheets. Been a while since I’ve built a model from scratch so I of course referenced previous models my colleagues have built. Interesting to see the use of FILTER and INDIRECT in […]

The post Dear Analyst #44: Referencing CO₂ emissions data with INDIRECT and FILTER to build a model appeared first on .

]]>
I recently had the opportunity to build a growth model at work, and it’s been fun getting back to my roots in Excel/Google Sheets. Been a while since I’ve built a model from scratch so I of course referenced previous models my colleagues have built. Interesting to see the use of FILTER and INDIRECT in the models for referencing data, so I’m sharing how you can use these two functions when building a model (specifically, referencing raw data). The example Google Sheet uses an open data set for CO₂ emissions found on Kaggle.

The data set

The data set simply looks at all CO₂ emissions for every country by year (in some cases going back to the 1800s). Not quite sure how Our World In Data was able to get data going back this far, but it’s there. The data set consists of four columns and is denormalized (just one long stats table):

About a year ago, National Geographic released a report card on how various countries are tracking towards emissions targets following the Paris Agreement a few years earlier. National Geographic put a few countries in the following buckets: “Top of the class,” “Shows some promise,” and “Barely trying”:

Our raw data set has emission data for every country in the world, and our goal is to build a simply model that outputs the emissions for a few select countries for let’s say the most recent 10 years of data. With this summary data set, we can then start looking at trends, growth patterns, and more. Basically, how do we get something that looks like this (and we can plug in whatever country we want in the first column)?

Modeling with named ranges vs. PivotTables

Since we have a denormalized data set, the easiest way to get the data into the “summary” table structure in the screenshot above is do a PivotTable:

You can play with the filters for Year and Country to just show the data you want. As much as I like PivotTables, it’s not easy to manipulate when you want to change the filters on the fly, so instead we can create a separate summary table that references the raw data using formulas. This is where named ranges come into play.

Applying named ranges to columns in the data set

This is a new technique that I haven’t done before in my models when I was more actively building models, but you’ll see the flexibility once you see how named ranges can work with the Filter function. In the gif below, I’ve named the Country, Year, and Annual CO₂ emissions columns. The named range just represents the entire column. You can access named ranges by going to the “Data” menu in the toolbar.

Referencing named ranges in summary table

Back on the “Model” spreadsheet, I have my countries broken out into the three buckets listed in the National Geographic article. This is a screenshot of the summary table I need to fill out:

Since I have the country in column C and years from columns D onward, I can write a formula to pull in the data I need from the raw emissions data to fill out the table. Here’s what that formula looks like:

Let’s break down why this formula works.

Use of INDIRECT

The INDIRECT function references column A and column A just has the word “emissions” copied down the column. INDIRECT is able to take whatever text you have in a cell and “convert” it to a cell reference. Remember how we made “emissions” a named range in our data set? This INDIRECT function is telling Google Sheets to take column D from our raw data set and turn it into the data we want to reference in this formula.

Another advantage of using INDIRECT in this scenario is that you may have other data sets you want to pull into your main summary table. Let’s say you had population data or energy data for these countries. You would name those columns just like you named the “emissions” column and put the words “population” and “energy” in column A so that Google Sheets knows which data you would like to filter.

Use of FILTER

The FILTER function sits outside as the main function and the first argument it takes is in the actual data you’re trying to filter. In this case, it’s just our emissions data (referenced by using the INDIRECT function). The rest of the arguments are how you actually want to filter your data.

Remember how we also applied named ranges to Year and Country in our raw data? We can now tell the FILTER formula to filter these named ranges (e.g. columns) by the actual year and country in our summary table. Then by using absolute and relative references on our year and country references on the summary table, we can quickly fill the rest of the summary table.

This can be done with GETPIVOTDATA in a PivotTable

Returning back to PivotTables, this summary table could’ve been filled out by referencing a PivotTable as well. Those of you using PivotTables in Excel are probably very familiar with the GETPIVOTDATA function. You could build a PivotTable off of the raw data set, and then use the GETPIVOTDATA function on the summary table to output the data you need.

There are pros and cons with this approach. The con is that you have to create this intermediate PivotTable that sits on another sheet just so you can reference it using the GETPIVOTDATA function. The pro is that your PivotTable constantly updates as you make changes to your raw data set, but this functionality already exists with the FILTER function as described above.

Conclusion

I find the FILTER function in conjunction with named ranges as a much more clean solution because there is no intermediate PivotTable and you can reference the columns you want to filter by their actual names versus a column reference. If you have multiple data sets you need to summarize, having many named ranges may complicate your model, but overall I think having unique names for your columns of data makes your model more readable.

Spreadsheet Day

October 17th is National Pasta Day and National Pay Back A Friend Day. It also happens to be Spreadsheet Day because VisiCalc, the first spreadsheet for personal computers, was released on October 17th, 1979. Debra Dalgleish suggested that this day is Spreadsheet Day back in 2010, so this holiday has been going on for 10 years strong!

To celebrate Spreadsheet Day, the MS Excel Toronto meetup group is hosting a special meetup with Excel’s heavy hitters like Bill Jelen (Mr. Excel), Dan Fylstra (creator of VisiCorp and VisiCalc), Rob Collie (Power Pivot creator), and David Monroy (Senior Program Manager for Microsoft Excel). In general, I’ve found the meetup really educational and have learned a few things from their webinars. Celia Alves has done an amazing job with managing this community and meetup.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #44: Referencing CO₂ emissions data with INDIRECT and FILTER to build a model appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-44-referencing-co2-emissions-data-with-indirect-and-filter-to-build-a-model/feed/ 1 I recently had the opportunity to build a growth model at work, and it's been fun getting back to my roots in Excel/Google Sheets. Been a while since I've built a model from scratch so I of course referenced previous models my colleagues have built. I recently had the opportunity to build a growth model at work, and it's been fun getting back to my roots in Excel/Google Sheets. Been a while since I've built a model from scratch so I of course referenced previous models my colleagues have built. Interesting to see the use of FILTER and INDIRECT in the models for referencing data, so I'm sharing how you can use these two functions when building a model (specifically, referencing raw data). The example Google Sheet uses an open data set for CO₂ emissions found on Kaggle.







The data set



The data set simply looks at all CO₂ emissions for every country by year (in some cases going back to the 1800s). Not quite sure how Our World In Data was able to get data going back this far, but it's there. The data set consists of four columns and is denormalized (just one long stats table):







About a year ago, National Geographic released a report card on how various countries are tracking towards emissions targets following the Paris Agreement a few years earlier. National Geographic put a few countries in the following buckets: "Top of the class," "Shows some promise," and "Barely trying":







Our raw data set has emission data for every country in the world, and our goal is to build a simply model that outputs the emissions for a few select countries for let's say the most recent 10 years of data. With this summary data set, we can then start looking at trends, growth patterns, and more. Basically, how do we get something that looks like this (and we can plug in whatever country we want in the first column)?







Modeling with named ranges vs. PivotTables



Since we have a denormalized data set, the easiest way to get the data into the "summary" table structure in the screenshot above is do a PivotTable:







You can play with the filters for Year and Country to just show the data you want. As much as I like PivotTables, it's not easy to manipulate when you want to change the filters on the fly, so instead we can create a separate summary table that references the raw data using formulas. This is where named ranges come into play.



Applying named ranges to columns in the data set



This is a new technique that I haven't done before in my models when I was more actively building models, but you'll see the flexibility once you see how named ranges can work with the Filter function. In the gif below, I've named the Country, Year, and Annual CO₂ emissions columns. The named range just represents the entire column. You can access named ranges by going to the "Data" menu in the toolbar.







Referencing named ranges in summary table



Back on the "Model" spreadsheet, I have my countries broken out into the three buckets listed in the National Geographic article. This is a screenshot of the summary table I need to fill out:







Since I have the country in column C and years from columns D onward, I can write a formula to pull in the data I need from the raw emissions data to fill out the table. Here's what that formula looks like:







]]>
Dear Analyst 44 28:52 49661
Dear Analyst #43: Setting up workflows that scale – from spreadsheets to tools & applications https://www.thekeycuts.com/dear-analyst-43-setting-up-workflows-that-scale-from-spreadsheets-to-tools-applications/ https://www.thekeycuts.com/dear-analyst-43-setting-up-workflows-that-scale-from-spreadsheets-to-tools-applications/#comments Mon, 14 Sep 2020 04:05:00 +0000 https://www.thekeycuts.com/?p=49633 This episode is the audio from a presentation I gave a few weeks ago to members of Betaworks based in NYC. Betaworks is a startup accelerator, co-working space, and community of founders. No-code is a pretty hot topic right now, and in this presentation I talk about how spreadsheets is one of the first no-code […]

The post Dear Analyst #43: Setting up workflows that scale – from spreadsheets to tools & applications appeared first on .

]]>
This episode is the audio from a presentation I gave a few weeks ago to members of Betaworks based in NYC. Betaworks is a startup accelerator, co-working space, and community of founders. No-code is a pretty hot topic right now, and in this presentation I talk about how spreadsheets is one of the first no-code “platforms” and how your spreadsheet skills can be extended to build real tools. The presentation is adapted from a talk I gave last year at Webflow’s No-Code Conference. I embedded the “slides” at the bottom of the post, and here is a link to the slides if you want to look on your own.

Summary of presentation

  1. The skills you’ve learned in Excel/Google Sheets — include data structuring — translate to building workflows for any part of your business
  2. Thinking beyond spreadsheets as a way to do data analysis or “number crunching”
  3. Any tool that helps automate or solve some workflow at your company can be built with spreadsheets
  4. Why learning spreadsheets can set you up well for learning “no-code” tools

Spreadsheet examples from presentation

During the presentation, I showed actual spreadsheets (Excel and Google Sheets) I’ve built in the past for freelance clients and friends. The main concept I’m trying to convey is that each of these spreadsheets look and feel more like an application rather than a model that forecasts out certain values. Each of these examples consists three core elements:

  1. Database – A place to store information
  2. User Input – Fields and forms for someone to fill out
  3. Calculations/Display – Formulas (e.g. “business logic”) to make the spreadsheet output something for you (the administrator) or the user

My 2 cents: When you’re building an application in a spreadsheet, you’re extending the original purpose and audience Excel and Google Sheets was meant to serve: financial models for accountants. But this is what makes the spreadsheet so versatile. The fact that an analyst can string together formulas to make a spreadsheet look and feel like an application is what gives the spreadsheet power. This innovation also pushes Microsoft, Google, and other platforms to release new features that give analysts the ability to build tools, not just models.

I’ve written extensively about this subject in the past, so will leave my soliloquy at that. On to the examples

Bachelorette planning Google Sheet

The first example I discuss is this bachelorette party planning Google Sheet I built for a friend. This spreadsheet has been duplicated quite a few times by friends of friends, and all it does is help a to-be bride plan figure out which weekend works best to have a bachelorette party.

The key insight is that the database is everything from column B onwards and row 3 and below. All the availability for each person is captured in each of these cells and there’s some conditional formatting to give the bride a visual indicator to see when a weekend is available.

The user input is the ability for each friend who is shared the Google Sheet to edit the cells. “Yes,” “No,” and “Maybe” are the only inputs that matter for this Google Sheet. Finally, the calculations are in rows 31-33 which tallies up the user inputs for each weekend so the bride can see which weekend is the “most free” for her friends.

There are countless iPhone and Android apps you can download to do this exact same thing, but this spreadsheet just does one thing and one thing well: help brides figure out which weekend to plan a bachelorette party.

Splitting costs with friends

This splitting costs with friends blog post is by far the most popular post on my blog since I published it in 2014 (thanks Google search!). Every day I still get requests to give people edit access to the Google Sheet (please just make a copy of it instead of requesting edit access). Here’s the Google Sheet if you want to make a copy for yourself.

Similar to the previous example, the database is all the items, costs, and who participated in the cost from rows 2 and down. The user input are the cells themselves, but the most important part of the Google Sheet are the 1s and 0s from column C onward. Those 1s and 0s represent whether a friend or family member “participated” in the cost. This allows the spreadsheet to do some basic calculations to figure out who owes what.

Rows 26-28 are the calculations that the trip organizer can see at a glance to see who is owed or who owes money. Again, numerous apps and custom tools you can pay for or download to split costs with friends, and this Google Sheet mimics the features of those apps in a more bare bones way.

Patient intake system

This example shows when the spreadsheet is really extended beyond what it was intended to do. This was for one of my consulting clients who needed a new CRM system for managing new patients at their clinic.

The Excel file basically lets the operations manager at the clinic quickly “move” new patients from one spreadsheet to another using a VBA macro. To mimic the look and feel of an application, I drew these blue and green buttons using the shape feature in Excel and tied a macro to each button. The database consists of patient details, the user input is simply each row of data, and the calculations involve these macros that move data from one spreadsheet to another.

This gets into an important concept that an Excel file or Google Sheet are not that great for: workflows. Since everything is usually calculated in real-time in a spreadsheet, it can be difficult to do a if-this-then-that type of workflow without using a macro or script (see my last post on automating a tedious filling values down task).

“Slides” from Betaworks presentation

The rest of the presentation includes tool and tips for building applications with other no-code tools. Slides are below:

Original talk from Webflow’s No-Code Conference in 2019:

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

  • No other podcasts for this episode given how long this episode is!

The post Dear Analyst #43: Setting up workflows that scale – from spreadsheets to tools & applications appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-43-setting-up-workflows-that-scale-from-spreadsheets-to-tools-applications/feed/ 1 This episode is the audio from a presentation I gave a few weeks ago to members of Betaworks based in NYC. Betaworks is a startup accelerator, co-working space, and community of founders. No-code is a pretty hot topic right now, This episode is the audio from a presentation I gave a few weeks ago to members of Betaworks based in NYC. Betaworks is a startup accelerator, co-working space, and community of founders. No-code is a pretty hot topic right now, and in this presentation I talk about how spreadsheets is one of the first no-code "platforms" and how your spreadsheet skills can be extended to build real tools. The presentation is adapted from a talk I gave last year at Webflow's No-Code Conference. I embedded the "slides" at the bottom of the post, and here is a link to the slides if you want to look on your own.







Summary of presentation



* The skills you've learned in Excel/Google Sheets — include data structuring — translate to building workflows for any part of your business* Thinking beyond spreadsheets as a way to do data analysis or "number crunching"* Any tool that helps automate or solve some workflow at your company can be built with spreadsheets* Why learning spreadsheets can set you up well for learning "no-code" tools



Spreadsheet examples from presentation



During the presentation, I showed actual spreadsheets (Excel and Google Sheets) I've built in the past for freelance clients and friends. The main concept I'm trying to convey is that each of these spreadsheets look and feel more like an application rather than a model that forecasts out certain values. Each of these examples consists three core elements:



* Database - A place to store information* User Input - Fields and forms for someone to fill out* Calculations/Display - Formulas (e.g. "business logic") to make the spreadsheet output something for you (the administrator) or the user



My 2 cents: When you're building an application in a spreadsheet, you're extending the original purpose and audience Excel and Google Sheets was meant to serve: financial models for accountants. But this is what makes the spreadsheet so versatile. The fact that an analyst can string together formulas to make a spreadsheet look and feel like an application is what gives the spreadsheet power. This innovation also pushes Microsoft, Google, and other platforms to release new features that give analysts the ability to build tools, not just models.



I've written extensively about this subject in the past, so will leave my soliloquy at that. On to the examples



Bachelorette planning Google Sheet



The first example I discuss is this bachelorette party planning Google Sheet I built for a friend. This spreadsheet has been duplicated quite a few times by friends of friends, and all it does is help a to-be bride plan figure out which weekend works best to have a bachelorette party.







The key insight is that the database is everything from column B onwards and row 3 and below. All the availability for each person is captured in each of these cells and there's some conditional formatting to give the bride a visual indicator to see when a weekend is available.



The user input is the ability for each friend who is shared the Google Sheet to edit the cells. "Yes," "No," and "Maybe" are the only inputs that matter for this Google Sheet. Finally, the calculations are in rows 31-33 which tallies up the user inputs for each w...]]>
Dear Analyst 43 50:56 49633
Dear Analyst #42: Filling values down into empty cells programmatically with Google Apps Script & VBA tutorial https://www.thekeycuts.com/dear-analyst-filling-values-down-into-empty-cells-programmatically-with-google-apps-script-vba-tutorial/ https://www.thekeycuts.com/dear-analyst-filling-values-down-into-empty-cells-programmatically-with-google-apps-script-vba-tutorial/#respond Mon, 07 Sep 2020 08:45:00 +0000 https://www.thekeycuts.com/?p=49588 SPACs (Special Purpose Acquisition Companies) or “blank check” companies have been in the news recently, so I used some real SPAC data for this episode. Your spreadsheet has empty cells in column A, and these empty cells should be filled with values. Your task is to fill values down up until you find another cell […]

The post Dear Analyst #42: Filling values down into empty cells programmatically with Google Apps Script & VBA tutorial appeared first on .

]]>
SPACs (Special Purpose Acquisition Companies) or “blank check” companies have been in the news recently, so I used some real SPAC data for this episode. Your spreadsheet has empty cells in column A, and these empty cells should be filled with values. Your task is to fill values down up until you find another cell with a value, at which point you need to fill that value down. This episode walks through how to do this programmatically with a script in Google Apps Script (for Google Sheets) and VBA (for Excel). This is the Google Sheet associated with the episode. The Google App Script is here and VBA script is here. See a quick example of what the issue is in the gif below and how the script “fills in” the values for you.


See the video below if you want to jump straight to the tutorial:


Why is this data structure a problem?

You’ve inherited a spreadsheet and the data structure looks like this:

It’s a list of data but there are empty cells in column A. This is usually a category or dimension in your data set that needs to be “filled down” so that the data set is complete. In the Google Sheet, each row represents one person that is associated with a given SPAC, but the SPAC Ticker column is incomplete. You’ll usually get this type of data structure through the following:

  • Data was manually created by someone who didn’t fill down the values in column A since they thought it was a “category”
  • You are working a data set that originally came from a PivotTable but you only have the “values” from the PivotTable, not the PivotTable itself

This data structure is a problem because if you want to do any type of analysis on this data, it will be extremely difficult since you have missing values in column A. Sorting, filtering, and PivotTables are all out of the question if your data set looks like that screenshot.

Solving this with keyboard shortcuts

Totally doable for this Google Sheet. This is what you could do:

All I’m doing above is the following (on PC):

  1. SHIFT+CONTROL+DOWN ARROW – Select all the empty cells from the current cell with a value up until the next cell with a value
  2. SHIFT+UP ARROW – Reduce the selection by one row
  3. CONTROL+D Fill the value from the first cell in the selection down
  4. CONTROL+DOWN ARROW – Skip to the next value that needs to be filled down

The obvious tradeoff here is time vs. human error. Every time I have to do this task on a spreadsheet, I think about whether it was worth filling the values down “manually” using keyboard shortcuts or using a VBA script (in Excel) to do this programatically. It really depends on the number of rows. For the example SPAC Google Sheet, doing this with keyboard shortcuts takes 10 seconds tops. If this spreadsheet was 1,000,000 rows, then we have a problem.

Don’t worry, I got you. Here’s the script you can use to do this programmatically.

Using Google Apps Script in Google Sheets

First off, here’s the script you can use for Google Sheets (gist here). Just 14 lines of code and you’re good to go:

Never used macros or Google Apps Script before? It’s super simply. First go to Tools then Script Editor:

You may be asked to authenticate your Google account so just hit Yes to all those screens. Copy/paste the script into the editor:

Go to File and Save in order to save the script into the Google Apps Script project. Go back to Google Sheets and go to Tools, Macros, and click Import to import the fillValuesDown function into Google Sheets. Now you can use this function as a macro in your Google Sheet:

You can close out the Google Apps Script editor and now click on Tools, Macros, and click on fillValuesDown to run the script on your dataset:

How does the script work?

The script utilizes the Spreadsheet service for Google Apps Script to access the data object for your Google Sheet (more on that below). The script is really only 12 lines long, and does the following in sequential order:

  1. Sets the spreadsheet variable so that we can use the active worksheet you’re on
  2. Sets the currentRange variable to start from A2 to the last row in the table
  3. Two more variables are set: newRange to store the new range of values we want to put into column A, and newFillValue which is kind of like an intermediate variable used in the loop
  4. The script goes through all values in currentRange (including the blank ones) and adds all the correct values to the newRange array
  5. The currentRange is then set equal to newRange to get all the “correct” values into column A

On the backend, the currentRange array looks like this:

[['HZAC'], [], ['FST'], [], [] , []...]

The purpose of newRange is to create a new array that is a complete list of values:

[['HZAC'], ['HZAC'], ['FST'], ['FST'], ['FST'] , ['FST']...]

Recording macros vs. programming Google Sheets

When I first started learning macros, the first thing I did was record my keystrokes and break down what the backend “code” looked like. Here’s what recording a macro looks like:

When you open up the script editor, you’ll see this:

There’s a lot of activate() and getCurrentCell() functions being called. You can then deconstruct all these keystrokes to build a script that accomplishes the task. But here’s the key difference between recording keystrokes versus working with the data object:

You are programming keystrokes instead of the Google Sheets application.

Other advantages of programming the application instead of the keystrokes:

  • Utilizes less compute resources and runs faster
  • Easier to debug
  • Easier to adapt to more scenarios and use cases

In the keystroke world, you are literally telling Google Sheets to select cells, select ranges, and moving the cursor around which doesn’t seem like a big deal. When you are working with hundreds of thousands of rows, this could cause serious performance issues. Since Google Apps Script runs in the cloud, you may not see these performance deficiencies, but you’ll definitely see this in your Excel workbooks.

Speaking of Excel workbooks…

Using the VBA script for Excel

The structure of the VBA script is pretty similar to the Google Apps Script, but it’s just a little different syntax. I’m not going to walk through the tutorial of how to set this up since it’s pretty similar to Google Sheets. In the VBA script, you do end up doing some “cell selection” like in line 8. Most of the script, however, is working with the Excel data object model so the script should run pretty quickly regardless of the size of your Excel file.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting:

The post Dear Analyst #42: Filling values down into empty cells programmatically with Google Apps Script & VBA tutorial appeared first on .

]]>
https://www.thekeycuts.com/dear-analyst-filling-values-down-into-empty-cells-programmatically-with-google-apps-script-vba-tutorial/feed/ 0 SPACs (Special Purpose Acquisition Companies) or "blank check" companies have been in the news recently, so I used some real SPAC data for this episode. Your spreadsheet has empty cells in column A, and these empty cells should be filled with values. SPACs (Special Purpose Acquisition Companies) or "blank check" companies have been in the news recently, so I used some real SPAC data for this episode. Your spreadsheet has empty cells in column A, and these empty cells should be filled with values. Your task is to fill values down up until you find another cell with a value, at which point you need to fill that value down. This episode walks through how to do this programmatically with a script in Google Apps Script (for Google Sheets) and VBA (for Excel). This is the Google Sheet associated with the episode. The Google App Script is here and VBA script is here. See a quick example of what the issue is in the gif below and how the script "fills in" the values for you.











See the video below if you want to jump straight to the tutorial:




https://www.youtube.com/watch?v=t-32QkyjKVE&feature=youtu.be








Why is this data structure a problem?



You've inherited a spreadsheet and the data structure looks like this:







It's a list of data but there are empty cells in column A. This is usually a category or dimension in your data set that needs to be "filled down" so that the data set is complete. In the Google Sheet, each row represents one person that is associated with a given SPAC, but the SPAC Ticker column is incomplete. You'll usually get this type of data structure through the following:



* Data was manually created by someone who didn't fill down the values in column A since they thought it was a "category" * You are working a data set that originally came from a PivotTable but you only have the "values" from the PivotTable, not the PivotTable itself



This data structure is a problem because if you want to do any type of analysis on this data, it will be extremely difficult since you have missing values in column A. Sorting, filtering, and PivotTables are all out of the question if your data set looks like that screenshot.



Solving this with keyboard shortcuts



Totally doable for this Google Sheet. This is what you could do:







All I'm doing above is the following (on PC):



* SHIFT+CONTROL+DOWN ARROW - Select all the empty cells from the current cell with a value up until the next cell with a value* SHIFT+UP ARROW - Reduce the selection by one row* CONTROL+D - Fill the value from the first cell in the selection down* CONTROL+DOWN ARROW - Skip to the next value that needs to be filled down



The obvious tradeoff here is time vs. human error. Every time I have to do this task on a spreadsheet, I think about whether it was worth filling the values down "manually" using keyboard shortcuts or using a VBA script (in Excel) to do this programatically. It really depends on the number of rows. For the example SPAC Google Sheet, doing this with keyboard shortcuts takes 10 seconds tops. If this spreadsheet was 1,000,000 rows, then we have a problem.



Don't worry, I got you. Here's the script you can use to do this programmatically.







Using Google Apps Script in Google Sheets



First off, here's the script you can use for Google Sheets (gist 49588