Dear Analyst #54: 5 lessons learned in 2020 and 5 skills for data analysts to learn in 2021

With the last episode for 2020, I wanted to take a look back and pull out some of the main themes and topics from the podcast. As people rushed to pick up new data skills to adapt to our changing environment this year, I think the precedent has been set for how one can learn almost anything from home. You may have picked up Excel and data analysis skills this year. What are some skills you should think about for 2021? This episode provides five skills for data analysts to consider adding to their toolbet in 2021.

First, a look back at some of the main topics from 2020:

1. Excel errors will still happen

Aside from the latest features Microsoft or Google releases for Excel and Google Sheets, the only spreadsheet news that makes it to mainstream media are mistakes that lead to a financial loss.

It all started with episode 38 where I spoke about the JPMorgan Chase trader that caused a $6.2B loss. Given the popularity of this episode, I did a few follow up episodes including episode 40 on the two Harvard professors who created a report that may have led to incorrect austerity policies. There was also episode 49 all about how Tesla might have underpaid by $400M for the acquisition of SolarCity. Many of these stories can be found on the EuSpRIG website (which I covered in episode 47).

These stories go back to the early 90s. After 30 years, I expect these stories to continue making headlines. When analysts are lazy, put under time constraints, and are hastily putting together spreadsheets, these errors will undoubtably occur.

2. Excel’s custom data types, what’s the hype?

Excel’s custom data types came onto the scene a few months ago. The Excel community raved about the new feature, but I think the jury is still out on real use cases and audience for this feature. Touted as going “beyond text and numbers,” the features is still in active development and is still missing some crucial features. I did deeper into this feature in episode 51. A more recent feature announcement called LAMBDA may give analysts the ability to extend Excel beyond what it’s meant to do. Just like the custom data types feature, we’ll wait to see how this feature actually gets used in the workplace.

It’s rare to see new features for Excel completely change the way analysts do their work. This video from MAKRO makes a good point about new features kind of dumbing down the existing features that power users have come to love (and master).

3. Google Apps Script for the win

This year was all about Google Apps Script for me. From doing some heavy VBA scripting during my financial analyst days, writing some Google Apps Scripts reminded me of how powerful scripts can be when you have a specific problem to solve that native features in Google Sheets cannot solve.

Specifically, I wanted to sync data from Google Sheets into Coda and vice versa. I detailed my experience writing this Google Apps Script to sync data in episode 31 along with some use cases I think this script unlocks for the workplace. Give the robustness of the Google Sheets API and the fact that the scripting language feels like Javascript, I think it’s a relatively simple platform to pick up for data analysts. Another Google Apps Script I talked about was in episode 42 where I showed how a script can fill values down (also show the VBA script as well).

While scripting is not usually a skill analysts may think of when it comes to creating models and analyses, it is an invaluable tool for building workflows.

4. Filling formulas down

This next theme is more of a surprising one for me. Filling formulas down a column seems like a pretty mundane operation in a spreadsheet. In episode 28, I discuss four methods for filling formulas down to the last row with data. Since publishing this episode/blog post, this blog post continues to be one of the top most visited posts on the blog.

My theory is that analysts who become better at spreadsheets will start running into the edge cases of what the spreadsheet can do. This results in searching Google for very specific questions that allow them to perform some operation in the spreadsheet faster. While filling formulas down is not that difficult, doing it in the context of not “overshooting” your last row of data is not something that’s built natively into Excel or Google Sheets. Sometimes the most mundane things in a spreadsheet (and life) can have an outsize impact on your overall happiness with the tool.

5. Learning in public with Shawn Wang

The final highlight for 2020 would have to be episode 50 where I interviewed Shawn Wang about his 4,000-line VBA script for trading derivatives. So far this is the only episode where I had a guest on the podcast (but I plan on doing more of these in 2021).

While the VBA script was interesting to walk through, the more interesting part of this episode was Shawn’s journey from Excel to python and eventually to Javascript. While many analysts become masters of the spreadsheet, many more opportunities for those analysts who go on to web development, data science, and other related fields. Shawn’s “learn in public” attitude is also something I see more of in the developer tools community as people tinker with new web frameworks. Through this active learning and building in public, you get the interesting cross-pollination of ideas and innovation that is akin to academics who write and publish papers to their peers.

Whatever your thing is, make the thing you wish you had found when you were learning. Don’t judge your results by “claps” or retweets or stars or upvotes – just talk to yourself from 3 months ago. I keep an almost-daily dev blog written for no one else but me. Guess what? It’s not about reaching as many people as possible with your content. If you can do that, great, remember me when you’re famous. But chances are that by far the biggest beneficiary of you trying to help past you is future you. If others benefit, that’s icing.

-Shawn Wang

Now onto five skills and tips for data analysts to think about acquiring in 2021:

1. Master the tools of your craft: Excel, Google Sheets, SQL

Most of you are already proficient in Excel or Google Sheets, but I would highly recommend getting proficient in SQL as well. Knowing how to pull your data from whatever database technology you use will prevent you from having to ask your engineering or data science counterparts from having to write queries for you. This also gives you an additional skill to add to your toolbet.

Database platforms and the data visualization companies built on top of them (Mode, Looker, Tableau) are becoming more accessible to non-engineers (at least that’s how these platforms are marketing themselves). This means that anyone has the ability to query the databases as long as they have access to the databases. This gives you agency and control over your own data workflow.

Just knowing the basics of these tools in 2021 is a given, but I would challenge you to go deeper so that you have to search Google for things like how to fill formulas down since you are getting deep into your tools. I wrote a bit about mastering your tools earlier this year on my Coda profile which might be relevant for this tip.

2. The art of data storytelling

You may not think your job is to tell stories, but this is probably the most important skill to learn for 2021 since it directly impacts the rest of your organization.

In addition to building PivotTables, setting up dashboards, and creating a scalable reporting system, the best data analysts are able to summarize their findings that lead to inspiration. The New York Times actually holds an internal data bootcamp to teach their reporters and journalists the fundamentals of data analytics (I talk about this in episode 16). Their reporters are already amazing storytellers. Equipping them with the ability to tell data-driven stories only makes their stories more compelling and interesting to read.

Source: New York Times

3. Think like a coder

In my opinion, the lines between data analysts, data engineers, and software engineers will continue to blur. This is happening because the tools that we use continue to democratize who has the access to build on these tools.

If you are used to just modeling or building out your reports in spreadsheets or some data visualization platform, I encourage you to “think like a coder.” What that means is asking yourself the question: “What would happen to my model/analysis/tool if someone else had to run it 100 times? would it break?

When your spreadsheet needs to be used by your colleagues in a repeatable fashion, you’ll start thinking about all the edge cases where the tool might break. You’ll consider adding in error checks, abstract certain parts of the tool behind dropdowns or checkboxes. All these prevent the end user from inputting something they shouldn’t. Extending this analogy further, your turning your spreadsheet into something that could be used by millions of people for a long time. This is just like software we use today that is constantly stress-tested for bugs and errors.

4. Understand the full data pipeline

Similar to the previous tip, data analysts should understand the full end-to-end workflow for how their models and reports are created. I sometimes show the following image during my data analysis trainings:

When I was an analyst, I was mostly concerned with the “presentation” or “data warehouse” portion of this diagram. This means building reports in spreadsheets or perhaps Tableau. Given that all parts of the pipeline have moved to the cloud, anyone within your organization can have access to the ingestion tools, data prep tools, and database tools (assuming you have security access).

The reason you need to understand the full data pipeline is to increase your efficiency with pulling data and knowing the data provenance for your analysis. If your company has questions about how your data was collected, cleaned, or transformed, you can say more than just “it came from this database.” This will help specifically with answering questions about data discrepancies between different reporting systems your organization uses. If you have control over your data pipeline, this means you can setup your data structures for long-term success (see episode 45).

5. Always be curious and ask questions

This skill doesn’t just apply to 2021. I want to end on this skill because it’s something I’ve carried with me throughout the years and has done well for me from two perspectives:

  1. Finding the key drivers and trends behind a model or analysis
  2. Learn new tools and platforms outside of spreadsheets

This kind of goes back to the previous tip of understanding the full data pipeline. If you aren’t curious about your data pipeline or data lineage, you won’t find the need to learn how to use AWS’ S3 or Lambda function features. By asking questions about the data pipeline, you’ll naturally start learning how your systems are glued together.

From a soft skills perspective, I spoke about the skills a data analyst should have in episode 5 where I reviewed a blog post by Mode CEO Derek Steer. In the blog post, one of the skills Derek talks about is asking good follow-up questions. It’s almost like you’re a detective or reporter trying to find the true culprit for a crime even though the data might convince you that this other thing is actually the cause of the problem. My former manager once told me that a good analyst “always asks questions until they get to the truth.” So that’s what you do for 2021: find the truth.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting: