Dear Analyst #51: Who is the real audience for custom data types in Excel?

To much fanfare, custom data types in Excel were released late last month (October 2020). This feature started off as a way to see “rich data” relating to stocks and geography in a cell, and now Microsoft is letting you define your own data types. Perhaps you want to see all attributes for a customer such as the location, region, and account rep in a cell without seeing these columns in your worksheet (see screenshot below). In this episode, I want to dig deeper into how this feature fits into existing workflows. More importantly, I want to know who is the audience for Excel custom data types?

Source: The Verge

Redefining the data engineer role

I can’t tell if data engineer who are working in SQL are rejoicing or rolling their eyes at this new feature. One one hand, a data engineer might be happy because if a business user in their organization needs to see additional data about a customer, they don’t need to contact the engineer to add this to the SQL database. In theory, the business user can add this column to the “Customers” data type in Power Query and that column is now available for anyone to use.

On the other hand, perhaps the data engineer needs to learn how to use Power Query and Power BI now because no one in the organization knows how to do the right types of joins in Power Query. Granted, setting up joins and cleaning up data in Power Query is probably easier than writing an Airflow DAG. If you’re an analyst who is excited about using this feature, perhaps you’ll need to up-level your skills to become a proficient user of Power Query first and learn how to use joins:

Who maintains these data types and who uses them?

My skepticism for this feature started as I saw people doing tutorials on how this feature works. The end product is quite tantalizing. But there are two separate audiences Microsoft needs to convince to use this feature: the admins (most likely the data engineers) and the business user (anyone with “analyst” in their job titles).

Microsoft has made it clear that this feature is still in development and the feature has only rolled out to a subset of Office 365 subscribers. Microsoft most likely released this feature because the most ardent users of the stocks and geography data types probably sent feedback saying they want to create their own data types. If you are one of those users, I’m super curious about what your use case is and how you are using the feature. I see two camps of people:

The admin

If your organization deploys this feature, undoubtably there will be someone who has to maintain all the data types. Here is the workflow for how the admin might maintain a “Product” data type:

  1. Set up the data connection with a database table that contains all your company’s products
  2. Do some data cleanup in Power Query to ensure the right columns show up in the data type
  3. Merge with other tables as necessary to get all properties a business user may want to see
  4. When a new property is needed, decide whether this change should be made in the underlying database or done through another merge in Power Query
  5. Set up a cross-functional meeting so other business users know that a change may be coming to the organization’s data types
  6. Start at step 2 again and rinse and repeat

Maybe I have some of these steps wrong, but it doesn’t feel that much different from what a data engineer or DBA is doing today. In fact, custom data types might complicate the workflow because instead of exporting results to a CSV from a database, business users now require the data moving through Power Query and Power BI to be just as accurate as the data stored in the database.

Source: devRant

The business user

My hunch is that Microsoft wants the business user to be the real audience of this feature. All the articles and videos point to how awesome this will be for all you analysts out there.

I think the big caveat will be this: while you, the business analyst is the end user, you’ll have to learn how to use Power Query and Power BI for this feature to have a meaningful impact on your workflows. This is great for Microsoft because they have more users of Power Query/BI and maybe some more $$ for additional licenses.

In terms of data visualization and data cleansing, Power BI competes with tools like Tableau, Mode, and Looker. Even OpenRefine might be a competitor in terms of data cleaning and transformation (I’m a big fan of OpenRefine since it’s free and plan on doing an episode about it in the future). Remember Microsoft Access? Still a trusty database that won’t die, but Power Query is obviously a step up. Needless to say, it behooves Microsoft to get more users onto Power Query/BI.

This is what you’re workflow might look like with custom data types:

  1. Go to data engineer or admin who is managing data types for your organization and explain analysis you need to do
  2. Have data engineer create a view in Power Query that merges all the tables you need access to in the data type
  3. During analysis, you realized you need an extra property.
  4. In the interest of time, you pull the data separately from another Excel file rather than getting the property added to the “official” data type

As custom data types become more widely used in your organization, there will be more controls in place to ensure changes to data types are tracked and properly communicated out. This means editing a custom data type may be just as laborious as adding new columns to a table in a SQL database.

Custom data types versus traditional lookups

I know I may simply be averse to change which is leading to my skepticism for this feature. Here is the current workflow I believe most analysts utilize when pulling data:

  1. A SQL query runs in Snowflake, Mode, Looker, or some other public cloud database
  2. You combine your lookup tables into various worksheets in a workbook
  3. To “join” data, you do some VLOOKUPs or INDEX/MATCHs in an intermediate worksheet
  4. A separate worksheet contains your model, report, or one-off analysis
  5. If report needs to be updated weekly/monthly, you run the SQL query again and paste the new data in one of the original lookup worksheets
  6. (Optional) You have a cron job that outputs a CSV into a Sharepoint folder on a recurring basis and you have a PivotTable that points to that CSV which you refresh every week/month to get the latest data

While this workflow does feel a bit manual and stitched together and may be subject to human error, it’s fast and requires little overhead from the business user.

The benefits of custom data types is that all these extra lookup columns are kind of “hidden” inside the cell. You can get these extra properties by doing dot notation in the formula. For instance, if cell A1 contained your “Product” custom data type, you could get the Category property by writing a formula like this:

=A2.Category

As you start working with tables of data, your formulas will get more complicated as you refer to table names and custom data types. For instance, this formula below would reference the Category property of the Product custom data type in the Products table:

=Products[Product].Category

This syntax may feel foreign to some Excel users. As you start converting your lists of data to tables and work with property names instead of cell references, it will be important for you to learn this syntax.

Are lookups really that bad?

I don’t think so. In my opinion, the main tradeoff with using custom data types is this:

Less columns cluttering your spreadsheet with the added complexity of learning Power Query.

Should you learn Power Query? I think so. Learning Power Query will only make you more knowledgeable about your organization’s data sources and data pipelines. It will also give you insight into how your backend databases are structured.

The issue is that business users shouldn’t have to learn Power Query for the sake of having custom data types. Again, this may be my aversion to change, but I like to see all the data that I’m doing lookups to in my workbook. With custom data types, you have to jump into Power Query, select which columns you want to include in the data type, and then close and load back into Excel. Having to navigate another interface just to save some space in my worksheet doesn’t feel necessary (for most of my use cases).

While the hype and marketing around custom data types is targeted at the business user, I think this feature is really a Power Query/BI feature since most of the work will be done in Power Query versus Excel. Here is a description from the Microsoft Excel forums on when you’re organization might want to use custom data types:

Lookup-style tables that are commonly used in your organization such as product masters, customer lists, facilities, supplier lists, or asset tables are good examples of what you can now share through Excel data types.

If you already have a workflow that uses “lookup-style tables” in your worksheets, is it really necessary to move these tables to Power Query so that they can be loaded back into Excel as a custom data type? I’d really like to hear from people who have the need for this workflow. I’ve only seen YouTube examples of custom data types in action, so perhaps I just haven’t seen the light yet. Hence my skepticism around who the target audience is for this feature.

What’s next for custom data types

The feature is still in development, and the Excel community has already expressed feature requests they would like to see in the product. Some features that are table stakes in my opinion:

  • When you add a column and do a VLOOKUP on a custom data type column, that new value you bring in via the lookup currently doesn’t get added to the data type. These lookup columns need to get added to the data type.
  • You can’t select a table of data in Excel and simply “convert” that table to a custom data type. No matter what you have to go into Power Query to set up the data type. Allowing someone to create a data type straight from Excel is like creating a named range that anyone in your organization can use and reference.

The interesting thing about these two features is that it just gives the business user more flexibility to create and manage custom data types. This adds risk to the organization, however. If any business user can change the definition of a custom data type and that data type is shared across the organization, then this will affect other analysts who rely on that custom data type. You can see why your organization would need an admin to put controls in place for who can edit these data types.

Having a data engineer or admin manage your organization’s use of Power Query and Power BI is not out of line with what Microsoft wants. If your organization is already Azure SQL Databases or even Synapse, custom data types might feel like a natural feature to give to your business users (especially if you are already using Power Query and Power BI).

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting: