Dear Analyst #48: Working with numbers formatted as text with avocado sales

For spreadsheet newbies, number formatting may seem like a pretty innocuous matter. As you become more familiar with Excel or Google Sheets, you’ll find that improper number formats will lead to formulas that don’t output what you expect or formulas that straight up don’t work. This episode explores what happens when you unknowingly have numbers formatted as text. I also explore the various ways you can try to debug the errors that come from numbers formatted as text. You can copy the Google Sheet here for this episode.

AutoSums and avocados

The topic for this episode came from a question about AutoSum in the Microsoft Excel community forum which has close to 40,000 views and 25 replies (some of them are quite spicy, I might add). I’ll be referencing this thread quite a bit in this episode since some of the best ideas come from–you guessed it–the comments. Props to Excel MVP Sergei Baklan for jumping into this thread and trying to help answer a somewhat ambiguous question.

No one: Spending Friday nights perusing the Microsoft Excel forums for interesting questions 🙂

The data set for this episode is a fun one: avocado sales across different cities (learn more about the dataset on Kaggle here). The prologue for this data set is amazing and will ring true for all millennials out there:

It is a well known fact that Millenials LOVE Avocado Toast. It’s also a well known fact that all Millenials live in their parents basements. Clearly, they aren’t buying homes because they are buying too much Avocado Toast! But maybe there’s hope… if a Millenial could find a city with cheap avocados, they could live out the Millenial American Dream.

-Justin Kiggins, Product Manager, Chan Zuckerberg Initiative

One can only make an episode about formatting numbers in Excel/Google Sheets so interesting, so this was my best attempt. And we go on!

Numbers formatted as text mess up formulas

The first thing you’ll notice with numbers formatted as text is that they will mess up formulas by giving you an output you would not expect. For instance, in our avocado dataset, cell C53 is simply a sum of all the “numbers” in column C, but the result of the SUM formula is 0:

How is this possible? If you click on the column C header and go to Format->Number, you’ll notice that all the cells in this column are formatted as “Plain text.”

This means any numbers you type into the cells in this column will be treated as text. Since these cells aren’t formatted as numbers, Google Sheets doesn’t know how to treat these values which means the SUM formula is trying to sum up a bunch of text values. In Excel, this is similar to the cells being formatted as “Text”:

Detecting cells that are formatted as text

If you are really diligent, you can use the ISTEXT formula to test whether a cell is indeed formatted as text. However, when you inherit an Excel model, you are just hoping that the previous modeler did things correctly and you can just plug in numbers and move on with your life. How can you figure out if some cells are improperly formatted before pulling your hair out?

Unfortunately, no answer exists. If you know of one, please comment below. For new spreadsheet users, this is probably the most frustrating issue to experience. As you pick up more experience and start to debug your spreadsheets, seeing weird outputs with your formulas should lead you to consider that your cells are improperly formatted (assuming the formula is built correctly).

In this example, we have a visual indicator letting us know that the numbers in the Total Volume column look kind of fishy. All these “numbers” are left-aligned while the rest of the spreadsheet’s numbers are right-aligned:

However, if you auto-fit the columns and remove the margins on the column headers and cells, that alignment becomes less apparent:

If you didn’t notice that the “numbers” in column C are left-aligned, you’d be stuck wondering why your SUM formula doesn’t work. A best practice when seeing numbers that don’t have commas in the thousandths and with decimals you don’t need is to simply fix that in formatting menu. As you make this fix, you’ll see that the formatting is indeed “Plain text” and shifting to the proper number format (or to “Automatic” in Google Sheets) will make the formula work again:

Extra cell formatting issues in Excel with formulas

When you have the “Automatic” format in Google Sheets or “General” format in Excel, Sheets and Excel will try their best to turn whatever values you enter into the cells into the correct format you need for data analysis and reporting. This doesn’t always work as intended, and can have some drastic consequences. I talk about the human gene naming issues in episode 40 and how the HUGO Gene Nomenclature Committee had to rename genes just so researchers can stop dealing with names being coerced into date formats.

In Excel, if you have a cell formatted as text and you try to enter a formula into the cell, you’ll get into a world of hurt. Are you are typing the formula, it looks like everything is working the way it should since you’re able to reference cells like you normally would, but the formula gets entered as text and Excel doesn’t coerce the text into the formula you want:

Only after you change the format of that cell to “General” will the formula actually calculate. Actually, even after you change the cell format to “General,” you have to go back into edit cell formula mode and commit the formula again by pressing ENTER:

As you can see, there are nuances to changing the format of cells from text to numbers. Luckily in Google Sheets, when you type a formula in a cell that is formatted as “Plain text,” the formula still calculates but the cell format still remains as “Plain text.”

Knowing how to use your tools

The Excel forum thread is full of other commenters saying AutoSum doesn’t work for reasons orthogonal to cell formatting:

  • A commenter forgets to put the cell references in the SUM formula and happens to be using international formatting for decimals (comma) bringing more confusion to number formatting
  • One accountant discusses inheriting a .XLSX file but is using Office 365, and says the formatting options in Office 365 are different leading to questions about backwards compatibility
  • One commenter talks about merged cells not working with AutoSum, which leads to a discussion about merged cells vs. center across selection (you should be doing the latter)
  • A commenter discusses his strategy: copy the numbers from Excel, do a Paste Special into word and select “Unformatted text,” and then copy/paste the numbers back into Excel

That last technique seems like the most ridiculous workaround but I’d have to admit I’ve tried it before myself to strip all formatting from numbers and words. The fact that there are all these stumbling blocks with the simple AutoSum function speaks to the bigger challenge of learning Excel:

Knowing how to use your tools correctly involves a lot of trial and error.

The right way to learn how to use spreadsheets and get over these simple formatting blunders is through real-world experience. And a lot of Googling :). This thread has close to 40,000 views from folks struggling to figure out why their AutoSumming isn’t working. It’s both enlightening (and disappointing) that the answer involves formatting your numbers as, well, numbers.

Other Podcasts & Blog Posts

In the 2nd half of the episode, I talk about some episodes and blogs from other people I found interesting: