The post Convert Text to Columns With Multiple Lines appeared first on .

]]>This is probably one of the most common use cases of Text-to-Columns I’ve seen. You have text in the format “Last Name, First Name” and you want to split this into two columns with one column being the First Name and the next column being the Last Name. What happens when you have multiple text in a cell entered in as new lines like this?

This is all in one cell and each text is separated by a new line. In this case, it looks like this was a database dump and all the text is put into one cell and our job is to put each value into a new column. The problem is, there is no delimiter! You could add a comma after each value but that would take forever if you had a cell with say 50 lines of text. How do we solve this?

The simple answer is using the function SUBSTITUTE() . It doesn’t seem intuitive, but what we need to do is format the cell with multiple lines of text so that it’s easy for the Text-to-Columns operation to work. We basically want the text in this cell to look like this:

Notice the commas after every value? Once the text in the cell looks like this, then we are ready to use the Text-to-Columns button to split the text up by the commas that separates each value. The key with using the SUBSTITUTE() function is we want to replace each new line with a comma. The ASCII character code for a new line break is 10 for PCs and 13 for Macs. In Excel, you can use the CHAR() function to represent different ASCII codes so we can do CHAR(10) to represent a line break. So in a a cell next to the cell with all your text, you can write the following formula to replace all the line breaks with a comma:

=SUBSTITUTE(B2,CHAR(10),",")

Let’s discuss what this formula does. All the SUBSTITUTE() function does is replace a character or characters in a text with another character. In this formula, the cell with the data in multiple lines is B2. The second value in the SUBSTITUTE() function is the actual text we are trying to find in the cell. In this case, the “text” we are trying to find is a new line break which is represented by CHAR(10) which we just discussed. Finally, we want to replace all occurrences of the new line break with a comma which is the last value in the SUBSTITUTE() function. If you apply this formula to this text, the result will look like this:

You’ll notice all our text goes onto one line with each value separated by a comma. This is exactly what we want because now we can use the Text-to-Columns operation to split this long text into columns.

If you try to use the Text-to-Columns operation on the cell where you have the SUBSTITUTE() function you’ll notice you’ll get this in the dialog box:

This isn’t correct because we don’t want to do text to columns on the text of the SUBSTITUTE() function, but rather on the *resulting text* of the function. All you would need to do at this point is to a Copy and Paste Special Values so that we get just the *values* from the function rather then the formula. After you do the Copy and Paste Special as Values, make sure you selected “Delimited” in Step 1 of the Text-to-Columns dialog box. Then you want to select Commas in Step 2:

You’ll see a preview of what the data will look like and the result looks exactly like what we want: the text (separated by commas) is split into multiple columns. The result in Excel should look like this:

There are multiple ways of solving problems in Excel, and this example shows how you can use different text hacks to get the result you want. This exercise was actually something asked of me in a workshop I gave a few weeks ago, and I didn’t know the answer until after I found that new line breaks are represented by CHAR(10) in Excel. Once I figured this out, I knew using the SUBSTITUTE() function, Paste Special as Values, and the Text-to-Columns would solve the problem of getting the source text into new columns.

Subscribe: Apple Podcasts | Android | Google Podcasts | Stitcher | TuneIn | Spotify | RSS

The post Convert Text to Columns With Multiple Lines appeared first on .

]]>The post Calculate Average Trends in Excel appeared first on .

]]>Based on the data, here are the objectives of the exercise:

- Calculate each country’s yearly change trend, by taking the average of its year-over-year difference
- Project each country’s 2011 value, based on its 2010 value and its average trend value

When I think of “yearly change trend,” I think of the percentage change in value. For instance, the yearly change trend from 2006 to 2007 is 15.9% (5.29/6.29 – 1). So now we know the year-over-year difference for 2006-2007, we should find the same changes for 2007-2008, 2008-2009, etc.

In cell G15, the formula I entered was =AVERAGE((C15/B15-1),(D15/C15-1),(E15/D15-1),(F15/E15-1)) . We are simply taking the average of each year-over-year change. This comes out to 9.1% for Afghanistan, which translates to Afghanistan’s tax revenue as a % of GDP increased 9.1% from 2006 to 2007. Now if we want to project this to 2011 based on the trend we just found, the formula in H15 would simply be =F15*(1+G15) which comes out to 9.07 as the tax revenue for Afghanistan (as a % of GDP). Here is what the table looks like once all the values are filled out:

Another way to think about this instead of yearly change as a % is the absolute value change for the tax revenue. Now our formula in G15 looks like this: =AVERAGE((C15-B15),(D15-C15),(E15-D15),(F15-E15)) which equals 0.51 for Afghanistan in cell G15. This value represents the absolute change from year to year in tax revenue as a % of GDP instead of the percentage change. Is this correct?

Since these differences are values in tax revenue *as a percent of GDP*, a 2% change in value is different from a 2% *increase* in the tax revenue. Therefore, to compare apples to apples, you must take the percentage change year-over-year in order to accurately find a benchmark across all countries. Consequently, by applying the average of the differences to 2010’s number to find the 2011 projected tax revenue, you are not accurately applying a growth factor to the 2010 number. Then, when you are comparing projections across the entire list of countries, the benchmarks are all different since you are not applying a *percentage *growth factor to 2010’s number but rather a *value* growth factor. This is what the *incorrect *data looks like:

The post Calculate Average Trends in Excel appeared first on .

]]>The post Excel Formula Final Question #4 from Modeloff 2013 Released with Explanation (4 of 4) appeared first on .

]]>Challenge number 4 was the most difficult challenge at Modeloff 2013. So hard, in fact, none of the contestants were able to solve the challenge in the allotted time. The teams were furiously experimenting with different formulas but as the clock started ticking down, a few simply gave up with exasperated looks on their faces; fingers resting on the keyboard instead of tirelessly typing new formulas to solve the challenge. If you can solve this challenge, you are in the top .1% of modelers out there, and you should definitely pre-register for Modeloff 2014! If you want to take a look at the first 3 challenges from 2013 Modeloff series, here are the questions along with the explanations:

As always, thanks to Dan Mayoh, the creator of all these challenges for providing us with the answers. Dan’s consulting practice is online now, so go check out Fintega Pty Ltd for more info. Now, onto the challenge!

Take a look at the first post in this series to familiarize yourself with the rules of the challenge. No VBA, Defined Names, or any other Excel black magic. Your formula must be entered into the grid of blue cells below:

There is also a 5 X 2 table of dates and duration periods that will help you with writing the formula:

Don’t forget! There is also a table of answers in the Excel file (download) that will turn green if the formula you entered into the blue answer cells is correct:

Write a formula that counts the number of cumulative flags to date that have been “set off” for each period end date (above the blue answer cells) relative to the first flag date. For instance, if the period end date is December 31st, 2014, and the first flag date is December 31st, 2013, and the duration of the flag is 1 year, the answer for the number of flags set off is 2. The answer is 2 because December 31st, 2013 is in the past (relative to the period end date), and a 1 year has passed since December 31st, 2013, so the flag was set off again on December 31st, 2014.

As another example, consider the period end date to be June 30th, 2015. The first flag date is June 30th, 2014, but the duration is 0.5 years. This means every 6 months the flag gets set off. The answer in this scenario is 3 since the flag was set off on June 30th, 2014, December 29th, 2014, and finally June 30th, 2014. In cases where the period end date is BEFORE the first flag date, the answer would simply be 0 since the flag would never be set off in the first place.

Ready to get your ass kicked? Or become a legendary financial modeler? Download the Excel file here that the contestants were given and try to solve the challenge yourself! The goal is to find the formula that satisfies the conditions of the challenge using the least number of characters possible in the formula.

Knowing that none of the contestants were able to solve this challenge, I was easily discouraged from trying to spend too much time banging my head to figure out the solution. I did take a few stabs at writing the formula as I knew it would involve dividing the difference between the period end date and the first flag date by 365. For instance, in cell O19 I started with this:

=(O$19-$E19)/365

You need to add the “$” to the row in O19 and to the column in E19 so that when you copy the formula across the blue cells, the reference to the period end date and first flag dates to not move around in the formula and stay fixed. I also knew that the formula would probably involve dividing the number of days by the duration cell, so the formula would look something like this in O19:

=(O$19-$E19)/365/$F19

This is pretty much where I got stuck as there are a few things you need to account for in the formula which remained unanswered to me:

- How do you account for the edge cases where the flag needs to increment by 1 given that subtracting the period end date from the flag date and then dividing by 365 still results in 1?
- How do you ensure that the answer is 0 for the period end dates that are before the flag date (columns H-J)?
- Piggy backing off of that last question, do we use the MIN() function somewhere to make sure we get the right answer for when the period end date is before the flag date?

There are still many unknowns, and the main problem I had was handling the edge cases (as is the case with most of the challenges in this series). Just finding a solution that works is difficult enough, forget trying to find the formula with the least number of characters.

The final solution contains 34 characters and only uses the INT() and MIN() functions:

=-INT(MIN($E19-H$17-9,1)/365/$F19)

This formula is entered into cell H19 (not array-entered) and copied across and down to show all the flags in the blue answer cells:

I was surprised myself that the final formula only utilized the INT() and MIN() functions, but through the clever use of these two functions we are able to see the edge cases handled elegantly.

Let’s take a look at each segment of the final formula to see how this formula is able to handle all the edge cases. Let’s take the formula in cell Q19 as an example:

=-INT(MIN($E19-Q$17-9,1)/365/$F19)

$E19 equals December 31st, 2013 and Q$17 equals the period end date, June 30th, 2014. Let’s forget the “-9” for now and see what happens when we just evaluate =MIN($E19-Q$17,1). The result is -546, which is essentially the difference in days between December 31st, 2013 and June 30th, 2015. The reason why MIN() evaluates to -546 is because this function takes in the two arguments, -546 and 1 and returns the lesser of the two arguments. Now let’s divide by 365 and we see the answer is is -1, or -1.50 if you show two decimal places.

This is starting to make sense now. The difference between December 31st, 2013 and June 30th, 2015 is indeed 1.50 years. If we divide the -1.50 by the duration, we will still get an answer of -1.5 since the flag duration is 1 year. Now let’s add in the INT() function and see how it affects the formula:

=INT(MIN($E19-Q$17-9,1)/365/$F19)

The result is -2, so we are definitely on the right track here. Why does =INT(-1.5) evaluate to -2? You usually use the INT() function to get the integer of a decimal number. For instance, the INT() function would return 5 for the number 5.25. However, for *negative *numbers, the INT() function returns the first negative number that is less than or equal to the expression. You would think that =INT(-1.5) would result in -1 (the integer of -1.5) but the closest integer to -1.5 that is less than -1.5 is -2. Even if the expression is -1.001, =INT(-1.001) still evaluates to -2.

The final solution has a “-” in front of it so this would turn the -2 to a 2 which is the correct value for cell Q19.

This leads us to the question about the “-9” which we omitted in our explanation above. It looks like this formula yields the correct answer for us:

=-INT(MIN($E19-Q$17,1)/365/$F19)

If you copy this formula across the rest of the blue answer cells, however, you’ll start to notice some errors in the table of answers. The reason is because the -9 helps us with edge cases when the period end date is the same as the flag date or one or two years ahead of the flag date. For instance, let’s look at the formula in cell Q20 *without* the -9:

=-INT(MIN($E20-Q$17,1)/365/$F20)

The result is 2 but this is incorrect since the actual answer is 3. The flag date is June 30th, 2014 and the period end date is June 30th, 2015, exactly one year ahead of the flag date. The duration is only 0.50 years in this case. This means that the flag has been set off 3 times: once on June 30th, 2014, again on December 31st, 2014, and one final time on June 30th, 2015. The issue with the formula above is that $E20-Q$17 results in -365. When we take -365 and divide it by 365 and then by the duration, 0.50, we will get the -2 answer which we currently see. What we need is a number to decrease the -365 so that when we divide by 365 the answer is not exactly -1, but rather a little less than -1. Then, when we incorporate the INT() function with the given expression, the result will will be the lesser negative number, or -3 in this case.

Let’s add in the -9 to the formula and see what happens:

=MIN($E20-Q$17-9,1)/365/$F20)

The result is -2.05, which is slightly less than -2 but makes a world of difference in terms of what the INT() function evaluates the expression to. The lesser number of -2.05 is -3, and when we multiply this number with the minus sign at the beginning of the formula, we will get the 3 that we need in cell Q20. There is no specific reason why the solution uses a “-9,” we could also use “-1” and the answers in the blue cells will be correct. The important concept here is to decrease the negative result of the the flag date minus the period end date by a little bit for the INT() function to evaluate the expression correctly.

This challenge shows how a formula may appear to work, but once you analyze the edge cases, you’ll begin to see how the formula breaks down and you need to incorporate other functions or elements to make the formula work. The INT() function wouldn’t even be incorporate into the formula if it wasn’t for the “-9” since the purpose of the INT() is to find the lesser integer from the expression. If we assumed the formula worked without the “-9,” the resulting formula would’ve been a lot shorter and wouldn’t have caused so much anguish among the Modeloff contestants.

Through all these Modeloff challenges, the one common theme I see is to observe the details about how data that is fed into the formulas. What may seem like a straightforward formula may become very complicated as you deal with the edge cases, as we saw with the final challenge. Hopefully these challenges helped you learn a little more about building advanced formulas in Excel, and some of the real-life issues analysts, consultants, and modelers face on their job.

The post Excel Formula Final Question #4 from Modeloff 2013 Released with Explanation (4 of 4) appeared first on .

]]>The post Modeloff 2013 Excel Challenge #3 Follow Up Explanation appeared first on .

]]>We mentioned the step where we needed to convert the TRUE/FALSE array into numbers in order to sum them. We were spot on with why it works when you do SUM((E20:S59>10)+(E20:S59<-10)), but not with =SUM(ABS(E20:S59)>10).

When Excel evaluates equalities and inequalities (expressions using the ‘=’, ‘<’ or ‘>’ operators for example), that little part of the equation evaluates to TRUE or FALSE. Typically we then want Excel to then resolve the TRUE/FALSE values into 1/0 values. The SUM() function alone won’t do it, but the ‘+’ operator will.

This is actually a really common occurrence in financial modelling. In this example, we needed to resolve the TRUE/FALSE values into 1/0 values so we could then sum them. More common in financial modelling might be a situation where we are modelling a row of indicators or flags, such as “is the current period before the operations start date (in which case return 0) or after the operations start date (in which case return 1)?”. Additionally, the model will look neater if it displays the results as “1” and “0” rather than “TRUE” and “FALSE”.

If the formula was written as =IF(ColumnPeriodEndDate > OpsStartDate, 1 , 0) then there is no issue. But a savvy financial modeler will not use an IF() function here. Since only results of 1 or 0 returned, which are equivalent to TRUE and FALSE, the IF() function is a bit redundant. Much better (and quicker from a computational point of view) is to simply convert the TRUE/FALSE results into 1/0. Hence they will base the formula around the inequality =(ColumnPeriodEndDate > OpsStartDate), and then convert the Boolean result into a number.

There are a few ways to do this. The most common is to perform a mathematical operation, such as ‘+’ or ‘*’. Specifically, it is common to either:

**Include “1*” in front or “*1” at the end of the expression.**Multiplying the TRUE/FALSE value by 1 which converts it into 1 or 0.**Include “–” in front of the expression**. That is, “minus minus” or two negative signs. This essentially performs multiplication by negative 1 twice, the same as multiplying by 1. And indeed this is why in financial models you’ll sometimes see formulas starting with “minus minus”.

Doing the “*1” in front of a Boolean expression is something we totally forgot about! The “minus minus” tip is something we have never encountered in our models, but this is definitely a simple solution to quickly convert the Boolean to a number. – KeyCuts Team Comment

As we discovered from doing a quick Google search, another way to achieve the same result is using the N() function, and it is 1 character shorter than “1*” or “–”! But in real world practice, it’s not common to use the N() function here.

The N() function IS sometimes used in real world financial models. It can be a useful way of including an annotation or comment WITHIN a formula to help describe what the formula is doing. If the formula is a bit unusual or complicated, N() allows the author to provide some guidance. You simply include at the end of the formula the expression +N(“your choice of explanatory text”). The N() function will convert any text value to 0, meaning that we are simply adding 0 on to the end of the formula, which won’t change the result. But it allows us to leave a written record within the formula.

This is genius! Instead of leaving the typical yellow sticky comment, this is an in-line method to write comments within your model. – KeyCuts Team Comment

Understanding the nuances of why an Excel formula works is interesting and gets you thinking about optimization, but the real world application is all that really matters when your job is on the line. Excel can be all fun and games but not thinking about how your colleagues may use your Excel file or keeping the file size down in the event of large amounts of data crunching may lead to disaster. Even worse, you get caught up with writing the best formula and huge mistakes like the JPMorgan VaR error can occur.

The post Modeloff 2013 Excel Challenge #3 Follow Up Explanation appeared first on .

]]>The post Excel Formula Question #3 from Modeloff 2013 Released with Explanation (3 of 4) appeared first on .

]]>Ground rules are the same as before, create a formula that solves the challenge in the least number of characters without using VBA and all that fancy stuff. Read the first post in this series if you want the full rules. This week, the challenge is only one blue cell. Yup. That’s it. Just one cell to enter your kick ass formula.

There is also a large 40X15 block of numbers that you will utilize to solve the challenge:

The numbers in this block are a bunch of random integers.

Write a formula that counts how many cells in the number block are less than -10 or greater than +10. Ready, set, gooooo! During the Modeloff 2013 award ceremonies, the contestants easily created formulas to solve this challenge, but the problem was trying to find the *shortest *formula. I’ll give you a hint, the number of cells that have values less than -10 or greater than +10 is 158.

Give it a shot! Download this file that represents the data discussed above and see if you can create the formula.

I’m going to do things a little differently for this post, and walk you through my thought process for finding the solution before posting the actual solution as per Dan Mayoh. If you simply want to see the 22-character solution and don’t give a crap about my meandering thoughts, then scroll to the bottom and get instantly gratified.

As stated earlier, the solution definitely involves the use of array-entered formulas to test all the numbers in the block against the condition greater than +10 or less than -10. It also involves the SUM() function in order to add up the number of cells that meet the condition. Ok let’s get started.

The first formula I wrote was this:

=SUM(N((E20:S59>10)+(E20:S59<-10)))

This formula is array-entered (CTRL+SHIFT+ENTER) into cell E17 on the worksheet and it will result in the correct answer of 158. Why does this work? Well let’s first look at what the two conditions we are testing for: E20:S59>10 and E20:S59<-10. Nothing too crazy here, we are simply checking the entire range of random integers to see which ones are less than -10 and which ones are greater than 10.

The interesting thing about array formulas is that you can put OR() and AND() operators to test your conditions. For array formulas, the syntax to use is the “+” sign to tell Excel that we want to see which numbers are less than -10 OR greater than 10. If we used the “*” sign in between our two conditions, this would tell Excel to look for values that are less than -10 AND greater than 10. This conditional makes no sense because it is impossible for a number to be both greater than 10 and less than -10. Duh.

If we array-enter this formula =SUM((E20:S59>10)+(E20:S59<-10)), we will get the correct result of 158. However, the number of characters in this formula is 32, a whopping 10 more characters than the actual solution. There must be a better way to optimize, so I thought about using the ABS() function which returns the absolute value of whatever number you give it. This would allow me to not compare the range of values to both 10 and -10, but rather just one number.

The next iteration looks like this: =SUM(ABS(E20:S59)>10). I got really excited because this formula only contains *21 *characters, which is one less than the correct solution. When I array-entered the formula, the result made me realize I am still just a regular Excel geek compared the top financial modelers out there. The result of this array-entered formula is 0, and I struggled to figure out why. Here’s what I discovered.

ABS(E20:S59)>10) yields an array of TRUE and FALSE values indicating which values are indeed less than -10 or greater then 10. The array would look something like this (abbreviated to save space):

{FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, [...]}

Which is based on the first few values in the number block:

{9, 22, 0, 15, -14, -18, [...]}

While we know which values are indeed less than -10 and greater than 10, we cannot simply sum up the resulting array of TRUE and FALSE values. I am not completely sure about this next statement, but the reason why I think =SUM((E20:S59>10)+(E20:S59<-10)) works is because when you introduce a conditional on top of two other conditions (using the “+” sign), the result is the numeric representation of TRUE and FALSE. You’ll get an array of 0s and 1s which you can easily sum up. If you are only working with one condition, as in the above formula where I used the ABS() function, the resulting array is the non-number values of TRUE and FALSE. If you have a better explanation for why this occurs, please leave a comment below!

Ok, so I know I need to somehow convert these TRUE and FALSE values to numbers. I cheated a little here and Googled for the answer and found a lesser known function called N() which converts non-number values to a number, dates to serials, etc. Exactly what we need! Now the formula looks like this with the N() function doing its job:

=SUM(N(ABS(E20:S59)>10))

Array-enter that bad boy and you get 158. However, the character count is at 24 (including the equals “=” sign). Still. Need. To. Optimize. At this point I was getting restless and just wanted to see the solution. I think if I spent another 10 minutes on this sucker I could have come up with the solution.

In all it’s 22 character glory:

=SUM(N(E20:S59^2>100))

So simple and elegant. It’s very similar to my solution except the condition is cleverly written to cut down on characters. The E20:S59^2 portion of the formula tells Excel to raise each value to the power of 2 which basically eliminates the need for the ABS() function since any integer raised to the power of 2 will yield a positive number. Of course, we still need our trusty single-character N() function to convert the array of TRUE and FALSE values to 0s and 1s. After you array-enter this guy, you’ll get your solution of 158.

In the real world, rarely will you encounter the need to optimize your formula to this level of detail. On a project you are usually working under a deadline and whatever formula you come up with that simply works and gets the job done will suffice. There are still two points I want to finish on that this challenge can teach us:

**Speed up large Excel files**– You will work on a file that has thousands or millions of rows, and writing an efficient formula will actually*decrease*your productivity since Excel will take a long time to think and spit out the answer. This can also lead to your file crashing and data getting lost and you end up playing the whole AutoSave game to recover your file. When you are optimizing a MySQL query, these shortcuts and optimizations can mean the difference between a successful and unsuccessful product.**Formula sex factor**– There’s no other way to say it. Seeing a really simply formula that can break down a complex problem and spit out the answer is just*sexy*. I may be taking this too far, but understanding why a formula works allows you to think about other data problems differently. In computer programming parlance, this is like learning a new module and being able to use it over and over again in another part of your program.

Whew, that was a lot of talking from me for a 22-character formula. Did you come up with an alternative solution that is less than 22 characters?

The post Excel Formula Question #3 from Modeloff 2013 Released with Explanation (3 of 4) appeared first on .

]]>The post Excel Formula Question #2 from Modeloff 2013 Released with Explanation (2 of 4) appeared first on .

]]>In case you missed the ground rules for the group challenge, read the first post in this series of Excel formula challenges. Similar to last week, you have a range of blue cells where you need to enter in the solution formula:

There is also a range of numbers next to the blue cells which you must use in the challenge:

**The Objective:** Write a formula to rotate the values in the 4*4 square through 180 degrees.

All the numbers need to be “flipped” along both the X-axis and Y-axis. Here is what the final table of answers should look like in the blue cells:

Here is a side-by-side comparison between the original range of numbers and what the final output should look like in the blue cells:

Give it a shot! Download this file that represents the data discussed above and see if you can create the formula!

When I first looked at this challenge, I knew it involved using array formulas. For those not familiar with array formulas, here is a good guide from Mr. Excel that explains how array formulas work and why they are useful. I was seriously stumped by the question, and struggled to figure out how to “flip” the numbers in an automated way.

The solution involves using the simple SUM() function in an array formula in an elegant fashion. The final formula consists of **33 characters**:

SUM($E$18:$H$21*(B15:E18=$E$18))

This is the formula you enter in the first cell of the blue cells in the array format (by pressing CTRL+SHIFT+ENTER) after you’ve written the formula. You then copy over to all the cells in the range to yield the final table of answers.

This formula compares two ranges of values through an array formula. If you simply entered the formula as is and pressed ENTER, you would get a #VALUE error in the cell since you are supplying the SUM() function with a conditional statement.

Let’s back up and see how array formulas work in the context of this solution. Let’s analyze the $E$18:$H$21 portion of the formula. This is simply the range of numbers we need to flip to get our solution. By entering this formula as an array formula, this specific portion of the formula returns an *array* of values like this:

{100, 111, 112, 114, 118, 126, 135, 136, 142, 147, 149, 151, 174, 186, 193, 197}

The B15:E18=$E$18 portion of the formula is a little different since we are testing a condition against an array of values. In this case, we are testing each value in the B15:E18 range to see if that value matches the value in $E$18. The B15:E18 range of values returns an array that is mostly empty except for the last value:

{"", "", "", "", "", "", "", "", "", "", "", "", "", "", "", 100}

The reason why there are so many blanks is simple. The range B15:E18 is what I like to call a “dummy” range since it’s created mostly to help with making the final formula work. Cells B15, C15, D15, etc. are all empty in the worksheet. The B15:E18=$E$18 condition actually returns an array of TRUE and FALSE values, or 0s and 1st based on whether or not the values in the B15:E18 range equal cell $E$18, or 100. Thus, the array of values returned from testing B15:E18 with the value 100 is this:

{FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, TRUE}

Mathematically, this array actually looks like this:

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1}

So now we have two arrays: one with the values in $E$18:$E$21 and one with the 0s and 1s from testing range B15:E18 with the value 100. The “*” in the formula multiplies the first array by the second array. The product of this is…you guessed it! Another array of 16 values.

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 197}

This array above is what’s returned from $E$18:$H$21*(B15:E18=$E$18) when entered as an array formula. Why is the last value 197? Remember the first array $E$18:$H$21 consists of all the numbers in the original square. When we multiply all those numbers with the second array with the 0s and 1s, the first 15 numbers in the resulting array are 0 (due to the FALSE values) and the last number 197 gets multiplied by 1 (the only TRUE value in the second array). The SUM() function simply adds everything up in the array together which results the value 197. This is the correct number in the top left of the blue answer cells.

The real magic occurs when you copy this formula to all the cells in the blue answer cells. Let’s look at the formula in cell K19 of the answer cells:

SUM($E$18:$H$21*(C16:F19=$E$18))

You’ll notice that the only thing that changed about this array formula is the relative range C16:F19. All the other referenced cells are absolute references. As you copy the formulas over to the other cells in the answer table, the “dummy” range of cells begins to shift to help create this “flipped” axis of numbers. If we break down the two arrays that are multiplied together, you’ll see how this works. $E$18:$H$21 remains the same:

{100, 111, 112, 114, 118, 126, 135, 136, 142, 147, 149, 151, 174, 186, 193, 197}

When we test C16:F19 against $E$19 now, the C16:F19 dummy range starts to include some of the numbers in the original table of numbers. This effectively helps us “count” which position in the second array contains the TRUE value. The resulting array of C16:F19=$E$19 is:

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0}

The 11th value in the above array is the TRUE value because that’s when the value in range C16:F19 equals 100. When we multiply this array of 0s and 1s with the original table of numbers, the resulting array is this:

{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 149, 0, 0, 0, 0, 0}

Of course, when we sum up the numbers in this array, the result is 149 which is the correct number in cell K149 of the blue answer cells.

**Conclusion**

There are two other formulas that deserve honorable mentions which we won’t go into detail about how they work, but they show how you can get creative in Excel to solve problems with different functions. Both of these formulas clock in at **35 characters**:

INDEX(E18:H21,{4;3;2;1},{4,3,2,1})

OFFSET($E$18,21-ROW(),13-COLUMN())

The first formula is array-entered into all 16 cells in the blue answer cells simultaneously. The second formula is entered into the first cell of the answer table (Cell J18) and copied across and down to the rest of the cells in the range.

Were you able to come up with other solutions? How long did it take you to answer this question?

The post Excel Formula Question #2 from Modeloff 2013 Released with Explanation (2 of 4) appeared first on .

]]>The post Excel Formula Challenge – Question #1 from Modeloff 2013 Released (with explanation) appeared first on .

]]>The goal of the challenge is to write the most “efficient” Excel formula that solves the problem, e.g. the formula with the least number of characters. The teams were generally given 10 minutes or less to create their final formulas. The formula must be copied across all the cells in the answer range to solve the objective. No VBA, Defined Names, Helper Cells, or references to the Table of Answers or Check Sum cells may be used.

** **You have a bunch of empty cells as shown below with these start dates and end dates above them:

You also have these Start and End dates in yellow directly to the left of those blue outlined cells:

**The Objective:** Write a formula that counts the number of days between the Start and End dates in the above table that fall within the Start and End dates listed above the first table.

For instance, in the first cell in the blue-outlined table, the number **69** should be the output, like this:

Why is that first cell 69? The period in the blue-outlined table is January 1^{st} to March 31^{st}, 2013 (one quarter). The Start date in the yellow table is January 22^{nd}, 2013, which is greater than the period start date (January 1^{st}, 2013). The End date from the yellow table, however, is greater than the period end date at January 21^{st}, 2014. Therefore, the only overlap would be from January 22^{nd} to March 31^{st} 2013 in this case (69 days total).

The correct number in the second column for the blue-outlined table is **91**. Why? The period start and end date for this cell is April 1^{st}, 2013 and June 30^{th}, 2013. The Start and End dates in the yellow table are April 1^{st}, 2013 and June 25, 2030. This entire date range of 17 years overlaps with the entire period from the blue-outlined table, so the resulting answer is the entire quarter, or 91 days. If there is no overlap, then the formula should return nothing in the blue-outlined cells.

The Table of Answers shown in the Excel file tells you if the output in the blue-outlined cells are indeed the correct calculation (Green = correct, Red= incorrect):

The Table of Answers quickly helps you figure out if the formula you are copying across and down in the blue-outlined table are correct. After creating the correct formula and copying it across all the cells in the range, your table should look like this:

Give it a shot! Download this file that represents the data discussed above and see if you can create the formula!

The very first thought that comes to my mind is to use a bunch of IF statements to solve the problem. While this brute force method will work, it’s not the most efficient solution since the challenge calls for writing the shortest formula.

The solution involves using the MAX() and MIN() formulas in a creative way. The final formula only uses **38 characters **as reported by Dan:

MAX(,MIN(H$17,$F18)-MAX(H$16,$E18)+1)

This is the formula you enter in the first cell of the blue-outlined cells and then copy over to all the cells in the range.

Why does this work? Let’s first focus on the MIN(H$17,$F18) part of the formula. MIN() simply returns the smallest value you provide the function—in this case it simply returns H$17 or $F18 depending on which is smaller. H$17 refers to March 31^{st}, 2013 in the blue-outlined cells, and $F18 refers to January 21^{st}, 2014 in the yellow cells. Ok, so we know that this MIN() formula will return **March 31 ^{st}, 2014 **(cell H$17).

For the MAX(H$16,$E18) function, MAX() does the opposite of MIN(), and returns the greater value of these two inputs. H$16 refers to January 1^{st}, 2013, and $E18 refers to January 22^{nd}, 2013. The greater of these two dates is, of course, **January 22 ^{nd}, 2013**.

Now, let’s step back and look at what this does: MIN(H$17,$F18)-MAX(H$16,$E18)+1

We are taking March 31^{st}, 2014 and subtracting from it January 22^{nd}, 2013. The result is 69 days. The reason we add the 1 at the end is to make sure the result is inclusive of the start and end dates.

If we simply copy this formula MIN(H$17,$F18)-MAX(H$16,$E18)+1 across the range of blue-outlined cells, we’ll get something like this:

This is obviously incorrect since there negative numbers in parentheses that doesn’t make sense if we are trying to find the overlap. These negative numbers should in fact be empty cells since the dates from the yellow table to not overlap with the period dates from the blue-outlined table.

Here is where the MAX() function from the outside of the formula we found comes into play: MAX(,MIN(H$17,$F18)-MAX(H$16,$E18)+1)

We are asking Excel to give us the greater of MIN(H$17,$F18)-MAX(H$16,$E18)+1 or 0. If there is indeed an overlap, then we will always get a positive number from the MIN(H$17,$F18)-MAX(H$16,$E18)+1 part of the formula. When this formula returns a negative number, 0 will be the result of this outside MAX() function, so therefore the resulting output will be the 0 or the blank we are looking for in the answer set.

Interestingly, the formula could have been written like this: MAX(0,MIN(H$17,$F18)-MAX(H$16,$E18)+1).

You see that extra 0 (zero)? That can simply be deleted and Excel automatically knows to compare against the number 0 as stated in the final solution. This also allows us to save one character from formula!

This is the first question the teams answered and most teams finished it in less than 10 minutes. How long did it take for you? Were you able to come up with a shorter formula?

The post Excel Formula Challenge – Question #1 from Modeloff 2013 Released (with explanation) appeared first on .

]]>