The Greatest Formula in the World, Part 3

 
Our time machine that we developed in part one and part two still needs some explanation, so let’s dive in.

First Things First:  The MAXX Wasn’t Needed

Looking back, I noticed that I was overcomplicating things needlessly.  The final formula I had involved the following FILTER() term, in which I used a MAXX:

FILTER(ALL(Periods),
       Periods[NextYear Period]<=
          MAXX(VALUES(Sales[Period Num]), Sales[Period Num])
)

Turns out that MAX() works just fine, so let’s replace that clause and simplify things a bit.  Here’s the new formula:

CALCULATE([Total Sales],
   ALL(Periods),
   FILTER(ALL(Periods),
      Periods[Year]=MAX(Periods[Year])-1
   ),
   FILTER(ALL(Periods),
          Periods[NextYear Period]<=MAX(Sales[Period Num])
   ),       
   VALUES(Periods[MerchPeriod])
)

OK, with that, we can move on to explanation:  How does this formula work??

I used to call this technique “expand then filter”

Well actually I still do, in my own head.  It’s just that the GFITW is a catchier title.

OK, so the “expand” part is just that first ALL():

CALCULATE([Total Sales],
          ALL(Periods),

The first step in the formula, then, is basically just telling the calc engine to throw away all filters on the Periods table.  In other words, “forget all concept of time, pretend the pivot is not filtered at all with respect to time.”

We do this so that we have a clean slate.  Then, in the subsequent FILTER clauses, we build up a new filter context for time.

One more time for clarity:  the way this formula works is to first throw out all time filters, and then in subsequent steps, we build up new filters to match the time period that we want, which in this case is last year.

Once you understand that, this overall formula starts to get pretty simple.  Each piece of the formula is quite straightforward in its own right.  ALL() is pretty straightforward for sure, and so are the subsequent FILTERS().

OK, we’ve expanded.  Now on to the filters!

The first filter says, “hey, now that we’ve thrown out all time filters, let’s filter time back down to just be last year.”

FILTER(ALL(Periods),
       Periods[Year]=MAX(Periods[Year])-1
)

Let’s talk about the FILTER() function itself for a moment.

How does FILTER() Work?

Honestly this function has deserved its own post for a long time.  I’ll give a brief explanation here.

The syntax for the FILTER function is FILTER(TableToFilter, FilterExpression).   Pretty simple.  Here’s some more detail:

  1. FILTER() takes a TableToFilter and a FilterExpression, and returns all rows from that TableToFilter that match the FilterExpression.
    1. In the example above, TableToFilter is ALL(Periods)
    2. and FilterExpression is Periods[Year]=MAX(Periods[Year])-1
  2. FILTER() steps through the TableToFilter one row at a time. 
    1. And for each row, it evaluates the FilterExpression.  If the expression evaluates to true, the row is “kept.”  If not, it is filtered out.
    2. Because FILTER() goes one row at a time, it can be quite slow if you use it against a large table.  When I say “large” that is of course subjective.  A few thousand rows is fine in my experience.  A million is not.  Do not use FILTER() against your fact table.
  3. The FilterExpression typically takes the form of Table[Column] = <expression>
    1. The comparison operator doesn’t have to be “=”.  It can also be <, >, <=, >=, <>
    2. The expression on the right hand side of FilterExpression can be “rich.”  This is VERY useful.  In a simple CALCULATE, the right side of each filter expression has to be simple, like a literal number (9) or a string (“Standard”).  The fact that FILTER() allows for rich expressions here is one of the most common reasons I use FILTER().
    3. The Table[Column] in the filter expression is a column in the TableToFilter.  If you are filtering the Periods table, it makes sense that you are testing some property of each row in Periods.  I can’t think of a sensible reason to use a column here that is NOT from TableToFilter.  (Insert “boot signal” here, maybe the Italians can address this).
  4. FILTER() ignores everything else going on in your formula and acts completely on its own.
    1. For example, our overall formula sets ALL(Periods) as the first argument to CALCULATE.
    2. The FILTER()’s that come after that do NOT pay any attention to other arguments however, including that ALL(Periods).
    3. In other words, the FILTER() functions are still operating against the original filter context from the pivot!  If the pivot is sliced to Year=2009, then the FILTER() function starts with the Periods table already pre-filtered to just 2009.
    4. This is why each of my FILTER()’s uses ALL(Periods) for TableToFilter.  I have to repeat the “expand” step so that my FILTER() is also working from a clean slate.
  5. Even though each FILTER() operates on its own, their results then “stack up” in the overall formula.
    1. Even though FILTER() RETURNS a set of rows that matched the FilterExpression, it actually REMOVES rows from the overall filter context.
    2. This sounds tricky but really it isn’t.
    3. Let’s say our TableToFilter contains 6 rows:  A, B, C, D, E, and F.
    4. And our overall formula contains two FILTER() clauses that both operate on the same TableToFilter, just like our overall formula near the beginning of this post.
    5. Let’s also say that the first FILTER() returns rows A, B, C, and D.
    6. And the second FILTER() returns rows C, D, E, and F.
    7. The net result is that only rows C and D are left “alive” in the overall filter context of the formula.
    8. So one way to think of this is that FILTER()s “stack up” on top of each other.
    9. Another way to think of it is that even though the first filter RETURNED rows A, B, C, and D, its real effect was to REMOVE all other rows (E and F) from consideration.

OK, back to that first filter!

Here it is again:

FILTER(ALL(Periods),
       Periods[Year]=MAX(Periods[Year])-1
)

Let’s revisit points 1-5 above for this FILTER expression to see how it all works.  And let’s examine just a single cell of the pivot to see how this FILTER operates for that one cell:

image

Focusing on The Orange-Circled LASTYRSALES Cell As An Example
With Its Period Filter Context Highlighted in Green

In that picture above, the orange cell we are looking at has a filter context “coming in” from the pivot.  It has Period[Year] set to 2011 and Period[MerchPeriod] set to 1, as highlighted in green.

Given the detailed description of FILTER() from points 1-5 above, we can see that:

  1. We set ALL(Periods) as our TableToFilter so that we are starting from a clean slate with respect to time.  So our Periods table now has “all rows alive.”
  2. Then our FilterExpression tests against the Periods[Year] column.
  3. MAX(Periods[Year]) – 1 still operates independently!, so it still picks up Periods[Year]=2011 from the pivot.  Therefore it returns 2011 – 1 = 2010!
  4. Since we started with ALL(Periods) as the TableToFilter, and the FilterExpression only “keeps” rows where Year=2010, we are left with all 2010 rows “alive” after evaluating this FILTER().
  5. If we didn’t do ALL(Periods) for TableToFilter, and instead just used Periods without the ALL(), our FILTER would start out with only rows from 2011 (since that is what the pivot is telling us).
    1. And then in the next step when we go back a year to 2010, FILTER() would find no rows.  There are no rows that match Periods[Year] = 2010 and Periods[Year]=2011. 
    2. So our FILTER would return no rows, which means it would have the effect of REMOVING all rows from Periods in the overall formula, and our measure would return blank for all cells in the pivot.

That’s a lot of explanation, I know.  Walk through it a few times.  It’s actually pretty intuitive once you’ve done it a few times.  The tricky part, for me, was discovering all of these details for myself.  And since I’ve done all of that, you don’t have to.

That’s enough for this time.  I think you can probably figure out how the second FILTER() evaluates based on the above, but I will step through it next time. 

I will also explain why we use that last VALUES() in the formula, and probably also share some of the answers I got from the Italians, and from David Churchward, in response to my question “did I need to add that calc column in the Periods table?”

8 Responses to The Greatest Formula in the World, Part 3

  1. Carsten Bieker says:

    Is the first “All(Periods)” really necessary?

    “Filter(all(periods)” returns the Periods table minus some rows, but nevertheless with all columns, so for each column there is a new filter, overriding any other filters that may exists in the initial Pivot context?

    • powerpivotpro says:

      FILTER() only subtracts rows from the filter context, it can never add rows back. For that reason, the first ALL(Periods) is necessary.

      • Carsten Bieker says:

        Filter(Dates;…) would further reduce the current context, but filter(all(Dates);..) can increase the filter context).

        If you have a Picot cell that is filtered by date, than the Dates table would be filtered to a single row like 2013-03-28, March, 2013, Thursday.., depending on the fields of your date table. Filter(Dates;) could not add rows to that, at best it could Keep the existing row.

        Now (Filter(all(Dates);Month = March) would run through the whole date table again and return all rows where the Month is March. So the resulting table would consist of several rows. each row consisting of a value for Date, for, Month, for year. You get a value for each field of the date table, so any filter in the Initial filter context taken from the date table will be overriden by this “Filter table”.

        Try a calculate([Total Sales];All(Dates)) and compare the result with calculate([Total Sales];Filter(all(Dates);1). Both return the same result.

      • Carsten Bieker says:

        Or, a slightly shorter version.
        All(Dates) returns all rows of the Dates tables
        Filter(All(Dates) returns a subset of the rows of the Dates table.
        Both combined will aways return only the rows that are in the subset returned by filter, so the All(Dates) does not have an influence on the filter context on which the caculate is based.
        The whole clear and refilter-bit is already happening inside Filter(), I think. Well, at least it works this way on the model I am working on right now.

  2. Carsten Bieker says:

    Maybe one last word, then I’ll go away. :)

    Filter(all(Dates[Year];….) needs an outside All(dates); as it only returns one column, any filter in the initial filter context on another column still “survives” and influences the result.

    Filter(All(Dates)) returns all columns of the Dates table minus some rows and uses them to flter the dates table, so the whole query context /initial filter context overriden.

  3. Claire says:

    Hi, just discovering PowerPivot and went directy to the GFITW :-) Using the given sample file to replicate the formulas in my project. It appears though that it’s NOT working with the combined selection: calculation “Current” & Comparison “Last Year” when, in a given peeriod|month, you have several weeks within a period. It returns the sales of the last week of the period but not the sum of the 4 weeks actually… any thoughts on how to solve that that? Thanks in advance for your help

  4. Claire says:

    Hello, well, I found the solution :-) :-) but entering the blog post earlier, i was really getting desperate… I replaced the Last Year formula by copying the Quarter To Date Last Year and replacing “Quarter” by Period so it does a “Period To Date” calculation. By doing so, the last week of the period (which is selected by defalt) contains the right sales number… It may hoepfully help other “newbies” for whome this subtility is not integrated yet. Cheers

  5. Matt says:

    Trying to do something similar to this but with no success, so don’t know if anyone can suggest a solution – my problem is that I want to show the previous years data – but only from the years that are selected in the filter.
    For example if 2011 and 2010 are selected then alongside 2011 data I want to show the 2010 sales figures as a comparison (which I can do from the above).
    If the end user filters for 2012 and 2010 then I want the 2010 sales data to be shown alongside 2012′s
    I don’t want to hard code in a two year lag (or anything similar), as I want the user to be able to select any set of years that they want.
    I’ve tried different versions of the following calculation to determine what the comparable year would be
    =CALCULATE(MAX(Calendar[Year]),FILTER(ALL(Calendar),Calendar[Year]<MAX(Calendar[Year]))), but the problem being that to get the values across all years I need to use ALL – but then that ignores the filters that have been applied to the year, so it always ends up returning the previous year minus one value – rather than the previous filtered year (i.e. in the case of 2012 and 2010 selected it returns 2011 rather than 2010 as the previous year).
    Any ideas?

Leave a Reply