Becoming one with CALCULATE()

Guest Post by Scott Senkeresty

Intro from Rob

Hey, it starts out simple and powerful:  CALCULATE is the SUMIF you always wished you’d had.  It works in pivots.  It’s the “anything IF.”  It’s amazing, really, how many doors it opens.

Of course, CALCULATE is designed to be powerful in ways we can’t even IMAGINE in our first day/week/month of using it.  You can spend years discovering all the things it can do – and that’s a good thing!  But sooner or later you’re going to hit something with CALCULATE that makes you scratch your head – why is it returning THOSE results?

I myself entered this twilight zone with the Precedence Project – a series of posts that I quickly abandoned.  It turns out that, practically speaking, you don’t need to achieve deep theoretical understanding of this stuff in order to achieve great results.

Below, however, Scott does a great job of resolving those mysteries.  And he does so by “channeling” two old friends who live at the base of the Alps.  Take it away, Scott…

Going to “Graduate School”

image

All right, so I’ve read Rob’s book a few times.  (Heck, I am credited as tech editor on it.)  I’ve devoured PowerPivotPro University.  So now what, I ask Rob?

“Go forth and conquer – data is your ocean,” is his answer.  He’s a practical sort of guy.  Me, though?  I’m never satisfied until I’ve completely torn the machine apart.

So, as I hinted in my last post, I went to graduate school and spent a few intense days engrossed in Marco and Alberto’s book.

 

I had to read and re-read the chapter on CALCULATE() three complete times, and for some paragraphs, probably closer to a dozen times.  But then it happened.  I became one with CALCULATE.

So, in the post, I am not really giving new info.  Rob does a great job of the material in his book, again from that practical perspective.  The Italians obviously do a great job.  There are blog posts from Rob and others that discuss it as well.   But this stuff is so fundamental to your understanding of DAX, I think it is worth retelling again.  I know for me, I had to read different material again and again before I was one with the Matrix.   So maybe my retelling will help you too.

The Boolean Case

CALCULATE() is simply the way we execute an expression in a modified filter context.  The nerdy looking syntax:  CALCULATE(<expression>,<filter1>,<filter2>…)Avg Sales By Day

Let’s jump into an example.  You already have this measure against your sales data.

[Avg Sales] := SUM(Sales[ExtendedAmount]) / COUNTROWS(Calendar)

We drop it on a pivot table broken out by “IsWeekend” and “DayOfWeek” and get something like what you see on the right.

We see that Saturday has our highest average sales, and decide we want to start building out a new measure… say [% of Saturday].  As a step in that direction, we pull out CALCULATE() and get to work!

[Sat Avg Sales] := CALCULATE([Avg Sales], Calendar[ShortWeekdayName] = “Sat”)

Before we look at the results, let’s review the “rules of calculate”.

  1. If a filter parameter is against an existing filter context, it replaces the existing filter.
  2. Otherwise, a new filter is added to the filter context.

Since our pivot table is split out by ShortWeekdayName (that is “on rows”), our filter parameter is just going to replace that filter with ShortWeekdayName = “Sat”.  Right?

imageWell, Yes!   That is exactly what happened!   What we didn’t mention is that IsWeekday is also in the filter context, and we made no effort to replace that filter… so while the expected value of $26,319 shows up in both rows where it’s the weekend… we just get blanks for weekdays.

Well, that’s easy to fix…

=CALCULATE([Avg Sales], Calendar[ShortWeekdayName] = “Sat”, ALL(Calendar))

We simply add a second filter parameter to CALCULATE (you can pass as many as you like), this time replacing the filter context on Calendar.   We could have just as well passed ALL(Calendar[IsWknd]) for the same impact on this pivot table, but it would breakimage again once somebody drops Year/Quarter/etc on a row, column or slicer… so, we just strip away the filter on the whole darn table.  We are crazy like that.

Note this is a different “style” of filter parameter to calculate.  We are not passing a boolean expression, we passed it a table.  That means I am typing in the wrong section and better move along…

The Table Case

Recall there are restrictions when using boolean parameters to CALCULATE().  For example, you can not reference more than one column (as the engine would get confused on which filters to remove).  These restrictions (which allow for great performance benefits) are removed by passing a table as the parameter.   When you pass a table, you are explicitly telling the engine “Hey DAX engine for this filter, here are all the rows I want you to use.  No more, no less”.

We already saw ALL() used above, where we said “Hey DAX, clear off any filters from the Calendar table”.  Another common usage is passing the results of FILTER().  What I want to do is compare the same logic we used above (in the boolean case)… but do it in the table case via FILTER().

=CALCULATE([Avg Sales], FILTER(Calendar, Calendar[ShortWeekdayName] = “Sat”))image

Well, that’s… interesting.   We have the same problem with IsWeekend, but now we have a new problem… there isn’t even a value for Sunday!?

Here is the thing.  You see that Calendar parameter to FILTER?  It is still under the influence of the original filter context.  When we are evaluating Sunday the filter context is { IsWeekend=1, ShortWeekdayName=”Sun” }.  From that set of rows, which of them will evaluate  Calendar[ShortWeekdayName] = “Sat”  to true?   None of them!  So, you get a blank for that Sunday row.

Okay… what if I add another filter parameter to calculate?

=CALCULATE([Avg Sales], ALL(Calendar), FILTER(Calendar, Calendar[ShortWeekdayName] = “Sat”))

Will that help us?  It certainly “reads nice”, but no.  The result would look identical.

We have created a new filter context (with ALL) that strips filters off the Calendar table, but we still have the other filter where ShortWeekdayName must be Saturday.  Both filters are applied.  That is how CALCULATE works… every filter parameter is added to the filter context and basically AND’ed together.  They don’t interact in any way.

If you do something silly like pass two filters, one where WeekdayName is Sat, and one is WeekdayName is Sun… you simply get no rows, because no row is both Sat and Sun.

The solution here is to move that ALL() to surround the first parameter to FILTER().

[Filter All Sat Avg Sales] := CALCULATE([Avg Sales], FILTER(ALL(Calendar), Calendar[ShortWeekdayName] = “Sat”))

Then we happily get rows everywhere.  Let’s step through this a bit. image

  1. CALCULATE() is going to execute our expression [Avg Sales] for each row.  On each execution the filter context will be different (each of the combinations of IsWeekend and ShortWeekdayName).
  2. The filter context from 1, flows into each of the filter parameters passed to CALCULATE().
  3. Each of the filter parameters to CALCULATE() then modifies the filter context in the same way as the boolean case.  If the current filter context already has a filter on the column, it replaces it, otherwise it adds a new filter.
  4. In our case, FILTER() accepts the filtered (say Weekend=Sun, Weekend=1) then nukes it by calling ALL(), then applies the new filter [ShortWeekdayName] = “Sat”… and as a result returns this new table back to CALCULATE().  “Hey DAX, for this filter, here are the rows I want you to use.  No more.  No less”.

Note that instead of FILTER(ALL(Calendar)… we could have passed FILTER(ALL(Calendar[ShortWeekdayName]) … and the results would be the same as the 2nd pivot table in this post.  We would not have cleared the filter on IsWeekend, so Mon-Fri would be blank rows.

Conclusion

NeoFor me, this all clicked when I formed the mental model that for each filter parameter to CALCULATE(), a filter context flows in, and then flows out.  In my head there are little arrows flowing into the parameter, then flowing back out.   That mental image aids my understanding that my parameters (frequently a call to FILTER()) are always impacted by the existing filter context, but then my parameter can add/clear/change however it wishes before returning the filter context to CALCULATE().

If you keep that in mind, along with the understanding that all filter params to CALCULATE() will be AND’ed togther… you too will be one with CALCULATE()!

19 Responses to Becoming one with CALCULATE()

  1. Eric Hutton says:

    Is there any performance difference between

    =CALCULATE([Avg Sales], Calendar[ShortWeekdayName] = “Sat”, ALL(Calendar))

    and

    = CALCULATE([Avg Sales], FILTER(ALL(Calendar), Calendar[ShortWeekdayName] = “Sat”))?

    In other words, is there any compelling reason to select one over the other, or is it just a matter of style?

    • Scott Senkeresty says:

      I do believe that functionally, you will get the same results. In general, you should always prefer the boolean (true/false) style of the first one for performance reasons.

      • Marco Russo says:

        Scott is right. Writing
        =CALCULATE([Avg Sales], Calendar[ShortWeekdayName] = “Sat”, ALL(Calendar))

        is like

        =CALCULATE([Avg Sales], FILTER ( ALL ( Calendar[ShortWeekdayName] ), Calendar[ShortWeekdayName] = “Sat” ), ALL(Calendar))

        In general, every time you write a filter argument in CALCULATE such as:

        table[column] =

        you are in reality writing:

        FILTER ( ALL ( table[column] ), table[column] = )

        The performance depends on the cardinality (number of distinct values) of table[column]. Usually a column has a lower number of distinct values than the number of rows of the entire table, and for this reason it’s faster.

        • Oxenskiold says:

          As far as I know, when you use All() as a top-level function argument to CALCULATE ALL() doesn’t really return a table. It merely removes all filters from that table that existed before the first filter argument of Calculate is processed.

          A top-level function argument is an argument that isn’t nested inside another function like FILTER().
          So in essence CALCULATE( … , ALL(Calender), …) doesn’t use the entire Calendar table to filter the final calculation dataset, it just make sure that all of the filters set on the calendar table is removed. (a filter might get added back in again by another filter argument of course as is the case here)

          Jeffrey Wang writes shortly about this in his epic blog post:
          http://mdxdax.blogspot.dk/2011/03/logic-behind-magic-of-dax-cross-table.html

          For this reason CALCULATE([Avg Sales], Calendar[ShortWeekdayName] = “Sat”, ALL(Calendar)) ought to be the fastest alternative.

          That is because:
          1.
          Calendar[ShortWeekdayName] = “Sat” is transformed into FILTER(ALL(Calendar[ShortWeekdayName]) , Calendar[ShortWeekdayName] = “Sat”) and the ALL(..) here can get the distinct values of Calendar[ShortWeekdayName] from a column index instead of traversing the entire Calendar table. Here the ALL(..) is used with a table field argument instead of the entire table. It’s not a top-level argument since it’s inside the FILTER() function however, but can make use of a column index. The result of the FILTER() will be a one-row-one-column filter table with “Sat” as its value.
          2.
          The All(calendar) argument is a top-level argument to the CALCULATE in question and it doesn’t traverse the Calendar table nor is the Calendar table inserted as a filter table as explained above.
          3.
          The alternative CALCULATE expression: CALCULATE([Avg Sales], FILTER(ALL(Calendar), Calendar[ShortWeekdayName] = “Sat”)) would have to traverse the entire calendar table because ALL() is now not a top-level argument to CALCULATE but is nested inside the FILTER() function AND it uses the entire calendar table as the argument. It would have to examine every row to find those with Calendar[ShortWeekdayName] = “Sat”.

          —-
          Admittedly I don’t know if the rule from Jeffrey Wang’s blog post still applies nor whether the implementation of DAX make efficiently use of column indices. However should that be the case cardinality shouldn’t really matter in this case. (It would be nice to have this confirmed by the people having the right tools to do so though)

          • Marco Russo says:

            At point (3) you say that this expression has to iterate the entire Calendar table:
            CALCULATE([Avg Sales], FILTER(ALL(Calendar), Calendar[ShortWeekdayName] = “Sat”))
            In reality, internal optimizations might optimize such iteration, so even if the FILTER returns the rows from Calendar, the time required to obtain such a result does not require the test on ShortWeekdayName to be performend on each row of Calendar.
            But this is not a general rule and a more complex logical condition will require complete iteration.

  2. Mike Girvin says:

    When you say “Filter Context and Arrows in, then Arrows Out” is it equally okay to just think of it as “AND Criteria” where all conditions must be TRUE?

    • Scott Senkeresty says:

      Kind of 2 different issues. The arrows in/out is to help me remember that for each filter param, the current/outer filter context is applied before my param does it’s magic… then my updated filter context flows out (and into the big AND). I hope to work on a visualization/graphic for this today to help…

  3. Chris Gilbert says:

    Note: Just in case you try to copy formulas from the blog (to save some time), if there are double-quotes in the formula, you’ll need to (I had to) replace the copied double-quotation characters with ones you type in from your keyboard.

    Could be a country-code setting, but it always seems to catch me off-guard.

  4. sam says:

    @Scot
    Could you kindly explain how the filters work in the below measures
    The Data is the Adventure Works Database
    The Task is to find Sales from thoose customers who visited 2 or more times

    1) [mSales] = SUM(Sales[ExtendedAmount])
    2) [mOrders] = DISTINCTCOUNT(Sales[SalesOrderNumber])
    3) [mSalesCustRepeatMorethan2] =CALCULATE([mSales],FILTER(Customers,[mOrders]>=2))

    In the 3rd Measure are we not using a Measure defined on the Fact table to filter a LookUp Table -i.e are the filters not moving uphill.

    I asked this to question to Rob as well – but could not get response

    However the measure below also works and I do understand how

    mSalesCustRepeatMorethan2] =CALCULATE([mSales],FILTER(VALUES(Sales[CustomerKey]),[mOrders]>=2))

    • Marco Russo says:

      When you apply a filter on a table (or on a column of a table) having one-to-many relationships with other tables, such a filter propagates to the other tables (those on the “many” side). Thus, if you apply a filter to Customers table, you are also filtering the Sales table considering only rows belonging to the filtered customers.

      • sam says:

        @Marco
        My point is if you say Filter(Customer, [Some Column or Measure in Customer Table] [Some operator] [Some Value]) and this results in less number of rows left standing in the customers Table and therefore filters the Sales Table – This is Fine – The filters flow down hill

        But in my example above – [mOrders] is a measure defined on the Sales Table
        How come it is being used to Filter the Customers Table – which results in less number of Customer ID’s and this then flows back to the Sales Table !!!
        So is the filter context not travelling up (from Sales to Customer) and then Travelling down (Customer to Sales) !!! – Like a Circular Reference !!!

        Rob convinced us that Filters only flow Down Hill :-)

        • Marco Russo says:

          When you write

          FILTER(Customers,[mOrders]>=2)

          you are saying “iterate the customers table and, for each Customer, check that the result of mOrders measure is greater than or equal to 2″
          If you expand the measure, you have:

          FILTER(Customers, CALCULATE ( SUM ( Sales[ExtendedAmount] ) ) >=2)

          which means “iterate the customers table and, for each Customer, check that the sum of the ExtendedAmount value in Sales table for the rows of the customer is greater than or equal to 2″

          The presence of CALCULATE (which is automatically involved when you have a reference to a measure, such as [mOrders] ) invoke what we call a “context transition”. The current row in Customer becomes a filter context, which propagates to the Sales table, filtering only the rows in Sales that correspond to the “current” Customer. This operation is repeated for all the rows of the Customers table iterated by the FILTER statement.

          Technically, you are not propagating the row context back and forth between Customers and Sales. The FILTER loop defines a new filter context for each Customer thanks to the implicit CALCULATE function.
          You do a propagation of the filter context when you apply the many-to-many pattern (see several examples of this syntax here: http://www.sqlbi.com/articles/optimize-many-to-many-calculation-in-dax-with-summarize-and-cross-table-filtering)

          • sam says:

            @Marco
            “iterate the customers table and, for each Customer, check that the result of mOrders measure is greater than or equal to 2″

            Thanks – That is a “Eureka” moment for me for calculate…Everything is now crystal clear

  5. sam says:

    Sorry the last couple of line should read

    “So is the filter context IS NOW travelling up (from Sales to Customer) and then Travelling down (Customer to Sales) !!! – Like a Circular Reference !!!”

  6. Keith Barney says:

    Any chance of getting a copy of the workbook with this example? Thanks.

  7. Clara says:

    Hello. Working with calculate, is it possible to reference a measure in the filter arguments so it changes when the measre results changes ? or do you have to use a filter function for that.?

Leave a Comment or Question