SUMX() – The 5-point palm, exploding fxn technique

 
Pai_Mei

 

 

“AGAIN!”

-Pai Mei

 

 

 

SUMX() – the great iterator

Have you ever written an array formula in Excel?  (Don’t worry, most people haven’t).  Have you ever written a FOR loop in a programming language?  (Again, don’t worry, there’s another question coming).  Have you every repeated something over and over again, slowly building up to a final result?

That’s what SUMX() does.  It loops through a list, performs a calc at each step, and then adds up the results of each step.  That’s a pretty simple explanation, but it produces some results in pivots that are nothing short of spectacular.

Anatomy of the function

     SUMX(<Table>, <Expression>)

THE BRIDE: “What praytell, is a five-point palm, exploding function technique?”
BILL: “Quite simply, the deadliest blow in all of the analytical martial arts.”
THE BRIDE: “Did he teach you that?”
BILL: “No. He teaches no one the five-point palm, exploding function technique.”

That’s kinda how I feel about the description of SUMX in the Beta release:  “Returns the sum of an expression evaluated for each row in a table.”  It merely hints at the power within.

Oddly, the best way to show you what I mean is to start with some useless examples and then build up to useful ones.  For all examples, I will use the following simple table, Table1:

Sample Table for SUMX Table1 

Useless Example #1:  By the whole table

     SUMX(Table1, Table1[Qty])

Returns:  35, which is the total of the Qty column.  Might as well just use SUM([Qty]).

Why:  Well, it iterates over every row in Table1, and adds up [Qty] at each step, just like the description says it would.

Useless Example #2:  By a single column

     SUMX(Table1[Product], Table1[Qty])

Returns:  An Error

Why:  Table1[Product] is not a Table, it’s a Column.  And SUMX demands a Table as the first param.

Useless Example #3:  By distinct values of a column, sum another

OK, I’ll wrap the [Product] column in DISCTINT(), since that returns a single-column table:

     SUMX(DISTINCT(Table1[Product]), Table1[Qty])

Returns:  An Error

Why:  [Qty] is not a column in the single-column table DISTINCT([Product]).  Only [Product] is.  Why did I even try this?

That’s where I gave up awhile back.  Until I learned…

Almost-Useful Example:  The Second Param Can Be a Measure!

And even better, that measure CAN access other columns even if you use DISTINCT.  First let’s define a [Sum of Qty] measure:

     [Sum of Qty] = SUM(Table1[Qty])

And then re-try the previous example with the measure, not the column:

     SUMX(DISTINCT(Table1[Product]), [Sum of Qty])

Returns:  35 Yes, the total, again.   But this time, the “Why” is worth paying attention to.

Why:  Let’s step through it.  Remember, for each value of the first param, SUMX evaluates the expression in the second param, and then adds that to its running total.
Distinct Products

Step One:  SUMX evaluates DISTINCT([Table1[Product]) which yields a single-column table of the unique values in [Product]:

 

Step Two:  SUMX then filters the Table1 (not just the [Product] column!) to the first value in its single-column list, [Product] = Apples.

Table1 Filtered to Apples by SUMX
Then it evaluates the  [Sum of Qty] measure against that table, which returns 17.

Steps Three and Four:  The process repeats for Oranges and Pears, which return 13 and 5:

Table1 Filtered to Oranges by SUMX

Table1 Filtered to Pears by SUMX
Last Step:  SUMX then adds the three results it obtained:  17, 13, and 5, which yields 35.

A lot of work to get the same result that the [Sum of Qty] measure can get on its own, but now that you know how it operates, let’s do something else.

And now, the Useful Example!

Let’s define another measure, which is the count of unique stores:

     [Count of Stores] = COUNTROWS(DISTINCT(Table1[Store]))

For the overall Table1, that returns 2, because there are only 2 unique stores.

Let’s then use that measure as the second param:

     SUMX(DISTINCT(Table1[Product]), [Count of Stores])

Distinct Products

Step One:  same as previous example, get the one-column result from DISTINCT:

 

Step Two:  filter to Apples, as above:

Table1 Filtered to Apples by SUMX

…and the [Count of Stores] measure evaluates to 2 – 2 unique stores have sold Apples.

Step Three:  Oranges

Table1 Filtered to Oranges by SUMX

…again, the measure evaluates to 2.  2 unique stores sold Oranges.

Step Four:  Pears

Table1 Filtered to Pears by SUMX

…hey look, only one unique store sold Pears.  So the measure evaluates to 1 here.

Last Step:  Add them all up.  2 + 2 + 1 = 5.  SUMX returns 5.  This basically means that there are 5 unique combinations of stores and products that they sell.

Why is that useful?

Well, I can’t share the precise case I was working on, because it belongs to a reader’s business.  But trust me, you are going to find yourself wanting this sooner or later.

Things to keep in mind

  1. SUMX responds to pivot context just like anything else.  So if you slice down to just a particular year, your results will reflect only what Stores sold in that year.
  2. AVERAGEX, MINX, MAXX, and COUNTAX all work the same way.  So if you want to iterate through just like SUMX but apply a different aggregation across all of the steps, you can.  Those would return (5/3), 1, 2, and 3, respectively in our example.
  3. The fields referenced in SUMX do NOT have to be present in your pivot view.  In my case, SUMX was working against [Store] and [Product].  But my pivot could just be broken out by [Region] on rows and sliced by [Year], and the measure still works.  (I like to think of it as a stack of invisible cells underneath each pivot cell that you can see, and SUMX is rolling up a lot of logic across those invisible cells to return a simple number to the top cell you can see.)

More to come!

Yeah, I am not even done with SUMX.  Like Jules told you, it’s some serious gourmet DAX :)

16 Responses to SUMX() – The 5-point palm, exploding fxn technique

  1. Frank says:

    Super-cool, but why does SUMX behave differently when the second param is a measure? You show it filtering the table revealing the other table columns in the almost-useful example – but why does this not happen in example 3?

    • I had the same head-scratching moment. In fact, if example 3 had worked for me the first time I tried it, I would have discovered examples 4 and 5 sooner.

      I suspect that no matter how much we select and filter, measures ALWAYS have access to all of the columns. But columns are either there or they are not. We’ll have to test that theory on other cases to see if it is true.

      As intrepid explorers of the new frontier, I now realize that we are halfway learning DAX, and halfway discovering it. It’s simply way too rich for any one person to grasp entirely yet, even for members of the product team.

  2. I think that this is one of the most significant DAX posts to date. I have a lot of measures using SUMX. I’ve struggled to work out some of these measures (and got help with many of them) because the crucial facts about the inner workings of SUMX have never been well documented (until now). Thanks a million.

    I’ve observed that folks on the DAX team have a preference for using VALUES in SUMX formulas (except in measures that use COUNTROWS). Although I understand how VALUES work, I find it to be one of the most unintuitive names for a function that I’ve come across. There’s no hint in the name that the function has anything to do with distinct values. If the only difference between VALUES and DISTINCT is that VALUES handle unknown members, could this not have been a parameter in the DISTINCT function? OK, OK, this is a topic for another discussion :)

  3. [...] a sum of Table1[amount] over a filtered table, more on sumx at this blog post of Rob [...]

  4. [...] a sum of Table1[amount] over a filtered table, more on sumx at this blog post of Rob [...]

  5. [...] the MIN function, the function MIN only takes a column, not a measure. So again we resort to the The 5-point palm, exploding fxn technique in this case [...]

  6. [...] a DAX function that needed the The 5-point palm, exploding fxn technique as described by Rob in his PowerPivotPro blog post.  I have used SUMX and COUNTX with success a few times before but this time I had a hard time [...]

  7. [...] had a problem that needed The 5-point palm, exploding fxn technique as described by Rob in his blog post.  I have used SUMX and COUNTX with success a few times before but this time I had a hard time [...]

  8. [...] CALCULATE, ALL, and maybe even SUMX (in that order!) before digging into the DAX chapters.  The book introduces those functions in [...]

  9. [...] be a MAX at the lowest level of the pivot but then a SUM at higher levels?  If so, I recommend you try SUMX [...]

  10. 100tsky says:

    Hi, Rob

    If we try Useless Example #2: By a single column
    SUMX(Table1[Product], Table1[Qty])

    in calculated column we do not get an error, but get
    =Table1[Qty] for each row *Count(Table1[Product]) for all table
    can you explain this behaviour please!

    Thank you

  11. Frank says:

    I believed in DAX, but the following is weird:

    The Usefull Example ‘logically’ returns 5.
    [Count of Stores] = COUNTROWS(DISTINCT(Table1[Store]))
    SUMX(DISTINCT(Table1[Product]), [Count of Stores])

    However, the following measure, which IMO is actually the same, returns 6.
    SUMX(DISTINCT(Table1[Product]), COUNTROWS(DISTINCT(Table1[Store])))

    And what is even more strange:
    Apples=2
    Oranges=2
    Pears=1
    but Total=6
    and there are no blanks …

    Maybe, you have an explanation!?

  12. Mahmoud Fouz says:

    Experiencing the same problem that Frank mentioned. Getting a different value if a measure is used or if the same formula is written out in sumx. Looks like a bug or is there an explanation for such a behavior?

    • powerpivotpro says:

      Mahmoud and Frank, here’s the “answer.” But I agree that it’s weird, and when I’m done with the answer, I’m not sure anyone will feel much better.

      1) The “X” functions (and the FILTER function) actually *create* a row context for each step of their evaluation. In other words, inside an X function or a FILTER function, your formulas behave like calculated columns rather than like measures.

      2) And, in a calculated column, an aggregate function like COUNTROWS will automatically “include” the entire table. Go try that out in a calc column right now – it doesn’t operate against the current row like a regular arithmetic expression like [Col1] + [Col2]. It counts the entire table.

      3) So that “explains” why Frank’s SUMX(…COUNTROWS…) formula returns a bigger number than expected. But why does [Count of Stores] return a smaller number?

      4) Go back to step 2 and the calc column you wrote. Now “wrap” your COUNTROWS in a CALCULATE. Literally just do CALCULATE(COUNTROWS(Table)). No filter inputs in the CALCULATE. See what happens.

      5) Now you get 1 for every row in your calc column! WTF??? (What the Formula???)

      6) That’s because CALCULATE takes a row context and turns it into a filter context. In other words, the aggregation function (COUNTROWS in this case) starts behaving like it would in a measure – and our filter context is just this row, so when COUNTROWS runs, it only sees one row.

      7) One last twist! When you reference a measure by name, that IMPLICITLY introduces a CALCULATE behind the scenes! Yes, even though your original measure did NOT use a CALCULATE function, when you reference a measure by name, inside a FILTER or an X function, now you get a “magic” CALCULATE introduced behind the scenes.

      So that all REALLY sucks doesn’t it? SUPER confusing. I hate it. But the good news is, you don’t really have to know all of that in order to do 99% of the cool things you want to do. I don’t encounter this problem very often, and it doesn’t hint at other problems lurking below the surface. This is by far the weirdest thing in DAX, in my experience.

      • Frank says:

        Rob, I know about row context and that CALCULATE – whether it is hidden or not – turns row context into filter context. But in the pivottable the ONLY difference is in the bottom line, the total, where there is no filter on rows.

        ad 2) if it counts the entire table the result would be 8. But it is 6, obviously the number of distinct stores multiplied by the number of distinct products. But, being an iterator (!) SUMX should give back 5.

        For me, it is still not clear, why the some of the parts here is 6. Why 6 and not 8 or maybe 7?

        6 only makes sense without a row context, i.e. 2 distinct stores for each row independant of the product.

        Anyway, I appreciate your comments and will be waiting for further hints as time goes by.

  13. Matt says:

    I love you for this. Finally, the clouds have lifted and I see the light….

Leave a Comment or Question