Writing a Subtotal Calc Column, AKA The Simplest Use of the Earlier() Function

 
I am a bit behind the 8-ball today, so here’s a quick topic.  It’s actually an excerpt from an older post, but one that has long deserved to be its own standalone post.

If you want to calculate a column in a table that is the total of all “similar” values in that table – meaning the total of all rows that have the same value as the current row for a particular column (or columns), this is what you need.

Say I have the following VERY simple table like this:

image

And I want to add a third column that is the total for each customer:

A PowerPivot column that is the total of all “similar” values in that table – meaning the total of all rows that have the same value as the current row for a particular column (or columns)

The calc column formula for that third column is this:

=CALCULATE(SUM([Amt]),
           FILTER(
‘Table’,
                  ‘Table’[Customer]=EARLIER(‘Table’[Customer])
                 )
          )

When I am writing a calculated column and use the FILTER function, within the FILTER function, all of my references to columns in ‘Table’ will have “forgotten” all notion of “current row” and will instead be references to the entire column.

I repeat:  within the FILTER function, references to columns from the table being FILTERed (the table specified in the first parameter to FILTER), will not know what the current row is.  Column references outside of the FILTER function will remember the current row, just like they always do.

So the EARLIER() function is my escape hatch, to be used within the FILTER function, that allows me to go back and inspect the current row’s value.

This line of the formula:

  ‘Table’[Customer]=EARLIER(‘Table’[Customer])

Can be understood as:

  ‘Table’[Customer]=CurrentRow(‘Table’[Customer])

EARLIER is Probably Best Understood as CURRENTROW

In fact that’s a better name for EARLIER 99% of the time.  Just think of it as a CURRENTROW function, useful only in calculated columns, and only when you are performing FILTERS on ALL(Table), when you need to “jump back out” of the ALL and fetch a value from the current row.

Yes, it IS useful in other cases.  But I suspect that those other cases are rare enough that rather than a general purpose function like EARLIER, we should have been given a dedicated, easy to understand version named CURRENTROW.  Or maybe we should have both.  Yeah, both.  That would be good.

22 Responses to Writing a Subtotal Calc Column, AKA The Simplest Use of the Earlier() Function

  1. John Bradley says:

    Hi Rob,

    I have used this function quite often but I have never used the ALL portion of the function and get the same result. It seems that you are really highlighting the ALL here. =CALCULATE(sum([Amt]),filter(Table1,Table1[Customer]=EARLIER(Table1[Customer]))) What is the significant difference if any between this formula and with the all if the calculated column provides the same results? Would love some clarfication.

    • powerpivotpro says:

      Serves me right for excerpting an old post without double checking. There is no benefit to the ALL(). I will revise the post :)

      There *may* have been a reason I needed ALL() in the larger context of what I was doing before – I will also have to go check that.

      Thanks for catching this so quickly John.

    • powerpivotpro says:

      Fixed. All better :) Thanks again John and sorry for the confusion.

  2. Siraj Samsudeen says:

    Rob, this is brilliant. Yes, I also felt many times in the past that this is the way to just reference the current row. You have just expressed what was on my mind.

  3. Henson says:

    Awesome! Hadn’t actually used EARLIER() before, but I can now see several instances of where I could (should have.) One question — The customer column would need to be sorted/ordered for this to work right?

    • powerpivotpro says:

      Nope. Sort is not needed. As a blanket statement, sorting or filtering applied in the PowerPivot window NEVER impacts formulas (or what you see in pivots). Sort and filter in the PowerPivot window should be thought of as tools to help you navigate and inspect your data, nothing more.

      • Henson says:

        Wow.. I even changed the customer order in the data to a, b, a, b and it still works. Because of the filter. I get what you mean now by saying that EARLIER can be thought of as CurrentRow. Thanks!

  4. Oleg Gvozdenko says:

    Until now, my formula, in your example, would have been:

    =CALCULATE(SUM([Amt]),
    ALL(Table),
    Table[Customer]=EARLIER(Table[Customer]))

    It was working fine, as long as I operated on a single table. Recently, I had to do a similar formula, but for values residing in a lookup table. I would get a “current context” error without FILTER() in my formula, so I had to do a major workaround, adding 10′s of MB’s to the size of the workbook.

    I just tried your FILTER() around EARLIER() trick and it resolved that problem, not 100% why it works, but I guess it has something to do with FILTER() creating a virtual table to identify needed rows in related tables, instead of regular CALCULATE() relying on the existing relationships between tables and therefore delivering a current context error.

    • powerpivotpro says:

      Oleg – I think for most of us, DAX gets a little mysterious out on the fringes. In a different workbook, I myself am looking at two formulas right now and wondering why one works and one doesn’t.

      I think the answer to your particular question lies shrouded in the mists of “row context vs. filter context.” Since FILTER() operates on a “one row at a time” basis, it actually DOES have a row context available when you do comparisons (within the second param of FILTER).

      But I fully admit that the nuances of writing CALCULATE() formulas in calc columns, especially across tables, represents “soft ground underfoot” for me in general.

      There’s also the LOOKUPVALUE function, which i have not used extensively at this point in time but is probably relevant here.

      • Oleg Gvozdenko says:

        Yes, I tried LOOKUPVALUE() and it works great and is a much more elegant formula with understandable syntax, unfortunately I am stuck in PP v1 at work until a corporate-wide upgrade, so I could only test on a home laptop.
        So, for now, FILTER( EARLIER()) it is!

  5. MikeG says:

    I have this formula in my notes as the equivalent to Excel’s SUMIF or SUMIFS.

    I used this same approach on a project and added an additional filter expression in the filter function. In my case, I wanted to sum all amounts for a particular “ticket” but limit the sum to payments up through a certain date. (their can be more than one payment related to any one ticket and the data contains payments for all dates )

    So, I created a “date flag” for each row/payment.

    I’m going of memory here (workbooks is at the office):

    =calculate(sum(AMT),filter(Table,Table[TicketID] = earlier(Table[TicketID])) && Table[DateFlag] “FuturePmt”))

    This worked and I was so happy and blown away when it did.

    However, this workbook has two tables with roughly 700k rows each. Once I added this calculated column, I could no longer refresh with my data source (an OLAP cube) and kept getting memory allocation errors. Has anyone else had memory issues with a workbook of this size?

    I know PowerPivot is supposed to handle “millions” of rows but I’m pretty bummed with its failure in this case. I don’t recall the specs now but my machine is one of the more robust machines in my office..not sure what the issue is.

    • Oleg Gvozdenko says:

      If it is only one formula that breaks the file, you may want to try deleting that formula before database refresh, and then pasting it back once you bring in the data. System tries to bring in the data and do the calc column calculations at the same time. If you split the two, you might just squeeze throught the limitations of your system.
      But this will probably only be a short term solution, since it sounds like you are on a 32 bit system and you will be bumping into this problem as your datasets grow. If you are serious about PowerPivot, upgrade to 64 bit.

  6. David Hager says:

    The EARLIER function is very memory intensive. To use it with a “big” PowerPivot workbook, you really need to have a 64-bit system.

  7. circledchicken says:

    Hi,

    This was a helpful example for me, thankyou.

    In this single table calculated column context, I think the following SUMX version returns the same result for me:

    =SUMX(
    FILTER(Table1, Table1[Customer] = EARLIER( Table1[Customer]) ),
    Table1[Amt])

    Do you know if this is going to be slower than the CALCULATE version? Or is there another advantage to using the CALCULATE version in this case?

    • powerpivotpro says:

      I believe sumx is probably slower. Both formulas use FILTER which is a slow, row by row iterating function. But in my version, once the FILTER is done, the SUM is allowed to take advantage of all of powerpivot’s compression and indexing shortcuts. SUMX by contrast is yet another row by row iterator. But this all goes a bit beyond my confidence level so lets see if anyone else weighs in.

      • circledchicken says:

        Ok, thanks for your reply. It does make sense, although I would have thought that internally SUM would still need to iterate over the values? Or perhaps it only needs to iterate over distinct values as per the shortcuts you mentioned. (I know very little about the topic so I might be talking rubbish…!).

        • powerpivotpro says:

          Yeah it might be able to take advantage of the distinct values. The other thing is that I’m pretty sure SUMX runs in the formula engine rather than the storage engine – the latter of which is quite a bit faster. (SUM runs in the storage engine). But really this is all just informed speculation. Over time, MS is improving PowerPivot so that more and more expressions/functions are being handled in the storage engine, This conversation is trying to hit a moving target behind a cloud of smoke :) We need to try it on a large data set and time it :)

          • circledchicken says:

            That’s Interesting, I’ll try and learn more about the storage engine vs formula engine.

            Haha, that’s a good point. I tested it and I think you are correct. I generated random letters for customer name and assigned random values between 1 and 50 to each.

            For 1 million rows the times seemed similar. For 2 million it also seemed similar. For 3 million I got ‘Memory error: Allocation failure’, using the SUMX version (I am using a 32-bit version). The CALCULATE version had no problems (returned in a couple of seconds).

            Not exactly accurate measurement…but it illustrated your points for me.

            Many thanks for your help.

          • powerpivotpro says:

            GREAT test! Thanks for reporting back :)

  8. Jean-Pierre Girardot says:

    Hi, may be another quicker way –
    =calculate(sum(Table1[Amt]);allexcept(Table1,Table1[Customer]))

  9. Patricia says:

    Thanks for your post! How can I obtain the same results with a calculated field?
    I’ve tried something similar to the following on a large dataset and I’m encountering performance issues. Is there a better way of getting to the same results?

    =SUMX(‘Table’, CALCULATE(SUMX(FILTER(‘Table’,
    ‘Table’[Customer]=‘Table’[Customer]), ‘Table’[Amt])
    )
    )
    )

    Thanks!

Leave a Reply