Counting “Active” Rows in a Time Period: Guest Post from Chris Campbell

  • image

  • “There’s a Fight Club up in Delaware City.”

  •     “Yeah, I heard.”

  • “There’s one in Penns Grove too.”

  •     “Bob even found one up in New Castle.”

  • “Did you start that one?”

  •     “No, I thought you did.”

 

 

It Keeps Spreading Smile

About a week ago I was talking to Chris Campbell and some of the other folks at Blue Granite.  Chris mentioned that he has been teaching some PowerPivot classes at the Microsoft Technology Centers, sometimes even in my neck of the woods, but I didn’t know until he told me.

Which, of course, instantly reminded me of the scene above in Fight Club.  I’m sure everyone else makes the same connection right? Smile with tongue out

Anyway, Chris asked if I would be interested in him writing a guest post and I said heck yeah!  So, without further delay, I give you the PowerPivotPro.com debut of Chris Campbell.

-Rob

The Problem

Recently, a customer sent me a question regarding a DAX problem they were working on. They have a Members table in their model that includes attributes of “Start Date” and “End Date” for each member. The question they needed to answer was “How many active members did we have in [fill in the blank]?” I thought this was a pretty interesting question and it seemed like it ought to be pretty easy to do in DAX.

They were trying to get it to work using relationships between the start and end dates columns on the Members table and their Date table. Their date table also has “Month Start” and “Month End” columns. They were trying to use these columns in conjunction with the RELATED function to determine if the member was active by comparing the member start and end dates to the month start and end dates.

How I Reproduced the Problem

I’m not local to this particular customer and their model is large and I didn’t have easy access to it. It seemed easier to try to replicate their issue with a dataset I know well and I thought it might be useful to include that process here. If you’re not interested in that and just want to get to the solution, you can skip this section. If you’d like to see my thought process, read on.

I emulated the model they built using the ContosoRetailDW dataset that is available from Microsoft here. The Contoso model includes a Promotions table (DimPromotion) that includes a start and end date column for each Promotion. I used that table as the basis for my example. To replicate what the customer had on their table, I added a couple of calculated columns to create the MonthStart and MonthEnd month. Then I related DimPromotion to DimDate on the Promotion StartDate in order to have a similar model to what my customer had.

Here’s what it looked like:

clip_image002

The DimDate table in the Contoso dataset doesn’t have a column with just the month number (e.g. 4 for April) so I created a column for that. Then adding MonthStart and MonthEnd to the DimDate table was simple with a couple of calculated columns. I supposed I could’ve embedded the calculation for MonthNumber in the other two columns but I like to compartmentalize things.

MonthNumber

=MONTH([Datekey])

MonthStart

The MonthStart column just uses the DATE function, with the Year

=DATE([CalendarYear], [MonthNumber], 1)

MonthEnd

=EOMONTH([MonthStart],0)

The DimDate table (with other columns hidden) ends up looking like this.

clip_image004

The MonthNumber column makes use of the MONTH function to extract the month number from the Datekey column. The MonthStart column uses the DATE function to take the values from CalendarYear and MontNumber columns plus the value “1” to get the first date of the month. Finally, the MonthEnd column uses the EOMONTH function to get the last date of the month.

The last thing I did was to create a hierarchy on the calendar columns in the DimDate table so I could drill from Year to Quarter to Month.

clip_image006

This gave me a working environment that looked enough like their model that I could replicate their issue.

On to the Real Problem

I converted the DAX from the original “Active Members” measure that I got from my customer to use the DimPromotion table in my new model.

The DimPromotion table looks like this:

clip_image008

In order to answer the “how many promotions were active in [fill in the blank]?” question, the measure uses the relationship between the DimDate table and the DimPromotion table to get the start and end date for the month. It then compares them to the start and end date for the promotion.

Active Promotions Wrong:=CALCULATE(COUNTROWS(DimPromotion),
   
FILTER(DimPromotion, (DimPromotion[StartDate] <= RELATED(DimDate[MonthEnd])
        && DimPromotion[EndDate] >= RELATED(DimDate[MonthStart]))))

The first thing I had to do was think through the logic. To determine if a row is “active” we have to check to see that the promotion started before the last date of the month and did not end before the first date of the month. That’s what the FILTER function does in the expression above. The RELATED function follows the relationship between the tables and retrieves the MonthEnd and MonthStart values.

Browsing this measure in a pivot table gives you this:

clip_image010

The problem is that the start and end dates are not necessarily (or usually) in the same month that you may be slicing on. However, when you put this measure in a pivot table, a row only gets counted if the promotion start date is in the same month as the related row on the DimDate table. For example, in May 2008, the pivot table only shows 1 promotion active. If you look at the data in the DimPromotion table, you will find that there are actually 2 promotions that were active. The “No Discount” Promotion has a start date of 1/1/2003 and an end date of 12/31/2010 and the “Asian Summer Promotion” has a start date of 5/1/2008 and an end date of 6/30/2008. Both of these should be counted as “active” in May of 2008.

How to fix it? Get rid of the relationship.

I’m a database guy so I like my data to have relationships. So much so that I spent a bunch of time trying to make this work keeping the relationships intact. However, sometimes in PowerPivot, working without a relationship is the appropriate way to handle an issue. Thanks to Josh Fennessy (b | t) for the conversation around this. Both Rob and Kasper have a number of posts that cover this concept too, like here and here.

So finally I thought, “Oh! If I just remove the relationship, it’ll work!” But that wasn’t exactly right because without the relationship, I needed to get the first and last dates of the month some other way. Fortunately, DAX comes to the rescue with a couple of handy functions: FIRSTDATE and LASTDATE. They return the first and last dates in a column of dates constrained by the current context.

The new version of the Active Promotions measure looks like this:

Active Promotions:=CALCULATE(COUNTROWS(DimPromotion),
    FILTER(DimPromotion, ([StartDate] <= LASTDATE(DimDate[Datekey])
        && [EndDate>= FIRSTDATE(DimDate[Datekey]))))

Using this new measure in the pivot table gave me the result I was looking for:

clip_image012

It also has the added benefit of not being tied to a month. I can slice the pivot table by any level of my calendar hierarchy and get a correct result thanks to the FIRSTDATE and LASTDATE functions.

This measure probably isn’t perfect yet. It assumes that there will always be a start date and end date for every record. In many cases, end dates are left empty for an item that is still active. Changing the expression to include an IF function to check for that should be an easy matter.

One last note on removing relationships. You will have to deal with the nagging “Relationships between tables may be needed” message. In PowerPivot 2010, you can turn this off by disabling automatic relationship detection. In PowerPivot 2013, you have to click the “X” button in the message box.

clip_image014

10 Responses to Counting “Active” Rows in a Time Period: Guest Post from Chris Campbell

  1. Eric Hutton says:

    I’ve been trying to do something like this – you just saved me a big headache!! I probably would never have gotten over the hump of losing the relationship to make it work. If this article were by Rob, that would have been the place for this:

    Kenobi: [gets up and takes a blast helmet] I suggest you try it again, Luke. Only this time, let go your conscious self and act on instinct. [puts the helmet on Luke, which covers his eyes]
    Skywalker: But with the blast shield down, I can’t even see! How am I supposed to fight?
    Kenobi: Your eyes can deceive you. Don’t trust them. [remote shoots Luke] Stretch out with your feelings! [Watches Luke succeed in blocking the lasers] You see? You can do it.

  2. Chris Campbell says:

    Glad it helped! You guys are killing me…a Fight Club quote AND a Star Wars quote in one post! I’m in nerd heaven!

    • Eric Hutton says:

      You also made me my ears perk up when you said “create a hierarchy on the calendar columns” – a what now?
      I had to go look that up – and now I know why I have never seen that feature: it can only be accessed from the diagram view, and I rarely use diagram view (I had sheets that consistently crashed when trying to enter diagram view, probably in an older version of PP). I can’t see why this feature is not also made accessible from the data view.

  3. Chris Campbell says:

    Yep, hierarchies are a great new feature that was added in V2. I guess the reason they’re only available in the diagram view is to make the editing experience easier.

  4. Chris, nice post! Good topic for discussion – anything involving a table with 2 date columns is a pain!

    This is untested but I’m pretty sure that this small alteration means your measure works with the relationship active (but effectively ignored):

    = CALCULATE(
    COUNTROWS(DimPromotion),
    ALL(DimPromotion),
    FILTER(ALL(DimPromotion), [StartDate] = FIRSTDATE(DimDate[Datekey]))
    )

    Its effectively a reproduction of Rob’s ‘Greatest Formula in the World’.

    Jacob

  5. Damn thing cut off part of my formula but I you get what I mean……..

  6. OK, 3rd time lucky? Tested my version and it didn’t work, something to do with the way FIRSTDATE()/LASTDATE() work that isn’t the same as the MIN()/MAX() I usually prefer:

    =CALCULATE(
    COUNTROWS(DimPromotion),
    ALL(DimPromotion),
    FILTER(ALL(DimPromotion), [StartDate] #= MAX(DimDate[Datekey])
    && [EndDate]~= MIN(DimDate[Datekey]))
    )

    NB I swapped the less than and greater than for # and ~ respectively to avoid formula being curtailed by the site.

    Jacob

  7. Carsten B. says:

    That leads to one of my favorite questions:

    Is the first All(DimPromotion) really necessary?

    All(DimPromotion) returns all rows of the DimPromotion table.

    Filter(all(DimPromotion)..) returns a subset of all rows of the DimPromotion table.

    Both combined will always return only the rows defined by the second filter, the Filter(All..) condition.

    It would be different if you were just filtering one column, like (Filter(All(dimPromo[StartDate])…), because this would just return values for one column, so that filters from the inital filter context could still be valid.

    However, if you are filtering the whole table, than the whole table will be returned as a filter condition, i.e. values for each and every column, effectively overriding/clearing any other filter in the Initial filter context. In a way, filter(table) is like Select t.* from table t where t.[c]=… .

    Carsten.
    P.S.: I hope I got this right, my apologies for being so stubborn Rob, maybe I will try and get some enlightenment from Italy ;)

  8. Michiel Rozema says:

    Hi all, don’t know if I’ve ever left a reply here, but been a reader since the early days…

    In my book on PowerPivot (in Dutch) I wrote about a similar case with three relationships between a Sales table and Date table (DateOrdered, DateDelivered, DatePaid), answering the question about sales orders delivered but not yet paid. The relationship on DateOrdered is active.

    My approach is two-step, calculating orders delivered, and paid, respectively, since the beginning of time :-) until the last date in the current context (which is sliced on DateOrdered). Payments due is the difference between the two.

    OrdersDelivered:=CALCULATE(COUNTROWS(Sales),USERELATIONSHIP(Sales[DateDelivered],Date[Date]),DATESBETWEEN(Date[Date],BLANK(),LASTDATE(Date[Date])))

    OrdersPaid:=CALCULATE(COUNTROWS(Sales),USERELATIONSHIP(Sales[DatePaid],Date[Date]),DATESBETWEEN(Date[Date],BLANK(),LASTDATE(Date[Date])))

    regards,
    Michiel

  9. Scott Barcz says:

    I am trying to determine a slight variation on this problem. First, knowing show many member were “active” during specific promotion periods. Adding one setup, is there any easy way to CALCULATE how many days during the promotion period the member was active?

Leave a Comment or Question