New Customers Per Day Generalized to “New Customers per Month,” etc.

 
 
A Generalized New Customers (or unique visitors) in Time Period - per Month, Year, Etc. in PowerPivot

A Generalized “New Customers in Time Period” Solution, Inspired by Tuesday’s Post

David Hager’s post on Tuesday really planted a seed in my brain.  And then a comment on that post from Charlie got me thinking further.

How can we extend the “New Customers per Day” concept to become “New Customers in <Any Period of Time>?”  New Customers per Month for instance.


Step One:  Make That Calendar Table!

I think it’s a general habit we should all get into:  if you’re using the Time Intelligence functions in DAX (like DATESBETWEEN), it is going to be MUCH safer all the way around if you create a separate Calendar/Date table first.

Quick technique for doing this is to use a flat pivot:

image  image

Create a Flat Pivot, Use All Date-Related Columns from Your Original Table on Rows,
Turn Off Subtotals, then Copy the Pivot

image

Paste as New Table in the PowerPivot Window

Yielding…

image

Calendar Table in PowerPivot

Which I then relate to my original Table1…

image

Step Two:  Measures!

[DistinctCustomers] =
DISTINCTCOUNT(Table1[CustomerID])

[PreExisting Customers] =
CALCULATE([DistinctCustomers],
          DATESBETWEEN(Calendar[Date],
                       0,
                       FIRSTDATE(Calendar[Date])-1
                      )
         )

The first measure is just a distinct count of customer ID, and the second one calculates how many such disctinct customers “existed” from date 0 (the beginning of time, heh heh) up through the last day before the selected time period.

(Note how we do not need to use the ALL function in the second measure, unlike in Tuesday’s post.  This is because of the separate Calendar table we’re using – but that’s not the only benefit of the separate Calendar table).

So if I am looking at March 2009 on my pivot, FIRSTDATE(Calendar[Date]) returns March 1, 2009 and then I subtract 1 to get Feb 28, 2009 – the last day before the selected month started.

Here it is:

A Generalized New Customers (or unique visitors) in Time Period - per Month, Year, Etc. in PowerPivot

The number of pre-existing customers can never go DOWN over time, so the fact that this shows a nice smooth increase is a good sign.

Now I add another measure:

[Customers Life to Date] =
CALCULATE([DistinctCustomers],
          DATESBETWEEN(Calendar[Date],
                       0,
                       LASTDATE(Calendar[Date])
                      )
         )

It’s just like the [PreExisting Customers] measure except for that highlighted portion – we extend up through the last date in the selected time period.

A Generalized New Customers (or unique visitors) in Time Period - per Month, Year, Etc. in PowerPivot

And then our last measure:

[New Customers in Period] =
[Customers Life to Date] – [PreExisting Customers]

A Generalized New Customers (or unique visitors) in Time Period - per Month, Year, Etc. in PowerPivot

Portable!

If I want to use Days or Years on rows instead, I just rearrange the pivot, and the formulas still work:

A Generalized New Customers (or unique visitors) in Time Period - per Month, Year, Etc. in PowerPivot

Or, say, Year:

A Generalized New Customers (or unique visitors) in Time Period - per Month, Year, Etc. in PowerPivot

Always want to emphasize this as a HUGE benefit of PowerPivot:  PowerPivot formulas are portable.  They adapt to new report shapes with no additional effort.

Download the New Workbook

Workbook for this post is available here.

17 Responses to New Customers Per Day Generalized to “New Customers per Month,” etc.

  1. Siraj Samsudeen says:

    This is cool.The table really look very nice with conditional formatting.

  2. Kevin says:

    Every time I try this I get an error of “MdxScript(Model)(7, 11) Calculation error in measure ‘Table1′[Preexisting Customers]: an invalid numeric representation of a date value was encountered”.

    My date values are coming directly from SQL Server, so I believe they should be valid, and I have copied and pasted the formula directly from your example. Any idea on what could be causing this?

    • powerpivotpro says:

      In the powerpivot window have you verified that the data type is indeed date/time?

      • Kevin says:

        Thanks for the response!

        Yes, the date column in both my fact table and my date dimension table are both set to ‘Data Type: Date’. My date dimension table is also set as the ‘Date Table’, and other time based functions like TOTALYTD/TOTALMTD are working correctly with it.

        • powerpivotpro says:

          Well at this point I’d suggest sending me the workbook, if that is an option. rob. at a place called pivotstream. dot com.

        • Kevin says:

          I think I may have got this working. It appears there was an issue with my Calendar table and missing dates (it was built off of distinct dates in my fact table). I pulled in a date dimension from an un-related DW that has contiguous dates between 1900-2032, set it up to use that, and I no longer get the error. That said, when I try to display distinct/pre-existing for actual days (there are about 1000 distinct days) Excel then eats up 6-7GB of memory and doesn’t seem to get anywhere after 15 minutes of processing. The same thing works for display those measures for months though.

          • powerpivotpro says:

            Ah nevermind my previous comment then about sending me the workbook. Yes I have noticed the same problem – with a lot of different values on rows (like dates), it can be very slow. DISTINCTCOUNT is much better than COUNTROWS(DISTINCT()) in this regard but it’s still doing a *lot* of work.

  3. Shawn says:

    The dates need to be distinct too. Chances are you might have customers that were created at the same time.

    A date column containing duplicate dates was specified in the call to function ‘DATESBETWEEN’. This is not supported.

    • Kevin says:

      With regard to the distinct dates, I think that’s why he used a pivot of the existing dates – which would be distinct. I did run into an additional step that was required though, as our column is a datetime – so I had to drop the time portion.

      By the way, Rob – I tried ordering your book through MrExcel the other day, both the ebook and physical copies. Were you aware that they charge a minimum of $135 for international shipping?!?

  4. Jude Hoffner says:

    I’m getting the same error message as Kevin in January, Every time I try this I get an error of “MdxScript(Model)(7, 11) Calculation error in measure ‘Table1′[Preexisting Customers]: an invalid numeric representation of a date value was encountered”.

    Everything on my date table checks out. Cannot for the life of me figure out what is driving this. Does the date table have to be of the exact same date range as my Orders table? Any help is appreciated.

    • powerpivotpro says:

      Just from reading the resolutions in the comment thread above, between Kevin and Shawn, it sounds like you should:

      1) Make sure you calendar table is indeed using a Date data type
      2) Remove any time component of your dates
      3) Make sure there are no gaps and no duplicates in your Calendar table
      4) Make sure you are using fields from your Calendar table on the pivot, and NOT date related fields from your data table.

    • powerpivotpro says:

      Failing that you can also send me the workbook and I will take a look. rob. at a place called powerpivotpro. dot com.

      • Curtis says:

        I know the comments were posted quite a while ago but I came up against this same problem and tried all of the suggestions listed above, None of these resolved my issue “an invalid numeric representation of a date value was encountered”. I managed to find that using BLANK() instead of 0 as the StartDate argument resolved this issue. This maybe of use to someone else.

        =CALCULATE([DistinctCustomers],
        DATESBETWEEN(Calendar[Date],
        BLANK(),
        LASTDATE(Calendar[Date])))

  5. wyl2000 says:

    Is it possible to calculate the sales contributed by new/ returning customers then?

  6. grumelo says:

    The Date Table is key for this solution but most of the time you use a predefine Date Table not one extracted from your data.

    To avoid calculation of [PreExisting Customers] and [Customers Life to Date] for periodes where there is no data it may be useful to check if there are data first.

    I’m suggesting these two new fomulas:

    PreExisting Customers:=if(COUNTROWS(Table1)>0,CALCULATE([DistinctCustomers],
    DATESBETWEEN(DimDate[Date],
    0,
    FIRSTDATE(DimDate[Date])-1
    )
    ),blank())

    Customer Life to Date:=if(COUNTROWS(Table1)>0,CALCULATE([DistinctCustomers],
    DATESBETWEEN(DimDate[Date],
    0,
    LASTDATE(DimDate[Date])
    )
    ),[PreExisting Customers])

    With these formulas you can use a “standard” Date Table.

Leave a Comment or Question