Converting from Peaks to Frequencies

February 16, 2012

image32

Sniff something? Did ya, rat boy?”

Continuity!

Folks I just can’t resist a quick followup on the Rat Sniffing Project.  I…  just… can’t.  Plus I am absolutely worn down and don’t have the energy tonight for anything that isn’t inherently entertaining.

All of those calculated columns from the last post, remember, were just the setup so we could start doing some REAL observational stuff.

You know, something like this:

Measuring Event Frequency in Hz in PowerPivot

This wasn’t difficult at all.

Remember that I now had a column that flagged each row as “1” if it represented the peak of an inhalation:

image

Each Row is 0.01 second, Flagged as 1 if
it’s a Peak Inhale, aka a “Sniff”

So now I really just need some measures.

The First Measure I’ve Ever Written in the Unit of Hertz!

(Hmmm.  There’s a joke in here somewhere, the first line of which is something like “what do you do if your unit Hertz?”  Listen, I told you I was tired.)

Since I have a 1 in the RobPeak column, summing it will yield the number of peaks in an interval – I don’t need to do a CALCULATE(COUNTROWS()) with a filter set to 1.

So the measure formula for Hz (events per second) is:

(SUM(Data[RobPeak]) / COUNTROWS(Data)) * 100

Why times 100?  Because each row is actually 0.01 second.

Now THIS is a *Time* Table

This is another first:  creating a separate time table that is NOT measured in days.  In this case…  hundredths of a second:

A Time Table in PowerPivot - Not Dates.  Seconds.

Not a Calendar Table – a Time Table.  In  0.01 Second Increments

OK, it’s not a Calendar table but it is VERY similar.  Just like a Calendar table has columns like DayofWeek, DayOfMonth, CalendarYear, etc. – those columns represent properties of dates.  Well, the columns I have here like Second and FiveSecond are the same sort of thing.

And since I have 6 rats in the experiment, it is wasteful to duplicate those properties in my Data table.  Plus there’s that whole speed thing.

So I created this separate table and related it to my Data table.

And this table then “powers” my slicers like these:

Slicers Based on a Time Table Measured in Minutes and Seconds

Slicers Based on a Time Table Measured in Minutes and Seconds

These slicers represent a navigation method more than they do a “filter” in the classic sense.  I say that because fields from the time table are on the axis of the chart as well:

image

The net effect is that I can use my slicers to quickly move around the data and examine “windows” that are interesting or relevant.  Couple of quick clicks and I am now looking at minute 4 instead of minute 3, and just the first 10 seconds of that minute:

image

Quickly “Jumped” to the 4:00-4:10 Interval

It’s ALMOST like the slider control on a YouTube video:

image

YouTube Has a Time Slicer Control Too!

Wouldn’t it be neat if Excel gave us a YouTube-style slicer for time?  We can dream.

Side Note:  Rats Breathe FAST!

Check those charts out.  Between 5 and 9 Hz???  Really?  Those little rascals inhale between 5 and 9 times per second, on a SUSTAINED basis?  Wow.  I’m easily amused I guess, but wow.


Peak Detection: A Surprising Usage of PowerPivot

February 14, 2012

 
image

“Who are YOU, who are so WISE in the ways of SCIENCE?”

It’s all about following through…

In a recent post I covered a very simple scientific scenario.  It was an interesting diversion from the normal biz-style scenarios but it left me feeling hollow in two ways.  First, it was too simple and didn’t account for the possibility of multiple different treatment types, so I pinged the Italians.  (They responded, as they always do, and I owe a post on their reply, probably Thursday).

But the other problem was that, while claiming to be a scientific scenario, it was manufactured by me, and hey, I’m no scientist.  Then it hit me…

Hey!  My Neighbor is a REAL Scientist!

Yeah, Dan Wesson is a “he-runs-his-own-research-lab-at-the-university, is-published-in-prestigious-science-journals” kind of scientist.  So I decided to bounce the question off of him – was my example scenario at all useful, and if not, what WOULD be useful?  Boy, am I glad I asked.

Perhaps My Favorite PowerPivot Visualization Yet

PowerPivot Used for Peak Detection - My New Dining Room Mural

Click for Full Size!

First of all, HOW COOL IS THAT?  It looks like the double helix or something.  I joked with my wife that I was gonna have it turned into an eight-foot mural for the dining room.  (Yeah, I was joking sweetie.  Totally joking, I would never do something like that).

What do the Values Mean?

image

These are measurements of respiration taken from rats in Dan’s lab.  Negative values indicate “inhale” and positive indicate “exhale.”

Each value represents one-one-hundredth of a second – how’s that for time intelligence? Smile

What Does “Peak Detection” Mean?

In order to make sense of the data, Dan needs to measure the frequency of how often the rats “sniff.”  Sniffing is an indicator of all sorts of things apparently – a sign of interest, a sign of cognition, a sign of dominance…  I had no idea.

Anyway, he needs to measure the number of “sniffs” the rats take in a given time frame, and in order to do that, he needs to count the negative peaks in the breathing data:

image

PowerPivot Can Do This?  Yes, Better Than the Professional Tools!

I went into this mostly expecting that PowerPivot was not going to be a good tool for his needs.  And as usual, PowerPivot surprised me.

Not only can PowerPivot do this, it outperforms expensive specialized software.  Software that is designed to do precisely this task.  Let that sink in for a moment:

PowerPivot Outperforms Expensive and Specialized Scientific Software

$10,000 Specialized Scientific Software Often “Misses” the Peaks,
PowerPivot is Dead On

How Did I Do It?

It actually wasn’t that hard.

Given a data set like this with 180 thousand rows:

PowerPivot Six Rats’ Breathing Measured 100 Times per Second for Five Minutes

Six Rats’ Breathing Measured 100 Times per Second for Five Minutes

I needed to generate a 1/0, True/False column identifying whether a given row was a “sniff” peak.

I ended up creating a TimeID column so that I could work in integers:

image

Multiply by 100 So I Can Work in Integer Time Increments

And then I wrote the following calc column formula:

[IsPeakCandidate]

=IF(OR([TimeID]=0,[value]=0),0,
IF([value] > 0,0,   
    IF(
       AND(
           CALCULATE
             (MIN([value]),
             FILTER(ALL(Data), Data[Rat]=EARLIER(Data[Rat])),
             FILTER(ALL(Data),
                Data[TimeID]=EARLIER(Data[TimeID])-1)
             )
           >[value]
           ,

           CALCULATE
             (MIN([value]),
             FILTER(ALL(Data), Data[Rat]=EARLIER(Data[Rat])),
             FILTER(ALL(Data),
                Data[TimeID]=EARLIER(Data[TimeID])+1)
             )
           >[value],

          )
       ,1
       ,0
      )
   )

)

What is that formula doing?

Ok that looks complex.  But all it’s doing is asking three questions:

  1. Is the CURRENT row’s value less than zero?
  2. Was the PRIOR row’s value greater than this row’s value?
  3. Is the NEXT row’s value also greater than this row’s value?

If all three are “yes,” then this row represents a negative peak.

Flaw:  A “Hiccup” Becomes a Peak

Unsurprisingly, it turns out that formula wasn’t quite good enough:

image

In Basketball This is Called This a “Pump Fake”
(In American Football, a “Juke,” in Hockey, a “Deke…”)

So I added another calculated column:

[MinOverInterval]

=CALCULATE(MIN(Data[value]),
           ALL(Data),
           FILTER(ALL(Data),
                  Data[Rat]=EARLIER(Data[Rat])
           ),
           FILTER(ALL(Data),
                  Data[TimeID] <= EARLIER(Data[TimeID]) +5 &&
                  Data[TimeID] >= EARLIER(Data[TimeID]) –5
           )
)

For each row in my data that calculates what is the minimum value over an 11-row interval – five rows before, five rows after, and the row itself:

image

MinOverInterval:  The Smallest Value in the 11-Row Window

One Last Column!

Then, tying it all together, the last calc column is this:

[IsRealPeak]

=IF(
   AND(
       Data[PeakCandidate]=1,
       Data[MinOverInterval]=Data[Value]
      )
   ,1,0
)

This column says “if this row was flagged as a peak candidate already, AND it is the most negative row in its 11-row interval, then we bless it as a REAL peak row.”

And if I use that column instead, that fixes the false peaks:

image

Some Final Notes

First, I suspect that I can simplify my calc columns a bit.  There was some exploratory trial and error in this process and it might be that the “MinOverInterval” test is ALL I need, since there’s no way a row can be its minimum in the interval and NOT be more negative than its immediate neighbors.

But hey, I wanted to get this post out ok? Smile

Second, yes, these calc columns are something I’d ideally want to have done in a database.  But since I want to give Dan a completely self-contained toolset, I had to do it in calc columns.

Lastly, this “peak detection” is NOT the end of the road!  Far from it.  It’s just the beginning.  Now that we have a reliable “peak flag” column in our data, we can start doing the normal PowerPivot thing – measures of frequency, slicing by properties of the rat itself (age, sex, etc.), properties of the trial… 

So I expect Pivotstream will be doing some real work for Dan’s lab, helping support his Alzheimer’s research.  All starting from a quick conversation in Dan’s backyard over the weekend.  How cool is that?


Comparing scientific and other data across trials

January 31, 2012

Enjoying the new forum!

I’ve been spending some time on the Mr. Excel PowerPivot forum over the past few days and I must say I am enjoying the experience.  The blog is essentially a one-way form of communication, given that far less than 1% of views result in comments.

But a forum!  Aha!  That is different!  I encourage you to head over there and drop a question, or maybe even try your hand at answering a question or two when they pop up.

Comparison Across Trials:  A Simple and Useful Example

Here’s one that I particularly liked because it’s practical and ultimately pretty simple on the DAX Spicy Scale after a few posts of higher complexity (but also extreme utility). 

A user over there on the forum named “TopDown” asked a question that I will paraphrase here.  Imagine you have a dataset like this:

Scientific Trial Data in PowerPivot

Measurements Table:  A Series of Measurements Loaded into PowerPivot

Basically you have a set of sensors (111 through 115 in this data) and you take a set of measurements at each sensor (temperature and pressure in this example).  Then you repeat the measurements over multiple experiments – trials 1, 2, 3, 4, etc.

Topdown had a few questions:

  1. How do I compare the results of each trial back to the first trial?  Absolute difference versus trial #1 and percent difference versus trial 1.
  2. Do I have to create calculated columns for the absolute and percent difference for each measurement like temperature?  (Topdown said he has MANY more than two measurements)
  3. And if the answer to 2 is “yes,” then how is PowerPivot useful?  Once I have the calc columns, I can create this in a normal pivot.

I think intuitively, Topdown sensed that this is a great place to use PowerPivot, and just couldn’t put his finger on the how.  He was right.  And I really liked this problem because it is so different from the sorts of problems I work on most of the time.

Solution #1:  Nothing Fancy

Hey, if you ALWAYS want trial #1 to be the baseline, this is pretty easy.  Let’s focus on Temperature for a moment, with the understanding that Pressure will be solved the same way.

First we create a basic measure, AvgTemp:

[AvgTemp] = AVERAGE(Measurements[Temperature])

Then we create a version of that measure that ALWAYS returns the value for Trial 1:

[Trial1Temp] =
CALCULATE([AvgTemp], Measurements[Trial Number]=1)

Put both of those on a pivot with Trial Number on rows:

image

The AvgTemp measure respects the Trial Number filter on rows, just like normal measures do.  But the Trial1Temp measure, by virtue of CALCULATE(), overrides that filter and sets it to Trial 1.  The only place the two measures “agree” is for Trial 1, as expected.

A couple of quick subtraction and division measures follow:

[Delta Temp] = [AvgTemp]-[Trial1Temp]

[Pct Change Temp] = [DeltaTemp]/[Trial1Temp]

Yielding:

image

Again, that first row (Trial 1) is pretty useless, but the others are doing what we expect.  If that’s distracting, we can filter the pivot to NOT display Trial 1, or we can put Trial on a slicer.

No calc columns required.  You just can’t do this in “normal” Excel.

We can stop here and have a quite useful solution, but let’s get just a bit fancier Smile

Solution #2:  Let the report consumer choose which trials to compare!

Folks, I am addicted, yes, ADDICTED to what I call “disconnected slicer tables.”  What do I mean by that?  I mean, tables that I import into PowerPivot for the sole purpose of parameterization.  Let the report consumer make a selection, I “harvest” that selection via a simple formula, and then I feed that parameter into my other measures.

I call these tables “disconnected” because I do NOT create relationships between them and the other tables – they are islands.

Here’s one I just pasted into the model.  Note that TrialSlicer2 is identical.

image

TrialSlicer1 Disconnected Slicer Table
(TrialSlicer2 Table is IDENTICAL)

Put both of these new single column tables on slicers and they have ZERO impact on my pivot, since there are no relationships:

image

No matter what I select on those slicers, [AvgTemp] keeps returning 50.65, because there is no relationship to the Measurements table.  The field list is warning me about that, but I don’t want to create one.

Instead, I write measures to “harvest” the user’s selections on those slicers, so that I can use them as parameters:

[SelectedTrial1] =
IF(COUNTROWS(VALUES(TrialSlicer1[Compare To Trial]))>1,BLANK(),
    MAX(TrialSlicer1[Compare To Trial]))

[SelectedTrial2] =
IF(COUNTROWS(VALUES(TrialSlicer2[Compare To Trial]))>1,BLANK(),
   MAX(TrialSlicer2[Compare To Trial]))

The measures are identical except for the tables they reference, and which tables I “anchor” them to in the field list.  The results look like:

image

OK, now that I have those two parameter measures, I can modify the [Trial1Temp] measure from above to no longer be “hard-coded” to Trial 1.  Now I can make it variable by parameterizing it using [SelectedTrial1]:

[Trial1Temp] =
CALCULATE([AvgTemp],
   FILTER(Measurements,
          Measurements[Trial Number]=[SelectedTrial1]
   )
)

Note that I only have to use FILTER() inside the CALCULATE simply because the “normal” parameters to calculate will NOT accept a measure name on the right side of the equals sign.

This new measure responds nicely to the selections made on the first slicer, whereas AvgTemp keeps ignoring it as always:

image

Now I can write a new [TrialXTemp] measure that is exactly the same as Trial1Temp except that it respects [SelectedTrial2] instead of [SelectedTrial1].  Update my Delta and Pct Delta measures and now I have a completely interactive pivot:

image

Neat huh?  And still pretty simple. Smile

Download the workbook, more variations, etc.

Download the workbook here

And for another example of disconnected slicer tables, see here.

And naturally, if you wanted to write a measure that automatically returns the delta between trial N versus trial N-1, you can always use the Greatest Formula in the World.


Data of Different Grains: A Followup

January 19, 2012

 
image

Sales Measure Returns the Same Value for Everything:  We Solved the Calendar/Periods
Problem But Now We Have a Products/Category Problem

Quick Recap:  We Created a Separate “Periods” Table

In last week’s post, I explained how PowerPivot can very easily help you solve the “Budget vs. Actuals” problem or any other problem where you have data sets of different granularities that you want to compare in a single report.

The crux of the problem in my example was that my Budget table only budgeted down to the Month level whereas my Sales table went down to the day level.

To solve that, I created a separate table that only contained months (no sales or budget data, just months), and assigned a “PeriodID” to each month.  (I could have just as easily called it MonthId).

image

The Newly-Created Periods Table

I then linked that PeriodID column to the PeriodID column that already existed in my Budget table:

image

And then created a new PeriodID column in my Sales table using a formula:

image

Resulting in a three-table setup that looks like this:

image

Which then allows me to use fields from that Periods table on slicers (or on rows or columns) while using measures from both Sales and Budget in the same pivot:

image

And that, my friends, is a very useful thing indeed.

Could I have just added the PeriodID column to Sales and Stopped?

This is an important question for us to stop and consider.  Once I had a PeriodID column in both Sales and Budget, could I have skipped the creation of the Periods table and just moved on?

No.  I could not.  And there are two reasons.

Reason #1:  I can’t create a relationship between Sales and Budget.

image

Each PeriodID Appears More than Once in Each Table, So You Can’t Relate
Sales Directly to Budget or Vice Versa

That’s just the way relationships work.  The matching column needs to be unique (no value appears more than once) in at least one of the two tables.  No need to really worry about why, but if you think about how VLOOKUP works when you set the last argument to FALSE, it’s pretty similar.

Reason #2:  Without a relationship, PeriodID from one table doesn’t work with measures from the other.

Look what happens to Budget measures if I use PeriodID from the Sales table:

image

PeriodID from Sales Table Results in Broken Budget Measures
(and a relationship warning)

Yep, all busted.  And if I use PeriodID from the Budget table, I’ll get good Budget numbers but the Sales numbers will be hosed instead.

THAT is why the third, new, separate Periods table is required.  When I use fields from THAT table on my pivot, I can use measures from both Sales and Budget at the same time and nothing is broken.

So we use the Periods table as a master filter of sorts, one that can drive filters down into both Sales and Budget.

“Master Filter” Table is a MUCH better description than “Bridge”

I debated using a “bridge” metaphor to describe the role of the Periods table but I think that’s misleading – a bridge helps you travel from point A to point B:  A –> Bridge –> B.  That would make you think that we’re “starting” on the Sales table and “traversing” the Periods table to get to the Budget table.

But that’s not how it works.  Neither Sales nor Budget is “in charge” here – neither is the starting point.  There are two separate paths here:  Periods –> Sales, and Periods –> Budget.  The diagram drives that home:

image

So you should think of the Periods table as being “in charge” here.  Which makes sense, since you have to use it on the pivot, and can’t use PeriodID from either Sales or Budget.

Extending that Rule to Fields Other than PeriodID

This is the crux of what I left out of my previous post, and it’s an important rule to drive home:

When you have separate data tables that cannot be combined into a single table (as is the case with Sales and Budget), you CANNOT use row/column/slicer fields from one table in a pivot with measures from the other table.  You MUST have a separate “master filter” table, and use fields from THAT table in your pivot.

So does that apply to fields other than PeriodID?  You betcha.  Guess what happens if you use a Product Subcategory or similar field from the Budget table in a pivot that contains a Sales measure (or vice versa)?  Yeah, same problem as trying to use PeriodID from Sales as we did above:

image

What’s the solution?  A separate master filter table for Products!

And that may be as simple as a single column:

image

Wash, Rinse, Repeat

Once you have measures from two different data tables, every row/column/slicer field you use requires a separate master filter table.

It may sound tedious but it’s actually quite quick.  To easily create such a table you can even just use  a pivot: 

  1. Put the ProductSubCat field from either Sales or Budget on rows.
  2. Copy/paste that column of unique values into PowerPivot as a new table.
  3. Create the relationships to Sales and Budget.  
  4. Use fields from the newly created master filter table in your pivot.  Done.

(Better to get someone to add a new view or table to a database for you if that’s available, but copy/paste works well otherwise).

Also Useful for Single Table Situations!

One last point:  even when you only have a single table of data, like Sales, it’s often still quite useful to create separate master filter tables.

Why is that?

Because when you create master filter tables, it gives you the opportunity to remove lots of columns from your “big” table.

Imagine a sales table that, in addition to columns like Quantity and Amount, also contains columns like CustomerID, CustomerGender, CustomerAge, CustomerAddress1, CustomerAddress2, etc.

If you leave CustomerID in the sales table but “move” all of the other customer columns into a separate Customer table, and then link the two tables by Customer ID, you can reduce the number of columns in your Sales table by a significant number.

And that can sometimes shrink your files dramatically, AND speed them up.  See this post for an example.


Sales/Budget: Integrating Data of Different “Grains”

January 12, 2012

 
image

Latest Article for CIMA Insight

Some of you know that I’ve been writing a series of articles for the Chartered Institute of Management Accounts, aka CIMA.  Up until this point those articles have either focused on the organizational impact, the “why” of PowerPivot.  Things like how it contrasts with traditional BI and traditional spreadsheets.  How it saves costs and delivers more.

And then I started to introduce basic topics that have already been covered at length here on this blog, just simple how-to-get-started stuff.

But this month is the first time I wrote something for CIMA that I have not really covered here.

The Problem:  60k Sales Rows vs. 2k Budget Rows

PowerPivot Sales Data Very Granular Day Level

Sales Table with 60 Thousand Rows

PowerPivot Budget Table Less Granular Month Level

Budget Table with Two Thousand Rows

The Desired Result:  A Single Unified Report

PowerPivot Report Combining Day Level Actual Sales and Month Level Budget Granularities

The Solution

Diagram of How to Integrate Different Granularities of Data in PowerPivot

Here’s where I have simplified it a bit for the introductory audience.  I likely have a Calendar table in a well-designed model, and that doesn’t appear in the diagram.  I also likely have to solve for more than just Time granularity – the Budget is also likely less granular at the Product level as well.

But to introduce this powerful concept, I think a simple example is best so I kept it clean.

Click here to read the full (short) article and see the details of how I create the linkage between the two tables.

Popular Topic This Month!

Everyone has this on the brain this month it seems Smile

First, in the exact same issue of CIMA, there’s an article about handling Sales vs. Budget using traditional Excel.

And Marco Russo has written one about this same topic in PowerPivot and BISM Tabular.  As usual, his approach is excellent, and quite different from what my more primitive mind conjures.

I didn’t even have to use the Boot Signal:


The Greatest Formula in the World, Part 3

December 13, 2011

 
Our time machine that we developed in part one and part two still needs some explanation, so let’s dive in.

First Things First:  The MAXX Wasn’t Needed

Looking back, I noticed that I was overcomplicating things needlessly.  The final formula I had involved the following FILTER() term, in which I used a MAXX:

FILTER(ALL(Periods),
       Periods[NextYear Period]<=
          MAXX(VALUES(Sales[Period Num]), Sales[Period Num])
)

Turns out that MAX() works just fine, so let’s replace that clause and simplify things a bit.  Here’s the new formula:

CALCULATE([Total Sales],
   ALL(Periods),
   FILTER(ALL(Periods),
      Periods[Year]=MAX(Periods[Year])-1
   ),
   FILTER(ALL(Periods),
          Periods[NextYear Period]<=MAX(Sales[Period Num])
   ),       
   VALUES(Periods[MerchPeriod])
)

OK, with that, we can move on to explanation:  How does this formula work??

I used to call this technique “expand then filter”

Well actually I still do, in my own head.  It’s just that the GFITW is a catchier title.

OK, so the “expand” part is just that first ALL():

CALCULATE([Total Sales],
          ALL(Periods),

The first step in the formula, then, is basically just telling the calc engine to throw away all filters on the Periods table.  In other words, “forget all concept of time, pretend the pivot is not filtered at all with respect to time.”

We do this so that we have a clean slate.  Then, in the subsequent FILTER clauses, we build up a new filter context for time.

One more time for clarity:  the way this formula works is to first throw out all time filters, and then in subsequent steps, we build up new filters to match the time period that we want, which in this case is last year.

Once you understand that, this overall formula starts to get pretty simple.  Each piece of the formula is quite straightforward in its own right.  ALL() is pretty straightforward for sure, and so are the subsequent FILTERS().

OK, we’ve expanded.  Now on to the filters!

The first filter says, “hey, now that we’ve thrown out all time filters, let’s filter time back down to just be last year.”

FILTER(ALL(Periods),
       Periods[Year]=MAX(Periods[Year])-1
)

Let’s talk about the FILTER() function itself for a moment.

How does FILTER() Work?

Honestly this function has deserved its own post for a long time.  I’ll give a brief explanation here.

The syntax for the FILTER function is FILTER(TableToFilter, FilterExpression).   Pretty simple.  Here’s some more detail:

  1. FILTER() takes a TableToFilter and a FilterExpression, and returns all rows from that TableToFilter that match the FilterExpression.
    1. In the example above, TableToFilter is ALL(Periods)
    2. and FilterExpression is Periods[Year]=MAX(Periods[Year])-1
  2. FILTER() steps through the TableToFilter one row at a time. 
    1. And for each row, it evaluates the FilterExpression.  If the expression evaluates to true, the row is “kept.”  If not, it is filtered out.
    2. Because FILTER() goes one row at a time, it can be quite slow if you use it against a large table.  When I say “large” that is of course subjective.  A few thousand rows is fine in my experience.  A million is not.  Do not use FILTER() against your fact table.
  3. The FilterExpression typically takes the form of Table[Column] = <expression>
    1. The comparison operator doesn’t have to be “=”.  It can also be <, >, <=, >=, <>
    2. The expression on the right hand side of FilterExpression can be “rich.”  This is VERY useful.  In a simple CALCULATE, the right side of each filter expression has to be simple, like a literal number (9) or a string (“Standard”).  The fact that FILTER() allows for rich expressions here is one of the most common reasons I use FILTER().
    3. The Table[Column] in the filter expression is a column in the TableToFilter.  If you are filtering the Periods table, it makes sense that you are testing some property of each row in Periods.  I can’t think of a sensible reason to use a column here that is NOT from TableToFilter.  (Insert “boot signal” here, maybe the Italians can address this).
  4. FILTER() ignores everything else going on in your formula and acts completely on its own.
    1. For example, our overall formula sets ALL(Periods) as the first argument to CALCULATE.
    2. The FILTER()’s that come after that do NOT pay any attention to other arguments however, including that ALL(Periods).
    3. In other words, the FILTER() functions are still operating against the original filter context from the pivot!  If the pivot is sliced to Year=2009, then the FILTER() function starts with the Periods table already pre-filtered to just 2009.
    4. This is why each of my FILTER()’s uses ALL(Periods) for TableToFilter.  I have to repeat the “expand” step so that my FILTER() is also working from a clean slate.
  5. Even though each FILTER() operates on its own, their results then “stack up” in the overall formula.
    1. Even though FILTER() RETURNS a set of rows that matched the FilterExpression, it actually REMOVES rows from the overall filter context.
    2. This sounds tricky but really it isn’t.
    3. Let’s say our TableToFilter contains 6 rows:  A, B, C, D, E, and F.
    4. And our overall formula contains two FILTER() clauses that both operate on the same TableToFilter, just like our overall formula near the beginning of this post.
    5. Let’s also say that the first FILTER() returns rows A, B, C, and D.
    6. And the second FILTER() returns rows C, D, E, and F.
    7. The net result is that only rows C and D are left “alive” in the overall filter context of the formula.
    8. So one way to think of this is that FILTER()s “stack up” on top of each other.
    9. Another way to think of it is that even though the first filter RETURNED rows A, B, C, and D, its real effect was to REMOVE all other rows (E and F) from consideration.

OK, back to that first filter!

Here it is again:

FILTER(ALL(Periods),
       Periods[Year]=MAX(Periods[Year])-1
)

Let’s revisit points 1-5 above for this FILTER expression to see how it all works.  And let’s examine just a single cell of the pivot to see how this FILTER operates for that one cell:

image

Focusing on The Orange-Circled LASTYRSALES Cell As An Example
With Its Period Filter Context Highlighted in Green

In that picture above, the orange cell we are looking at has a filter context “coming in” from the pivot.  It has Period[Year] set to 2011 and Period[MerchPeriod] set to 1, as highlighted in green.

Given the detailed description of FILTER() from points 1-5 above, we can see that:

  1. We set ALL(Periods) as our TableToFilter so that we are starting from a clean slate with respect to time.  So our Periods table now has “all rows alive.”
  2. Then our FilterExpression tests against the Periods[Year] column.
  3. MAX(Periods[Year]) – 1 still operates independently!, so it still picks up Periods[Year]=2011 from the pivot.  Therefore it returns 2011 – 1 = 2010!
  4. Since we started with ALL(Periods) as the TableToFilter, and the FilterExpression only “keeps” rows where Year=2010, we are left with all 2010 rows “alive” after evaluating this FILTER().
  5. If we didn’t do ALL(Periods) for TableToFilter, and instead just used Periods without the ALL(), our FILTER would start out with only rows from 2011 (since that is what the pivot is telling us).
    1. And then in the next step when we go back a year to 2010, FILTER() would find no rows.  There are no rows that match Periods[Year] = 2010 and Periods[Year]=2011. 
    2. So our FILTER would return no rows, which means it would have the effect of REMOVING all rows from Periods in the overall formula, and our measure would return blank for all cells in the pivot.

That’s a lot of explanation, I know.  Walk through it a few times.  It’s actually pretty intuitive once you’ve done it a few times.  The tricky part, for me, was discovering all of these details for myself.  And since I’ve done all of that, you don’t have to.

That’s enough for this time.  I think you can probably figure out how the second FILTER() evaluates based on the above, but I will step through it next time. 

I will also explain why we use that last VALUES() in the formula, and probably also share some of the answers I got from the Italians, and from David Churchward, in response to my question “did I need to add that calc column in the Periods table?”


The Greatest Formula in the World, Part Two

December 7, 2011

image

Our Time Machine From Part 1 Needs a Couple of Fixes

OK, if you recall from part one, we had a “year over year” time machine calculation going on, and it was built against a data set that lacked a “real” calendar table.  Very cool.  Very resourceful.  And very necessary.

But the time machine had two problems.  Let’s fix them.

Problem #1:  Meaningless Grand Total

Let’s start with the easy one:

Year over year in PowerPivot - Meaningless Grand Total

Grand Total is Meaningless for Last Year Sales

It is meaningless to have a “grand total” value for a measure that returns “last year’s sales.”  What year would that BE, actually?  It’s nonsense.  So we use an old trick, one of the many flavors of IF(VALUES()):

=IF(COUNTROWS(VALUES(Periods[Year]))>1,BLANK(),
   CALCULATE([Total Sales],
      ALL(Periods),
      FILTER(ALL(Periods), Periods[Year]=MAX(Periods[Year])-1),
      VALUES(Periods[MerchPeriod])
   )
)

Where the new IF “wrapper” is highlighted and the original formula is in normal font.

For more on that “IF VALUES” technique, please see this post.

The results are as desired, the grand total is now blank:

image

Problem solved.  Moving on…

Problem #2:  2011 isn’t complete but we’re getting all of 2010 sales

Remember, our sales data for 2011 only goes through the first 6 months.  So we do NOT want “last year sales” for months 7-12:

image

That can be solved a number of ways.  The simplest is just to add another IF(), and have the Last Year Sales measure return BLANK() whenever there are no Sales:

=IF(COUNTROWS(VALUES(Periods[Year]))>1,BLANK(),
   IF([Total Sales] = BLANK(), BLANK(),
      CALCULATE([Total Sales],
         ALL(Periods),
         FILTER(ALL(Periods),
            Periods[Year]=MAX(Periods[Year])-1
         ),
         VALUES(Periods[MerchPeriod])
      )
  )
)

That trims the results to just the months desired:

image

But note that the subtotal for 2011 is still too high:

image

“Last Year Sales” for 2011 Should Only be Returning
2010 Sales Through the First Six Months

This one is trickier.  I struggled to find a good answer before coming up with the following:

Step 1:  Add a “Next Year Period Num” Column to the Periods Table

Recall that my Sales table has a “Period Num” column that is the basis for the relationship with the Periods table:

image

And that matches up with a similar column in the Periods table:

image

Note that Period Num does NOT reset to 1 with each new year (unlike MerchPeriod).  So that is the absolute unique ID for a given Month/Year combo.

Well, I created a new column in the Periods table that tells me, for a given Period Num, what the equivalent Period Num will be NEXT year:

image

Now, in my FILTER statements, I can choose to match on THAT column instead.

Bringing it Home

OK, now that I have that column, I can add a new FILTER clause to my measure:

=IF(COUNTROWS(VALUES(Periods[Year]))>1,BLANK(),
   CALCULATE([Total Sales],
      ALL(Periods),
      FILTER(ALL(Periods),
         Periods[Year]=MAX(Periods[Year])-1
      ),
      FILTER(ALL(Periods),
         Periods[NextYear Period]<=
            MAXX(
                VALUES(PeriodSales[Period Num]),
                PeriodSales[Period Num]
                )
      ),
       
      VALUES(Periods[MerchPeriod])
   )
)

OK, what does that do?  Simply put, it further filters the Periods table to NOT go beyond any periods for which we currently have records in the Sales table.

And the results:

Last Year Sales Measure in PowerPivot With a Custom Calendar

Last Year Sales Measure Doing the Right Thing

And now if you want YOY Growth, it’s straightforward:

=([Total Sales] – [LY Sales Finished]) / [Total Sales]

Which yields:

Year over Year / Year on Year Growth Percentage Measure in PowerPivot With a Custom Calendar

Year over Year / Year on Year Growth Percentage
Measure in PowerPivot With a Custom Calendar

OK, I cheated and wrapped another IF(COUNTROWS(VALUES)) around the outside of the formula so that the grand total cell is blank again, because again, that’s a meaningless value to report.  Details.

Isn’t that a lot of work?

Depends on how you look at it.  First of all, it’s a lot harder to grasp the first time than it is as you get used to it, trust me.

But even better, this investment pays off forever.  Normal Excel formulas may be easier to write the first time, but then you have to re-write and adjust them forever, every time your data changes.  And every time your desired report shape changes.

This one is a portable formula.  It goes wherever you want it to go, and eats whatever data you feed it.  Forever.  It’s worth a little extra effort.

OK but do I really have to understand all of that?

No, not really.  Not right away.  Think of it this way:  this is what I do all day, every day.  And I can tell you that figuring this out is not something you want to do in your spare time.

But once someone gives you the pattern?  Wash, rinse, repeat.  Make small adjustments as needed.

Honestly, that whole formula above could be wrapped up in a function provided by Microsoft, and all of the nasty stuff hidden.  I told them as much last time I visited Redmond.  And if it were a function rather than a formula, we wouldn’t care at all how it worked.

So… treat this as a pattern.  Copy/paste and modify to fit your needs.  Seriously.

Unfinished Business

I actually WILL retrace my steps here though and explain a number of things though, so that you CAN understand.  I have a couple of posts in the queue aimed at just that.

Were there alternate ways to write this measure?

Yes, there were, especially in that last step where I needed to get the “Last year sales” measure correct for the 2011 subtotal.  I could have used SUMX to make the year subtotal equal to the sum of its underlying months.  That is worthy of a post for sure.

But I was wondering if there was still another way.  A way that doesn’t use SUMX and doesn’t require a new calc column.

And for that, we need the Italians.  So I’m gonna try out my new toy, the Boot Signal:

image

Calling Marco and Alberto

Get it?  *Boot* Signal?


The Greatest Formula in the World, Part One

December 6, 2011

 
image

Play Write the greatest song formula in the world, or I’ll eat your souls.”
-From “Tribute,” by Tenacious DAX

A Do-It-Yourself Time Machine

You remember the Great Precedence Project?  Well, I have shelved it for now, for two reasons:

  1. Creating an exhaustive list of all the precedence rules in PowerPivot formulas was going to be…  exhausting.
  2. I didn’t really need to boil the philosophical ocean in search of deep underlying meaning, because on further inspection, I already had the Greatest Formula in the World, and needed no other.

So what is the GFITW about?  It’s about time navigation.  It’s useful when you have a custom calendar and the time intelligence functions are therefore not so helpful.

The GFITW is a do-it-yourself time machine.

Year on Year Sales With a Custom Calendar?  Yes We Can!

In that post linked above, which happened to be the last installment of the precedence project, I was calculating a running total, and doing so without benefit of the fancier functions like DATESYTD etc. (which rely on the “real” calendar, not your business calendar).

OK, so now let’s do a “Year on Year” sales measure, sometimes called a “Year over Year” measure.  And again, our “calendar” table isn’t a true calendar at all:

Semantically this IS a calendar but functions like PREVIOUSYEAR can't use it

A Table of Periods (Not Dates) Means You Cannot Use the Built-In
Time Intelligence Functions to Calculate Year-on-Year Sales

And the Sales table is very simple:

Sales Table linked to a period number, not linked to a date

Sales Table is Linked to the Periods Table by the Period Num Column

And I have this basic pivot already set up:

image

Year and MerchPeriod on Rows

So, how do I write a measure that “fetches” the sales from last year?

Let’s Skip Some Steps

I am going to be honest with you:  it took me a very long time to figure this out.  The formula looks reasonably simple, and it is, but I had to try a million variations before I got it right.  Literally, it took me about a week to refine it.

So rather than walk you through that painful process, let’s skip ahead and show you the GFITW in all its glory.  Then in the next post I can show you what NOT to do, and we can also fine tune it.  OK?  OK.

So here it is, the Greatest Formula.  In the World:

=CALCULATE([Total Sales],
   ALL(Periods),
   FILTER(ALL(Periods),
      Periods[Year]=MAX(Periods[Year])-1),
   VALUES(Periods[MerchPeriod])
)

And its results:

image

No, it’s not perfect yet.  We will need to calculate percentage growth of course, rather than merely fetching last year’s sales.  And that given that 2011 is not yet complete, I’m a little squeamish about Last Year Sales returning ALL of 2010’s sales (the $5.9M number).

But for now, let’s count our blessings.  This formula DOES fetch the sales from last year, and it works (mostly) both at the year level and the MerchPeriod (aka Month) level.

Dissecting the Formula

Here is the formula again:

=CALCULATE([Total Sales],
   ALL(Periods),
   FILTER(ALL(Periods),
      Periods[Year]=MAX(Periods[Year])-1),
   VALUES(Periods[MerchPeriod])
)

But really, the GFITW is not a formula.  It is a pattern that you can re-use and modify.  So let’s look at it that way:

=CALCULATE(Original Measure,

   ALL(Your Custom Calendar Table),
  
   FILTER(ALL(Your Custom Calendar Table),
      Year Column Or Similar =
        
Expression that “moves” the Year Column
  
),
  
   VALUES(
Time Column That is More Granular
          Than Year, Like Month or MerchPeriod

  
)

)

Where everything in blue is “fixed” as part of the pattern, and everything in italics is something you can change.

Like I said, this is running a bit long for one post.  Come back Thursday for the rest Smile