Converting from Peaks to Frequencies

February 16, 2012

image32

Sniff something? Did ya, rat boy?”

Continuity!

Folks I just can’t resist a quick followup on the Rat Sniffing Project.  I…  just… can’t.  Plus I am absolutely worn down and don’t have the energy tonight for anything that isn’t inherently entertaining.

All of those calculated columns from the last post, remember, were just the setup so we could start doing some REAL observational stuff.

You know, something like this:

Measuring Event Frequency in Hz in PowerPivot

This wasn’t difficult at all.

Remember that I now had a column that flagged each row as “1” if it represented the peak of an inhalation:

image

Each Row is 0.01 second, Flagged as 1 if
it’s a Peak Inhale, aka a “Sniff”

So now I really just need some measures.

The First Measure I’ve Ever Written in the Unit of Hertz!

(Hmmm.  There’s a joke in here somewhere, the first line of which is something like “what do you do if your unit Hertz?”  Listen, I told you I was tired.)

Since I have a 1 in the RobPeak column, summing it will yield the number of peaks in an interval – I don’t need to do a CALCULATE(COUNTROWS()) with a filter set to 1.

So the measure formula for Hz (events per second) is:

(SUM(Data[RobPeak]) / COUNTROWS(Data)) * 100

Why times 100?  Because each row is actually 0.01 second.

Now THIS is a *Time* Table

This is another first:  creating a separate time table that is NOT measured in days.  In this case…  hundredths of a second:

A Time Table in PowerPivot - Not Dates.  Seconds.

Not a Calendar Table – a Time Table.  In  0.01 Second Increments

OK, it’s not a Calendar table but it is VERY similar.  Just like a Calendar table has columns like DayofWeek, DayOfMonth, CalendarYear, etc. – those columns represent properties of dates.  Well, the columns I have here like Second and FiveSecond are the same sort of thing.

And since I have 6 rats in the experiment, it is wasteful to duplicate those properties in my Data table.  Plus there’s that whole speed thing.

So I created this separate table and related it to my Data table.

And this table then “powers” my slicers like these:

Slicers Based on a Time Table Measured in Minutes and Seconds

Slicers Based on a Time Table Measured in Minutes and Seconds

These slicers represent a navigation method more than they do a “filter” in the classic sense.  I say that because fields from the time table are on the axis of the chart as well:

image

The net effect is that I can use my slicers to quickly move around the data and examine “windows” that are interesting or relevant.  Couple of quick clicks and I am now looking at minute 4 instead of minute 3, and just the first 10 seconds of that minute:

image

Quickly “Jumped” to the 4:00-4:10 Interval

It’s ALMOST like the slider control on a YouTube video:

image

YouTube Has a Time Slicer Control Too!

Wouldn’t it be neat if Excel gave us a YouTube-style slicer for time?  We can dream.

Side Note:  Rats Breathe FAST!

Check those charts out.  Between 5 and 9 Hz???  Really?  Those little rascals inhale between 5 and 9 times per second, on a SUSTAINED basis?  Wow.  I’m easily amused I guess, but wow.


The Greatest Formula in the World, Part 3

December 13, 2011

 
Our time machine that we developed in part one and part two still needs some explanation, so let’s dive in.

First Things First:  The MAXX Wasn’t Needed

Looking back, I noticed that I was overcomplicating things needlessly.  The final formula I had involved the following FILTER() term, in which I used a MAXX:

FILTER(ALL(Periods),
       Periods[NextYear Period]<=
          MAXX(VALUES(Sales[Period Num]), Sales[Period Num])
)

Turns out that MAX() works just fine, so let’s replace that clause and simplify things a bit.  Here’s the new formula:

CALCULATE([Total Sales],
   ALL(Periods),
   FILTER(ALL(Periods),
      Periods[Year]=MAX(Periods[Year])-1
   ),
   FILTER(ALL(Periods),
          Periods[NextYear Period]<=MAX(Sales[Period Num])
   ),       
   VALUES(Periods[MerchPeriod])
)

OK, with that, we can move on to explanation:  How does this formula work??

I used to call this technique “expand then filter”

Well actually I still do, in my own head.  It’s just that the GFITW is a catchier title.

OK, so the “expand” part is just that first ALL():

CALCULATE([Total Sales],
          ALL(Periods),

The first step in the formula, then, is basically just telling the calc engine to throw away all filters on the Periods table.  In other words, “forget all concept of time, pretend the pivot is not filtered at all with respect to time.”

We do this so that we have a clean slate.  Then, in the subsequent FILTER clauses, we build up a new filter context for time.

One more time for clarity:  the way this formula works is to first throw out all time filters, and then in subsequent steps, we build up new filters to match the time period that we want, which in this case is last year.

Once you understand that, this overall formula starts to get pretty simple.  Each piece of the formula is quite straightforward in its own right.  ALL() is pretty straightforward for sure, and so are the subsequent FILTERS().

OK, we’ve expanded.  Now on to the filters!

The first filter says, “hey, now that we’ve thrown out all time filters, let’s filter time back down to just be last year.”

FILTER(ALL(Periods),
       Periods[Year]=MAX(Periods[Year])-1
)

Let’s talk about the FILTER() function itself for a moment.

How does FILTER() Work?

Honestly this function has deserved its own post for a long time.  I’ll give a brief explanation here.

The syntax for the FILTER function is FILTER(TableToFilter, FilterExpression).   Pretty simple.  Here’s some more detail:

  1. FILTER() takes a TableToFilter and a FilterExpression, and returns all rows from that TableToFilter that match the FilterExpression.
    1. In the example above, TableToFilter is ALL(Periods)
    2. and FilterExpression is Periods[Year]=MAX(Periods[Year])-1
  2. FILTER() steps through the TableToFilter one row at a time. 
    1. And for each row, it evaluates the FilterExpression.  If the expression evaluates to true, the row is “kept.”  If not, it is filtered out.
    2. Because FILTER() goes one row at a time, it can be quite slow if you use it against a large table.  When I say “large” that is of course subjective.  A few thousand rows is fine in my experience.  A million is not.  Do not use FILTER() against your fact table.
  3. The FilterExpression typically takes the form of Table[Column] = <expression>
    1. The comparison operator doesn’t have to be “=”.  It can also be <, >, <=, >=, <>
    2. The expression on the right hand side of FilterExpression can be “rich.”  This is VERY useful.  In a simple CALCULATE, the right side of each filter expression has to be simple, like a literal number (9) or a string (“Standard”).  The fact that FILTER() allows for rich expressions here is one of the most common reasons I use FILTER().
    3. The Table[Column] in the filter expression is a column in the TableToFilter.  If you are filtering the Periods table, it makes sense that you are testing some property of each row in Periods.  I can’t think of a sensible reason to use a column here that is NOT from TableToFilter.  (Insert “boot signal” here, maybe the Italians can address this).
  4. FILTER() ignores everything else going on in your formula and acts completely on its own.
    1. For example, our overall formula sets ALL(Periods) as the first argument to CALCULATE.
    2. The FILTER()’s that come after that do NOT pay any attention to other arguments however, including that ALL(Periods).
    3. In other words, the FILTER() functions are still operating against the original filter context from the pivot!  If the pivot is sliced to Year=2009, then the FILTER() function starts with the Periods table already pre-filtered to just 2009.
    4. This is why each of my FILTER()’s uses ALL(Periods) for TableToFilter.  I have to repeat the “expand” step so that my FILTER() is also working from a clean slate.
  5. Even though each FILTER() operates on its own, their results then “stack up” in the overall formula.
    1. Even though FILTER() RETURNS a set of rows that matched the FilterExpression, it actually REMOVES rows from the overall filter context.
    2. This sounds tricky but really it isn’t.
    3. Let’s say our TableToFilter contains 6 rows:  A, B, C, D, E, and F.
    4. And our overall formula contains two FILTER() clauses that both operate on the same TableToFilter, just like our overall formula near the beginning of this post.
    5. Let’s also say that the first FILTER() returns rows A, B, C, and D.
    6. And the second FILTER() returns rows C, D, E, and F.
    7. The net result is that only rows C and D are left “alive” in the overall filter context of the formula.
    8. So one way to think of this is that FILTER()s “stack up” on top of each other.
    9. Another way to think of it is that even though the first filter RETURNED rows A, B, C, and D, its real effect was to REMOVE all other rows (E and F) from consideration.

OK, back to that first filter!

Here it is again:

FILTER(ALL(Periods),
       Periods[Year]=MAX(Periods[Year])-1
)

Let’s revisit points 1-5 above for this FILTER expression to see how it all works.  And let’s examine just a single cell of the pivot to see how this FILTER operates for that one cell:

image

Focusing on The Orange-Circled LASTYRSALES Cell As An Example
With Its Period Filter Context Highlighted in Green

In that picture above, the orange cell we are looking at has a filter context “coming in” from the pivot.  It has Period[Year] set to 2011 and Period[MerchPeriod] set to 1, as highlighted in green.

Given the detailed description of FILTER() from points 1-5 above, we can see that:

  1. We set ALL(Periods) as our TableToFilter so that we are starting from a clean slate with respect to time.  So our Periods table now has “all rows alive.”
  2. Then our FilterExpression tests against the Periods[Year] column.
  3. MAX(Periods[Year]) – 1 still operates independently!, so it still picks up Periods[Year]=2011 from the pivot.  Therefore it returns 2011 – 1 = 2010!
  4. Since we started with ALL(Periods) as the TableToFilter, and the FilterExpression only “keeps” rows where Year=2010, we are left with all 2010 rows “alive” after evaluating this FILTER().
  5. If we didn’t do ALL(Periods) for TableToFilter, and instead just used Periods without the ALL(), our FILTER would start out with only rows from 2011 (since that is what the pivot is telling us).
    1. And then in the next step when we go back a year to 2010, FILTER() would find no rows.  There are no rows that match Periods[Year] = 2010 and Periods[Year]=2011. 
    2. So our FILTER would return no rows, which means it would have the effect of REMOVING all rows from Periods in the overall formula, and our measure would return blank for all cells in the pivot.

That’s a lot of explanation, I know.  Walk through it a few times.  It’s actually pretty intuitive once you’ve done it a few times.  The tricky part, for me, was discovering all of these details for myself.  And since I’ve done all of that, you don’t have to.

That’s enough for this time.  I think you can probably figure out how the second FILTER() evaluates based on the above, but I will step through it next time. 

I will also explain why we use that last VALUES() in the formula, and probably also share some of the answers I got from the Italians, and from David Churchward, in response to my question “did I need to add that calc column in the Periods table?”