PowerPivot V2 is Released!

March 8, 2012

 
Well the wait is over.  It is now released, for real.  No longer a beta or RC.  The real deal.

Download it here.

Other important links:

  1. My first glimpse at new features here.
  2. Important warnings about compatibility with V1.

Auto-Modify Every Pivot in a Workbook

March 6, 2012

 
image

It’s *ALMOST* That Easy Smile

It’s been awhile since I’ve talked about Macros (also known as VBA).  I think it’s overdue.

Macros are nothing short of amazing.  We couldn’t live without them at Pivotstream.

Wait Rob, Macros Don’t Run on the Server!

That’s right, they don’t.  And everything we do at Pivotstream eventually lands on the server.  So why do we use macros?

Simple:  we use macros to more efficiently create and modify our workbooks.  Macros are a “design time” tool for us, not a “run time” tool.

And they have saved us probably decades of work.  I’m not exaggerating.

An Example:  The “Change Every Pivot” Macro

Here’s a pretty simple macro that runs through every PivotTable on every visible worksheet and does whatever you want to each pivot:

Sub ModifyAllPivots()
    Dim Pivot As PivotTable
    Dim Sheet As Worksheet
   
    For Each Sheet In ActiveWorkbook.Worksheets
        If Sheet.Visible = xlSheetVisible Then
            For Each Pivot In Sheet.PivotTables
                AutoPadPivot Sheet.Name, Pivot.Name, 10
                GrandTotalsBottomOnly Sheet.Name, Pivot.Name

            Next
        End If
    Next

End Sub

I emphasized the “payload” of the macro – for each pivot the macro finds, it “pads” the columns of the pivot to be wide enough, and sets the pivot to display grand totals only on the bottom of the pivot (and never on the right).

Those two lines are macros that I also wrote, and I will include them below.

Auto Pad Pivot Columns Macro

At Pivotstream, we tend to ALWAYS turn off the “auto-fit columns on update” setting on our pivots:

Making Pivots NOT Change Column Widths On Slicer Clicks Etc

Making Pivots NOT Change Column Widths On Slicer Clicks Etc

This yields a much more pleasant “application-like” result – clicking a slicer never results in things jumping around.  I highly recommend it.

But this DOES lead to a problem.  If your numbers suddenly grow by a digit in the future, you can get something like this:

One of the Drawbacks of Turning off Autofit Column Width

One of the Drawbacks of Turning off Autofit Column Width

To account for this, we have a macro that runs through every pivot in the workbook and “pads” the column width by a percentage:

Sub AutoPadPivot(sSheet As String, sPivot As String, iPct As Integer)
    Dim oPivot As PivotTable
    Dim oSheet As Worksheet
    Dim r As Range
    Dim rCurr As Range
    Dim iCol As Integer
  
    Set oSheet = ActiveWorkbook.Worksheets(sSheet)
    Set oPivot = oSheet.PivotTables(sPivot)
   
    Set r = oPivot.DataBodyRange.Rows(1)
    For Each rCurr In r.Cells
        iCol = rCurr.Column
       
        oSheet.Columns(iCol).EntireColumn.AutoFit
        oSheet.Columns(iCol).ColumnWidth = oSheet.Columns(iCol).ColumnWidth * (1 + (iPct / 100))
    Next
   
   
End Sub

At the beginning of this post, you saw an example where I called this macro to pad each column by 10 percent.  But I can pad by 5, 15, whatever I want.  And you can easily imagine a version that pads by an absolute amount rather than a percentage.

Grand Totals Bottom Only Macro

This one is more self-explanatory:

Sub GrandTotalsBottomOnly(sSheet As String, sPivot As String)
    Dim oPivot As PivotTable
    Dim oSheet As Worksheet
   
    Set oSheet = ActiveWorkbook.Worksheets(sSheet)
    Set oPivot = oSheet.PivotTables(sPivot)
   
    oSheet.PivotTables(sPivot).RowGrand = False
End Sub

Want one more?  OK.  You talked me into it.  We have dozens, many of which are quite ambitious.  Simpler macros make for better blog posts though, so…

Create Page Filter For Each Slicer On Pivot Macro

Sub CreatePageFilterForEachSlicerOnPivot(sSheet As String, sPivot As String)
    Dim oSlicer As Slicer
    Dim oSlicerCache As SlicerCache
    Dim sField As String
    Dim oPivot As PivotTable
    Dim oSheet As Worksheet
   
    Set oSheet = ActiveWorkbook.Sheets(sSheet)
    Set oPivot = oSheet.PivotTables(sPivot)
   
    For Each oSlicer In oPivot.Slicers
        sField = oSlicer.SlicerCache.SourceName
        oPivot.CubeFields(sField).Orientation = xlPageField
    Next
   
End Sub

Why would I want to create a page filter for very slicer on the pivot?  Well, primarily so you can “harvest” the slicer selections in formulas.  But there are other reasons you might do this as well, which I will likely cover in the future.

Learn to Record Macros Folks!

What’s that, you say?  You never need to pad pivots, switch their grand total settings, or add page filters for every slicer?  Not satisfied eh?  Well, you can make your own!

If you have never recorded a macro, seriously, it’s SOOOO easy.  You should try it. 

First you will need to enable the Developer ribbon by going to File|Options|Customize the Ribbon.

Once you’ve done that, here’s how you get started:

Going Into Macro Recording Mode

Going Into Macro Recording Mode
(Note That I Named the Macro Based On What I Am Going to Do Next)

Now I do something to the pivot.  In this case, I switch my Pivot to one of the “Medium Green” Styles:

image

Changing My Pivot to a Different Style
(While the Macro Recorder Watches My Every Move)

Now I can stop recording:

Stop Recording the Macro

Stop Recording the Macro

Inspecting Your Freshly-Recorded Macro

Inspecting Your Freshly-Recorded Macro

Just click the Macros button, select your macro, and then click Edit

And here it is:

Sub ChangeToGreenStyle()

‘ ChangeToGreenStyle Macro


    ActiveSheet.PivotTables(“PivotTable3″).TableStyle2 = “PivotStyleMedium4″
End Sub

There’s really only one line in the macro that DOES anything, so I highlighted it.

Note that it’s “tied” to a pivot named “PivotTable3.”  Not all of your pivots will be named that of course.  And it only works on the ACTIVE sheet.  So it won’t work if you try to loop through using the ModifyAllPivots macro.

So, you can modify it to look like the other macros I showed above (GrandTotalsBottomOnly, etc).  Change the macro to be:

Sub ChangeToGreenStyle (sSheet As String, sPivot As String)

   Dim oPivot As PivotTable
   Dim oSheet As Worksheet

   Set oSheet = ActiveWorkbook.Worksheets(sSheet)
   Set oPivot = oSheet.PivotTables(sPivot)


   oSheet.PivotTables(sPivot).TableStyle2 = “PivotStyleMedium4″

End Sub

The stuff in grey is just copied from the other macros.  The red is a replacement (also copied from the other macros) for the part the recorder set to Active Sheet and PivotTable3.

That macro can now be called from within ModifyEveryPivot just like the others were:


      For Each Pivot In Sheet.PivotTables
         AutoPadPivot Sheet.Name, Pivot.Name, 10
         GrandTotalsBottomOnly Sheet.Name, Pivot.Name

         ChangeToGreenStyle Sheet.Name, Pivot.Name

      Next

Off you go Smile


Subtotals and Grand Totals That Add Up “Correctly”

March 1, 2012

 
Busy week here at the MVP Summit in Redmond.  As usual, I can’t tell you anything I have learned this week – it’s strictly under NDA.  But I have seen some really exciting things – at one point today I screamed out loud “YES!” in a packed room, and a little while later I said “I don’t think we can be stopped if we had something like this.”  Where the “we” means us – me and you.

Enough of that.  Here’s one from my archived list of “topics to cover on the blog at some point.”

Sales per Day Measure

Check out this relatively simple pivot:

image

Note that the subtotals for [Sales per Day] do not equal the sum of their parts.  2002’s total is $17,891 but if I add Mountain Bikes plus Road Bikes myself, I get a number closer to $19,000:

PowerPivot totals don't add up how do I fix it?

Totals don’t add up!

The formula for [Sales per Day] is:

[Sales per Day] =
[Sales] / COUNTROWS(DISTINCT(Sales[OrderDate]))

Which, in English, equates to “My Sales measure divided by the number of days for which I have sales transactions.”

Why Doesn’t it Add Up?

Well, if the “number of days for which I have sales transactions” is different for Mountain Bikes vs. Road Bikes, that will cause this problem.  I can illustrate by adding that as a separate measure:

[Days I have Transactions] =
COUNTROWS(DISTINCT(Sales[OrderDate]))

image

The Reason Why Sales per Day Doesn’t “Add Up”

So I only sold Mountain Bikes on 96 days in 2001, and Road Bikes on 180 days, but on 181 days, I sold at least one bike of any sort.  This is why it doesn’t add up.

“Shouldn’t you divide by Calendar[Date] instead of Sales[OrderDate]?”

Some of you will no doubt have this question already:  Isn’t it more accurate for [Sales per Day] to be based on the number of days in the calendar rather than the number of days on which it sold?”

And my answer is “probably, but it depends, AND this is a good example of a problem you WILL hit sooner or later in completely legitimate cases.”

For instance, if I opened my store in July 2001 and didn’t offer Mountain Bikes at all until September, I don’t want to divide either of my Sales amounts by 365, and I certainly don’t want to divide Mountain Bike sales by 180.

Even then though, I should probably have a separate table like Inventory or maybe [Start Date] and [End Date] columns in my Products table, and use those to create a measure named [Days Offered] measure, and use THAT as my denominator instead.  That would be the fairest/most accurate approach. 

But it would STILL have this same “doesn’t add up” problem.  So let’s move on to a fix.

Forcing the Totals to Add Up

Let’s write a new measure:

[Sales per Day FIXED] =
IF(COUNTROWS(VALUES(Category[Name]))=1,
   [Sales per Day],
   SUMX(VALUES(Category[Name]), [Sales per Day])
)

In English, this says

“If I am in a pivot cell that corresponds to a single product category like Mountain Bikes, then just use the normal [Sales per Day] measure.  But if the pivot cell corresponds to more than 1 category, that means I am on a total, and then I want to have the total be the sum of all of the individual categories beneath it.”

For details on the whole “IF COUNTROWS” thing, see this post.

For details on SUMX, see this post.

Does it work?  You bet:

PowerPivot totals fixed to add up

The Second Measure Works

Pretty slick.  There are shortcomings to this of course.  If I put something other than Category on the pivot, the measure won’t work right.  It is “tied” to the Category field.  So it isn’t quite as portable as most measures, but it still is amazingly useful when you need it.


Correlating “Fuzzy Time” Events in One Table with Precise Measurements in Another Table

February 26, 2012

 
Precisely-Timed Measurements vs. Imprecisely-Timed Events in PowerPivot - How Do We Relate Across Ranges of Time?

Precisely-Timed Measurements vs. Imprecisely-Timed Events: 
How Do We Correlate/Relate Them?

The Streak is Broken…  and a new streak begins.

Well it was a full six-month run of posting every Tuesday and Thursday, Cal Ripken-like consistency.  Then last week it ended.  I was just drained.  Oh well, time to start a new streak!  Back to Tuesday and Thursday.  So let’s dive in…

Calculated columns.  I am continually realizing how much there is to know about them – stuff I haven’t really been forced to learn since I rely so heavily on Pivotstream’s database team.  My recent work with scientific and medical data sets, starting with the really simple fake data set and then moving into the rat sniffing data, has really driven this home.

So let’s do a quick treatment of fetching data from one table into another, using calculated columns, and in different situations.

When you have a relationship:  =RELATED()

OK, I’ve covered this one before.  When you have two tables and a relationship between them, you can fetch data from the “lookup” or “reference” table into your data table using the RELATED() function.

But there are really only a few places where you SHOULD do this.  You can already include the “lookup” column in your pivots without fetching it into the data table, so there’s little benefit of fetching it.  And doing so will just make your file bigger AND your pivots slower.

So really, I just do this when I’m debugging or exploring my data set.  When I’m done, I delete the column.

When you don’t have a relationship…

OK, back to the rat data Smile.  This will parallel many other real-world examples, such as marketing campaigns etc.  So bear with me even if you don’t find scientific measurements to be interesting.

I have one table called “Sniffs” which records how often (and how deeply) the rats were inhaling:

image

Sniffs Table:  Coded by RatID, SessionID, and TimeID
(Where Each TimeID Represents 1/100 of a Second)

And then I have an Events table.  “Events,” in this case, are things that the rats are doing.  Like…  sniffing each other.  In various places.

image

Events Data – Entered by Humans Watching Time-Coded Video of the Rats
(Note that Time is Recorded in Seconds, not Hundredths of Seconds As in the Sniffs Table)

Event type 4, for instance, is “Rearing” – the rat raising up on its hind legs and sniffing its surroundings.  This event is recorded by someone in the lab watching time-coded video.

THE GOAL:  We want to see how breathing frequency and intensity (data from the Sniffs table) changes during such events – does a rat sniff more or less when Rearing than normal?  (And in Retail, the parallel would be something like “do we get decent lift from our ad campaigns” or “how do spending habits change around a particular event like the Super Bowl?”)

Well, you can’t create a relationship between these two tables, for a few reasons.

Problem 1:  Events are Coded at the Second Level, Sniffs at 0.01 Second

In a single second of elapsed time, there will be 100 rows of data in the Sniffs table, and at most 1 row in the Events table.  Right there, we have a problem.

Then again, each 0.01 second DOES map to a “full” second, in much the same way as a day maps to a year.  The real problem here is…

Problem 2:  Both Tables Are Non-Unique in Time Column

Even if we lined up each 0.01 second from Sniffs with a full second from Events, well, I can have multiple rows in Events that all have the same value for TimeInS, because each row is only unique when you consider RatID and SessionID.

Hmmm…  so how about I just concatenates RatID, SessionID, and TimeInS to form a unique composite column in Events, and then do something similar in the Sniffs table, and THEN I create my relationship?

There are two problems with that – one minor, and one fatal.

The minor problem:  this will make my file quite a bit larger, by adding a column to my largest table (Sniffs).  And it’s the worst kind of column to boot:  a calculated column with a lot of unique values.  Plus, it will form the basis for a relationship, which means my pivot performance will potentially be slow.

The fatal problem?

Problem 3:  The TimeInS Value in the Events Data is Imprecise!

First of all, the video of the rats’ behavior will never be precisely aligned with the scientific equipment that is measuring the rats’ breathing patterns.  There is some “wiggle” in that synchronization, maybe as much as half a second.

Secondly, this is compounded by human error and variability – the act of a rat rearing up to sniff its environment might cover a full two seconds elapsed, but the human being coding the video must pick a single second to flag the behavior.

Taken together, we can see that there’s really a fuzzy “range” defining when the event occurs.

The Solution Part One:  Adding “Time Window” Columns to Events

Let’s say I define the “Event” as lasting one second – 0.5 second before and after the time recorded in the Events table.

Then I add calc columns to my Events table that reflect that:

image

Where the *100 is to convert to the “centiseconds” granularity that I use in my other table (the Sniff table).  PostTimeID is the same, except +0.5 rather than –0.5

The Solution Part Two:  CALCULATE?  In a Calc Column?

Then I can add a calc column to my Sniffs table that “fetches” a corresponding event ID from the Events table whenever there was an event “in progress” for the current sniff row:

=CALCULATE(MAX(Events[EventType]),
   FILTER(Events,
          Events[RatID]=EARLIER(Sniff[RatID]) &&
          Events[SessionID]=EARLIER(Sniff[SessionID]) &&
          Events[PostTimeID]>=EARLIER(Sniff[TimeID]) &&
          Events[PreTimeID]<=EARLIER(Sniff[TimeID])
         )
)

***UPDATE:  I realized that EARLIER() was NOT needed in this formula.  See this post for an explanation.

Notes on this formula:

  1. I had to choose some aggregation function, and I chose MAX, but I could have chosen MIN just as easily.  SUM and AVERAGE would NOT have worked out for me though.  More on this later.
  2. Everything after that is a FILTER expression – with four clauses all “anded” together with the && operator
  3. The first two filter clauses basically just say “only grab rows from Events where the Session and Rat match the Session and Rat on the current row in Sniff”
  4. The function documentation for EARLIER() has a very abstract definition, but in this case you can think of EARLIER() as just meaning “grab the value from the current row in Sniff.”
  5. The last two filter clauses make sure we only grab rows from Events whose time window includes the current row of Sniff.  In other words, if an Event “contains” the time of the current Sniff, it counts
  6. Going back to the MAX() at the beginning of the formula, the MAX essentially breaks ties in favor of events with larger ID’s.  That is of course completely arbitrary.  But in truth there are very, very few overlapping events so it hardly matters.  If I cared more, I could narrow the window to less than +/- 0.5 seconds, or I could pay more attention to my Event ID’s to make sure that higher numbered ID’s truly were the most significant events.

That gives me a result in my Sniff table that basically looks like I had a relationship between the two tables and used =RELATED()

image

OK folks.  I’m tired and about to drop.  But hopefully this inspires some thinking, or at least some questions Smile


Thursday Post Delayed Until Friday

February 23, 2012

 
Hi folks.  I’m on a plane right now, coming back from conducting another on-site, two-day training session.  Always an awesome experience – it’s seriously something that spoils me.  I like to joke that hanging out with the “early adopter set” for the past two years like I have can give the false impression that the entire world is actually REALLY smart.

Lots of fun quotes from these great folks.  Some that jump out at me:

“Our old version of this report was 22 sheets.  We just got it down to 2.”

“OK Rob, save that workbook right there.  That’s like, ALL of my work for the next two days.”  (Not said in a joking manner).

“Tell ya what, why don’t you just…  put that file on my thumbdrive before you leave.”  (Also not joking).

“The weather in Columbus, Ohio is not bad at all.”  (Ironically, not joking).

“Hey a few of us are going to the bar, wanna come?”

(And if my friends in San Diego are reading this…  no.  Not even close.  But May is just around the corner isn’t it?  I think it’s almost time I come visit again.)

I also got to meet one of our other clients in person on this trip – they very kindly took me to dinner at an awesome burger joint and bar.  Thanks AH and B!  Very nice and intelligent people – even more so in person than in email.

Anyway, it’s midnight and I’m still in the air (flight delayed) with a busy morning.  So I’m gonna give myself one day of respite and go live with a real post on Friday.

Next week, back to Tuesday/Thursday schedule.


My Experiences with Hosted PowerPivot, Part One

February 21, 2012

Guest post by David Churchward

Pivotstream Dashboard Application

One of Azzurri Communications Ltd’s PowerPivot
Applications Running in the Browser

Six Months Ago:  The “Lightbulb” Moment

Almost exactly six months ago, after being a long time reader of this blog, I emailed Rob and asked him a question regarding something that I just couldn’t get my head around in DAX – Banding!  He kindly responded, and his answer solved my problem, so I asked him if I could return the favour somehow.  He asked if I’d mind writing it up as a guest post, which I did.

Now, double-digit guest posts later, I’m amazed at how far I’ve come in short order.  Something definitely “clicked” for me that day, and my grasp of PowerPivot’s capabilities expanded rapidly.  It felt like that moment that I imagine Pianists reach where they can suddenly play by ear, because whilst I could conquer most things in DAX, it didn’t seem to quite “flow” – until that day!

Up until that point I had viewed PowerPivot as a “private” tool – something that was useful for me in my work, a supplement to other tools and methods.  But starting six months ago, I started to understand that PowerPivot could, and SHOULD, be used to improve or replace most of our existing Business Intelligence and Analytics tools.

Step One:  Azzurri Deploys its First “On Premise” SharePoint Server

At Azzurri, I am fortunate to enjoy two critical flavours of support:

  1. Our executive team is open-minded to progressive ways of doing things (so long as there is a solid value proposition).
  2. My tech team is a crack outfit who will bring me the moon if I ask for it, but tend to make reasonable alternative suggestions such as building data warehouses.

In other words, Azzurri is the perfect sort of place to deploy PowerPivot for SharePoint, bordering on a textbook example.  There aren’t many companies of Azzurri’s size where I could explain the benefits of a PowerPivot server, win people over, and have a server deployed two weeks later.  But that is precisely what happened Smile

Step Two:  Start Emailing Rob Again Smile

We didn’t just deploy the server, we immediately began USING it for serious work.  And that led to questions – questions about performance.  Questions about hardware.  Questions about customisation.  Questions about refresh.  Questions about “core and thin” workbooks.

Rob and I had a friendly correspondence going at that point, so I started sending those questions over.  I even looped him into email chains with our tech team, and we talked through a number of issues and optimisations.

Step Three:  Try Out Hosted PowerPivot in Parallel

Everything I do in Systems Development, especially with my Finance background, is about Cost V Benefit, ROI, IRR and payback.  With this in mind, I started wondering whether it made sense for us to develop PowerPivot for SharePoint expertise of our own.

We had originally decided to go with our own SharePoint deployment because we had the required licences and a particularly clever team who I had every faith could deliver.  This seemed obvious as SharePoint was already in operation at Azzurri.  My initial view was that it must be relatively straight forward to bring all of the BI tools into the equation.

Two weeks into the process, however, I was already seeing that things might not be as straight forward as I had first hoped.  Performance was the first major barrier that I hit and I couldn’t be entirely sure what kind of investment in hardware might be required to alleviate this.  Out of nowhere, PowerPivot gallery started playing games which turned out to be an IE9 issue and then I was introduced to Kerberos which, it turns out, isn’t a breakfast cereal that I was yet to encounter!

I knew about the Pivotstream Hosted Solution of course, and I still wasn’t ready to commit to hosting, but I decided that running a trial in parallel made a lot of sense, especially since I was particularly aware that my tech team needed to be doing other things.

I’m very glad that we decided on a trial, because step four was to switch over full-time.

Goodbye “Do it Yourself”, Hello SaaS

The journey I’ve been on as a customer of Pivotstream has validated for me that the SaaS model together with the capability of PowerPivot makes for a more compelling business solution for reporting and BI than any alternative that I can find.

I’ve been particularly conscious of making sure that my tech team spend their time where they can really drive business value – building Data Warehouses, ETL and efficient business processes.  It was clear that time spent developing SharePoint Server was time not spent adding value elsewhere.  There’s no doubt in my mind that they would have delivered, but I knew that they could deliver more value elsewhere to more than offset any cost of hosting.

Summary of Our Experience

Once I had taken the decision to try out Pivotstream’s hosted solution, it became clear that “elapsed time” taken was no longer going to be a constraint to the project.  On that same day, Azzurri had it’s own Pivotstream site in full working order with admin and consumer accounts setup for the trial.  It was now down to me to start making this a fully functional dashboard.

Naturally, I had workbooks at the ready and I loaded a few up immediately.  I started sniffing some of the additional features that I could now start playing with.  Before I knew it, I was canvassing Pivotstream for direction on Query String URL filtering (an awesome attribute to drill across to other dashboards).  A handy guidance document found it’s way into my inbox and I was away.

I was supplied with a program to split core and thin workbooks, another gem that just saves time and aggravation.  I served up a query with web part layouts and, next day, I get a new page layout deployed straight to our site.

Immediately, the focus of what I was delivering was about end user usability as opposed to finding ways around potential (and in some cases very evident) performance issues.  Performance was immediately apparent on the Pivotstream solution, as could be immediately seen by some of my more “chunky” analysis that didn’t even make it flinch.  My in-house SharePoint Server could take upwards of a minute to open these workbooks whereas the hosted solution barely registers seconds.

Within a matter of days, I realised that the limits of this solution only existed to the extent of the limits of my imagination in creating dashboards.  All of a sudden, my focus was turned on making sure that full value was derived and, to that end, I started spreading the word within Azzurri.  Some initial training took place and I immediately recognised that these clever individuals that I was working with had even more insight bursting to get out and the fact that we were playing in Excel meant that they could immediately relate to what they were being shown.  I had hit that fantastic point in the project where momentum starts taking over and this is probably less than two weeks after starting the trial.

Speed (both of implementation and application), elimination of complexity and additional value adding applications delivered in a scalable data-centre model with an OPEX cost model sums it up for me.  Now, it’s just about making the dashboards deliver the real benefit to the business – insight!

As I’ve been writing this, Rob’s reminded me of a comment I made back towards when this whole thing started:

“My key driver is laziness so I’m always looking for quicker and better ways to do things. In doing so, I find myself working non-stop so I may have my driver wrong or I’m failing miserably!”

The reality is that Hosted PowerPivot does do it quicker and better.  I’m working non-stop because the results speak for themselves and I fundamentally “get it”.  The reality is that my driver was wrong!


Converting from Peaks to Frequencies

February 16, 2012

image32

Sniff something? Did ya, rat boy?”

Continuity!

Folks I just can’t resist a quick followup on the Rat Sniffing Project.  I…  just… can’t.  Plus I am absolutely worn down and don’t have the energy tonight for anything that isn’t inherently entertaining.

All of those calculated columns from the last post, remember, were just the setup so we could start doing some REAL observational stuff.

You know, something like this:

Measuring Event Frequency in Hz in PowerPivot

This wasn’t difficult at all.

Remember that I now had a column that flagged each row as “1” if it represented the peak of an inhalation:

image

Each Row is 0.01 second, Flagged as 1 if
it’s a Peak Inhale, aka a “Sniff”

So now I really just need some measures.

The First Measure I’ve Ever Written in the Unit of Hertz!

(Hmmm.  There’s a joke in here somewhere, the first line of which is something like “what do you do if your unit Hertz?”  Listen, I told you I was tired.)

Since I have a 1 in the RobPeak column, summing it will yield the number of peaks in an interval – I don’t need to do a CALCULATE(COUNTROWS()) with a filter set to 1.

So the measure formula for Hz (events per second) is:

(SUM(Data[RobPeak]) / COUNTROWS(Data)) * 100

Why times 100?  Because each row is actually 0.01 second.

Now THIS is a *Time* Table

This is another first:  creating a separate time table that is NOT measured in days.  In this case…  hundredths of a second:

A Time Table in PowerPivot - Not Dates.  Seconds.

Not a Calendar Table – a Time Table.  In  0.01 Second Increments

OK, it’s not a Calendar table but it is VERY similar.  Just like a Calendar table has columns like DayofWeek, DayOfMonth, CalendarYear, etc. – those columns represent properties of dates.  Well, the columns I have here like Second and FiveSecond are the same sort of thing.

And since I have 6 rats in the experiment, it is wasteful to duplicate those properties in my Data table.  Plus there’s that whole speed thing.

So I created this separate table and related it to my Data table.

And this table then “powers” my slicers like these:

Slicers Based on a Time Table Measured in Minutes and Seconds

Slicers Based on a Time Table Measured in Minutes and Seconds

These slicers represent a navigation method more than they do a “filter” in the classic sense.  I say that because fields from the time table are on the axis of the chart as well:

image

The net effect is that I can use my slicers to quickly move around the data and examine “windows” that are interesting or relevant.  Couple of quick clicks and I am now looking at minute 4 instead of minute 3, and just the first 10 seconds of that minute:

image

Quickly “Jumped” to the 4:00-4:10 Interval

It’s ALMOST like the slider control on a YouTube video:

image

YouTube Has a Time Slicer Control Too!

Wouldn’t it be neat if Excel gave us a YouTube-style slicer for time?  We can dream.

Side Note:  Rats Breathe FAST!

Check those charts out.  Between 5 and 9 Hz???  Really?  Those little rascals inhale between 5 and 9 times per second, on a SUSTAINED basis?  Wow.  I’m easily amused I guess, but wow.


Peak Detection: A Surprising Usage of PowerPivot

February 14, 2012

 
image

“Who are YOU, who are so WISE in the ways of SCIENCE?”

It’s all about following through…

In a recent post I covered a very simple scientific scenario.  It was an interesting diversion from the normal biz-style scenarios but it left me feeling hollow in two ways.  First, it was too simple and didn’t account for the possibility of multiple different treatment types, so I pinged the Italians.  (They responded, as they always do, and I owe a post on their reply, probably Thursday).

But the other problem was that, while claiming to be a scientific scenario, it was manufactured by me, and hey, I’m no scientist.  Then it hit me…

Hey!  My Neighbor is a REAL Scientist!

Yeah, Dan Wesson is a “he-runs-his-own-research-lab-at-the-university, is-published-in-prestigious-science-journals” kind of scientist.  So I decided to bounce the question off of him – was my example scenario at all useful, and if not, what WOULD be useful?  Boy, am I glad I asked.

Perhaps My Favorite PowerPivot Visualization Yet

PowerPivot Used for Peak Detection - My New Dining Room Mural

Click for Full Size!

First of all, HOW COOL IS THAT?  It looks like the double helix or something.  I joked with my wife that I was gonna have it turned into an eight-foot mural for the dining room.  (Yeah, I was joking sweetie.  Totally joking, I would never do something like that).

What do the Values Mean?

image

These are measurements of respiration taken from rats in Dan’s lab.  Negative values indicate “inhale” and positive indicate “exhale.”

Each value represents one-one-hundredth of a second – how’s that for time intelligence? Smile

What Does “Peak Detection” Mean?

In order to make sense of the data, Dan needs to measure the frequency of how often the rats “sniff.”  Sniffing is an indicator of all sorts of things apparently – a sign of interest, a sign of cognition, a sign of dominance…  I had no idea.

Anyway, he needs to measure the number of “sniffs” the rats take in a given time frame, and in order to do that, he needs to count the negative peaks in the breathing data:

image

PowerPivot Can Do This?  Yes, Better Than the Professional Tools!

I went into this mostly expecting that PowerPivot was not going to be a good tool for his needs.  And as usual, PowerPivot surprised me.

Not only can PowerPivot do this, it outperforms expensive specialized software.  Software that is designed to do precisely this task.  Let that sink in for a moment:

PowerPivot Outperforms Expensive and Specialized Scientific Software

$10,000 Specialized Scientific Software Often “Misses” the Peaks,
PowerPivot is Dead On

How Did I Do It?

It actually wasn’t that hard.

Given a data set like this with 180 thousand rows:

PowerPivot Six Rats’ Breathing Measured 100 Times per Second for Five Minutes

Six Rats’ Breathing Measured 100 Times per Second for Five Minutes

I needed to generate a 1/0, True/False column identifying whether a given row was a “sniff” peak.

I ended up creating a TimeID column so that I could work in integers:

image

Multiply by 100 So I Can Work in Integer Time Increments

And then I wrote the following calc column formula:

[IsPeakCandidate]

=IF(OR([TimeID]=0,[value]=0),0,
IF([value] > 0,0,   
    IF(
       AND(
           CALCULATE
             (MIN([value]),
             FILTER(ALL(Data), Data[Rat]=EARLIER(Data[Rat])),
             FILTER(ALL(Data),
                Data[TimeID]=EARLIER(Data[TimeID])-1)
             )
           >[value]
           ,

           CALCULATE
             (MIN([value]),
             FILTER(ALL(Data), Data[Rat]=EARLIER(Data[Rat])),
             FILTER(ALL(Data),
                Data[TimeID]=EARLIER(Data[TimeID])+1)
             )
           >[value],

          )
       ,1
       ,0
      )
   )

)

What is that formula doing?

Ok that looks complex.  But all it’s doing is asking three questions:

  1. Is the CURRENT row’s value less than zero?
  2. Was the PRIOR row’s value greater than this row’s value?
  3. Is the NEXT row’s value also greater than this row’s value?

If all three are “yes,” then this row represents a negative peak.

Flaw:  A “Hiccup” Becomes a Peak

Unsurprisingly, it turns out that formula wasn’t quite good enough:

image

In Basketball This is Called This a “Pump Fake”
(In American Football, a “Juke,” in Hockey, a “Deke…”)

So I added another calculated column:

[MinOverInterval]

=CALCULATE(MIN(Data[value]),
           ALL(Data),
           FILTER(ALL(Data),
                  Data[Rat]=EARLIER(Data[Rat])
           ),
           FILTER(ALL(Data),
                  Data[TimeID] <= EARLIER(Data[TimeID]) +5 &&
                  Data[TimeID] >= EARLIER(Data[TimeID]) –5
           )
)

For each row in my data that calculates what is the minimum value over an 11-row interval – five rows before, five rows after, and the row itself:

image

MinOverInterval:  The Smallest Value in the 11-Row Window

One Last Column!

Then, tying it all together, the last calc column is this:

[IsRealPeak]

=IF(
   AND(
       Data[PeakCandidate]=1,
       Data[MinOverInterval]=Data[Value]
      )
   ,1,0
)

This column says “if this row was flagged as a peak candidate already, AND it is the most negative row in its 11-row interval, then we bless it as a REAL peak row.”

And if I use that column instead, that fixes the false peaks:

image

Some Final Notes

First, I suspect that I can simplify my calc columns a bit.  There was some exploratory trial and error in this process and it might be that the “MinOverInterval” test is ALL I need, since there’s no way a row can be its minimum in the interval and NOT be more negative than its immediate neighbors.

But hey, I wanted to get this post out ok? Smile

Second, yes, these calc columns are something I’d ideally want to have done in a database.  But since I want to give Dan a completely self-contained toolset, I had to do it in calc columns.

Lastly, this “peak detection” is NOT the end of the road!  Far from it.  It’s just the beginning.  Now that we have a reliable “peak flag” column in our data, we can start doing the normal PowerPivot thing – measures of frequency, slicing by properties of the rat itself (age, sex, etc.), properties of the trial… 

So I expect Pivotstream will be doing some real work for Dan’s lab, helping support his Alzheimer’s research.  All starting from a quick conversation in Dan’s backyard over the weekend.  How cool is that?


More Live PowerPivot Web Apps!

February 9, 2012

 
Excel Pro Turned Web Developer with a SINGLE CLICK!  MUHAHAHAHA!

“IT’S ALIVE!!!!”
(Just your average Excel Pro after converting
his first PowerPivot Workbook into a Web App)

I’ve got another article about to go live on CIMA Insight, but I’m gonna jump the gun a bit and post basically the entire thing here ahead of time.

At Pivotstream we recently went live with our first full-time demo site for Hosted PowerPivot.  We’re going to be adding to it over time, but it’s got enough on it already that I think it’s worth looking at – it shows that “spreadsheets have become live web applications” thing that has to be seen, live, to really sink in.

Customizable Homepage

Hosted PowerPivot Demo - PowerPivot for SharePoint Made Simple

Pretty self-explanatory – customizable content plus a menu of applications.  Link below. 

(OK, one note:  I say “customizable” but customization is only allowed for Authors/Owners of the site.  What you are seeing here in the Consumer experience, and Consumers cannot customize this home page).

https://insights.hostedpowerpivot.com/sites/Demo/Pages/default.aspx 

App #1: Based on Microsoft’s “AdventureWorks” Sample Data Set

This is the workbook from the Budget vs. Actuals Part One and Part Two posts. Here’s a picture of that same workbook after it’s been saved to a SharePoint web server and then accessed in my browser:

Hosted PowerPivot Demo - PowerPivot for SharePoint Made Simple

Notes on this application:

  1. This is just an XLSX file, created with PowerPivot and then saved to the server
  2. But it’s rendering in my browser (Firefox in this case), and the XLSX is NOT being downloaded to my computer
  3. I do NOT have to have Excel installed on my computer in order for this browser application to function
  4. This picture is of the menu page of the app, which is just a worksheet with the gridlines and headers turned off
  5. The graphics are two image files inserted into the sheet via Insert.Picture on the ribbon
  6. The four hyperlinks are merely links to other worksheets within the workbook

If I click the “Sales vs. Budget” hyperlink, I am taken to the report I built in last month’s article:

Hosted PowerPivot Demo - PowerPivot for SharePoint Made Simple

Click the slicer – it works Smile

Try this application out here:

https://insights.hostedpowerpivot.com/sites/Demo/Pages/Adventureworks-Sample.aspx

Application #2: Retailer Overlap Competitive Analysis

This application is based on two real-world data sets. One is a list of the addresses of almost every retail food/drug store in the United States. The other is a detailed list of demographic information about every ZIP code (postal code) in the United States.

Blend them together in PowerPivot and you get an application like this:

Hosted PowerPivot Demo - PowerPivot for SharePoint Made Simple

Note that I have selected the two warmest temperature ranges, circled in orange. If I click the link at the top of the sheet I then see the following analysis:

Hosted PowerPivot Demo - PowerPivot for SharePoint Made Simple

Notes:

  1. I have the application set as if “I” were Walgreens, and evaluating my competitors’ geographic overlap with my Walgreens retail locations
  2. The first column indicates to me that CVS competes with me (Walgreens) for 41.5% of the potential customers that I try to reach
  3. In other words, 41.5% of the people who live near a Walgreens also live near a CVS
  4. The second column reports that Walgreens competes with me for 47.3% of my customers in Warm locations
  5. So the first column ignores the slicer selections I made on the menu page, and the second column respects them!
  6. The third column represents the delta between columns 1 and 2. Interesting, for instance, that Rite Aid does NOT compete with me at all really in warm locations
  7. Try it out, slice away – there are many ways to slice and analyze this data set

Application is located here:

https://insights.hostedpowerpivot.com/sites/Demo/Pages/Retailer%20Overlap%20Analysis.aspx

Application #3: CRM Analyzer

This one is also based off of a sample data set, but it is one pulled from a popular CRM package and therefore represents real-world value.

Hosted PowerPivot Demo - PowerPivot for SharePoint Made Simple

Note that there are multiple reports in this application as well, plus a menu page, but I’m just showing one here for space reasons.

Application can be tried out here:

https://insights.hostedpowerpivot.com/sites/Demo/Pages/CRM-Analyzer.aspx

Application #4: UFO Sightings

Many of you have seen this one already, on Mr. Excel’s Hosted PowerPivot site, but we put it on this demo site as well.

Hosted PowerPivot Demo - PowerPivot for SharePoint Made Simple

Application is located here:

https://insights.hostedpowerpivot.com/sites/Demo/Pages/UFO-Sightings-Real-Data,-Humorous-Conclusions.aspx


Lookups based on date ranges

February 7, 2012

Another one from the Mr. XL Forum!

…and the word “forum” gives me an excuse for a blast from the past…

PowerPivot Court

“What you are about to witness is real. The participants are not actors.  They have agreed to dismiss their tech support cases and have their questions settled here, in our forum:
THE POWERPIVOT COURT

(Click here for theme music!)

The much-neglected calculated column

OK, neglected by me, not by everyone else.  At Pivotstream I have the benefit of a great SQL team.  Generally speaking, when I need a calculated column, I ask them to provide it.  The benefits of doing that are documented several places on this site, including here and continued here.

But hey, not everyone has a database rapid-response team at their disposal Sad smile

So questions like this one come up a lot on the forums, reminding me of my blind spot:

You have two tables of data.  In one, you have a list of your clients and their effective discount rate during particular date ranges – marked by a start and an end date.  If the End date is blank, that discount rate is assumed to still be active.

image

Rates Table:  Discount Rate per Client,
Over Distinct Date Ranges (Start and End Effective Date)

Then there’s a second table, listing Clients and the days that you called each of them:

Multiple Clients, and Multiple Calls to Each Client in PowerPivot

Calls Table:  Multiple Clients, and Multiple Calls to Each Client

Desired Result

You want to add a calculated column to that Calls table which contains that client’s effective discount rate on the date of the call:

Calc Column in PowerPivot Looking Up a Value Based on Date Ranges in Another Table

We Want to Add the Highlighted Calc Column – What’s the Formula?

So… what’s the formula?

=CALCULATE(AVERAGE(Rates[Rate]),
   FILTER(Rates,
          Rates[Start]<=Calls[ContactDate] &&
          Rates[EffectiveEnd] >= Calls[ContactDate] &&
          Rates[Client]=Calls[Client]
   )
)

Hey, That’s a Lot Like a Measure!

Yeah, it IS a lot like a measure.  It uses CALCULATE, the wonder function.

But in this case, we’re just using CALCULATE to apply filters in our calculated column.

Works basically the same way, though.  The three clauses in the FILTER() function are all AND’d together using the && operator, so that only rows from the Rates table meeting the following three criteria are “kept” and then “fed” to the AVERAGE function:

  1. The Start date in the rates table must be prior to (or equal to) the Contact Date in the Calls table.
  2. The End date in the rates table must be after (or equal to) the Contact Date in the Calls table.
  3. The Client ID in the rates table must match the Client ID in the Calls table

Note for those who desire deeper understanding:  The biggest difference between this calc column formula and a similar formula we’d have to write in a measure is that in a measure, we’d have to use the VALUES() function wrapped around the right hand side of the criteria – VALUES(Calls[ContactDate]) rather than just Calls[ContactDate]).  And we’d have to “protect” those VALUES functions with an IF(COUNTROWS()) since you can’t use VALUES() as part of an “=” test, except when there is only one value returned, or maybe use MAX() instead of VALUES().  The point is that in a calc column, we DO have a current row in the table, so we don’t need the wrapper function like we do in a measure.

What’s that EffectiveEndDate column?

Yeah I sneaked that one in.  It’s another calc column I added to the Rates table:

image

You can see its formula in the formula bar in the image above.  I just wanted to put a non-blank value in whenever End was blank.  You could hard-code a date in there if you wanted, or use TODAY() and a much bigger number than 365.  I just figured that setting a date one year in the future from today is good.  Keep in mind though that TODAY() will only be re-evaluated when you refresh your data or force a re-calc.

Is There a Relationship Between Those Two Tables?

No, there is not.  Neat huh?  And really, there couldn’t be anyway – each Client ID appears more than once in each table, and PowerPivot doesn’t handle relationships on “many to many” like that.  The date columns are even less “relationship friendly” since they don’t match up at all, and are based on ranges in one table and single dates in the other.

Download the Workbook

I’m trying to do this more often, especially with the forum questions.

Download the workbook here