Well the wait is over. It is now released, for real. No longer a beta or RC. The real deal.
Other important links:
Well the wait is over. It is now released, for real. No longer a beta or RC. The real deal.
Other important links:
It’s *ALMOST* That Easy ![]()
It’s been awhile since I’ve talked about Macros (also known as VBA). I think it’s overdue.
Macros are nothing short of amazing. We couldn’t live without them at Pivotstream.
That’s right, they don’t. And everything we do at Pivotstream eventually lands on the server. So why do we use macros?
Simple: we use macros to more efficiently create and modify our workbooks. Macros are a “design time” tool for us, not a “run time” tool.
And they have saved us probably decades of work. I’m not exaggerating.
Here’s a pretty simple macro that runs through every PivotTable on every visible worksheet and does whatever you want to each pivot:
Sub ModifyAllPivots()
Dim Pivot As PivotTable
Dim Sheet As Worksheet
For Each Sheet In ActiveWorkbook.Worksheets
If Sheet.Visible = xlSheetVisible Then
For Each Pivot In Sheet.PivotTables
AutoPadPivot Sheet.Name, Pivot.Name, 10
GrandTotalsBottomOnly Sheet.Name, Pivot.Name
Next
End If
Next
End Sub
I emphasized the “payload” of the macro – for each pivot the macro finds, it “pads” the columns of the pivot to be wide enough, and sets the pivot to display grand totals only on the bottom of the pivot (and never on the right).
Those two lines are macros that I also wrote, and I will include them below.
At Pivotstream, we tend to ALWAYS turn off the “auto-fit columns on update” setting on our pivots:
Making Pivots NOT Change Column Widths On Slicer Clicks Etc
This yields a much more pleasant “application-like” result – clicking a slicer never results in things jumping around. I highly recommend it.
But this DOES lead to a problem. If your numbers suddenly grow by a digit in the future, you can get something like this:
One of the Drawbacks of Turning off Autofit Column Width
To account for this, we have a macro that runs through every pivot in the workbook and “pads” the column width by a percentage:
Sub AutoPadPivot(sSheet As String, sPivot As String, iPct As Integer)
Dim oPivot As PivotTable
Dim oSheet As Worksheet
Dim r As Range
Dim rCurr As Range
Dim iCol As Integer
Set oSheet = ActiveWorkbook.Worksheets(sSheet)
Set oPivot = oSheet.PivotTables(sPivot)
Set r = oPivot.DataBodyRange.Rows(1)
For Each rCurr In r.Cells
iCol = rCurr.Column
oSheet.Columns(iCol).EntireColumn.AutoFit
oSheet.Columns(iCol).ColumnWidth = oSheet.Columns(iCol).ColumnWidth * (1 + (iPct / 100))
Next
End Sub
At the beginning of this post, you saw an example where I called this macro to pad each column by 10 percent. But I can pad by 5, 15, whatever I want. And you can easily imagine a version that pads by an absolute amount rather than a percentage.
This one is more self-explanatory:
Sub GrandTotalsBottomOnly(sSheet As String, sPivot As String)
Dim oPivot As PivotTable
Dim oSheet As Worksheet
Set oSheet = ActiveWorkbook.Worksheets(sSheet)
Set oPivot = oSheet.PivotTables(sPivot)
oSheet.PivotTables(sPivot).RowGrand = False
End Sub
Want one more? OK. You talked me into it. We have dozens, many of which are quite ambitious. Simpler macros make for better blog posts though, so…
Sub CreatePageFilterForEachSlicerOnPivot(sSheet As String, sPivot As String)
Dim oSlicer As Slicer
Dim oSlicerCache As SlicerCache
Dim sField As String
Dim oPivot As PivotTable
Dim oSheet As Worksheet
Set oSheet = ActiveWorkbook.Sheets(sSheet)
Set oPivot = oSheet.PivotTables(sPivot)
For Each oSlicer In oPivot.Slicers
sField = oSlicer.SlicerCache.SourceName
oPivot.CubeFields(sField).Orientation = xlPageField
Next
End Sub
Why would I want to create a page filter for very slicer on the pivot? Well, primarily so you can “harvest” the slicer selections in formulas. But there are other reasons you might do this as well, which I will likely cover in the future.
What’s that, you say? You never need to pad pivots, switch their grand total settings, or add page filters for every slicer? Not satisfied eh? Well, you can make your own!
If you have never recorded a macro, seriously, it’s SOOOO easy. You should try it.
First you will need to enable the Developer ribbon by going to File|Options|Customize the Ribbon.
Once you’ve done that, here’s how you get started:
Going Into Macro Recording Mode
(Note That I Named the Macro Based On What I Am Going to Do Next)
Now I do something to the pivot. In this case, I switch my Pivot to one of the “Medium Green” Styles:
Changing My Pivot to a Different Style
(While the Macro Recorder Watches My Every Move)
Now I can stop recording:
Stop Recording the Macro
Just click the Macros button, select your macro, and then click Edit
And here it is:
Sub ChangeToGreenStyle()
‘
‘ ChangeToGreenStyle Macro
‘
‘
ActiveSheet.PivotTables(“PivotTable3″).TableStyle2 = “PivotStyleMedium4″
End Sub
There’s really only one line in the macro that DOES anything, so I highlighted it.
Note that it’s “tied” to a pivot named “PivotTable3.” Not all of your pivots will be named that of course. And it only works on the ACTIVE sheet. So it won’t work if you try to loop through using the ModifyAllPivots macro.
So, you can modify it to look like the other macros I showed above (GrandTotalsBottomOnly, etc). Change the macro to be:
Sub ChangeToGreenStyle (sSheet As String, sPivot As String)
Dim oPivot As PivotTable
Dim oSheet As Worksheet
Set oSheet = ActiveWorkbook.Worksheets(sSheet)
Set oPivot = oSheet.PivotTables(sPivot)
oSheet.PivotTables(sPivot).TableStyle2 = “PivotStyleMedium4″
End Sub
The stuff in grey is just copied from the other macros. The red is a replacement (also copied from the other macros) for the part the recorder set to Active Sheet and PivotTable3.
That macro can now be called from within ModifyEveryPivot just like the others were:
…
For Each Pivot In Sheet.PivotTables
AutoPadPivot Sheet.Name, Pivot.Name, 10
GrandTotalsBottomOnly Sheet.Name, Pivot.Name
ChangeToGreenStyle Sheet.Name, Pivot.Name
Next
…
Off you go ![]()
Busy week here at the MVP Summit in Redmond. As usual, I can’t tell you anything I have learned this week – it’s strictly under NDA. But I have seen some really exciting things – at one point today I screamed out loud “YES!” in a packed room, and a little while later I said “I don’t think we can be stopped if we had something like this.” Where the “we” means us – me and you.
Enough of that. Here’s one from my archived list of “topics to cover on the blog at some point.”
Check out this relatively simple pivot:
Note that the subtotals for [Sales per Day] do not equal the sum of their parts. 2002’s total is $17,891 but if I add Mountain Bikes plus Road Bikes myself, I get a number closer to $19,000:
Totals don’t add up!
The formula for [Sales per Day] is:
[Sales per Day] =
[Sales] / COUNTROWS(DISTINCT(Sales[OrderDate]))
Which, in English, equates to “My Sales measure divided by the number of days for which I have sales transactions.”
Well, if the “number of days for which I have sales transactions” is different for Mountain Bikes vs. Road Bikes, that will cause this problem. I can illustrate by adding that as a separate measure:
[Days I have Transactions] =
COUNTROWS(DISTINCT(Sales[OrderDate]))
The Reason Why Sales per Day Doesn’t “Add Up”
So I only sold Mountain Bikes on 96 days in 2001, and Road Bikes on 180 days, but on 181 days, I sold at least one bike of any sort. This is why it doesn’t add up.
Some of you will no doubt have this question already: Isn’t it more accurate for [Sales per Day] to be based on the number of days in the calendar rather than the number of days on which it sold?”
And my answer is “probably, but it depends, AND this is a good example of a problem you WILL hit sooner or later in completely legitimate cases.”
For instance, if I opened my store in July 2001 and didn’t offer Mountain Bikes at all until September, I don’t want to divide either of my Sales amounts by 365, and I certainly don’t want to divide Mountain Bike sales by 180.
Even then though, I should probably have a separate table like Inventory or maybe [Start Date] and [End Date] columns in my Products table, and use those to create a measure named [Days Offered] measure, and use THAT as my denominator instead. That would be the fairest/most accurate approach.
But it would STILL have this same “doesn’t add up” problem. So let’s move on to a fix.
Let’s write a new measure:
[Sales per Day FIXED] =
IF(COUNTROWS(VALUES(Category[Name]))=1,
[Sales per Day],
SUMX(VALUES(Category[Name]), [Sales per Day])
)
In English, this says
“If I am in a pivot cell that corresponds to a single product category like Mountain Bikes, then just use the normal [Sales per Day] measure. But if the pivot cell corresponds to more than 1 category, that means I am on a total, and then I want to have the total be the sum of all of the individual categories beneath it.”
For details on the whole “IF COUNTROWS” thing, see this post.
For details on SUMX, see this post.
Does it work? You bet:
The Second Measure Works
Pretty slick. There are shortcomings to this of course. If I put something other than Category on the pivot, the measure won’t work right. It is “tied” to the Category field. So it isn’t quite as portable as most measures, but it still is amazingly useful when you need it.
Precisely-Timed Measurements vs. Imprecisely-Timed Events:
How Do We Correlate/Relate Them?
Well it was a full six-month run of posting every Tuesday and Thursday, Cal Ripken-like consistency. Then last week it ended. I was just drained. Oh well, time to start a new streak! Back to Tuesday and Thursday. So let’s dive in…
Calculated columns. I am continually realizing how much there is to know about them – stuff I haven’t really been forced to learn since I rely so heavily on Pivotstream’s database team. My recent work with scientific and medical data sets, starting with the really simple fake data set and then moving into the rat sniffing data, has really driven this home.
So let’s do a quick treatment of fetching data from one table into another, using calculated columns, and in different situations.
OK, I’ve covered this one before. When you have two tables and a relationship between them, you can fetch data from the “lookup” or “reference” table into your data table using the RELATED() function.
But there are really only a few places where you SHOULD do this. You can already include the “lookup” column in your pivots without fetching it into the data table, so there’s little benefit of fetching it. And doing so will just make your file bigger AND your pivots slower.
So really, I just do this when I’m debugging or exploring my data set. When I’m done, I delete the column.
OK, back to the rat data
. This will parallel many other real-world examples, such as marketing campaigns etc. So bear with me even if you don’t find scientific measurements to be interesting.
I have one table called “Sniffs” which records how often (and how deeply) the rats were inhaling:
Sniffs Table: Coded by RatID, SessionID, and TimeID
(Where Each TimeID Represents 1/100 of a Second)
And then I have an Events table. “Events,” in this case, are things that the rats are doing. Like… sniffing each other. In various places.
Events Data – Entered by Humans Watching Time-Coded Video of the Rats
(Note that Time is Recorded in Seconds, not Hundredths of Seconds As in the Sniffs Table)
Event type 4, for instance, is “Rearing” – the rat raising up on its hind legs and sniffing its surroundings. This event is recorded by someone in the lab watching time-coded video.
THE GOAL: We want to see how breathing frequency and intensity (data from the Sniffs table) changes during such events – does a rat sniff more or less when Rearing than normal? (And in Retail, the parallel would be something like “do we get decent lift from our ad campaigns” or “how do spending habits change around a particular event like the Super Bowl?”)
Well, you can’t create a relationship between these two tables, for a few reasons.
In a single second of elapsed time, there will be 100 rows of data in the Sniffs table, and at most 1 row in the Events table. Right there, we have a problem.
Then again, each 0.01 second DOES map to a “full” second, in much the same way as a day maps to a year. The real problem here is…
Even if we lined up each 0.01 second from Sniffs with a full second from Events, well, I can have multiple rows in Events that all have the same value for TimeInS, because each row is only unique when you consider RatID and SessionID.
Hmmm… so how about I just concatenates RatID, SessionID, and TimeInS to form a unique composite column in Events, and then do something similar in the Sniffs table, and THEN I create my relationship?
There are two problems with that – one minor, and one fatal.
The minor problem: this will make my file quite a bit larger, by adding a column to my largest table (Sniffs). And it’s the worst kind of column to boot: a calculated column with a lot of unique values. Plus, it will form the basis for a relationship, which means my pivot performance will potentially be slow.
The fatal problem?
First of all, the video of the rats’ behavior will never be precisely aligned with the scientific equipment that is measuring the rats’ breathing patterns. There is some “wiggle” in that synchronization, maybe as much as half a second.
Secondly, this is compounded by human error and variability – the act of a rat rearing up to sniff its environment might cover a full two seconds elapsed, but the human being coding the video must pick a single second to flag the behavior.
Taken together, we can see that there’s really a fuzzy “range” defining when the event occurs.
Let’s say I define the “Event” as lasting one second – 0.5 second before and after the time recorded in the Events table.
Then I add calc columns to my Events table that reflect that:
Where the *100 is to convert to the “centiseconds” granularity that I use in my other table (the Sniff table). PostTimeID is the same, except +0.5 rather than –0.5
Then I can add a calc column to my Sniffs table that “fetches” a corresponding event ID from the Events table whenever there was an event “in progress” for the current sniff row:
=CALCULATE(MAX(Events[EventType]),
FILTER(Events,
Events[RatID]=EARLIER(Sniff[RatID]) &&
Events[SessionID]=EARLIER(Sniff[SessionID]) &&
Events[PostTimeID]>=EARLIER(Sniff[TimeID]) &&
Events[PreTimeID]<=EARLIER(Sniff[TimeID])
)
)
***UPDATE: I realized that EARLIER() was NOT needed in this formula. See this post for an explanation.
Notes on this formula:
That gives me a result in my Sniff table that basically looks like I had a relationship between the two tables and used =RELATED()
OK folks. I’m tired and about to drop. But hopefully this inspires some thinking, or at least some questions ![]()
Hi folks. I’m on a plane right now, coming back from conducting another on-site, two-day training session. Always an awesome experience – it’s seriously something that spoils me. I like to joke that hanging out with the “early adopter set” for the past two years like I have can give the false impression that the entire world is actually REALLY smart.
Lots of fun quotes from these great folks. Some that jump out at me:
“Our old version of this report was 22 sheets. We just got it down to 2.”
“OK Rob, save that workbook right there. That’s like, ALL of my work for the next two days.” (Not said in a joking manner).
“Tell ya what, why don’t you just… put that file on my thumbdrive before you leave.” (Also not joking).
“The weather in Columbus, Ohio is not bad at all.” (Ironically, not joking).
“Hey a few of us are going to the bar, wanna come?”
(And if my friends in San Diego are reading this… no. Not even close. But May is just around the corner isn’t it? I think it’s almost time I come visit again.)
I also got to meet one of our other clients in person on this trip – they very kindly took me to dinner at an awesome burger joint and bar. Thanks AH and B! Very nice and intelligent people – even more so in person than in email.
Anyway, it’s midnight and I’m still in the air (flight delayed) with a busy morning. So I’m gonna give myself one day of respite and go live with a real post on Friday.
Next week, back to Tuesday/Thursday schedule.
Guest post by David Churchward
One of Azzurri Communications Ltd’s PowerPivot
Applications Running in the Browser
Almost exactly six months ago, after being a long time reader of this blog, I emailed Rob and asked him a question regarding something that I just couldn’t get my head around in DAX – Banding! He kindly responded, and his answer solved my problem, so I asked him if I could return the favour somehow. He asked if I’d mind writing it up as a guest post, which I did.
Now, double-digit guest posts later, I’m amazed at how far I’ve come in short order. Something definitely “clicked” for me that day, and my grasp of PowerPivot’s capabilities expanded rapidly. It felt like that moment that I imagine Pianists reach where they can suddenly play by ear, because whilst I could conquer most things in DAX, it didn’t seem to quite “flow” – until that day!
Up until that point I had viewed PowerPivot as a “private” tool – something that was useful for me in my work, a supplement to other tools and methods. But starting six months ago, I started to understand that PowerPivot could, and SHOULD, be used to improve or replace most of our existing Business Intelligence and Analytics tools.
At Azzurri, I am fortunate to enjoy two critical flavours of support:
In other words, Azzurri is the perfect sort of place to deploy PowerPivot for SharePoint, bordering on a textbook example. There aren’t many companies of Azzurri’s size where I could explain the benefits of a PowerPivot server, win people over, and have a server deployed two weeks later. But that is precisely what happened ![]()
We didn’t just deploy the server, we immediately began USING it for serious work. And that led to questions – questions about performance. Questions about hardware. Questions about customisation. Questions about refresh. Questions about “core and thin” workbooks.
Rob and I had a friendly correspondence going at that point, so I started sending those questions over. I even looped him into email chains with our tech team, and we talked through a number of issues and optimisations.
Everything I do in Systems Development, especially with my Finance background, is about Cost V Benefit, ROI, IRR and payback. With this in mind, I started wondering whether it made sense for us to develop PowerPivot for SharePoint expertise of our own.
We had originally decided to go with our own SharePoint deployment because we had the required licences and a particularly clever team who I had every faith could deliver. This seemed obvious as SharePoint was already in operation at Azzurri. My initial view was that it must be relatively straight forward to bring all of the BI tools into the equation.
Two weeks into the process, however, I was already seeing that things might not be as straight forward as I had first hoped. Performance was the first major barrier that I hit and I couldn’t be entirely sure what kind of investment in hardware might be required to alleviate this. Out of nowhere, PowerPivot gallery started playing games which turned out to be an IE9 issue and then I was introduced to Kerberos which, it turns out, isn’t a breakfast cereal that I was yet to encounter!
I knew about the Pivotstream Hosted Solution of course, and I still wasn’t ready to commit to hosting, but I decided that running a trial in parallel made a lot of sense, especially since I was particularly aware that my tech team needed to be doing other things.
I’m very glad that we decided on a trial, because step four was to switch over full-time.
The journey I’ve been on as a customer of Pivotstream has validated for me that the SaaS model together with the capability of PowerPivot makes for a more compelling business solution for reporting and BI than any alternative that I can find.
I’ve been particularly conscious of making sure that my tech team spend their time where they can really drive business value – building Data Warehouses, ETL and efficient business processes. It was clear that time spent developing SharePoint Server was time not spent adding value elsewhere. There’s no doubt in my mind that they would have delivered, but I knew that they could deliver more value elsewhere to more than offset any cost of hosting.
Once I had taken the decision to try out Pivotstream’s hosted solution, it became clear that “elapsed time” taken was no longer going to be a constraint to the project. On that same day, Azzurri had it’s own Pivotstream site in full working order with admin and consumer accounts setup for the trial. It was now down to me to start making this a fully functional dashboard.
Naturally, I had workbooks at the ready and I loaded a few up immediately. I started sniffing some of the additional features that I could now start playing with. Before I knew it, I was canvassing Pivotstream for direction on Query String URL filtering (an awesome attribute to drill across to other dashboards). A handy guidance document found it’s way into my inbox and I was away.
I was supplied with a program to split core and thin workbooks, another gem that just saves time and aggravation. I served up a query with web part layouts and, next day, I get a new page layout deployed straight to our site.
Immediately, the focus of what I was delivering was about end user usability as opposed to finding ways around potential (and in some cases very evident) performance issues. Performance was immediately apparent on the Pivotstream solution, as could be immediately seen by some of my more “chunky” analysis that didn’t even make it flinch. My in-house SharePoint Server could take upwards of a minute to open these workbooks whereas the hosted solution barely registers seconds.
Within a matter of days, I realised that the limits of this solution only existed to the extent of the limits of my imagination in creating dashboards. All of a sudden, my focus was turned on making sure that full value was derived and, to that end, I started spreading the word within Azzurri. Some initial training took place and I immediately recognised that these clever individuals that I was working with had even more insight bursting to get out and the fact that we were playing in Excel meant that they could immediately relate to what they were being shown. I had hit that fantastic point in the project where momentum starts taking over and this is probably less than two weeks after starting the trial.
Speed (both of implementation and application), elimination of complexity and additional value adding applications delivered in a scalable data-centre model with an OPEX cost model sums it up for me. Now, it’s just about making the dashboards deliver the real benefit to the business – insight!
As I’ve been writing this, Rob’s reminded me of a comment I made back towards when this whole thing started:
“My key driver is laziness so I’m always looking for quicker and better ways to do things. In doing so, I find myself working non-stop so I may have my driver wrong or I’m failing miserably!”
The reality is that Hosted PowerPivot does do it quicker and better. I’m working non-stop because the results speak for themselves and I fundamentally “get it”. The reality is that my driver was wrong!

“Sniff something? Did ya, rat boy?”
Folks I just can’t resist a quick followup on the Rat Sniffing Project. I… just… can’t. Plus I am absolutely worn down and don’t have the energy tonight for anything that isn’t inherently entertaining.
All of those calculated columns from the last post, remember, were just the setup so we could start doing some REAL observational stuff.
You know, something like this:
This wasn’t difficult at all.
Remember that I now had a column that flagged each row as “1” if it represented the peak of an inhalation:
Each Row is 0.01 second, Flagged as 1 if
it’s a Peak Inhale, aka a “Sniff”
So now I really just need some measures.
(Hmmm. There’s a joke in here somewhere, the first line of which is something like “what do you do if your unit Hertz?” Listen, I told you I was tired.)
Since I have a 1 in the RobPeak column, summing it will yield the number of peaks in an interval – I don’t need to do a CALCULATE(COUNTROWS()) with a filter set to 1.
So the measure formula for Hz (events per second) is:
(SUM(Data[RobPeak]) / COUNTROWS(Data)) * 100
Why times 100? Because each row is actually 0.01 second.
This is another first: creating a separate time table that is NOT measured in days. In this case… hundredths of a second:
Not a Calendar Table – a Time Table. In 0.01 Second Increments
OK, it’s not a Calendar table but it is VERY similar. Just like a Calendar table has columns like DayofWeek, DayOfMonth, CalendarYear, etc. – those columns represent properties of dates. Well, the columns I have here like Second and FiveSecond are the same sort of thing.
And since I have 6 rats in the experiment, it is wasteful to duplicate those properties in my Data table. Plus there’s that whole speed thing.
So I created this separate table and related it to my Data table.
And this table then “powers” my slicers like these:
Slicers Based on a Time Table Measured in Minutes and Seconds
These slicers represent a navigation method more than they do a “filter” in the classic sense. I say that because fields from the time table are on the axis of the chart as well:
The net effect is that I can use my slicers to quickly move around the data and examine “windows” that are interesting or relevant. Couple of quick clicks and I am now looking at minute 4 instead of minute 3, and just the first 10 seconds of that minute:
Quickly “Jumped” to the 4:00-4:10 Interval
It’s ALMOST like the slider control on a YouTube video:
YouTube Has a Time Slicer Control Too!
Wouldn’t it be neat if Excel gave us a YouTube-style slicer for time? We can dream.
Check those charts out. Between 5 and 9 Hz??? Really? Those little rascals inhale between 5 and 9 times per second, on a SUSTAINED basis? Wow. I’m easily amused I guess, but wow.
“Who are YOU, who are so WISE in the ways of SCIENCE?”
In a recent post I covered a very simple scientific scenario. It was an interesting diversion from the normal biz-style scenarios but it left me feeling hollow in two ways. First, it was too simple and didn’t account for the possibility of multiple different treatment types, so I pinged the Italians. (They responded, as they always do, and I owe a post on their reply, probably Thursday).
But the other problem was that, while claiming to be a scientific scenario, it was manufactured by me, and hey, I’m no scientist. Then it hit me…
Yeah, Dan Wesson is a “he-runs-his-own-research-lab-at-the-university, is-published-in-prestigious-science-journals” kind of scientist. So I decided to bounce the question off of him – was my example scenario at all useful, and if not, what WOULD be useful? Boy, am I glad I asked.
Click for Full Size!
First of all, HOW COOL IS THAT? It looks like the double helix or something. I joked with my wife that I was gonna have it turned into an eight-foot mural for the dining room. (Yeah, I was joking sweetie. Totally joking, I would never do something like that).
These are measurements of respiration taken from rats in Dan’s lab. Negative values indicate “inhale” and positive indicate “exhale.”
Each value represents one-one-hundredth of a second – how’s that for time intelligence? ![]()
In order to make sense of the data, Dan needs to measure the frequency of how often the rats “sniff.” Sniffing is an indicator of all sorts of things apparently – a sign of interest, a sign of cognition, a sign of dominance… I had no idea.
Anyway, he needs to measure the number of “sniffs” the rats take in a given time frame, and in order to do that, he needs to count the negative peaks in the breathing data:
I went into this mostly expecting that PowerPivot was not going to be a good tool for his needs. And as usual, PowerPivot surprised me.
Not only can PowerPivot do this, it outperforms expensive specialized software. Software that is designed to do precisely this task. Let that sink in for a moment:
$10,000 Specialized Scientific Software Often “Misses” the Peaks,
PowerPivot is Dead On
It actually wasn’t that hard.
Given a data set like this with 180 thousand rows:
Six Rats’ Breathing Measured 100 Times per Second for Five Minutes
I needed to generate a 1/0, True/False column identifying whether a given row was a “sniff” peak.
I ended up creating a TimeID column so that I could work in integers:
Multiply by 100 So I Can Work in Integer Time Increments
And then I wrote the following calc column formula:
[IsPeakCandidate]
=IF(OR([TimeID]=0,[value]=0),0,
IF([value] > 0,0,
IF(
AND(
CALCULATE
(MIN([value]),
FILTER(ALL(Data), Data[Rat]=EARLIER(Data[Rat])),
FILTER(ALL(Data),
Data[TimeID]=EARLIER(Data[TimeID])-1)
)
>[value]
,
CALCULATE
(MIN([value]),
FILTER(ALL(Data), Data[Rat]=EARLIER(Data[Rat])),
FILTER(ALL(Data),
Data[TimeID]=EARLIER(Data[TimeID])+1)
)
>[value],
)
,1
,0
)
)
)
Ok that looks complex. But all it’s doing is asking three questions:
If all three are “yes,” then this row represents a negative peak.
Unsurprisingly, it turns out that formula wasn’t quite good enough:
In Basketball This is Called This a “Pump Fake”
(In American Football, a “Juke,” in Hockey, a “Deke…”)
So I added another calculated column:
[MinOverInterval]
=CALCULATE(MIN(Data[value]),
ALL(Data),
FILTER(ALL(Data),
Data[Rat]=EARLIER(Data[Rat])
),
FILTER(ALL(Data),
Data[TimeID] <= EARLIER(Data[TimeID]) +5 &&
Data[TimeID] >= EARLIER(Data[TimeID]) –5
)
)
For each row in my data that calculates what is the minimum value over an 11-row interval – five rows before, five rows after, and the row itself:
MinOverInterval: The Smallest Value in the 11-Row Window
Then, tying it all together, the last calc column is this:
[IsRealPeak]
=IF(
AND(
Data[PeakCandidate]=1,
Data[MinOverInterval]=Data[Value]
)
,1,0
)
This column says “if this row was flagged as a peak candidate already, AND it is the most negative row in its 11-row interval, then we bless it as a REAL peak row.”
And if I use that column instead, that fixes the false peaks:
First, I suspect that I can simplify my calc columns a bit. There was some exploratory trial and error in this process and it might be that the “MinOverInterval” test is ALL I need, since there’s no way a row can be its minimum in the interval and NOT be more negative than its immediate neighbors.
But hey, I wanted to get this post out ok? ![]()
Second, yes, these calc columns are something I’d ideally want to have done in a database. But since I want to give Dan a completely self-contained toolset, I had to do it in calc columns.
Lastly, this “peak detection” is NOT the end of the road! Far from it. It’s just the beginning. Now that we have a reliable “peak flag” column in our data, we can start doing the normal PowerPivot thing – measures of frequency, slicing by properties of the rat itself (age, sex, etc.), properties of the trial…
So I expect Pivotstream will be doing some real work for Dan’s lab, helping support his Alzheimer’s research. All starting from a quick conversation in Dan’s backyard over the weekend. How cool is that?
“IT’S ALIVE!!!!”
(Just your average Excel Pro after converting
his first PowerPivot Workbook into a Web App)
I’ve got another article about to go live on CIMA Insight, but I’m gonna jump the gun a bit and post basically the entire thing here ahead of time.
At Pivotstream we recently went live with our first full-time demo site for Hosted PowerPivot. We’re going to be adding to it over time, but it’s got enough on it already that I think it’s worth looking at – it shows that “spreadsheets have become live web applications” thing that has to be seen, live, to really sink in.
Pretty self-explanatory – customizable content plus a menu of applications. Link below.
(OK, one note: I say “customizable” but customization is only allowed for Authors/Owners of the site. What you are seeing here in the Consumer experience, and Consumers cannot customize this home page).
https://insights.hostedpowerpivot.com/sites/Demo/Pages/default.aspx
This is the workbook from the Budget vs. Actuals Part One and Part Two posts. Here’s a picture of that same workbook after it’s been saved to a SharePoint web server and then accessed in my browser:
Notes on this application:
If I click the “Sales vs. Budget” hyperlink, I am taken to the report I built in last month’s article:
Click the slicer – it works ![]()
Try this application out here:
https://insights.hostedpowerpivot.com/sites/Demo/Pages/Adventureworks-Sample.aspx
This application is based on two real-world data sets. One is a list of the addresses of almost every retail food/drug store in the United States. The other is a detailed list of demographic information about every ZIP code (postal code) in the United States.
Blend them together in PowerPivot and you get an application like this:
Note that I have selected the two warmest temperature ranges, circled in orange. If I click the link at the top of the sheet I then see the following analysis:
Notes:
Application is located here:
https://insights.hostedpowerpivot.com/sites/Demo/Pages/Retailer%20Overlap%20Analysis.aspx
This one is also based off of a sample data set, but it is one pulled from a popular CRM package and therefore represents real-world value.
Note that there are multiple reports in this application as well, plus a menu page, but I’m just showing one here for space reasons.
Application can be tried out here:
https://insights.hostedpowerpivot.com/sites/Demo/Pages/CRM-Analyzer.aspx
Many of you have seen this one already, on Mr. Excel’s Hosted PowerPivot site, but we put it on this demo site as well.
Application is located here:
…and the word “forum” gives me an excuse for a blast from the past…
“What you are about to witness is real. The participants are not actors. They have agreed to dismiss their tech support cases and have their questions settled here, in our forum:
THE POWERPIVOT COURT
OK, neglected by me, not by everyone else. At Pivotstream I have the benefit of a great SQL team. Generally speaking, when I need a calculated column, I ask them to provide it. The benefits of doing that are documented several places on this site, including here and continued here.
But hey, not everyone has a database rapid-response team at their disposal ![]()
So questions like this one come up a lot on the forums, reminding me of my blind spot:
You have two tables of data. In one, you have a list of your clients and their effective discount rate during particular date ranges – marked by a start and an end date. If the End date is blank, that discount rate is assumed to still be active.
Rates Table: Discount Rate per Client,
Over Distinct Date Ranges (Start and End Effective Date)
Then there’s a second table, listing Clients and the days that you called each of them:
Calls Table: Multiple Clients, and Multiple Calls to Each Client
You want to add a calculated column to that Calls table which contains that client’s effective discount rate on the date of the call:
We Want to Add the Highlighted Calc Column – What’s the Formula?
So… what’s the formula?
=CALCULATE(AVERAGE(Rates[Rate]),
FILTER(Rates,
Rates[Start]<=Calls[ContactDate] &&
Rates[EffectiveEnd] >= Calls[ContactDate] &&
Rates[Client]=Calls[Client]
)
)
Yeah, it IS a lot like a measure. It uses CALCULATE, the wonder function.
But in this case, we’re just using CALCULATE to apply filters in our calculated column.
Works basically the same way, though. The three clauses in the FILTER() function are all AND’d together using the && operator, so that only rows from the Rates table meeting the following three criteria are “kept” and then “fed” to the AVERAGE function:
Note for those who desire deeper understanding: The biggest difference between this calc column formula and a similar formula we’d have to write in a measure is that in a measure, we’d have to use the VALUES() function wrapped around the right hand side of the criteria – VALUES(Calls[ContactDate]) rather than just Calls[ContactDate]). And we’d have to “protect” those VALUES functions with an IF(COUNTROWS()) since you can’t use VALUES() as part of an “=” test, except when there is only one value returned, or maybe use MAX() instead of VALUES(). The point is that in a calc column, we DO have a current row in the table, so we don’t need the wrapper function like we do in a measure.
Yeah I sneaked that one in. It’s another calc column I added to the Rates table:
You can see its formula in the formula bar in the image above. I just wanted to put a non-blank value in whenever End was blank. You could hard-code a date in there if you wanted, or use TODAY() and a much bigger number than 365. I just figured that setting a date one year in the future from today is good. Keep in mind though that TODAY() will only be re-evaluated when you refresh your data or force a re-calc.
No, there is not. Neat huh? And really, there couldn’t be anyway – each Client ID appears more than once in each table, and PowerPivot doesn’t handle relationships on “many to many” like that. The date columns are even less “relationship friendly” since they don’t match up at all, and are based on ranges in one table and single dates in the other.
I’m trying to do this more often, especially with the forum questions.