PowerPivot Compression: Mysterious Ways

March 29, 2012

 
image

Johnny take a tour of your data, and tune…
…up your columns and your rows, to free up some room
You’ve been following some rules, with the former Vertipaq
Now you’re seeing some results, that you don’t understand
(Click image for the original music and lyrics)

Modifying Lyrics From U2’s Best Album Ever

Yes, “Achtung Baby” is U2’s best album ever.  Yes, I am deliberately trying to pick a fight.

(“Former” VertiPaq?  Why yes, Microsoft recently renamed the storage engine under the hood of PowerPivot to be xVelocity.  Yes, VertiPaq was a better name.  Yes, there were marketing reasons for the rename, to create an “umbrella brand” over many technologies.  No, this does not help any of us out here in the real world.  I wonder how long I will keep calling it VertiPaq.  Probably a long time.)

An Exception to One “Rule,” and the Reinforcement of Another

Longtime readers of this blog have seen me tout the benefits of reducing columns in your data, even if that means adding a lot more rows (see also David cover it here).

Well, I recently discovered an exception to that “rule.”  And that same data set also yielded a VERY dramatic example of the benefits of sorting your data prior to import.

Let’s cover one at a time.  But first, some perspective.

Disclaimer:  This Post is Probably Just Brain Candy

The data sets I am working with are in the tens of millions of rows and therefore don’t reflect “normal” usage for most people reading this (although I’d argue that is changing, to an extent).  What I’m seeing in this data is very unlikely to repeat itself in smaller data sets.

That said, the impact of reducing columns (even at the expense of more rows) is not something you will typically notice with small data sets.  If you only have, say, 20 thousand rows of data in a table, I would not worry too much about reducing column sets – you still shouldn’t import columns that you don’t need, but I wouldn’t go to great lengths to reshape your data set.

So file this post under “interesting brain candy that may be relevant someday” rather than “something you intrinsically need to know.”  This is 90% fun, with a slice of 10% relevance.  OK, time to dig in.

Original Data Set:  Two Wide Tables

I had a two wide tables in my data.  Yes they look VERY similar but trust me, they are different tables:

image

image

Total File Size:  561 MB

OK, Now Make Them Tall and Narrow!

I was naturally tempted to try importing these same tables as tall and narrow, by introducing an amount type column.

image

image

Total File Size: 1,207 MB

Whoa, it Doubled the File Size!

Yes, a normally-reliable trick has betrayed us here.  Why?

Well, I have not confirmed this with my former colleagues at Microsoft, but I suspect it has something to do with VertiPaq storing data in 1M-row partitions.

That’s right – if you have more than one million rows in a table, they are not all compressed and stored in a single unit.  Your data is first broken into partitions of one million rows each, and then each partition is compressed and stored as an individual partition (“chunk” in Rob-speak).

“Chunking” does reduce the effective compression rate.  In some cases we would get MUCH better overall compression if all the data were stored in a single chunk, but that comes with other tradeoffs that I don’t have room to go into here.

Why Does the One Million Row “Chunk Size” Matter?

Well first of all, in the original “wide” data set, I had 45M and 44M rows.  So, a total of 89 partitions/chunks.

In the new data set, I have 366 chunks.  So each chunk would need to be 4x smaller than the original chunk size just to result in a similar file size.

Given that the file size doubled, we can guess that each new chunk was half the size of the original chunks.

Impact of Sort Order

Let’s take that knowledge of 1M row chunks and consider the impact of sort order as you import data.

My database is sorted by the leftmost columns in the table – Date, PeriodType, Product, etc.

That means my new AmountType column is basically unsorted – look at the pictures above and you will see that even in the tiny number of rows I snapshotted, AmountType is varying whereas all of the columns to the left display the same values over and over.

That means my AmountType column is very “fragmented” in each of the 1M row chunks, and I am not getting as much savings from the shape change as I would if it were not fragmented.

So, let’s try sorting the data by AmountType before importing it!

Sorting by Amount Type Decreases File Size by 35%!

image

Note that This Was Just Table #1
(I Deleted Table #2 Before Running This Sorting Test)

Marco and Alberto were the first people to bring sort order to my attention.  In their experiment, they found a 25% savings on sorted versus unsorted data, but not much difference when they sorted by different columns.

In my case, the data was already sorted.  Changing which column I sorted by, BEFORE import, made a 35% difference.  So, your mileage may clearly vary.

Only Sort BEFORE Import Matters!

Just want to be overly clear here:  once you import the data, sorting in the PowerPivot window has NO impact on file size.  Actually, it has no impact whatsoever, except that it helps you navigate your data in the PowerPivot window.

Only sorting the data in the source, before importing into PowerPivot, has an impact.

In This Case, I Went Back to Wide Tables

All right, even with the 35% reduction from sorting, my resulting files would still be bigger than the original “wide” format.

I also ran some quick performance tests and both formats were returning my test pivot in about 4 seconds.

So in this one case, going tall/narrow didn’t help.  But that won’t discourage me from trying it again next time.  I am pretty sure this data set is an outlier – it currently only has two values in the date column for instance.

The moral of the story:  the tall/narrow trick is generally reliable.  But always test to make sure.


VLOOKUP Week: Who Needs VLOOKUP Anymore?

March 27, 2012

 
vlookupshark_550x223

VLOOKUP Week – Brought to You by Mr. Excel
(Click Image for the Official Site of VLOOKUP Week)

What is the “Scariest” Feature in Excel?

A couple of years ago at lunch, Bill remarked to me that VLOOKUP was the “scariest” feature in Excel, and that PowerPivot’s introduction of relationships was going to make Excel a lot “friendlier” to the average user.

This sparked a few minutes of friendly debate, as I had always considered pivots themselves to be the “scariest” feature in Excel.  To be “scary,” a feature must be very useful if you know how to use it, and yet 80% or so of the Excel audience doesn’t know how to use it.  VLOOKUP and Pivots both clearly meet those criteria, so it was an interesting discussion.

Should PowerPivot be Named SimplePivot?

In traditional pivots, VLOOKUP is often a required step to prepare your data before pivoting it (combining multiple tables into one).  So I remain hopeful that Bill is correct.  Wouldn’t it be ironic if PowerPivot ended up being a Simpler way to create pivots, and didn’t just dramatically increase the power of pivots, but also broadly expanded the audience that even uses pivots?

That would rock.  Is there anyone out there who can corroborate Bill’s theory?  Were you put off by VLOOKUP before, and now use pivots thanks to PowerPivot?  Let me know.

Anyway, time to do my part for VLOOKUP weeka full week of posts by Bill and the Excel community focused on that fearsome monster, VLOOKUP.

My take?  With PowerPivot, you literally do not need VLOOKUP.  Ever.

From the Archives #1:  Relationships as Alternative to VLOOKUP

image

That Looks a Lot Easier than VLOOKUP…  Because it IS

This article I wrote in CIMA Insight is probably the best intro I’ve written to relationships in PowerPivot:

http://www.cimaglobal.com/Thought-leadership/Newsletters/Insight-e-magazine/Insight-2011/Insight-December-2011/Excel-extras-revolutionary-features/

From the Archives #2:  Using =RELATED() to Inspect Your Data

This post comes from all the way back in 2009 and is an example of me using a new function, RELATED(), that does exactly the same thing as VLOOKUP, but only takes one argument:

image

Note that once you are done with your =RELATED calc columns for inspection purposes, in most cases it makes more sense to then delete those columns and just use the columns from the other table in your pivots.

Full post here.


Poll Results

March 24, 2012

Even though it is still open, the one-question poll results have stabilized.  In fact, the percentages were pretty stable even from the beginning.

image

When I first “opened” the blog more than two years ago, the readership was overwhelmingly “BI” plus a smattering of SharePoint folks.  There weren’t enough Excel folks aware of PowerPivot to really even register back then.

The growth of the Excel audience here from essentially zero to approximately 70% of the readership actually parallels, perfectly, the total growth in page views.

Makes sense.  If you are part of the Microsoft BI community, you heard the PowerPivot message early and often.  The Excel audience is MUCH harder to reach, and has had to discover PowerPivot via social/viral means.

Non-BI = Excel?

Yeah, is that a stretch?  I don’t think so, not at all.  If you are crunching numbers and you are not BI, what are you?  Excel, that’s what.  Or working/researching on behalf of an Excel audience.

What about “Other?”  Those answers were overwhelmingly Excel-focused, too.  Here’s a sample:

image

One Last Note

How did I perform the analysis?  Why, I exported to Excel of course Smile

image

…and then into PowerPivot.


What’s Your Background?

March 22, 2012

***UPDATE:  Results summary here.

Gonna do something a little different today.  Blog traffic has been spiking quite a bit and I want to confirm or dispel a theory.

So let’s keep it simple.  A one-question survey.  What’s your background?


The 3rd Most Common Button in Data Apps is…

March 20, 2012

 
A little bit of food for thought today, delivered in a pictorial/humorous style but with a serious point.

imageA Running Joke That Never Gets Old

I’m quite fond of the following one-liner that I’ve been using in presentations for awhile now:

“Export to Excel is the third most common button in data and business intelligence apps…  after OK and Cancel.”

That generally brings laughs.  It’s ironic of course, since so many business intelligence suites are marketed in part on the principle of “stamping out spreadsheets.”  But even the proudest BI suite succumbs to that ever-present force:  customer demands.

And so they find themselves, over and over, opening a door to the “dark side.”  Let’s take a tour shall we?

Crystal Reports

image

And I really like this marketing point here, about how version 9 added richer support:

image

Business Objects

image

Cognos

Cognos doesn’t mess around.  Why settle for simple export when you can write an entire addin to push data into Excel?

image

Et tu, Qlikview?

(There is no extra charge for rhymes of such quality)

image

I like that they use the XL icon.

PivotLink

I’m including them because a few members of our team came from PivotLink, and because Pivotstream was using PivotLink until, um, yeah…  until I showed up.

image

One-upping QlikView, they not only use the XL icon, they give it top billing for the entire export dropdown.

Microsoft

Hey, in fairness, Microsoft products do it, too.  Here’s Reporting Services:

image

And PerformancePoint:

image

Even Excel Services itself offers “Download Snapshot” which is really just Export to Excel…  from Excel!

image

Why So Prevalent?

Why does everything have this button?  Why is export to Excel such a “must have,” even for tools that claim to “break” an organization’s dependence on Excel?

  1. Excel is widely known.  Everyone knows it, or at least when compared to any other BI tool.  But everyone knowing it isn’t quite enough to explain why it is so important to take data out of other tools and land it in Excel.  It has to provide capabilities lacking in the other tools, which brings us to reason 2…
  2. Excel is more flexible than any of these tools.  Excel isn’t really a tool so much as it is a cleverly disguised programming language.  Its capabilities are insanely broad, and those capabilities are all “composeable” with one another – it is not a collection of distinct features, but of building blocks that can be combined in myriad ways.
  3. Excel is results-oriented and agile.  Even when the other tools CAN do something, and the user DOES know how to achieve that, they still often export to Excel because it’s just a lot faster to get the result they want.

My Conclusions

First, this reinforces for me is that Excel remains the common denominator in business analysis and reporting.  If that sounds controversial, try building and selling a BI or reporting tool that lacks export to Excel, and let me know how that goes.  The companies above all produce good products, and they are smart companies.  They implicitly acknowledge what I am saying  (with their feature decisions) without explicitly endorsing it (there’s no upside in them doing so, for sure).

Export to Excel also implies an undeniable “drift” away from “one version of the truth.”  When I export from a BI tool to Excel, and perform some calculations “in the margin” of the exported data, that is a forked, one-time analysis.  It is not captured in any public data model.  It lives on my desktop.  It represents a “drift” away from the rightly-coveted single version of the truth.

PowerPivot, of course, is not immune to that problem of drift.  As I showed above, even server-based Excel has the ability to export snapshots.

But there is an opportunity here, with PowerPivot, that the other tools will always lack.  With PowerPivot, an analyst with proper permissions can build their “in the margin” calcs directly into the PowerPivot workbook itself.  All the flexibility and familiarity of Excel can be harnessed, but then shared and maintained centrally on the server.

Just as with the other tools, it is undeniably better to build that incremental business logic into the PowerPivot model itself (in the form of new measures, tables, etc.).  But sometimes that just isn’t practical.  Often that is true, in fact.  If it wasn’t true, these other BI tools wouldn’t have the export button!

And once we acknowledge that truth, that “last mile” calculations and similar extensions are common and inevitable, a tool that embrace them directly holds an advantage.  I find that pleasantly and rebelliously ironic, that an Excel-based BI tool can hold an intrinsic advantage in “one version of the truth,” which a few years from now we will refer to as “a former weakness of Excel-driven BI.”


Friday Bonus: My Son “Invents” the Spreadsheet

March 16, 2012

 
image

Row-Wise AND Column-Wise Calcs!
(Click for Bigger Version)

My 9-Year Old Son Has Never Touched Excel…

No, I’ve never sat my son down in front of Excel.  Does that reflect good parenting or poor parenting?  I can see both sides of that one.

But it doesn’t matter.  Life finds a way doesn’t it?

image

He put this together to “optimize” his army in his Lego Battles game.  Each row is a series of yes/no properties of each character – does it have each ability or not – and then sums up the count of “yes’s” as a score for each character.  And then decides whether that score is good enough to include in his army based on ranking that score against all other scores:

image

When I asked him what he was doing, his response was basically “these are my numbers dad, I make them go up.”

Fair and Balanced

For obvious reasons, I can’t do a post about my son without also featuring my daughter.  Whereas he got a picture of his spreadsheet, she gets a picture of her:

image

Like her dad, she is a huge movie quoter.  She can recite you just about any line from any Star Wars movie.  She is in charge of the daily Star Wars “simulation” at recess on the school playground.  I am very lucky to have these particular young people in my life.


Another Angle to “The Three Seconds of Now”

March 15, 2012

 
image

Yesterday, Tom LaRock posted an insightful followup to my post on Moore’s Law and Inflation.

He’s a lot better at brevity than I am, so I encourage you to read it.  I really liked this part:

“…he took my idea and was able to quickly gather data, perform analysis, postulate his own theory, and return the entire results back in a very short amount of time.

This is the future for data professionals. It isn’t about racking servers. It’s about getting people access to the data they want, in a meaningful way, and quickly.”

Another Secret to PowerPivot’s Success

In my full post today, I described how the “three seconds of now” should guide your construction of reports.

But the things Tom said reminded me of something else, too:

When you are building something in PowerPivot, the line between “report” and “model” is perfectly blurred.  You transition between them seamlessly, very much in the moment of now.  You try something out on a report, see the results, and realize you need a new measure all within a 3-second window.

Because everything is (generally) so fast and seamless, you never leave that moment.  “Now” consumes your entire work session.  And because that suits our biology to a tee, the results are phenomenal.

“PowerPivot:  in tune with the rhythms of the soul.”  Too much?  Probably.  Anyway I am late for my drum circle.


Analysis in the Three Seconds of Now

March 15, 2012

 
NOW, by Alison Farmer

“NOW”, by Alison Farmer
(Click to visit her site)

Our Perception of “Now” is Three Seconds in Length

For several years now this has been one of my absolute favorite paintings.  I liked it long before I knew the meaning, but when it was explained to me I was doubly hooked:  the string that she is holding is divided into three equal lengths:  three seconds.  The moment of “now.”

Apparently there is a lot of research indicating that three seconds is the “magic” window in MANY human and animal behaviors, from hugs to goodbye waves to babies’ babbling. 

Go read this quick article in Science.  It’s short, fun, and interesting.  I will wait.

Where I Used to See It

Why is Tommy Chong Sitting in Donald's Office???I used to see this painting nearly every day, at Microsoft, in Donald Farmer’s office.  Yes, that Donald Farmer.  The one who works as a stunt double for Tommy Chong (or vice versa).  The genetic donor for PowerPivot Yoda.  And he is also the one who last year left Microsoft for QlikView.

Since he is married to the artist, he gets to use his office as a private gallery of sorts.  And he was kind enough to send me some pictures of the painting last night when I asked.

 

Why This is Relevant

Someone at Microsoft recently emailed me to ask my opinion: 

“For large PowerPivot workbooks, how long do you think users will expect to wait when they click a slicer?”

My answer was:

“They don’t care that there is a lot of data behind it.  If it isn’t fast, they won’t engage.  The limits of human patience are not the least bit sympathetic to our data volume problems.”

In other words, people expect the click to finish “now,” in three seconds or less, no matter what.  They don’t even think about it – this is biological.

Two Things Happen When Something Takes Too Long…

imageIf a slicer click or related interaction takes too long, two things happen:

1. The user’s train of thought is broken while waiting on the click to complete.  Their mind wanders off topic and they often flip back over to email while they wait.  They sometimes forget to come back. 

They do not “commit” to the experience, do not get absorbed, and generally decide to remain “shallow” in their thoughts toward it.

2. If they grow to expect “long” wait times, they will ultimately decide not to click at all.  If they know that conducting a slicer click exploration of the data is going to take 15 seconds a click, and they may have to execute 10 clicks over the course of their exploration, they simply decide not to do it at all. 

Yeah, in they had invested that 2.5 minutes, they may have discovered something amazing or revolutionary in the data.  Tough.  Humans aren’t built for that.  They want their three seconds, even more than they want their MTV.

Let that sink in for a moment…

The speed of the report heavily impacts the quality of the thinking someone will do when using it. 

It also will impact whether they use the report at all.

It does NOT just impact how long it takes someone to get something done with your report.

Moral of the Story:  Make Your Reports Fast

If you are building interactivity into your reports for others to consume, don’t skimp on the speed.  If something is slow, here are a few simple tricks to consider:
 

  1. Slicer cross filtering is very often the #1 source of slowdown.
  2. Make your bigger tables tall and narrow.
  3. Don’t use calc columns in your big tables if you can avoid it (do the calc columns in the db if you have one)

It may also be time for better hardware.  We certainly spend a lot of time on hardware selection at Pivotstream.  I like to say that every unit of time or money invested in speed pays off.  I never stop pressing for speed.

Gotta keep things in the moment of now.


Moore’s Law and Inflation

March 13, 2012

 
SQL Rockstar, aka Tom LaRock (Blog | Twitter) sent me a fascinating data set the other day:  a table of different computing devices over the years, their “horsepower” in calculations per second, and how much they cost:

image 

Source:  The Rise of the Machines

The Cost of a Million Calcs per Second, Over History

Tom remarked that some modern devices, like his iPad2, have more computing power per dollar than even some of the recent supercomputers.

To validate this, I added a calc column called “Cost per MCalc”

  =Computers[Cost]/(Computers[Comp per Second]/1000000)

And indeed, his iPad2 is cheaper per million calcs than, say, 2002’s Earth Simulator:

image

By a lot, too.  Like 40x cheaper per MCalc.

But then he had an astute follow-on question:  how would that change if we took inflation into account?  That’s when he tagged me in.

Enter Shadowstats!

Tom’s idea finally gave me an excuse to subscribe to Shadowstats.com.  About six months ago I even emailed them and asked them whether they had considered setting themselves up on Azure Datamarket.  (Their answer:  not yet.  My answer:  I’ll be back to convince you later.)

Shadowstats provides historical data on things like inflation and employment, and provides it in a convenient format (which is in itself quite valuable – have you ever tried to make sense of the data from .gov sites?  It’s a labyrinthine mess.)

image

CPI Data From Shadowstats.com, Pasted
Into PowerPivot as a New Table

Wow, 11% inflation in the US in 1974, my birth year?  Wow.  That’s intense.  And it piles up quickly when you have a few years in a row of high inflation.

Cumulative Impact of Inflation

To measure cumulative impact, I added a new column:

image

What that shows us is this:  Prices were 65% higher at the end of 1977 than they were at the beginning of 1970.  A 65% increase in the cost of living in just eight years.

Let’s chart it:

image

Inflation:  In 2011 It Took $6 to Buy What $1 Bought in 1970
(Official US Govt Numbers)

Factoring Inflation Into Price per MCalc:  2011 Adjustment Factor

image

Tom wanted to convert everything into 2011 dollars, which makes sense.  In order to do that, I created two measures.  Cumulative Inflation Official is just the Average of the same column in the CPI table of my source data, and the Adj Factor is:

=CALCULATE([Cumulative Inflation Official], Years[Year]=2011)
    / [Cumulative Inflation Official]

In other words “take the Cumulative Inflation value for 2011 and divide it by the Cumulative Inflation value for the current year.”

Now I can use that factor to give me a 2011-Adjusted cost per MCalc:

image

Where that second measure is just:

=[$ per M Calc]*[Adj Factor for 2011 Dollars Official]

Which shows us that the iPad2 is an even better deal (in terms of MCalc’s), compared to the Earth Simulator, than we had originally thought – more like 50x cheaper as opposed to our original 40x cheaper.

What’s with the Playstation3?

It rounds to $0.00 per MCalc even when adjusted to 2011 dollars?  I know game systems are sold at a loss, but really?  Well, the Playstation IS the only gaming/graphics system in the data, and those are of course dedicated renderers of polygons, and NOT general-purpose calcs.  So the number is indeed a lot higher – gaming systems just offer an insane number of (polygon) calcs per second, which is why there is so much interest these days in using GPU’s for business calc purposes – if you can “transform” a biz problem into a polygon problem, you’re off to the races.

Tying it All Together

Here you go, the “final” results:

image

Cost of a Million Calcs per Second, Adjusted to 2011 Dollars

Why the heck does it go back UP????

Isn’t that interesting?  We see a steady and sharp decline in price per MCalc from the 1970’s all the way into 2006, but then prices start to RISE again?

The data IS skewed a bit by the fact that we only are looking at 26 computers.  So if we happened to have a supercomputer in the data in 2009 but no PC or server, that can throw us off.

Let’s take a look by type of computer, then:

image

All Three Computer Types Show a Price per MCalc INCREASE in the late 2000’s
(Click for Larger Version)

OK, the shapes are a bit different, and the 2006 “plunge” in price for PC/Gaming/Handheld IS indeed due to 2006 being the year where the PS3 shows up…  but all three charts DO show some form of recent price increase per M Calcs.

So what’s going on?  Our Quest for Power Meets Physics.

I’m no expert but I’ve read enough over time to have a decent idea what’s causing that rise.

For the most part over computing history, our quest hasn’t really been for “cheap.”  It has been a quest for power.  My Dell PC in 1992 was about $3,000 and offered 33 M Calcs.  Five years later I bought a $2,500 machine that offered about 300 M Calcs.  So in five years the price fell a little but the power grew a lot.image

In theory, at that point I could have bought the 1992 level of power (33 MCalcs) for $300 or so.  But that’s not what I did.  I bought a more powerful machine rather than a cheaper machine.  My roommates would have made fun of me.  Conor lorded his fancy 3d graphics card over me every time we played Quake.  I needed 300 MCalcs!

Moore’s Law is based on our ability to continually cram ever-more transistors onto a single chip.  And as we’ve gotten closer and closer to the size of atoms, we’ve hit a bit of a limit in that regard.  Moore’s Law is “stalling.”

When Moore’s Law stalls, do we stop chasing power?  Nope, we just go in a different direction.  We start going multi-core.  Multi-CPU.  And while that DOES deliver more MCalcs, at the moment, it’s a more expensive way of doing it than the old way of “keep shrinking the transistors.”

Next Up…

One of the REALLY cool things about PowerPivot is its “mashup” capability.  I’ve shown it over and over.  But now that I have a good source of inflation statistics, I can dollar-correct ANYTHING.

image

I Can Now Mash This CPI Model Into ANY Other Data Source
That is Associated with a Year or Date!

I’ll have another wrinkle to share about inflation later this week, and the hint is the difference between “official” and “real” inflation – you may even see some measures in the field list above.


The Correct Usage of EARLIER()

March 8, 2012

 
image

Egg on My Face For Improper Use of EARLIER()

Tonight I was looking at one of my recent posts, the one about “fuzzy” time relationships in calculated columns, and I discovered that I had used the EARLIER() function in places that I did not need to.

OK, well, it’s not a LOT of egg.  I mean, my formulas were all returning the correct answers.  They were just needlessly complex.

Here was the calculated column formula in question:

=CALCULATE(MAX(Events[EventType]),
   FILTER(Events,
          Events[RatID]=EARLIER(Sniff[RatID]) &&
          Events[SessionID]=EARLIER(Sniff[SessionID]) &&
          Events[PostTimeID]>=EARLIER(Sniff[TimeID]) &&
          Events[PreTimeID]<=EARLIER(Sniff[TimeID])
         )
)

Every one of those highlighted EARLIER() functions is 100% unneeded.

If I remove all of the EARLIER()’s, the formula returns the same result.

=CALCULATE(MAX(Events[EventType]),
   FILTER(Events,
          Events[RatID]=Sniff[RatID] &&
          Events[SessionID]=Sniff[SessionID] &&
          Events[PostTimeID]>=Sniff[TimeID] &&
          Events[PreTimeID]<=Sniff[TimeID]
         )
)

And here’s the proof that the EARLIER()’s were NOT needed:

image

Old Formula, Fixed Formula,
and Comparison Column That Proves They Are Equivalent

Why Weren’t They Needed?

Let’s look at the FILTER() in my formula, and focus on just one of the comparisons:

FILTER(Events,
   Events[RatID]=Sniff[RatID]
   …

I am filtering the Events table to rows where the RatID is the same RatID in the Sniff table.

And this calculated column is in the Sniff table.  And there is not relationship between the Events and Sniff tables.

That FILTER() is quite straightforward then – it looks at the RatID in the current row of the Sniff table (since this is a calc column in the Sniff table) and then goes and finds rows in the Events table that have the same ID.

EARLIER() is only needed when you are “jumping back out” of something.  And there isn’t anything to jump back out of here.  Let’s revisit an example of where EARLIER() IS needed:

Flashback:   Simple Use of the EARLIER Function

This is an excerpt from a post last month, “reprinted” here for convenience.

Say I have the following VERY simple table like this:

image

And I want to add a third column that is the total for each customer:

image

The calc column formula for that third column is this:

=CALCULATE(SUM([Amt]),
           FILTER(ALL(‘Table’),
             ‘Table’[Customer]=EARLIER(‘Table’[Customer])

                 )
          )

Note the highlighted part:  in that formula we are filtering on ALL(Table) rather than just the “raw” Table.  Here’s the crux:

When I say FILTER(ALL(Table)), all of my references to columns in Table will have “forgotten” all notion of “current row” and will instead be references to the entire column.  That is because of the ALL().

So the EARLIER() function is my escape hatch that allows me to go back and inspect the current row’s value.

This line of the formula:

  ‘Table’[Customer]=EARLIER(‘Table’[Customer])

Can be understood as:

  ‘Table’[Customer]=CurrentRow(‘Table’[Customer])

EARLIER is Probably Best Understood as CURRENTROW

In fact that’s a better name for EARLIER 99% of the time.  Just think of it as a CURRENTROW function, useful only in calculated columns, and only when you are performing FILTERS on ALL(Table), when you need to “jump back out” of the ALL and fetch a value from the current row.

Yes, it IS useful in other cases.  But I suspect that those other cases are rare enough that rather than a general purpose function like EARLIER, we should have been given a dedicated, easy to understand version named CURRENTROW.  Or maybe we should have both.  Yeah, both.  That would be good.

Why My Formula Didn’t Need EARLIER

Quite simply, I had no ALL() to “undo.” 

FILTER(Events,
   Events
[RatID]=EARLIER(Sniff[RatID])
   …

I was just filtering on the Events table.  Not ALL(Events). 

And even if I had been filtering on ALL(Events), that wouldn’t have impacted the Sniff table, WHICH IS WHERE THIS CALC COLUMN LIVES.  So even ALL(Events) would NOT have required me to use EARLIER(Sniff[RatID]) to get the current row from Sniff – a simple Sniff[RatID] is sufficient.

Whew.  Glad I caught this before anyone noticed. 

But I suspect some people did, and were just polite Smile


PowerPivot V2 is Released!

March 8, 2012

 
Well the wait is over.  It is now released, for real.  No longer a beta or RC.  The real deal.

Download it here.

Other important links:

  1. My first glimpse at new features here.
  2. Important warnings about compatibility with V1.

Auto-Modify Every Pivot in a Workbook

March 6, 2012

 
image

It’s *ALMOST* That Easy Smile

It’s been awhile since I’ve talked about Macros (also known as VBA).  I think it’s overdue.

Macros are nothing short of amazing.  We couldn’t live without them at Pivotstream.

Wait Rob, Macros Don’t Run on the Server!

That’s right, they don’t.  And everything we do at Pivotstream eventually lands on the server.  So why do we use macros?

Simple:  we use macros to more efficiently create and modify our workbooks.  Macros are a “design time” tool for us, not a “run time” tool.

And they have saved us probably decades of work.  I’m not exaggerating.

An Example:  The “Change Every Pivot” Macro

Here’s a pretty simple macro that runs through every PivotTable on every visible worksheet and does whatever you want to each pivot:

Sub ModifyAllPivots()
    Dim Pivot As PivotTable
    Dim Sheet As Worksheet
   
    For Each Sheet In ActiveWorkbook.Worksheets
        If Sheet.Visible = xlSheetVisible Then
            For Each Pivot In Sheet.PivotTables
                AutoPadPivot Sheet.Name, Pivot.Name, 10
                GrandTotalsBottomOnly Sheet.Name, Pivot.Name

            Next
        End If
    Next

End Sub

I emphasized the “payload” of the macro – for each pivot the macro finds, it “pads” the columns of the pivot to be wide enough, and sets the pivot to display grand totals only on the bottom of the pivot (and never on the right).

Those two lines are macros that I also wrote, and I will include them below.

Auto Pad Pivot Columns Macro

At Pivotstream, we tend to ALWAYS turn off the “auto-fit columns on update” setting on our pivots:

Making Pivots NOT Change Column Widths On Slicer Clicks Etc

Making Pivots NOT Change Column Widths On Slicer Clicks Etc

This yields a much more pleasant “application-like” result – clicking a slicer never results in things jumping around.  I highly recommend it.

But this DOES lead to a problem.  If your numbers suddenly grow by a digit in the future, you can get something like this:

One of the Drawbacks of Turning off Autofit Column Width

One of the Drawbacks of Turning off Autofit Column Width

To account for this, we have a macro that runs through every pivot in the workbook and “pads” the column width by a percentage:

Sub AutoPadPivot(sSheet As String, sPivot As String, iPct As Integer)
    Dim oPivot As PivotTable
    Dim oSheet As Worksheet
    Dim r As Range
    Dim rCurr As Range
    Dim iCol As Integer
  
    Set oSheet = ActiveWorkbook.Worksheets(sSheet)
    Set oPivot = oSheet.PivotTables(sPivot)
   
    Set r = oPivot.DataBodyRange.Rows(1)
    For Each rCurr In r.Cells
        iCol = rCurr.Column
       
        oSheet.Columns(iCol).EntireColumn.AutoFit
        oSheet.Columns(iCol).ColumnWidth = oSheet.Columns(iCol).ColumnWidth * (1 + (iPct / 100))
    Next
   
   
End Sub

At the beginning of this post, you saw an example where I called this macro to pad each column by 10 percent.  But I can pad by 5, 15, whatever I want.  And you can easily imagine a version that pads by an absolute amount rather than a percentage.

Grand Totals Bottom Only Macro

This one is more self-explanatory:

Sub GrandTotalsBottomOnly(sSheet As String, sPivot As String)
    Dim oPivot As PivotTable
    Dim oSheet As Worksheet
   
    Set oSheet = ActiveWorkbook.Worksheets(sSheet)
    Set oPivot = oSheet.PivotTables(sPivot)
   
    oSheet.PivotTables(sPivot).RowGrand = False
End Sub

Want one more?  OK.  You talked me into it.  We have dozens, many of which are quite ambitious.  Simpler macros make for better blog posts though, so…

Create Page Filter For Each Slicer On Pivot Macro

Sub CreatePageFilterForEachSlicerOnPivot(sSheet As String, sPivot As String)
    Dim oSlicer As Slicer
    Dim oSlicerCache As SlicerCache
    Dim sField As String
    Dim oPivot As PivotTable
    Dim oSheet As Worksheet
   
    Set oSheet = ActiveWorkbook.Sheets(sSheet)
    Set oPivot = oSheet.PivotTables(sPivot)
   
    For Each oSlicer In oPivot.Slicers
        sField = oSlicer.SlicerCache.SourceName
        oPivot.CubeFields(sField).Orientation = xlPageField
    Next
   
End Sub

Why would I want to create a page filter for very slicer on the pivot?  Well, primarily so you can “harvest” the slicer selections in formulas.  But there are other reasons you might do this as well, which I will likely cover in the future.

Learn to Record Macros Folks!

What’s that, you say?  You never need to pad pivots, switch their grand total settings, or add page filters for every slicer?  Not satisfied eh?  Well, you can make your own!

If you have never recorded a macro, seriously, it’s SOOOO easy.  You should try it. 

First you will need to enable the Developer ribbon by going to File|Options|Customize the Ribbon.

Once you’ve done that, here’s how you get started:

Going Into Macro Recording Mode

Going Into Macro Recording Mode
(Note That I Named the Macro Based On What I Am Going to Do Next)

Now I do something to the pivot.  In this case, I switch my Pivot to one of the “Medium Green” Styles:

image

Changing My Pivot to a Different Style
(While the Macro Recorder Watches My Every Move)

Now I can stop recording:

Stop Recording the Macro

Stop Recording the Macro

Inspecting Your Freshly-Recorded Macro

Inspecting Your Freshly-Recorded Macro

Just click the Macros button, select your macro, and then click Edit

And here it is:

Sub ChangeToGreenStyle()

‘ ChangeToGreenStyle Macro


    ActiveSheet.PivotTables(“PivotTable3″).TableStyle2 = “PivotStyleMedium4″
End Sub

There’s really only one line in the macro that DOES anything, so I highlighted it.

Note that it’s “tied” to a pivot named “PivotTable3.”  Not all of your pivots will be named that of course.  And it only works on the ACTIVE sheet.  So it won’t work if you try to loop through using the ModifyAllPivots macro.

So, you can modify it to look like the other macros I showed above (GrandTotalsBottomOnly, etc).  Change the macro to be:

Sub ChangeToGreenStyle (sSheet As String, sPivot As String)

   Dim oPivot As PivotTable
   Dim oSheet As Worksheet

   Set oSheet = ActiveWorkbook.Worksheets(sSheet)
   Set oPivot = oSheet.PivotTables(sPivot)


   oSheet.PivotTables(sPivot).TableStyle2 = “PivotStyleMedium4″

End Sub

The stuff in grey is just copied from the other macros.  The red is a replacement (also copied from the other macros) for the part the recorder set to Active Sheet and PivotTable3.

That macro can now be called from within ModifyEveryPivot just like the others were:


      For Each Pivot In Sheet.PivotTables
         AutoPadPivot Sheet.Name, Pivot.Name, 10
         GrandTotalsBottomOnly Sheet.Name, Pivot.Name

         ChangeToGreenStyle Sheet.Name, Pivot.Name

      Next

Off you go Smile