New Server!

August 24, 2011

Hi everyone.  We spent the past few days moving this blog to a new host so that I have more toys to play with. 

Ideally you will NOT notice a difference, but if you do, PLEASE let me know.  I’m rob.  At a place called pivotstream.  You know, dot com.

If you are receiving this post via RSS or Email, then you have NO ACTION to take.  If you subscribed since about November 2010, that subscription should carry over just fine.

But some of you have RSS or Email subscriptions from a long time ago, and sadly, those will need to be re-established.  I will notify those folks separately.


Rackspace Webinar from June

August 18, 2011

A few people have asked me if the SharePoint Saturday presentation will be recorded this weekend.  I am 99% sure the answer is no.

But if you’d like to see something similar, the webinar I did with Rackspace back in June covers many of the same points:

http://www.rackspace.com/knowledge_center/pivotstream_webinar

I’ll warn you though that we hit two distinct technical issues in the presentation – one small SharePoint glitch which was not a big deal, and a problem with the webinar software itself that basically wiped out 4-6 slides and frustrated me to no end.

Up until that point though, it was going great Smile


SharePoint Saturday Columbus this Weekend

August 17, 2011

Quick note everyone:  I will be presenting at SharePoint Saturday in Columbus Ohio, THIS Saturday, August 20th.

Presentation titled:  “PowerPivot: SharePoint 2010’s Secret Weapon” at 1:55 PM.

Event Directions Et Cetera:  http://sharepointsaturday.org/columbus/Pages/about.aspx 

And yes, I have some real posts coming.  It’s been VERY busy lately, in a good way.  One of my biggest problems is “paralysis” on what to write about first Smile


Value Proposition of PowerPivot (in CIMA Insight)

August 3, 2011

 
A few days ago I mentioned I was writing a series of posts for CIMA Insight, which is the monthly web magazine of the Chartered Institute of Management Accountants – an audience that knows Excel quite well but probably has yet to discover PowerPivot.

Given that this series is starting from scratch, with an introductory post, it is mostly “old news” for readers here.  But there are perhaps some high level explanatory points that are new.  For instance, this diagram I sketched to explain the traditional tradeoffs between spreadsheets and formal BI: – a decision that results in impractically high costs sooner or later:

Spreadsheets vs Formal BI

PowerPivot, of course, provides a curve that shares the low startup costs of spreadsheets AND the long-term maintainability and robustness of formal BI.  Literally, PowerPivot is the end of that “damned if you do, damned if you don’t” situation.

To read the whole (brief) part one of the series, please click here.

As always, I’m interested in your comments.


Less Columns, More Rows = More Speed!

August 1, 2011

 

“Stay close!  Come together!  Staggered columns!  Staggered columns!

I mean…  less columns!  More rows!”

-General Maximus Speedicus Queryus

 
 

A Long-Held Belief, Quantified

For a long time now I have been advocating that everyone minimize the number of columns in their large tables.  For certain, you want to not import columns unless they are going to be used.

But I also recommend that if you have a lot of numerical columns in a table, it’s often better to replace them all with a SINGLE numerical column, and add a second column for “type.”  Even though that results in many more rows, the theory is that PowerPivot’s VertiPaq engine would  return quicker pivot results against the “tall, narrow” table than it would against the “short, wide” table.

I’ve simply “known” that to be true, in my bones, for a long time.  I was so certain of it, in fact, that I have never bothered to test a “before and after” case until now.

We had an opportunity this weekend to quantify the “before” and “after” impact of making such a change, so I thought this time I’d record the results and share them.

The Original Data

PowerPivot Wide Table Lots of Columns

With that data shape, the [Sales] measure needs to be the sum of a specific column:

PowerPivot Measure Wide Table

And then, a very large pivot reporting against this data ran in 5.7 seconds.

OK, so there’s our baseline.

The Taller, Narrower Table

PowerPivot Tall Table Less Columns

Whereas the original table used a single row to store all 9 numerical values, this new table stores each numerical value in its own row, and tags each row using the Value Type column to identify which numerical value it is.

(Note that in this new shape, zero values can be completely omitted if you’d like.  And we did, which is why there are not 9 * 700k rows in the resulting narrow table – we just dropped rows that were going to be zeroes.)

I then added a very small table that “decodes” the ValueTypes into something I can understand, and related it to the tall table:

PowerPivot Tall Table Lookup Table

Lookup Table – Helps Write More Readable Measures

Each row in this lookup table corresponds to one of the original numerical columns in the Wide table, of course.

I then wrote a simple measure for [Amount]:

PowerPivot Amount Measure Tall Table

Base Measure – Just a Raw Sum of the Value Column,
Serves as a Basis for Other Measures

And then the Sales measure can simply be a filtered version of Amount:

PowerPivot Measure Tall Table

[$ Sales] = The [Amount] Measure Filtered to “Ttl_Dollars” Type Rows

The Results

I have to admit, I was a bit nervous, because I had colleagues watching me, they knew I was making a precise comparison and was going to share the results.  Would this turn out as expected?  Would it be faster?  I experienced a moment of doubt as I was about to click the mouse and time the query.  I mean, we were looking at nearly 7x as many rows in the new set.  I was starting to sandbag my own expectations, telling myself it would likely be the same.  I should not have doubted.

It ran in 2.4 seconds, more than twice as fast as the original 5.7 seconds.

Boom.  It was worth it, big time.  Even more impressive since there is definitely some fixed-cost overhead involved in Excel sending the query, populating the grid with thousands of rows of results, etc.  There may even be a full second of overhead involved, in which case raw query time went from 4.7 to 1.4 seconds.

Why Faster?  Will This ALWAYS Work?

Remember that VertiPaq is a COLUMN storage engine, and it compresses the data VERY efficiently.  (I’ve covered this topic briefly here as well).

With that in mind, here are two quick answers.  The long answers would be too long for this post Smile

1) By removing a lot of the “noise” presented by the 9 numerical columns, this change enabled VertiPaq to do a much better job compressing the columns that remained.  And that compression translates into better query speed as well.

2) No, I do NOT expect that this trick will always speed things up.  It all comes down to “cardinality,” or how many unique values per column.  And I’m pretty sure that the three columns dedicated to “Units” in the original data set had lower cardinality than the SKU column for instance.  If we hadn’t had those Units columns, I suspect this change would NOT have been faster.

As a side note, the file size of tall and narrow was NOT smaller than short and wide.  I’ve mentioned many times that file size is a ROUGH guide to performance but not 100%.  Maybe we’ll discuss this in the comments section, because this post has run too long already.

No, I don’t think this will always work.