Surprising Example of PowerPivot Compression

Tony and Mike

“Best not analyze too deeply on this one, huh?”

A long time ago I promised a guy named Sam that I would dig up some examples of PowerPivot compression.  I then, of course, forgot all about it until today.  Sorry Sam.

But today I was playing with a data set on my desktop machine that was really getting me down.  The data was provided as a (regular, non-PowerPivot) Excel file with 500K rows in it.  And my machine labored to do anything at all with that data – opening it took forever.  Deleting a single row took forever.  And so on.

But this was the same machine I had used to do a PowerPivot demo with a 100M row data set, which ran with no problem!  0.5% of the data was bogging me down in Excel!

So I decided to do a “before and after” comparison.

Hypothesis

Based on my knowledge of PowerPivot compression, I expected that PowerPivot would do a little bit better than Excel on disk, and a lot better in RAM.

Didn’t quite turn out that way :)

Results

       PowerPivot Compression Relative to Excel

 

Data Stored in Excel Sheet

Data Stored in PowerPivot

File Size on Disk

126 MB

10.2 MB

RAM consumed by Excel.exe

1200 MB

140 MB

Wow.  8x better in RAM, 12x better on disk!  (I was SO tempted to use a pie chart.  Just kidding).

Good thing I ran the experiment twice!

First time I did this, the PowerPivot RAM number was 600 MB.  Still 2x better than Excel, but 8x is much better :)

Not quite sure what was going on with that 600 MB number.  I’d had Excel open for awhile and probably hadn’t closed Excel.exe completely after closing the native Excel file.  Probably that was it.

OK, don’t analyze!  Rejoice!

Bottom line:  12x better on disk and 8x better in RAM, no matter what, is tremendous.  Not to mention the performance of working with the data and how much better that is.

And PowerPivot’s data compression ratio only gets better with larger data sets.  500k rows is big, yes, but remember, this same computer handled 100M, which I obviously can’t even test out in Excel for comparison.

Now that the data is loaded into PowerPivot, I don’t even have to use my desktop machine to work with it.  I’m switching over to my laptop.  The laptop is even running 32-bit whereas the desktop is 64.  But hey, the laptop fits on my new treadmill desk :)

And my waistline could use some compression.

4 Responses to Surprising Example of PowerPivot Compression

  1. sam says:

    Rob,
    Thanks for this post.

    I created 2 files – One with a normal pivot on an External data source(SQL Server) and One with with Power pivot on the same data source.

    Size of Data Rows 122095 Columns 4
    Size on Disk
    Normal pivot 889kb
    Power Pivot 1.10 MB

    Size in RAM
    Normal Pivot 61.4 M
    Power Pivot 63.4 M

    I then decided to try out with larger data

    Rows,2441760 Columns 4

    Size on Disk
    Power Pivot – 3.1 MB
    Normal Pivot 12.0 MB !

    Size in RAM
    Power Pivot 68.06
    Normal Pivot 60.06

    • Hi Sam :) Yeah, each data set is different. The data set I posted here compressed really well in PowerPivot and struggled in normal Excel sheets. (You are not using sheets in your examples, just straight pivotcachces, but I do think data set has a lot to do with it).

      Both of us might get a more accurate picture of PowerPivot db size in RAM (as opposed to addin/engine overhead) by looking at the temp folders created for the db’s.

      On my home machine, these are at C:UsersRobAppDataLocalTemp and are the folders that begin with “VertiPaq.” For your build the name might be IMBI or something else.

      The size of that folder is VERY close to the size of RAM consumed by the DB.

      That said, RAM consumed is RAM consumed. Doesn’t matter much whether it’s addin/engine or db, so a direct comparison of RAM consumed by Excel.exe is a fair thing to do. Using the temp folder size just helps us get a better estimate of how db RAM consumption goes up relative to overhead.

      Another thing to note that matters to a lot of folks: in order to be able to add calc columns using an Excel-style experience (read: not SQL Server), if I am using normal pivots, I need to import the data into the Excel sheet. Which duplicates the data (once in the sheet, once in the pivot cache). So that would change your file/RAM size numbers for normal pivots, maybe by quite a bit. As we’ve discussed, though, you do that stuff on the SQL side, so you don’t incur that penalty.

      One last thing: I found out the hard way that it’s important to completely close Excel.exe between RAM tests. Sometimes after you close a file (normal OR PowerPivot), some of its RAM consumption appears to linger.

  2. sam says:

    Hi Rob…What surprised me was with Small data (First Comparison) the Normal pivot on External data did a better job… but as data size increased Power pivot file became much smaller on the disk…I was not able to replicate your results regarding the ram…will do more testing tomorrow..

  3. [...] are some PowerPivot samples to download.  Here, too.  There’s some stuff in CodePlex as well.  And, here is a surprising example of PowerPivot [...]

Leave a Comment or Question