Six Months With PowerPivot, part one

simply_red1

   
“If you don’t know me by now…  you will never never never know me…”

In honor of PowerPivot v1’s public release, I thought I’d offer up my thoughts on my last six months working with the product. 

Remember, I physically left Redmond in August and then officially left Microsoft in February, but have been putting PowerPivot through its paces that whole time, with a wide variety of data sets and business scenarios.

So in some sense, you can’t really get a closer view of PowerPivot than I have, while still maintaining any sense of neutrality about the product.  So here it is, the honest truth…

It passed the Great Football Project Challenge

When I started the Great Football Project back in October, I really did not know what to expect.  Honestly, I wasn’t sure that it was a great idea – was I going to get 5 days in and discover that PowerPivot simply wasn’t suited to that problem?  There was genuine potential for embarrassment.  But I needed something to blog about, and I was anxious to get started, so I just dove in.

After a few weeks, I was still holding my breath a bit.  I was past the basics but hadn’t really pushed the envelope at all.

At some point though I just stopped wondering.  It wasn’t until recently that I looked back and realized that I was building features, like the Rank measure, that were never in the original project at all. 

I can say now with confidence that PowerPivot can absolutely handle the business logic phase of the football project, which the consultant back in 2006 described as one of the most complex cubes he’d ever seen.

I call that a success.  Note how I specified the “business logic phase?”  That brings me to the next topic…

It is NOT a data cleaner/shaper

OK, this is perhaps obvious since the PowerPivot tables cannot be edited, and there are no features/functions that let you reshape rows.  Yes, you can add calc columns and that often does the trick, but there are other cases (like for instance, removing dupe rows) that PowerPivot can’t do at all.

The whole “shape it in normal Excel and then paste it over to PowerPivot window” workaround is ok for one-off work, or for tables that you don’t expect to change very often.  But like many of you, I am striving to gain the benefits of automatic nightly data refresh that the server provides, and the Excel workaround doesn’t translate to that system at all.

So, even more than ever, you need a clean and properly-shaped data source to start with.  So the first half of the work that the consultant did for the football project in 2006, where the text files were imported into SQL and turned into a decent schema, is still very much required.

For a production system, I don’t think this is a bad thing at all.  It forces you to cooperate with IT (or whoever owns your databases) to give you what you need.  And it forces them to listen to you more clearly, as long as they care about “taming the Excel beast” that is always their favorite thing to complain about :)

For a production system, that cooperation is essential for robust results.  And if it’s not a production system, then yeah, the Excel shaping workaround is great.

“No, it can’t do that.  Oh, wait. Nevermind.  Yeah, it CAN do that.”

Having worked on the product from the beginning, I’m more closely familiar with its limitations than most people.  In some ways that’s an advantage of course – I don’t dive into projects only to find out later that it can’t be done.

Oddly though, so far, knowing the limitations has largely just been a hindrance.  Every time my initial answer was “no, not in PowerPivot v1,” I’ve turned out to be wrong.

I have succeeded using PowerPivot to solve every single real-life problem I’ve encountered in my first six months of using it.

Market basket analysis?  Ranking measures?  Standard deviation?  Many to many relationship problems?  Godawful horrible data sources?  Measures that calc according to different formulas at different levels of the pivot?  Iterating over variables that aren’t even in the view?  PowerPivot has defeated them all.  Well, more accurately, I have defeated them all with PowerPivot.  It’s not like I sit back and watch PowerPivot do its thing.  It is not always easy.  Which brings me to the next point:

Challenging and Rewarding

You know those rare occasions where you suddenly find yourself in the fast lane?  When your brain is forced to expand?  When you are truly challenged, in a good way?

I’m talking about a specific kind of challenge, the good kind.  Not the kind like when you take a new job and are overwhelmed by all the new rules and bureaucracy you have to learn.  And not even the kind when you’re learning most new technologies (HTML and XML come to mind).

The best examples of this “positive challenge” vibe from my life are the ones where I’ve been pushed by a mentor.  A couple of teachers come to mind.  Some specific coworkers as well – Zeke Koch, David Gainer, and Amir Netz most prominently.  Being around those guys everyday basically was a wakeup call – “Rob, you’ve been asleep.  Wake up, it’s time to grow, to be excellent.”

PowerPivot, oddly, has felt like that.  My brain has been expanding again, after a period of stagnation.  More specifically, PowerPivot combined with the problems I’ve been tackling has done this.

And it flows over to other areas too.  Example:  years ago when I needed to estimate the incoming query load for the football stats project, I asked around for advice, got none, and then basically just guessed.  It was a very hollow experience.  Non-excellent.

Then recently, I was presented with essentially the same challenge.  But this time, I didn’t guess, I modeled it:  estimated how many reports (and queries per report) each user would exercise at peak, built a spreadsheet, re-taught myself the Poisson and Binomial functions, and voila – a “users per server” estimate I could believe. 

I’m positive that working with PowerPivot is the difference between the “lazy guess” mentality from before and the “it can be done” mentality today.  I love it :)

Carrot, not stick (but sometimes the carrot is too big for one sitting)

I don’t want you to interpret the above as “PowerPivot is a harsh Pai Mei figure, kicking Rob’s ass day in, day out.”

Nope, I worked with PowerPivot for months, and did things I thought were pretty damn cool, without ever stretching my brain the way I have been lately.  If you’re an Excel pro, you will find PowerPivot to be a very welcoming environment.

Excel veterans:  you will never be forced to do anything uncomfortable with PowerPivot.

In fact you will do amazing things in your first week that will actually deliver greater results than standalone Excel, and it will actually seem easier than normal Excel.  Check out the CALCULATE function and you will see what I mean.

But boy, sooner or later, it will TEMPT you to try something bigger.

You’ll be sitting there one day thinking, “Gee, I sure wish I could build an analysis that showed X.”  And then something hits you – “hey wait, I bet THAT’s what that function I looked at the other day does” or “Didn’t I see a blog post last month about something similar?”

Two hours later, you’re still heads-down over your DAX formulas and relationships, feeling like you’re 30 seconds away from a breakthrough that will change your professional life.

You might not even succeed that first day.  You may have to come back tomorrow with a fresh perspective and a clear head.

And you love it.  Every minute of it.

But that’s when you realize that you have left the reservation.  You are not in Kansas anymore.  Time to take off the training wheels.  Pick your analogy.  Make no mistake – the power of DAX in particular can challenge you immensely.  Jon Udell described the examples I’ve covered here on the blog as “magic,” and he’s right – you won’t pick up everything overnight.  You should be prepared for that.

It’s called learning.  And you’ve almost forgotten what that feels like.  It’s what drew you to Excel in the first place, years ago – that fleeting glimpse of capabilities and the results it could deliver, but also frankly because you were hooked, addicted, to mastering it.

That feeling is back :)

12 Responses to Six Months With PowerPivot, part one

  1. Marco Russo says:

    Very good points, I strongly agree here:
    - PowerPivot is not a data cleaner/shaper
    - ETL is still required
    - PowerPivot is good for specific problems, not for generic models

    The last point is my personal interpretation of many of your considerations, and is something I’m realizing after a few months of study and usage.

    • That last point about specific problems is quite interesting. I haven’t thought about that too much. Are you saying that because PowerPivot does not build a terribly clean public OLAP model?

      If so, I absolutely agree. PowerPivot was designed around building reports, not models. The UI and feature set flow that direction, with OLAP models being a side effect.

      But I’m not sure that’s what you mean. And since you’re one of the most advanced PowerPivot pros running around these days, I’m definitely interested in hearing more. I’m actually contemplating using PowerPivot to design re-usable models in my day job at PivotStream, so the interest is not purely academic :)

      • Marco Russo says:

        Well, when I think to generic models, the term of comparison is Analysis Services. First of all, PowerPivot lacks of several features: control over measure formatting, attribute hierarchies, calculated members (PowerPivot only can define measures). Maybe these are included in what you call “not clean OLAP model”. However, I think that there are two key points when I say PowerPivot is not good (at least as SSAS already is) for generic models.
        First of all, the model is inside an Excel worksheet. To share the model in many different “reports”, you end up copying the Excel sheet containing the original model. After that, imagine you have 10 Excel worksheets with the same original model… and you find that you need to correct a DAX measure. You have to correct 10 worksheets at this point. Having everything in the Excel file is good at the beginning, but it quickly becomes an issue if you share the same model in many worksheets.
        The second point is somewhat related to the missing calculated member feature I mentioned before. Having an utility dimension in SSAS is very important, and you don’t have such a tool in PowerPivot. For example, if you want to define a YTD calculation, you have to specify the measure you want to calculate as YTD. In a regular cube it is common to have up to 20 measure (and there could be more) and creating a YTD, YOY and other calculations for each measure increase complexity and lower usability.
        For this reason I think that PowerPivot is good to create calculations and a model for a specific report (or a specific set of reports, i.e. different sheets of the same worksheet). However, sharing the same model is not so easy in this first version of PowerPivot (I know, you can publish it to SharePoint and then connect to it as a data source, but it really wouldn’t solve all the issues I described above).

  2. Hey Rob, that’s a pretty good synopsis. Couldn’t have summed up my PT experience any better.

    Marco Russo Wrote:
    “ETL is still required.”

    I dunno. Recently I was reading a great book which stated: “The changes that need to be made when moving data from the OLTP database to the final data mart structure should be carried out by specialized ETL software, like SQL Server Integration Services, and cannot simply be handled by Analysis Services in the Data Source View.”

    I assume that the above statement is equally applicable to PowerPivot :)

    Interestingly, earlier Gemini builds did include a “Data Cleaning” tab, which was dumped in later builds. The tab can be seen in Donald’s old videos. I’ve often wondered what functionality existed there.

    On a somewhat related side note, in V2 I’d like to see an option to normalize an Excel cross-tab table in the linking process. It’s natural when a human inputs data into a table to do so as a cross-tabulation entry, rather than a normalized entry (which has meaning only to a database, or an app like PowerPivot).

  3. Um, that should be ..”my PP experience…”

  4. Navi says:

    A very cool note describing PowerPivot. I love the IT – Business Users relationships which you discuss related to the data cleaning.

  5. m-b says:

    Interesting read. I like the way you describe the possible experience for ‘Excel veterans’. Unfortunately I reckon it will probably take a while for most Excel veterans to have access to Excel 2010 and PowerPivot at their workplace. Excel 2003 is still pretty common from what I’ve heard. I definitely can’t wait though :-)

  6. Richard Flewitt says:

    I’m very interested in PowerPivot as we are in the early stages of evaluating BI tools. Your Football solution seems to have quite low volumes of data. What do you think (length of a piece of string question coming up) is the maximum pratical number of rows of data PowerPivot can handle. I have > 1 billion per month.

    I know … it depends … but just as a guide.

    • Hi Richard. In my work at Pivotstream I routinely use 300M row datasets. With a very small number of columns (like 4-5) I’ve come close to getting 600M rows to fit, and I suspect they would fit if I didn’t run into a PowerPivot bug that is triggered by my data set (the folks in Redmond are looking at it).

      If you have a billion rows per month, I would suggest finding a way to either segment that data set into 3-4 separate models, or pre-aggregate in some way to get yourself down into the few hundred million row neighborhood.

      Long-running historical models that accumulate data over a period of months will be a problem in that world, of course.

      Feel free to drop me an email with more questions.

      -rob

  7. [...] Adviser to our SQL Integration Services team – we use SQL Server Integration Services (SSIS) to digest data from our clients and partners (typically flat files over FTP), and then land the data in SQL Server.  From there, it’s imported into PowerPivot models.  I don’t know SQL or SSIS all that well myself, but I do know what sorts of data structures make for the best PowerPivot modeling and performance, so I spend a lot of time working closely with that team.  Longtime readers of this blog will know that sort of cooperation sounds familiar. [...]

  8. Danie Swart says:

    I have run into the following limitations:
    For one table containg 81660692 number of records i got the following error:
    A value in column ‘AfterImage’ of table ‘b2cc0294-6148-4915-9886-3e1495fe0e1a’ exceeded the maximum allowable size

    For another Table with 13000370 Number of records I got the following error:
    Memory error: Allocation failure : Not enough storage is available to process this command. .

    Memory error: Allocation failure : Not enough storage is available to process this command. .

    Also there seem to be a limitation on the number of relationships that you can set up from one field on a tbale to other tables.
    I Have set up four relationships for a field(PK) and it does not allow me to add any more relationships on that field

  9. Gerry Poulin says:

    I have a similar error
    column ‘Full Command Arguments’ of table ‘f2fd09cb-6d7c-461b-822c-14082b5733a3′ exceeded the maximum allowable size. The operation has been cancelled.

Leave a Comment or Question