In the Browser, Aesthetics Yield a Greater Return

January 10, 2012

 

A PowerPivot Doc in the Browser is an Application   
A Spreadsheet in Excel Services Is No Longer a Document,
It’s an Application

I thought today was going to be a “handful of mini posts” kind of day but then this post blossomed into a bit more than I thought it would, which is a good thing.

Normal Spreadsheets are Usually Ugly and That’s OK

It’s true:  most Excel pros are not artists.  I certainly am not.  We’re number folks first and foremost, and our jobs haven’t historically placed top-level emphasis on aesthetics either.  So we don’t spend much time on it, typically.

Besides, Excel itself isn’t pretty.  Back in the 80’s or 90’s, even if you made a spreadsheet look fantastic, well, it was still loaded in Excel.  So you got all those lovely “battleship grey” toolbars, the title bar, etc.:

Spreadsheets of Yore Were Also Made Ugly Just by the Excel Frame

Spreadsheets of Yore Were Also Made Ugly
Just by the Excel Frame Itself

I want to be clear:  This is NOT a post that is going to encourage you to run out and start putting lipstick on all of your spreadsheets. 

Instead, I am going to make the case for why the game changes significantly (for the better too) when you switch to using a server (or a cloud hosting service like ours) to share your work.

The “Excel Frame” Has an Enormous, Underappreciated Impact

Hey, you might say that the old-style Excel screenshot above is an unfair example, since Excel 2007 and 2010 replaced menus and toolbars with the Ribbon.  But in a crucial way the Ribbon is NO different really – the point is that either way (ribbon or menu/toolbars) the Excel frame is NOT YOURS.  It belongs to Excel.  And no matter how much work goes into the document itself, the user of that document still thinks you made them a document.

Let that sink in for a minute.  Because the user of your spreadsheet thinks they are “using Excel” and not “using an application built by my favorite Excel pro,” you are receiving a hidden benefit AND a hidden penalty:

  1. There are many things you will never be blamed for as long as the consumer thinks of Excel as the application.  Hey, the overall experience just feels kinda clunky.  “No big deal, that’s just how Excel docs always are.”
  2. But you also don’t get nearly enough credit, psychological impact, or perceived importance that your work deserves.  Seriously, your work drives your organizations.  If everyone thought of you as a programmer (which you are, even if you don’t write macros), you’d be viewed differently.  But people who produce documents are often viewed as “Excel Monkeys.”  Honestly I think Excel pros are, for the most part, underpaid relative to their true importance.

When you switch from mailing spreadsheets around to publishing spreadsheets on SharePoint, well, both of those go out the window.  Well, if you do it right, anyway.

SharePoint Brings Its Own Frames!  Ack!

OK, so you switch over to using SharePoint as your publishing mechanism.  Does that get rid of the Excel frame?  Well not really.  It just gives you a new Excel frame in your browser:

See all of those highlighted elements?  Just another Excel frame, ported to the browser.  Complete with File tab, toolbar, the .XLSX extension blazoned across the top, and even a warning bar.

Not to be left out, SharePoint adds some of its own at times:

So, you gotta get rid of those.  And that means customizing SharePoint.  If you’re a SharePoint pro that’s mostly not too difficult, but even then it likely will take you some time to get it tuned just right (we’ve been making tweaks now for two years).  And if you’re not a SharePoint pro, well, you are going to need one.

(For more info on the details of these SharePoint elements and what we’ve done to modify/suppress them, see this post and this post).

Going Frameless Turns “Document” Into “Application”

In our Hosted PowerPivot offering, we’ve got all of that suppressed, and the only “frame” you see is just the browser and a typical web header.  An example:

PowerPivot Application Built by Pivotstream

The Consumers of This Application Neither Know NOR Care that it Was Built in Excel

The idea for this post struck me yesterday as I was putting together this sample workbook (based on Microsoft’s AdventureWorks data set) that we are going to start including in all of our HostedPowerPivot sites as a tutorial:

Sample Hosted PowerPivot Workbook

Sample Workbook v1 For Hosted PowerPivot

Or try this humorous example (based on real data) that examines UFO sightings – click image to view the application on Mr. Excel’s HostedPowerPivot site.

Live PowerPivot Application on HostedPowerPivot

Click Image to View the Live APPLICATION

Completing the Illusion:  A Few Simple Steps

Once you’ve gone frameless, there are a few simple things you can do to complete the transformation from document to application.  Neglect these and your “frameless” would-be application screams “spreadsheet” to the audience.  Follow them, and even if you’re not terribly artistic, your work will be perceived very differently:

  1. Turn off gridlines and headers.  It’s not hard.  Two checkboxes on the View tab of the ribbon, but do that for EVERY sheet the consumer sees.  Every single one.  Crucial.
  2. Hide or delete all sheets you don’t want them to consume.  Don’t leave extra blank tabs in there named “Sheet3” OK?
  3. Insert some images.  So important!  Your company logo.  Your client’s logo.  Something.  And make sure you use the Insert ribbon to do this!  Simple copy/pasting an image into a worksheet, in my experience, seems to result in that image NOT showing up in the browser.
  4. Line up slicers, charts, and tables.  Takes just a minute or two to make sure the top of your slicers are even with the top of your chart, etc.
  5. Don’t neglect number formatting.  If something is a currency, format it as a currency.
  6. Use conditional formatting.  No need to overdo it, but conditional formatting turns a boring black and white grid (a pivot) into an inviting surface that is actually fun to look at.  Plus, trends, patterns, and outliers jump off the page much more readily.  I’m especially fond of data bars, color scales, icon sets, and when I have the time, sparklines.
  7. Create a Menu sheet (Table of Contents), and use hyperlinks for navigation.  Yes, the sheet tabs are visible.  But why force people to use them?  There are a million reasons why sheet tabs are disproportionately old fashioned and cognitively difficult.  When it comes to navigating around a web application,  nothing comes close to a hyperlink. 

“I’ll Take ‘Hyperlinks Between Sheets’ for the Win”

Let’s focus on that last one.  Did you know that you can hyperlink between sheets in a workbook?  I worked on Excel for years and never really realized this.  Our CEO at Pivotstream pointed that out to me, and it works on the server too.

Hyperlinking Between Sheets in Excel

Hyperlinking Between Sheets in Excel

This lets you create menu sheet like the AdventureWorks sheet above, as well as this one at the beginning of the post:

This Is Actually a PowerPivot Menu Sheet aka Table of Contents

This Is Actually a PowerPivot Menu Sheet (aka Table of Contents)

Cool huh?  Those chart thumbnails are IMAGES.  The hyperlinks above them take you to the full-page interactive sheets that host each of those chart views.

Formatting Macros

I have a number of macros that help me do some of this stuff, and in an upcoming post I will share some of them, once I have time to organize them a bit.

In the meantime, here’s a real simple one whose intent should be obvious:

Sub HideGridAndHeadersOnAllSheets()

    Dim oSheet As Worksheet
   
    For Each oSheet In ActiveWorkbook.Worksheets
        If oSheet.Visible = xlSheetVisible Then
            oSheet.Activate
            ActiveWindow.DisplayGridlines = False
            ActiveWindow.DisplayHeadings = False
        End If
    Next

End Sub

How many of you use macros, by the way?  I’m really curious.  Drop me a comment if you would and just say yes/no.  I will also put up a survey at some point if I get industrious.

Conclusion

I know that not everyone who has embraced PowerPivot for Excel has started using PowerPivot for SharePoint yet.  That’s changing, but it takes time.

For those of you who are starting that transition, I’m very excited for you.  I know it sounds weird.  But you cannot appreciate how much more impactful your work “feels” as a web application until you see it in action.  And this post is aimed at helping you reap that benefit.

For those of you who are yet to start down that path, file this one in the back of your mind for later.


Rackspace Webinar from June

August 18, 2011

A few people have asked me if the SharePoint Saturday presentation will be recorded this weekend.  I am 99% sure the answer is no.

But if you’d like to see something similar, the webinar I did with Rackspace back in June covers many of the same points:

http://www.rackspace.com/knowledge_center/pivotstream_webinar

I’ll warn you though that we hit two distinct technical issues in the presentation – one small SharePoint glitch which was not a big deal, and a problem with the webinar software itself that basically wiped out 4-6 slides and frustrated me to no end.

Up until that point though, it was going great Smile


Couldn’t Resist… Parameterized PowerPivot Report!

July 20, 2011

In the intro to David’s guest post below I mentioned some new applications we are building.  Check out a PowerPivot report where, in the browser, you can type in a SKU number and get a full dashboard for just that product!

Parameterized PowerPivot Report in SharePoint

Click for Larger Version

Much respect to our resident Data Junkie Monkey, aka DJ Monkey, for pulling this together.


HostedPowerPivot.com from Pivotstream & Rackspace

May 25, 2011

Click for HostedPowerPivot.com
No, we're not villains.  But the quote was too perfect to pass up!

“At last we will reveal ourselves to the Jedi Bee-I.  At last we will have shared intelligence.”

A secret long kept, finally revealed

It’s a recurring theme – I see it in my training/consulting practice, in my inbox, in the survey results, and at events: 

“We LOVE PowerPivot.  It’s a perfect fit for our analysis and reporting needs.  But our company has not yet adopted SharePoint, and we don’t have the in-house expertise to stand up and support PowerPivot for SharePoint.    We just want the simple beauty of PowerPivot, we want it now, and it’s frustrating that we can’t have the full system yet.”

I hear you.  That is precisely where we found ourselves when I joined Pivotstream.  The lack of a “turnkey” solution to that problem meant we had to go build it ourselves.  And our core business has been running on our internet-based PowerPivot infrastructure since last summer.

But now, one year and two hosting providers later, we are finally able to share what we’ve built with the community.

A Long, Long Time Ago, In a Conference Room Far, Far Away…

OK, it was February, in San Antonio.  John Casey and I were at Rackspace headquarters for two full days to pitch an idea:  that an all-in-one, customized-to-your-needs, zero-hassle PowerPivot for SharePoint infrastructure would be a very valuable thing to the world at large.

We’d chosen Rackspace based partly on their reputation for support, but primarily because they had the most SharePoint expertise in the hosting business.  SharePoint, after all, is probably the most complicated part of running a PowerPivot server farm.  We were already moving our own server farm over to Rackspace at that point, but now we were pitching them on a partnership.

It’s a dicey proposition, walking into someone else’s offices knowing that you have to start from scratch.  We planned to cover the dynamics of the BI market, past and present, Excel’s place in it, Microsoft’s first-ever total alignment on a strategy, and why imagePowerPivot was going to change the world.  That’s a tall order for anyone to digest or believe in a short two days, no matter how fervently I believed in the message myself.

I underestimated them.  They understood perfectly.  We ended up meeting with 10-12 members of their leadership team over those two days, transitioning from “here’s a cool idea” to “here’s how we can execute.”

Keeping this a secret has been the longest three months of my life.  I am stoked that the waiting is over.

Want the short version?

Being that this is a blog – my blog, specifically – and that I love telling stories, my aim here is to describe how we got here – motivations, steps along the way, etc.

But if you just want to get to the “meat” of this, and/or request more information, go ahead and visit HostedPowerPivot.com:

Click for HostedPowerPivot.com

Ok, back to the story.

Thanking Rackspace

I’m pretty sure we could not have done this HostedPowerPivot thing with anyone else, although I did not fully understand that going in.

We’ve been running our core PowerPivot platform in a Rackspace data center for four months, and the level of support we get from them is night and day different from what we had in our last data center.  They advertise “fanatical support,” and I’m a believer now.

As fantastic as that is, though, the word “support” doesn’t capture what continually impresses me.  I keep coming back to the human element – the real people on the other end who are acting like human beings and not cogs in a machine.  I’m not accustomed to big established companies, especially infrastructure companies, maintaining a nimble, entrepreneurial vibe, but that’s what I’ve found here.

For instance, does this sound like “support” to you?

Me:  “Hey Rackspace, we’ve found some unexpected PowerPivot performance results on this hardware set.  We’re now running some tests on every hardware platform we’ve got.”
Rackspace:  “Are there some other hardware options we can try out for you?  We’ve got access to a bunch of stuff here you know.”
Me:  “YES.  You’d need to install PowerPivot and run a bunch of tests on each machine, do you have time for that?”
Rackspace:  “No problem.  Send us the instructions and we’ll try it on 10 different machine types.”

Rackspace:  “OK, here’s a detailed spreadsheet of our results.  Three test runs for each unique config, reported separately and then averaged.”
Me:  “Did you say spreadsheet?  I think I’m in love.  We’ll correlate that with our other results.”

Me:  “OK based on all results, the best query performance would be achieved on a non-standard config, one with the following properties…  is that machine something that can be built out in your datacenters as THE standard PowerPivot server?”
Rackspace:  “Hmmm…  we’ll look into it and get back to you.”

Rackspace:  “Yes, we have approval to build that out.  Should we order one up for testing purposes?  We’ll have to have some new equipment delivered from the hardware vendor, might take a few days.”
Me:  “Yes please.”

Rackspace:  “Test machine racked and running.  And uh, I think you will be pleased.  It’s blowing the socks off of everything else we tested!”
Me:  “I love it when a plan comes together.  Gentlemen, we have ourselves a PowerPivot server.”

Thanks guys.  Too many of you to name specifically, but you know who you are Smile

Step 1:  Register Domain.  Step 2:  Submerge in PowerPivot for a year

True story:  Jeff Elderton, our CEO at Pivotstream, registered the domain HostedPowerPivot.com before we even decided I was going to sign on.  It’s been in our plans from the beginning.  But before we could credibly do such a thing, we first had to apply the technology ourselves, for our own core business.  We dug into that while PowerPivot was still in beta, as our sole focus.

Along the way we had to solve all of the common problems everyone will hit.  We’ve written software to plug the gaps and provide a professional aesthetic.  We know how to “capacity plan” specifically for PowerPivot.  We’ve even figured out that certain hardware configurations can dramatically outperform the most commonly-used server configs.  We learned a lot more than we expected to.

All of that was expensive and time-consuming of course.  But it was absolutely worth it.  The things we deliver to our customers simply were not possible before PowerPivot.

Today, I’m pretty sure no one in the world runs a PowerPivot infrastructure of the depth and breadth of what we run at Pivotstream.  Our entire core business (subscription analytics for dozens of clients) runs on our PowerPivot infrastructure.  There’s no substitute for just doing something – I learned much more about PowerPivot from the outside, as an adopter, than I did as an insider, working on the team at MS.  That was surprising, although it makes sense in hindsight.

I’m really happy to see it all come full circle.  At his core, Amir Netz describes himself as an inventor.  I like that, I think it fits him quite well.  I’m similar in some ways, but it’s not like I will ever come up with something like the VertiPaq engine, so “Inventor” would be an overly generous description of me.  I like to think of myself as a creator.  I love creating useful things.  I love filling voids.  And this one has had me jazzed for a very long time.

1997:  Alabama 20, Vanderbilt 0

There’s one more story I’d like to tell, and it’s a bit of a cliffhanger because it deserves its own post.  Like so many other things around this blog, it all comes back to football:  there’s a connection between us hooking up with Rackspace, and the 1997 Alabama routine thrashing of Vanderbilt.

Just one of those fun little wrinkles in life.
 

Click here for THRILLING highlights. I wonder who posted these? Hmmm…

PowerPivot Accelerators: The Story (And Announcing: the Private Beta)

February 2, 2011

 Pivotstream's PowerPivot Accelerators

Sunday night update:  we’ve received a lot of exciting interest and have identified about 15 great beta sites so far, representing a broad cross section of the industry – BI and SQL pros, SharePoint pros, and Excel pros who are “growing up” into BI and SharePoint via PowerPivot.  Great international representation, too.  Still looking for a few more participants, so drop us an email – beta@pivotstream.com

Remove hardhat.  Pick up thinking cap.

Short version:  We have some more toys to share.  Skip to the end if you want in on the beta.

Long version:  In August 2009, while still a member of the PowerPivot team at MS, I took off my product design hat and put on my “user” hat.  From Cleveland, far removed from the internals of the product team in Redmond, I started this website, and embarked on The Great Football Project.    I was as curious as you how well the product would work.  Maybe that’s hard to believe, but I promise it is 100% true :)   
 
I think it’s fair to say that applying a platform like Excel, SharePoint, PowerPivot, or SQL tends to foster a very different kind of expertise than that fostered by building or designing it.  You become much more familiar with the gaps, in particular, which is why those product teams listen so carefully to customer feedback (much more so than the Word team needs to, for example).

So I dove in, learning things at every turn.  About six months later, armed with the knowledge that PowerPivot performed VERY well in real scenarios, I dove in even deeper:  I left MS to join Pivotstream, where we started prototyping our PowerPivot infrastructure and models even prior to the product’s final release date.

“Gentlemen, we can rebuild it.  We have the technology…”

We started finding things in PowerPivot that we wished we could change.  Not earthshattering things.  “Last mile” type things – gaps in the feature set that made a big difference in practice.

That’s when my background as a software engineer became useful again.  Imagine working your whole career on MS products, where every day, your job is to identify flaws and opportunities to fix and fulfill.  And working right next to the people who actually reshape the products every day, as if the software were made of clay – stubborn clay, but clay nonetheless.

That breeds a certain optimism, a refusal to simply accept things as they are.  Instead of saying “crap I wish they had put X in the product,” my conditioned response is “hmm…  how hard can it REALLY be to fill that gap ourselves?”

Yes, it almost always turns out to be harder than it seems up front, sometimes MUCH harder.  But without that optimism, we probably wouldn’t have gotten started.  So it serves a purpose right?  Programmers everywhere are shaking their heads in disagreement, to which I reply, “Hey, why are you reading this?  Get back to coding!”  (Kidding.  Sorta.)

The Results:  A Complete System

I’ve already shared one of the projects with you:  the trimmed-down SharePoint pages optimized for the BI portal role.  And then their subsequent beautification.  Those have been in production for us for a long time now.

That’s just the tip of the iceberg though.  There are many other things we have done to improve our efficiency, or our customers’ experience, or both.  We are very proud of the results.  Here’s a glimpse:

  1. Bulk Workbook Modification – when we started out, we would have to manually edit double-digit numbers of workbooks by hand whenever we wanted to make a change.  The same was true if we wanted to add or modify a lot of measures in bulk.  Not only was this inefficient, but also error-prone.  Today, our workbooks are very rarely touched by human hands.  We queue up a change and kick if off.
  2. End to End Data Refresh – our nightly/weekly refresh process is completely automated, from FTP delivery of new data from clients, through SSIS, PowerPivot refresh of “Core” workbooks (triggered only when the underlying SQL sources are ready!), and automatic refresh of the “Thin” workbooks connected to them.  We can schedule Cores to refresh more than once per day.  In fact we can manually trigger a “right now” refresh of Cores in bulk, with basically one click.  In the Thins, we can even increment Date slicers to the latest value :)   Status updates are automatically emailed out to our team.  And the “refresh on open” problem is a thing of the past.
  3. SharePoint Tools – there’s also a reasonable amount of drudge work involved with SharePoint stuff, particularly publishing and linking pages and workbooks.  You may not notice this, depending on how many workbooks you have, but for us, well, we have far too many to be clicking around in the SharePoint config UI all day, so we’ve built tools for that, too.

Even better:  these things will all remain valuable once SQL 11 / Denali ships, and we have things like BISM and Crescent to play with.  In fact I hear rumor that the SQL CTP2 beta release may be just around the corner…  are you pumped?  I’m pumped.

The Private Beta:  Now Taking Applications

All of that stuff is working great for us internally.  And just like with the SharePoint Pages, we’d like to share them with the community and recoup some of the development costs.  Getting these components ready for broader usage IS more work, however, and we’d like to recruit a small group of early beta testers to kick the tires.  The Accelerators won’t release until they are ready.

If you are interested, please send an email to beta@pivotstream.com and specify:

  1. Your Name
  2. Company Name
  3. Current Usage of PowerPivot – personal or organizational, prototype/exploratory or already in production, and whether you are using PowerPivot for SharePoint or not
  4. Which areas you are interested in testing (1-3 above)
  5. Website(s) – your company website and/or blog URL if applicable

We’re going to keep this first round kinda small, maybe no larger than 10-15 participants.  So get your emails in, we’re anxious to start selecting the group :)


PowerPivot Scheduled Refresh Pt 3: Thin Workbooks

February 1, 2011

 
PowerPivot Thin and Core Workbooks

What is a “Thin” Workbook?

At Pivotstream, we came up with the term “thin workbook” to describe an Excel workbook that does not contain a PowerPivot model, but that DOES connect to another workbook that does, which we call a “core” workbook.  In other words, thin workbooks use core workbooks as their data source.  Others have called this “hub and spoke,” and I may even like that phrasing better, but hey, we’ve been using “thin and core” at Pivotstream for nearly a year and it’s hard to change.

So, a thin workbook contains pivots, charts, cube formulas, etc. – it’s a report-only workbook.  And whereas our core workbooks may be 200 MB or more, our thins are often 5 MB or less.

Why create thin workbooks?

Here are a few reasons why you may want to use a thin workbook approach.

1) To avoid duplicating data.  If you want to create multiple workbooks that offer different views of the same data, which we do all the time, you probably don’t want to duplicate the PowerPivot data across all of those workbooks.  That duplication needlessly burns RAM on the server, takes extra time to run scheduled refreshes, and also increases your maintenance work (ex:  if you want to modify a measure, you have to do it multiple places instead of one).  I *highly* recommend you never duplicate PowerPivot models like that, and thin workbooks are a good alternative.

2) To lock down data.  In many ways this is another flavor of #1.  At Pivotstream we often create one workbook that contains a full set of measures, and then another that omits certain measures that are particularly sensitive.  Both can use the same core workbook as a data source, but if you properly lock down your permissions on SharePoint, you can give one set of users a different level of data than another set.

3) To avoid the refresh on open performance problem.  In the last post about refresh, I pointed out that the PowerPivot refresh service on SharePoint kinda “cheats” a little bit – it refreshes the data in the PowerPivot model, but the data in the pivots (the data in the Excel sheets themselves) is untouched.  To work around that, PowerPivot for SharePoint sets the “refresh on open” flag, forcing Excel to refresh all pivots whenever the workbook is opened.

That can be slow.  In many cases in our work at Pivotstream, it is VERY slow.  Since the PowerPivot refresh service does NOT touch the thin workbooks, “refresh on open” never gets set.

Of course, that creates another problem doesn’t it?  The data in the Excel sheets then remains stale, with no automatic way to refresh it.  That’s no good.

But we are busy little beavers at Pivotstream and we don’t give up easy :)   We started working on this problem in March 2010.  More on this later.

How do you create thin workbooks?

The first step is to publish your “Core” workbook to SharePoint.  You CANNOT connect a thin to a core sitting on your desktop – the desktop version of the PowerPivot engine does not support cross-workbook connections like that.

Once you have your Core published, there are two primary ways to create a thin workbook:  Start from Scratch, or Hack like Mad.

Boring:  Start from Scratch

In this method, all you do is go to the PowerPivot Gallery and click the highlighted button:

Creating a Thin PowerPivot Workbook from the Gallery 
…and that gets you started in Excel, in a new workbook, with a blank pivot connected to the Core model.

More Fun:  Hack Like Mad!

For those of you who prefer to color outside the lines, here’s another technique that I often find helpful.  You’ve created a single Core workbook, and it has lots of report sheets in it.  You don’t want to start from scratch and throw all of those report sheets away, right?

So here’s what you do:

  1. Create a copy of the workbook.  Just copy/paste the file in Windows Explorer.  Name one file something like Core.xlsx and the other Thin.xlsx
  2. Delete pivots from the Core workbook.  Two reasons for this.  One, it reduces confusion – when you open the Core, you know it’s the core because you don’t see any report sheets.  But deleting pivots also makes the scheduled refresh process on SharePoint run faster – after PowerPivot refreshes the model on the server, it then needs to refresh the thumbnails in the gallery.  To do that, it asks Excel Services to open the workbook, which triggers “refresh on open.”  Why pay extra time and CPU penalties on reports no one is going to see?
  3. Publish the Core workbook to SharePoint.
  4. Change the connection string in the Thin workbook.  Open it up in Excel, and in Connection Properties change the “$Embedded$” string to be the full URL to the core workbook on SharePoint.  Example:

Changing the Connection String to Point to a Published PowerPivot Workbook

When you click OK, Excel performs a full refresh against the new source.  Be patient.  Once that is done, your thin no longer connects to the PowerPivot data embedded in its own file, it now queries the server copy instead. 

Of course, now you probably want to make the file smaller and get rid of all that unused PowerPivot embedded data.  Remember where that lives?  This post explains in more detail, but here’s the image of the thin workbook renamed from XLSX to ZIP:

Embedded Data - One Reason Why PowerPivot Will Not Work With Excel 2007

Item1.data is the file you want to get rid of.  But if you delete it, you will corrupt the file.  So I just create a 0-byte Item1.data file on my desktop and copy it over the top of the file inside the ZIP.

Rename back to XLSX and you have a true “Thin” workbook with all of the original report sheets intact and functional!

Summary, and a hint at the next post

Thin workbooks are a VERY useful tool in a PowerPivot system.  For Pivotstream, they are absolutely essential.

The only trick, as mentioned above, has been getting those thin workbooks to always be up to date.  I’m happy to say we have that problem solved.

I’m pretty sure others are going to need this sort of thing, too, as well as some other internal tools and components we have built.  If you have interest in beta testing our PowerPivot Accelerators, as we are calling them, watch for the next post :)


Data Refresh Twists/Turns Pt2 – Pivot Refresh

January 12, 2011

 
When You Are With the Right Organization You Don't Need to Wait on Fresh Pivots

“People looked at me differently.  They knew I was with somebody. 

I didn’t have to wait in line at the bakery on Sunday morning for fresh pivots.”

-Henry “DAX” Hill

 

 
That Second Refresh Can Really Bite You

Back before the great distraction known as Donald Farmer hijacked this blog for a few days, I was talking about Scheduled Data Refresh.  Specifically, I left off talking about how refresh is really TWO refreshes – PowerPivot and Excel.  Here’s the illustration again:

powerpivotrefreshvsexcelrefresh

OK, so the PowerPivot refresh service has one primary mission in life, and that is to refresh the PowerPivot model.  Which brings us to…

Surprise #1: PowerPivot refresh service does NOT refresh the Pivots!

That’s right, it leaves the pivots in the workbook alone.  So by default, those pivots (and/or cube formulas) still contain stale data!  I’m 100% serious.

“But wait, Rob, you’re wrong!” you say.  “I’ve tested this feature out, and I have NEVER seen stale data in my pivots when I view them in Excel Services!”

That’s right, you DON’T see stale data do you?  That PowerPivot refresh service is one resourceful beast, and it’s playing a clever little trick…

Surprise #2:  PowerPivot refresh service sets “Refresh on Open”

Have you ever seen this feature in Excel?  It’s buried pretty deep:

Refresh On Open Setting in Excel is Used By PowerPivot Scheduled Refresh 
Refresh on Open Setting in Excel 

Yep, click Connections on the ribbon, select a connection (typically named “PowerPivot Data” in our case), then click Properties.  On the resulting dialog you will see the checkbox “Refresh data when opening the file.”

By default that checkbox is NOT set.  You can try this out to see what I’m talking about:  take a PowerPivot workbook, open in Excel client and verify the checkbox is not set.  Then upload to SharePoint, schedule a refresh.  When complete, download the workbook and look at this setting again.  It will be checked now.

Side Topic:  How Does PowerPivot Refresh DO THAT???

I’m 99% certain that there is no API on the server for doing this.  I don’t think Excel Services “helps” PowerPivot at all.  I’m pretty sure PowerPivot modifies the workbook directly, via the file format.  The Open XML File Format, to be precise…  which happens to be what Office uses – XLSX, DOCX, PPTX – these are all Open XML files.

You can do pretty astounding things with that format, so if you are a developer type, I suggest playing around with it.  I can tell you that OUR developers at Pivotstream had a heart attack when they saw that SDK :)

How This Can Bite You:  Refresh Sometimes Takes Awhile

HOURGLASS So far so good.  PowerPivot refresh does NOT refresh the pivots, but it DOES set the refresh on open flag.  So when you open it in Excel Services, the first thing Excel Services does, before it shows you any workbook content, is refresh the pivots.  So you never see stale data.  Ever.

In many cases, that’s the end of the story, everything is happy.  But in other cases, it can be painful.

Remember the post I did awhile back on the differences between Update and Refresh?  The overall theme of that blog post was basically:  Refresh Can Be Very Slow.

Why is that?  Short version:  Refresh refreshes EVERY pivot in the workbook.  AND refresh does more work than a normal pivot query.

So…  if you’ve got a lot of pivots in your workbook, or a large data model, or complex measures, or perhaps a combination of those, refresh can take awhile.

Other Considerations

The worst part, of course, is that when a user opens a report in the browser, they have to wait awhile before they see any data.  Sometimes that’s just an extra second or so, which is why you may not have noticed.

In other cases though, it can be as much as 1-2 minutes!

That happens for every user.  In fact, if you open a browser and hit a report, wait through the refresh, and then close the browser, even if you immediately return to the report in a new browser window…  you have to wait out the refresh again.

Imagine what that can do to the CPU(s) on the PowerPivot server(s).  Of course, if you planned your hardware around data refresh as the peak load scenario, then you likely have enough CPU, but still…  a lot of users doing this at once will only magnify their wait time.

“Give me the good news!”

First of all, I brought this to the attention of my former colleagues at Microsoft last time I was in Seattle.  They’re working on solutions that don’t involve “Refresh on Open” but it’s not yet known when or how those solutions will be made available.  Kudos to them for their responsiveness on the matter – I will keep you posted.

In the meantime, there are obvious things you can experiment with.  Keeping the number of pivots smaller is one.  Cutting back on cross-filtering in slicers is another.  But fundamentally, if you have a reporting/modeling/analysis scenario that involves heavy lifting, you can’t exactly just remove those needs from your workbook.

At Pivotstream, we found ourselves in precisely that situation.  And I am happy to say that we have beaten the problem, but the answer is complex – deserving of its own series of posts in fact.  Plus, there are some things here that I am not yet ready to talk about publicly, muhaha…  so let’s put that on the back burner for now.

Next up in this series:  “thin” and “core” workbooks, aka, “hub and spoke.”

Click Here for Part Three >>


PowerPivot Scheduled Refresh – Twists & Turns, Pt 1

January 3, 2011

 
Skipper Plans A PowerPivot Refresh Strategy  
“Rico, we’ll need special tactical equipment. We’re going to face extreme peril. The private probably won’t survive.”

OK, with a new year it’s time to get serious on the blog.  (Yeah, nothing says “serious” like cartoon penguins, but really, I *do* mean it).  This is gonna kick off a series of inter-related posts that I’ve had in my head for a long time now.  So buckle up, Rico.

 

 

Let’s say you’ve built a bunch of nifty PowerPivot workbooks.  You’ve published them to SharePoint.  And naturally, now you want to set things up so those workbooks automatically refresh, with the latest data, every night (or perhaps once a week).

As a recap, the simplest way to get started is to go to the PowerPivot Gallery view and click the highlighted button:

PowerPivot Gallery Schedule Refresh Button

Which leads to this page:

PowerPivot Gallery Schedule Refresh Page

Refresh ASAP vs. Refresh Once

Most of the options in that scheduling page are quite straightforward.  The only one that’s ever tripped me up is “Once.”  “Once” does NOT mean “Now.”  Instead, “Once” typically means refresh will occur tonight – either after business hours or after your specified earliest time.

If you want a refresh to occur now, you either need to specify a time (from today) that has already passed, OR don’t use “Once” at all.

Instead, you can schedule a refresh to occur Daily/Monthly/Weekly, and then check the “Also refresh as soon as possible” checkbox.

Personally, I think the lack of a “One time, right now” option is an oversight.  As a workaround, I often found myself scheduling something to run once every 12 months just so I can check the “ASAP” checkbox.

Another Way to Turn “Once” into ASAP

At Pivotstream, we have found another way to schedule ASAP refreshes:  we’ve changed the centralized definition of “business hours” to be so narrow that there are essentially no business hours at all.  (Right now we have biz hours set to be 7-7:15 AM – I haven’t tried setting Start Time = End Time, but a 15 minute window is narrow enough for our purposes).

The downside of essentially disabling Business Hours, of course, is that it impacts everyone.  That may not be a good option for your organization, as a refresh scheduled for “after business hours” will end up running sooner than expected.  It works well for us at Pivotstream however.

For details on how to modify Business Hours, see Vidas’s item in the FAQ:  http://powerpivotfaq.com/Lists/TGPPF/DispForm.aspx?ID=51

Behind the Scenes – the Basics

When a scheduled refresh does occur, a lot more happens than meets the eye.  This rough diagram illustrates the steps behind the scenes:

What Happens During PowerPivot Refresh - The Basics

(Click for Larger Version)

Notes on the diagram

I won’t repeat the descriptions on the diagram above, so spend a couple of minutes going over it ok? :)

With that done, I do want to call out a few points:

  1. The timer service only checks about once per minute – so if you schedule an ASAP refresh but nothing happens immediately, just be patient, it will start shortly.
  2. You can set different schedules per connection – I just discovered this recently, thanks to the watchful eyes of a training/consulting client of mine.  George and company, you rock :)
  3. The whole process will fail if the original file is checked out, or otherwise open for editing, during Step 6.  So it’s extra important that you not leave your published PowerPivot workbooks checked out.  I recommend editing locally on your desktop for this reason (and also because SharePoint is not terribly speedy for open/save of large files…  and you should be saving often.)

Refresh is Actually TWO Refreshes

OK, one last piece of background in this post.  When PowerPivot runs a refresh, it is refreshing the PowerPivot model – which corresponds to the contents of the PowerPivot window in Excel client.

In order for report consumers to see the latest and greatest data when they first navigate to it, however, the pivots in the Excel sheets themselves ALSO need to be refreshed.  So there are TWO refreshes that need to happen – we’ll call the first one “PowerPivot Refresh” and the second one “Excel Refresh.”

PowerPivot Refresh vs Excel Refresh
PowerPivot Refresh and Excel Refresh

OK, that’s enough for part one.  Next time I’ll explain how this two-stage refresh can complicate things in some situations.

Click Here for Part Two >>

In the meantime, if you are interested in further reading on scheduled refresh, you can check out the following articles:

http://msdn.microsoft.com/en-us/library/ff976569.aspx (Mariano’s excellent whitepaper)

http://powerpivotgeek.com/2010/09/08/a-peek-inside-getting-the-most-from-data-refresh/

http://blogs.msdn.com/b/powerpivot/archive/2009/11/20/powerpivot-data-refresh.aspx


PowerPivot Refresh: CPU/RAM Spike at Completion

December 6, 2010

 
I mentioned this in a recent post, but I figure a picture illustrates it better.  This is a screenshot of task manager as a PowerPivot refresh completes.  This one ran for about 30 minutes, so this represents just the tail end of the process:

PowerPivot Data Refresh CPU and RAM Spike 

Note the two highlighted points – one of the 4 CPU’s pegged at 100% for awhile – and while the other 3 did go “quiet” at the very end, they WERE active for part of the time that the fourth CPU was at 100.  So – CPU gets precious near the end of refreshes (after hovering around 50% for much of the process).

Even more notably, RAM usage spiked by nearly 2 GB!  The PowerPivot file in question was 1.45 GB on disk when it was complete.  And that workbook was still in RAM after refresh completed, so the 2 GB spike was pure overhead during the final compression process.

(Side effect:  Even if you have enough RAM to load a workbook, that does not mean you have enough RAM to refresh it.)

Allocate your server RAM and CPU around the refresh process, folks.


Hardware/Capacity Planning: From the Trenches

November 30, 2010

So How Many Servers Do I Need in My PowerPivot SharePoint Farm?

As people increasingly move up from just dabbling with the addin, and decide to start leveraging the publish/schedule/share/secure benefits of the PowerPivot for SharePoint infrastructure, I am getting this question more frequently.

How many servers?  How much RAM?  And less often but just as important…  how many CPU’s?

Time for an old joke:  “Ever hear the one about the statistician who drowned in a river that was, on average, only 3 inches deep?”

That’s the whole joke.  And as time goes on, I only find it funnier.  And funnier.  But first, some basics.

The Three Primary Server Roles

For the vast majority of PowerPivot deployments, you will mostly need to concern yourself with three server types:  Excel Services, PowerPivot Engine, and Data Source DB Servers:

PowerPivot Server Roles Summary For RAM and CPU Planning 
(Click for Larger Version)

A few notes on the diagram up front:

  1. I recommend viewing the larger version – there is information on the diagram (in the notes) that I will not repeat below.
  2. Don’t read too much into the 3/4/2 ratio of Excel Services/PowerPivot/SQL – I included different numbers of servers in each tier intentionally, to illustrate that you can scale each tier out at independently.  But, for instance, I do NOT expect you will need 3 Excel Services boxes for every PowerPivot box (you will likely need less).
  3. Although separated above, server roles CAN be combined onto single servers – for instance, many folks can get by with an “all in one” server where everything above lives on a single box.  Also, even in a multi-server farm, one of the MS-recommended configs is to combine Excel Services and PowerPivot onto a single box, and then deploy as many of those combined boxes as needed.
  4. There are some elements missing from the diagram – for instance, there is a Web Front End (WFE) role, and a SharePoint Configuration Database role, but in my experience, PowerPivot does not put unique strain on those elements (except for config db disk space).  Your mileage may vary of course.

Early Planning Efforts

Back in the Spring, I set down to the very serious task of “how much hardware will we need on our SharePoint servers in order to handle user loads?”  And naturally, I made a very serious spreadsheet to model it out:

PowerPivot SharePoint Hardware Planning Spreadsheet v1

Wow, what a spreadsheet.  I mean, I even used the =POISSON.DIST() & =BINOM.DIST() functions – clearly, such spreadsheet horsepower indicates accurate results!

Turns out, that spreadsheet was nothing more than an amusing theoretical exercise with little bearing on the real world.

There were multiple problems with that spreadsheet:

  1. It relied on outright guesses as to how often/how intensely consumers would utilize the published reports.
  2. I was not yet aware of the CPU-gobbling power of slicers, a power that makes real-world queries many times more CPU-intensive.
  3. Most importantly, it assumed peak usage would occur in the morning, when consumers are most in need of fresh information.

Solution for 1) and 2)

These are in some sense the simplest to address.  Quite simply, put up a pilot solution and observe the usage characteristics.  You will learn a lot about usage patterns as well as what that does to CPU and RAM.

But even better:  you may choose to simply ignore these factors for now and focus on problem 3 instead.  Here is why:

Peak Load is Probably Experienced During Scheduled Refresh

If you’ve looked at the larger version of the diagram above, you have seen a hint as to what I’m about to say here: 

At Pivotstream, our nightly refresh process puts far more strain on our servers than our users do, and is what we now plan our hardware around.

I will explain further, so you can evaluate whether your situation will be similar.  We receive new data on a nightly basis (typically late at night).  That data gets ingested into SQL server, and then our PowerPivot refresh process begins.

Once PowerPivot refresh begins, we have about 6 hours, tops, to get all models and reports refreshed, so that when business opens in the morning, everyone has access to fresh insights.  6 hours sounds like a lot…  until you discover that it isn’t.

A lot happens during scheduled refresh!

Keep in mind that the PowerPivot model (the embedded database living inside your PowerPivot workbooks) must first be refreshed – let’s call this phase “Model Refresh.”  This can put a lot of strain on your SQL servers just in terms of supplying the data, since PowerPivot v1 pulls a fresh copy of all tables (no incremental refresh).

As the data flows into the PowerPivot servers, a LOT of CPU power goes into compressing that data into the storage format.  During the refresh process, RAM usage steadily climbs as well.

As model refresh nears completion, RAM usage spikes upward quite a bit, sometimes by as much as 50-100% the size of the resulting workbook.  CPU usage also spikes during this phase.

Once the model is done refreshing, PowerPivot for SharePoint then triggers a refresh (via Excel Services) of the Pivots and Cube Formulas in the workbook, so that when thumbnail screenshots are “snapshotted” for the report gallery, those reflect the latest data (let’s call this phase “Pivot Refresh”). 

Yes, that process initiates via Excel Services, but that refresh immediately results in a meaty set of queries sent back to the PowerPivot servers for processing.  So, CPU usage spikes again.  And the model in question is held in RAM while that happens, so that RAM can’t be recycled into the pool to be used for model refresh.

Remember, this all happens for a single workbook.  And if you hang a bunch of report-only workbooks off of a single “hub” model workbook (as we do all the time), the amount of time it takes to finish Pivot Refresh can actually exceed the time for Model Refresh.

If you have to cram all of this into a narrow nightly window, chances are that you will need more hardware to pull it off than you ever will need for normal daily usage!

Rough Guidelines for Your Own Situation

This post is running a bit long, so I will try to be succinct in closing:

  1. If a significant percentage of your PowerPivot models and reports will be refreshed nightly, refresh is likely going to be a larger peak strain than normal interactive usage
  2. Even if you only refresh weekly (or less often), if the execution window between “data is ready” and “reports must be ready” is basically still a single night, then refresh is still likely going to drive your peak hardware need.
  3. If you think “Peak Load = Refresh Process” is likely the case for your organization, I suggest ignoring interactive usage projections during your hardware planning process, and instead developing some prototype workbooks (with rich data sets and mutliple report sheets with 100% realistic slicer sets, measure complexity, etc.), and then putting those workbooks through the refresh process to get a baseline.