PowerPivot Scheduled Refresh Pt 3: Thin Workbooks

February 1, 2011

 
PowerPivot Thin and Core Workbooks

What is a “Thin” Workbook?

At Pivotstream, we came up with the term “thin workbook” to describe an Excel workbook that does not contain a PowerPivot model, but that DOES connect to another workbook that does, which we call a “core” workbook.  In other words, thin workbooks use core workbooks as their data source.  Others have called this “hub and spoke,” and I may even like that phrasing better, but hey, we’ve been using “thin and core” at Pivotstream for nearly a year and it’s hard to change.

So, a thin workbook contains pivots, charts, cube formulas, etc. – it’s a report-only workbook.  And whereas our core workbooks may be 200 MB or more, our thins are often 5 MB or less.

Why create thin workbooks?

Here are a few reasons why you may want to use a thin workbook approach.

1) To avoid duplicating data.  If you want to create multiple workbooks that offer different views of the same data, which we do all the time, you probably don’t want to duplicate the PowerPivot data across all of those workbooks.  That duplication needlessly burns RAM on the server, takes extra time to run scheduled refreshes, and also increases your maintenance work (ex:  if you want to modify a measure, you have to do it multiple places instead of one).  I *highly* recommend you never duplicate PowerPivot models like that, and thin workbooks are a good alternative.

2) To lock down data.  In many ways this is another flavor of #1.  At Pivotstream we often create one workbook that contains a full set of measures, and then another that omits certain measures that are particularly sensitive.  Both can use the same core workbook as a data source, but if you properly lock down your permissions on SharePoint, you can give one set of users a different level of data than another set.

3) To avoid the refresh on open performance problem.  In the last post about refresh, I pointed out that the PowerPivot refresh service on SharePoint kinda “cheats” a little bit – it refreshes the data in the PowerPivot model, but the data in the pivots (the data in the Excel sheets themselves) is untouched.  To work around that, PowerPivot for SharePoint sets the “refresh on open” flag, forcing Excel to refresh all pivots whenever the workbook is opened.

That can be slow.  In many cases in our work at Pivotstream, it is VERY slow.  Since the PowerPivot refresh service does NOT touch the thin workbooks, “refresh on open” never gets set.

Of course, that creates another problem doesn’t it?  The data in the Excel sheets then remains stale, with no automatic way to refresh it.  That’s no good.

But we are busy little beavers at Pivotstream and we don’t give up easy :)  We started working on this problem in March 2010.  More on this later.

How do you create thin workbooks?

The first step is to publish your “Core” workbook to SharePoint.  You CANNOT connect a thin to a core sitting on your desktop – the desktop version of the PowerPivot engine does not support cross-workbook connections like that.

Once you have your Core published, there are two primary ways to create a thin workbook:  Start from Scratch, or Hack like Mad.

Boring:  Start from Scratch

In this method, all you do is go to the PowerPivot Gallery and click the highlighted button:

Creating a Thin PowerPivot Workbook from the Gallery 
…and that gets you started in Excel, in a new workbook, with a blank pivot connected to the Core model.

More Fun:  Hack Like Mad!

For those of you who prefer to color outside the lines, here’s another technique that I often find helpful.  You’ve created a single Core workbook, and it has lots of report sheets in it.  You don’t want to start from scratch and throw all of those report sheets away, right?

So here’s what you do:

  1. Create a copy of the workbook.  Just copy/paste the file in Windows Explorer.  Name one file something like Core.xlsx and the other Thin.xlsx
  2. Delete pivots from the Core workbook.  Two reasons for this.  One, it reduces confusion – when you open the Core, you know it’s the core because you don’t see any report sheets.  But deleting pivots also makes the scheduled refresh process on SharePoint run faster – after PowerPivot refreshes the model on the server, it then needs to refresh the thumbnails in the gallery.  To do that, it asks Excel Services to open the workbook, which triggers “refresh on open.”  Why pay extra time and CPU penalties on reports no one is going to see?
  3. Publish the Core workbook to SharePoint.
  4. Change the connection string in the Thin workbook.  Open it up in Excel, and in Connection Properties change the “$Embedded$” string to be the full URL to the core workbook on SharePoint.  Example:

Changing the Connection String to Point to a Published PowerPivot Workbook

When you click OK, Excel performs a full refresh against the new source.  Be patient.  Once that is done, your thin no longer connects to the PowerPivot data embedded in its own file, it now queries the server copy instead. 

Of course, now you probably want to make the file smaller and get rid of all that unused PowerPivot embedded data.  Remember where that lives?  This post explains in more detail, but here’s the image of the thin workbook renamed from XLSX to ZIP:

Embedded Data - One Reason Why PowerPivot Will Not Work With Excel 2007

Item1.data is the file you want to get rid of.  But if you delete it, you will corrupt the file.  So I just create a 0-byte Item1.data file on my desktop and copy it over the top of the file inside the ZIP.

Rename back to XLSX and you have a true “Thin” workbook with all of the original report sheets intact and functional!

Summary, and a hint at the next post

Thin workbooks are a VERY useful tool in a PowerPivot system.  For Pivotstream, they are absolutely essential.

The only trick, as mentioned above, has been getting those thin workbooks to always be up to date.  I’m happy to say we have that problem solved.

I’m pretty sure others are going to need this sort of thing, too, as well as some other internal tools and components we have built.  If you have interest in beta testing our PowerPivot Accelerators, as we are calling them, watch for the next post :)


Data Refresh Twists/Turns Pt2 – Pivot Refresh

January 12, 2011

 
When You Are With the Right Organization You Don't Need to Wait on Fresh Pivots

“People looked at me differently.  They knew I was with somebody. 

I didn’t have to wait in line at the bakery on Sunday morning for fresh pivots.”

-Henry “DAX” Hill

 

 
That Second Refresh Can Really Bite You

Back before the great distraction known as Donald Farmer hijacked this blog for a few days, I was talking about Scheduled Data Refresh.  Specifically, I left off talking about how refresh is really TWO refreshes – PowerPivot and Excel.  Here’s the illustration again:

powerpivotrefreshvsexcelrefresh

OK, so the PowerPivot refresh service has one primary mission in life, and that is to refresh the PowerPivot model.  Which brings us to…

Surprise #1: PowerPivot refresh service does NOT refresh the Pivots!

That’s right, it leaves the pivots in the workbook alone.  So by default, those pivots (and/or cube formulas) still contain stale data!  I’m 100% serious.

“But wait, Rob, you’re wrong!” you say.  “I’ve tested this feature out, and I have NEVER seen stale data in my pivots when I view them in Excel Services!”

That’s right, you DON’T see stale data do you?  That PowerPivot refresh service is one resourceful beast, and it’s playing a clever little trick…

Surprise #2:  PowerPivot refresh service sets “Refresh on Open”

Have you ever seen this feature in Excel?  It’s buried pretty deep:

Refresh On Open Setting in Excel is Used By PowerPivot Scheduled Refresh 
Refresh on Open Setting in Excel 

Yep, click Connections on the ribbon, select a connection (typically named “PowerPivot Data” in our case), then click Properties.  On the resulting dialog you will see the checkbox “Refresh data when opening the file.”

By default that checkbox is NOT set.  You can try this out to see what I’m talking about:  take a PowerPivot workbook, open in Excel client and verify the checkbox is not set.  Then upload to SharePoint, schedule a refresh.  When complete, download the workbook and look at this setting again.  It will be checked now.

Side Topic:  How Does PowerPivot Refresh DO THAT???

I’m 99% certain that there is no API on the server for doing this.  I don’t think Excel Services “helps” PowerPivot at all.  I’m pretty sure PowerPivot modifies the workbook directly, via the file format.  The Open XML File Format, to be precise…  which happens to be what Office uses – XLSX, DOCX, PPTX – these are all Open XML files.

You can do pretty astounding things with that format, so if you are a developer type, I suggest playing around with it.  I can tell you that OUR developers at Pivotstream had a heart attack when they saw that SDK :)

How This Can Bite You:  Refresh Sometimes Takes Awhile

HOURGLASS So far so good.  PowerPivot refresh does NOT refresh the pivots, but it DOES set the refresh on open flag.  So when you open it in Excel Services, the first thing Excel Services does, before it shows you any workbook content, is refresh the pivots.  So you never see stale data.  Ever.

In many cases, that’s the end of the story, everything is happy.  But in other cases, it can be painful.

Remember the post I did awhile back on the differences between Update and Refresh?  The overall theme of that blog post was basically:  Refresh Can Be Very Slow.

Why is that?  Short version:  Refresh refreshes EVERY pivot in the workbook.  AND refresh does more work than a normal pivot query.

So…  if you’ve got a lot of pivots in your workbook, or a large data model, or complex measures, or perhaps a combination of those, refresh can take awhile.

Other Considerations

The worst part, of course, is that when a user opens a report in the browser, they have to wait awhile before they see any data.  Sometimes that’s just an extra second or so, which is why you may not have noticed.

In other cases though, it can be as much as 1-2 minutes!

That happens for every user.  In fact, if you open a browser and hit a report, wait through the refresh, and then close the browser, even if you immediately return to the report in a new browser window…  you have to wait out the refresh again.

Imagine what that can do to the CPU(s) on the PowerPivot server(s).  Of course, if you planned your hardware around data refresh as the peak load scenario, then you likely have enough CPU, but still…  a lot of users doing this at once will only magnify their wait time.

“Give me the good news!”

First of all, I brought this to the attention of my former colleagues at Microsoft last time I was in Seattle.  They’re working on solutions that don’t involve “Refresh on Open” but it’s not yet known when or how those solutions will be made available.  Kudos to them for their responsiveness on the matter – I will keep you posted.

In the meantime, there are obvious things you can experiment with.  Keeping the number of pivots smaller is one.  Cutting back on cross-filtering in slicers is another.  But fundamentally, if you have a reporting/modeling/analysis scenario that involves heavy lifting, you can’t exactly just remove those needs from your workbook.

At Pivotstream, we found ourselves in precisely that situation.  And I am happy to say that we have beaten the problem, but the answer is complex – deserving of its own series of posts in fact.  Plus, there are some things here that I am not yet ready to talk about publicly, muhaha…  so let’s put that on the back burner for now.

Next up in this series:  “thin” and “core” workbooks, aka, “hub and spoke.”

Click Here for Part Three >>


PowerPivot Scheduled Refresh – Twists & Turns, Pt 1

January 3, 2011

Skipper Plans A PowerPivot Refresh Strategy
“Rico, we’ll need special tactical equipment. We’re going to face extreme peril. The private probably won’t survive.”

OK, with a new year it’s time to get serious on the blog.  (Yeah, nothing says “serious” like cartoon penguins, but really, I *do* mean it).  This is gonna kick off a series of inter-related posts that I’ve had in my head for a long time now.  So buckle up, Rico.

 

 

Let’s say you’ve built a bunch of nifty PowerPivot workbooks.  You’ve published them to SharePoint.  And naturally, now you want to set things up so those workbooks automatically refresh, with the latest data, every night (or perhaps once a week).

As a recap, the simplest way to get started is to go to the PowerPivot Gallery view and click the highlighted button:

PowerPivot Gallery Schedule Refresh Button

Which leads to this page:

PowerPivot Gallery Schedule Refresh Page

Refresh ASAP vs. Refresh Once

Most of the options in that scheduling page are quite straightforward.  The only one that’s ever tripped me up is “Once.”  “Once” does NOT mean “Now.”  Instead, “Once” typically means refresh will occur tonight – either after business hours or after your specified earliest time.

If you want a refresh to occur now, you either need to specify a time (from today) that has already passed, OR don’t use “Once” at all.

Instead, you can schedule a refresh to occur Daily/Monthly/Weekly, and then check the “Also refresh as soon as possible” checkbox.

Personally, I think the lack of a “One time, right now” option is an oversight.  As a workaround, I often found myself scheduling something to run once every 12 months just so I can check the “ASAP” checkbox.

Another Way to Turn “Once” into ASAP

At Pivotstream, we have found another way to schedule ASAP refreshes:  we’ve changed the centralized definition of “business hours” to be so narrow that there are essentially no business hours at all.  (Right now we have biz hours set to be 7-7:15 AM – I haven’t tried setting Start Time = End Time, but a 15 minute window is narrow enough for our purposes).

The downside of essentially disabling Business Hours, of course, is that it impacts everyone.  That may not be a good option for your organization, as a refresh scheduled for “after business hours” will end up running sooner than expected.  It works well for us at Pivotstream however.

For details on how to modify Business Hours, see Vidas’s item in the FAQ:  http://powerpivotfaq.com/Lists/TGPPF/DispForm.aspx?ID=51

Behind the Scenes – the Basics

When a scheduled refresh does occur, a lot more happens than meets the eye.  This rough diagram illustrates the steps behind the scenes:

What Happens During PowerPivot Refresh - The Basics

(Click for Larger Version)

Notes on the diagram

I won’t repeat the descriptions on the diagram above, so spend a couple of minutes going over it ok? :)

With that done, I do want to call out a few points:

  1. The timer service only checks about once per minute – so if you schedule an ASAP refresh but nothing happens immediately, just be patient, it will start shortly.
  2. You can set different schedules per connection – I just discovered this recently, thanks to the watchful eyes of a training/consulting client of mine.  George and company, you rock :)
  3. The whole process will fail if the original file is checked out, or otherwise open for editing, during Step 6.  So it’s extra important that you not leave your published PowerPivot workbooks checked out.  I recommend editing locally on your desktop for this reason (and also because SharePoint is not terribly speedy for open/save of large files…  and you should be saving often.)

Refresh is Actually TWO Refreshes

OK, one last piece of background in this post.  When PowerPivot runs a refresh, it is refreshing the PowerPivot model – which corresponds to the contents of the PowerPivot window in Excel client.

In order for report consumers to see the latest and greatest data when they first navigate to it, however, the pivots in the Excel sheets themselves ALSO need to be refreshed.  So there are TWO refreshes that need to happen – we’ll call the first one “PowerPivot Refresh” and the second one “Excel Refresh.”

PowerPivot Refresh vs Excel Refresh
PowerPivot Refresh and Excel Refresh

OK, that’s enough for part one.  Next time I’ll explain how this two-stage refresh can complicate things in some situations.

Click Here for Part Two >>

In the meantime, if you are interested in further reading on scheduled refresh, you can check out the following articles:

http://msdn.microsoft.com/en-us/library/ff976569.aspx (Mariano’s excellent whitepaper)

http://powerpivotgeek.com/2010/09/08/a-peek-inside-getting-the-most-from-data-refresh/

http://blogs.msdn.com/b/powerpivot/archive/2009/11/20/powerpivot-data-refresh.aspx