Using Query Editor to Create a Time Table

December 1, 2011

 

Guest Post by David Churchward with Technical Input from Scott Senkeresty

The first thing I ever do when I create a new PowerPivot workbook is create the date tables that I need.  Until now, I’ve used a preconditioned SQL table to draw this information from.  Now, thanks to Scott Senkeresty, you can now create your own – from PowerPivot!

Scott is a true genius.  He spends a lot of time under the hood of Pivotstream operations working his magic and wizardry and, every now and then, he pops up for air to deliver a masterpiece like this.

 

It’s as Simple as Copy and Paste

If you read my last post SQL UNPIVOT Makes My Data Skinny, you’ll know where query editor is and the sort of tricks that it opens up within PowerPivot.  In this post, there’s some SQL scripts that create time tables and you can simply cut and paste them into the query editor and use them straight away.

Having said this, it’s worth checking out Rob’s post The Ultimate Date Table to see why a date table is important and where you can get one of your own from Azure DataMarket.  However, I think we’ve gone one step better now and given you a script that you can parameterise easily, condition relatively easily and setup extremely easily.  Add to that the fact that it runs in mere seconds!

AND there’s one more bonus, especially for Accountants, you can set this up for YOUR Fiscal Year!

Copy, Paste, Save – Simples!

To get going, simply copy and paste the script from THIS ATTACHMENT into query editor.  When you save it – bam – time table complete.

DECLARE @FirstDate DATE

DECLARE @LastDate DATE

SET @FirstDate =’1 Jan 2008′– Enter first date of calendar as ‘dd mmm yyyy’

SET @LastDate =’31 Dec 2015′– Enter last date of calendar as ‘dd mmm yyyy’

 

IF @FirstDate ISNULLOR @FirstDate =

                BEGIN

                                SET @FirstDate =’1 Jan 2000′

                END

               

IF @LastDate ISNULLOR @LastDate =

                OR @LastDate < @FirstDate

                ORDATEDIFF(DAY, @FirstDate, @LastDate)> 10000

                BEGIN

                                SET @LastDate =DATEADD(DAY, 10000, @FirstDate)

                END

 

                ;WITH CTE_DatesTable

                AS

                (

                  SELECT @FirstDate AS [Date]

         UNIONALL

         SELECTDATEADD(DAY, 1, [Date])

         FROM CTE_DatesTable

         WHEREDATEADD(DAY, 1, [Date])<= @LastDate

                )

 

                SELECT

                                Date,

                                DATEPART(DAYOFYEAR,Date)                             ASDayOfYear,

                                DATEPART(DAY,Date)                                          ASDay,

                                DATEPART(MONTH,Date)                                    ASMonth,

                                DATEPART(YEAR,Date)                                        ASYear,

                                DATENAME(WEEKDAY,Date)                               ASDayOfWeek,

                                LEFT(DATENAME(WEEKDAY,Date),3)                 AS DayOfWeekAbr,

                                DATEPART(WEEK,Date)                                       AS WeekOfYear,

                                DATEPART(QUARTER,Date)                                 ASQuarter,

                                DATENAME(MONTH,Date)                                  ASMonthName,

                                LEFT(DATENAME(MM,Date),3)                           AS MonthNameAbr,

                                DATEADD(DAY,-1,DATEADD(MONTH, 1,DATEADD(DAY,(DATEPART(DAY,

DATE)*-1)+1,Date)))                         AS MonthEndDate

 

                FROM

                CTE_DatesTable

 

                OPTION (MAXRECURSION 10000)

Without doing anything to this script, you’ll get a time table that spans Jan 2008 to Dec 2015.  That could be enough.  In which case, you’re done.  However, I’m sure you want more.

Change Date Parameters to Widen the Timeframe

The great news with this script is that it’s so easy to adjust to cover the timeframe that you need.  You just need to follow these simple couple of steps:

  1. Towards the top of the script you’ll see “SET @FirstDate = “.  You simply need to change the value to the first date in your dataset (or perhaps an earlier date to be sure).
  2. I’m sure you’ve already guessed step 2 – change “SET @LastDate = “ to a value that is beyond the last date in your dataset.

I almost wish I could make it more difficult but Scott’s nailed this so well that it’s almost impossible to get wrong – well almost!

A Quick Health Warning

This script uses a recursive CTE (Common Table Expression).  If you’re not careful, it’s very possible to spin this baby into an infinite cycle.  Without going into why this happens, here’s a couple of points to make sure that you don’t fall into that trap:

  1. The IF – BEGIN – SET – END statements are there to ensure that valid entries are made for @FirstDate and @LastDate.  It’s probably best to leave these in.
  2. At the end of the script, there’s a command OPTION(MAXRECURSION 10000).  This ensures that the script won’t loop more than 10,000 times (ie 10,000 days).  If you set the recursion to zero, this removes all limits and then you could get into an infinite cycle.

Don’t be scared to give it a go.  There’s controls here to stop anything going wrong and what’s the worst that could happen anyway!

The Next Level – Fiscal Elements

In all my time using SQL and PowerPivot, I’m yet to work in an environment where a year runs from 1st January to 31st December.  As a result, we refer to a Fiscal Year to represent this alternate calendar.  I took the liberty of hacking Scott’s script to add these elements in.

DECLARE @FirstDate DATE

DECLARE @LastDate DATE

DECLARE @FiscalStart INT

SET @FirstDate =’1 Jan 2008′– Enter first date of calendar as ‘dd mmm yyyy’

SET @LastDate =’31 Dec 2015′– Enter last date of calendar as ‘dd mmm yyyy’

SET @FiscalStart = 4 – First calendar period of Fical Year – 1=Jan, 2=Feb…., 12=Dec

 

IF @FirstDate ISNULLOR @FirstDate =

                BEGIN

                                SET @FirstDate =’1 Jan 2000′

                END

               

IF @LastDate ISNULLOR @LastDate =

                OR @LastDate < @FirstDate

                ORDATEDIFF(DAY, @FirstDate, @LastDate)> 10000

                BEGIN

                                SET @LastDate =DATEADD(DAY, 10000, @FirstDate)

                END

 

                ;WITH CTE_DatesTable

                AS

                (

                  SELECT @FirstDate AS [Date]

                 UNIONALL

                 SELECTDATEADD(DAY, 1, [Date])

                 FROM CTE_DatesTable

                 WHEREDATEADD(DAY, 1, [Date])<= @LastDate

                )

 

                SELECT

                                Date,

                                DATEPART(DAYOFYEAR,Date)                             ASDayOfYear,

                                DATEPART(DAY,Date)                                          ASDay,

                                DATEPART(MONTH,Date)                                    ASMonth,

                                DATEPART(YEAR,Date)                                        ASYear,

                                DATENAME(WEEKDAY,Date)                               ASDayOfWeek,

                                LEFT(DATENAME(WEEKDAY,Date),3)                 AS DayOfWeekAbr,

                                DATEPART(WEEK,Date)                                       AS WeekOfYear,

                                DATEPART(QUARTER,Date)                                 ASQuarter,

                                DATENAME(MONTH,Date)                                  ASMonthName,

                                LEFT(DATENAME(MM,Date),3)                           AS MonthNameAbr,

                                DATEADD(DAY,-1,DATEADD(MONTH, 1,DATEADD(DAY,(DATEPART(DAY,

DATE)*-1)+1,Date)))                         AS MonthEndDate,

                                DATEADD(DAY,-1,DATEADD(MONTH, 1 + 12 -(DATEPART(MONTH,DATE)

- @FiscalStart + 1 +CASEWHEN @FiscalStart >DATEPART(MONTH,DATE)

THEN 12 ELSE 0 END),DATEADD(DAY,(DATEPART(DAY,DATE)*-1)+1,

Date)))                                                   AS YearEndDate,

                                DATEPART(MONTH,DATE)- @FiscalStart + 1 +

                                                CASEWHEN @FiscalStart >DATEPART(MONTH,DATE)THEN 12

ELSE 0 END                                             AS FiscalPeriod,

                                YEAR(DATEADD(DAY,-1,DATEADD(MONTH, 1 + 12 -(DATEPART(MONTH,

DATE)- @FiscalStart + 1 +CASEWHEN @FiscalStart >

DATEPART(MONTH,DATE)THEN 12 ELSE 0 END),DATEADD(DAY,

(DATEPART(DAY,DATE)*-1)+1,Date))))

AS FiscalYearEnd,

                                CASEDATEPART(MONTH,DATE)- @FiscalStart + 1 +CASEWHEN

@FiscalStart >DATEPART(MONTH,DATE)THEN 12 ELSE 0 END

                                                                WHEN 1 THEN

WHEN 2 THEN

WHEN 3 THEN 1

                                                                WHEN 4 THEN

WHEN 5 THEN

WHEN 6 THEN 2

                                                                WHEN 7 THEN

WHEN 8 THEN

WHEN 9 THEN 3

                                                                WHEN 10 THEN

WHEN 11 THEN

WHEN 12 THEN 4

                                                                END                                         AS FiscalQuarter

 

                FROM

                CTE_DatesTable

 

                OPTION (MAXRECURSION 10000)

Again, you can simply copy and paste the script from THIS ATTACHMENT into query editor.  However, you need to set a parameter to tell the script when your fiscal year starts.  Towards the top of the script, you’ll see a command “SET @FiscalStart = “ and you need to enter a month number from 1 to 12 where 1 is January through to 12 being December.  Therefore, if the first month of your financial year is April, set this value to 4.  Otherwise, the script works the same as the previous example but returns some additional Fiscal Time elements.

And it’s as simple as that!  Thanks Scott – truly brilliant!


SQL UNPIVOT Makes My Data Skinny

November 29, 2011

 

Guest post by David Churchward

If you’re anything like me, you will have eagerly sucked data into PowerPivot, keen to get cracking with solving all of your reporting needs.  If you’re an Accountant (amongst others) doing this, you will no doubt have brought your trial balance dataset in and found endless columns of values representing monthly values.  “How do I get time intelligence to work with this?” is probably one of the first issues you’ve encountered and then, further down the line, when you’ve conjured up complex measures to resolve this problem, the words “why can’t I get this to run quicker” pass your lips.  On this note, everyone should pay close attention to Less Columns, More Rows = More Speed! where Rob proves that the tall and skinny table is much more effective than a short and wide version.

In this post, I’ll explain how SQL can solve this problem by transforming the data on the data extraction process using a SQL function call UNPIVOT and it’s available to everyone – not just developers!

This is PowerPivot using Pivot tables – why on earth do I want to UNPIVOT.  Just like a kid who ties their shoelaces wrong, Mum has to untie in order to do it properly!  The fact is that data transformation has to happen in many situations to leverage the full power that PowerPivot can provide.

What’s this SQL Stuff – Surely that’s for Developers?

I was reading Rob’s recent posts on Why PowerPivot is Better Fed From a Database and it made me realise that there’s a grey area that represents how far an Excel Pro can go in data transformation.  I’m going to leave the “Developer V Excel Pro” debate to Rob (but believe me, he’s right).  As an Accountant who learnt SQL but who is an Excel Pro through and through, my view is that Excel Pros should be able to transform their data to the extent that a SQL view can.

Some of you will be aware, but others won’t, that there is a SQL query editor in PowerPivot.  If I’m upsetting Developers by revealing this, I’m sorry, but, if you’re worried, please take this as an opportunity to provide your Excel Pros with a secure Dev environment!

Accessing The SQL Query Editor

You can access query editor either when you first import data or by editing a table that you’ve already imported.

Accessing Query Editor on First Import

  1. From the PowerPivot window, select From Database > From SQL Serverand set up your connection as you would do normally.
  2. Click Next and select the option Write a query that will specify the data to import

image

Accessing Query Editor From a Table Already Imported

  1. Select the table that you want to affect in the PowerPivot Window and select Design > Table Properties
  2. Change from Table Preview to Query Editor

image

image

And there you have it.  A script that is running to import the data that you’ve specified.  We can now edit this to transform the data.

Before I go ahead, I just want to set a benchmark regarding the data that I’m using.  When I’ve brought this data in, I’ve got 38,183 records using 14 columns.  We’ll see what the UNPIVOT does to this in the end.  Everyone’s expecting a lot more rows but fewer columns, right?

image

The SQL “UNPIVOT” SCRIPT

So, here it is, the script to use in Query Editor mode:

SELECT

– These are the fields that I want in my final dataset

NCODE,

NNAME,

VALUE_CATEGORY, – NEW FIELD 1 – The field that takes my value column names

VALUE – NEW FIELD 2 – The field that takes my values from those columns

FROM

– I specify where these fields come from

(

SELECT

NCODE,

NNAME,

NTURNOVER_C1, – Value Field

NTURNOVER_C2, – Value Field

NTURNOVER_C3, — Value Field

NTURNOVER_C4, – Value Field

NTURNOVER_C5, – Value Field

NTURNOVER_C6, – Value Field

NTURNOVER_C7, – Value Field

NTURNOVER_C8, – Value Field

NTURNOVER_C9, – Value Field

NTURNOVER_C10, – Value Field

NTURNOVER_C11, — Value Field

NTURNOVER_C12 – Value Field

 

FROM dbo.AZZ_V_POWERPIVOTPRO_TB –Substitute this with the name of your source table or view

) PIVOT_TABLE

– Unpivot the table               

UNPIVOT

(VALUE FOR VALUE_CATEGORY in        – These are the names given to the

       — value field (NEW FIELD 1) and value column name (NEW FIELD 2)

(

– Enter value fields as per the select above

NTURNOVER_C1,

NTURNOVER_C2,

NTURNOVER_C3,

NTURNOVER_C4,

NTURNOVER_C5,

NTURNOVER_C6,

NTURNOVER_C7,

NTURNOVER_C8,

NTURNOVER_C9,

NTURNOVER_C10,

NTURNOVER_C11,

NTURNOVER_C12

)

) AS UNPVT

WHERE VALUE <> 0

ORDER BY NCODE, VALUE_CATEGORY

Note – green text within the script are comments to provide some direction as you use this script.  You can copy these into query editor too – PowerPivot will ignore them.

How to Use This Script

  1. Copy and paste the script into your query editor
  2. In the first SELECT, enter the field names that you require in your final dataset.  These should correlate to fields in the table except where I have noted “NEW FIELD 1” and “NEW FIELD 2”.  These “NEW FIELDS” will carry the value elements that we are unpivoting.
  3. In the second SELECT, detail the fields that you have already detailed in the first select together with the column names carrying the values that you will unpivot.  Also change the table or view name in the FROM clause to reference the source data that you are using.
  4. In the UNPIVOT section, detail all of the value fields again that you are unpivoting.
  5. In the ORDER BY clause, select the fields that you wish to use for dataset ordering.

When you save this, your dataset will have been narrowed and you’ll have a new field denoting the data type (or value column reference).

image

You’ll notice that we’re down from 38,183 records to 7,103.  This won’t always happen.  The reality is that we’re able to strip out the zero value cells by using the WHERE VALUE <> 0 clause.  If you’re carrying a number of cells that are zero valued, we won’t see that item as a record in this new dataset.  As a result, we get fewer records in this case.  In any event, your data is now a lot more streamlined and ready for more efficient measures to be written.  You don’t have to include the WHERE clause if you prefer, for whatever reason, to bring all records in.  If that’s the case, just delete that clause.

One Further Step

To make this truly useful, we need to translate VALUE_CATEGORY into something more meaningful.  I would normally expect this to be a date but most trial balances use column names such as “CURRENT1” so that the system isn’t date dependent.  Another table would be required that the VALUE_CATEGORY column can link through to which translates that value, thereby opening up the whole time intelligence functionality.

A Quick Warning

This UNPIVOT approach has given you the power to transform your data into a more streamlined dataset.  However, by using this approach you’ve created a local version of the script.  You won’t be able to reuse this without copying and pasting it into other workbooks (or at least I haven’t found a way).  However, those Developer friends of ours can give you a centralised version that can be controlled and reused more easily.  If you ask these guys nicely, they might create a view or stored proc to do this for you.  I’ve found with my team that salt based snacks, chocolate and the occasional Spag Bol for lunch does the trick and, if it doesn’t, I go ahead and do it anyway!


Why PowerPivot is Better Fed from a Database, Pt2

November 24, 2011

 
image

Think of a Database as a Bodybuilding Supplement
for Your PowerPivot Regimen

In part one I covered the following three benefits of pulling data into PowerPivot from a database, as opposed to using other sources such as Excel itself:

  1. Data-shaping – much easier in a database than in basically any other tool, this is the first benefit you will see.
  2. Auto-refresh – by removing manual data shaping from the system, you can then rely 100% on PowerPivot’s server-side automatic refresh to keep your reports up to date.  The first time in “Excel history” that Excel pros aren’t slaves to their own spreadsheets.
  3. More quality, less errors – the combination of PowerPivot’s “portable formulas” and performing your data shaping in a database greatly reduces the opportunity to make mistakes.

OK, time for benefits 4-7.

Benefit #4:  Complex Calc Columns

Let’s say you want to stamp your customers into quintiles – the top 20% in terms of dollars spent, the bottom 20%, etc.

And you don’t want that to be dynamically re-calculated every time you slice or filter a pivot – you just want it statically calculated according to total overall sales, so that you can then use the quintile as a slicer or row/column axis in your pivot.  You want to be able to offer a pivot report like this:

image

To do that, you need a calculated column in your Customers table:

image

But that’s a reasonably tricky calculated column, as it has to be calculated per-customer, against their matching rows in the Sales table, and then ranked against other all other customers.  If you do that in Excel, you’ve introduced another manual step which foils auto-refresh and in turn introduces more chances for error, etc. 

And doing it in PowerPivot is awkward, at least in v1.  I’m not saying you can’t do it in PowerPivot, because you can, and that DOES auto-refresh.  But it’s just so much easier in a database, if you have someone around who knows databases, and there are more complex examples than this one which get even harder in PowerPivot.

Benefit #5:  Centralized, Re-Useable Logic

This one is both its own distinct benefit as well as a multiplier for other benefits (such as data-shaping and complex calcs).  Anything you do in Excel is manual.  Anything you do in PowerPivot is “tied” to the workbook in which you did it.  But logic that you put in a database can be re-used across many different workbooks.

Which, of course, is very nice.  No need to re-write it every time.  No chance that it accidentally diverges from other models (a source of error).  And if you need to modify that logic in the future, you only have to do it in one place.

At this point it’s worth pointing out that a PowerPivot BI environment DOES share some characteristics with a traditional BI environment.  These db-centric benefits are a hallmark of traditional BI, and these are the things you want to preserve from the traditional approach.

Benefit #6:  “Magic” Sliding Windows and Parameters

This is one of my favorites.  Oftentimes, an organization operates according to a calendar, or even multiple calendars, that are not quite the same as the calendar hanging on your wall.  And that leads you to a place where your measures like “growth versus prior year” can get pretty complex.

I won’t go into great detail here because it would take awhile, but imagine a slicer driven from a table that is purely calculated in the database before input:

image

To get a sense of some similar techniques, which revolve around using “unconnected” slicer tables as inputs, see this post and this post.

Benefit #7:  Compression

It’s a fact:  imported columns compress much better in PowerPivot than calculated columns.  So if you have a choice between doing a calc column in your db (and then importing it) or importing a table and then adding that calc column via a DAX calc column in PowerPivot, DO IT IN THE DATABASE!

Better compression means smaller files, less memory consumption, and often faster response times to slicer clicks.  (Note that the bigger the table, the bigger the difference you will see).

Again, deserving of its own post in the future.


Why PowerPivot is Better Fed From a Database, Pt 1

November 22, 2011

An Excel Pro’s Two Year Journey in Database Land

In a post last week I mentioned that when you use a real database as a source for PowerPivot, a number of unexpected doors open up.  As a longtime Excel pro who has spent the past two years working closely with a database team, I can tell you that it’s been an eye opening experience.

So I have been advocating to Excel/PowerPivot pros for quite some time that they cozy up to their database teams (if they have them).  Of course, this works both ways, and I have also been advocating to Database/BI pros that they embrace and cooperate with Excel/PowerPivot pros.

Awareness, Flexibility, and Sometimes…  Outright Resistance

It’s an interesting ambassadorship.  On one hand, Excel pros are often accustomed to doing everything on their own, with no external help whatsoever, and so it’s mostly a matter of opening their eyes to benefits they’ve never imagined. 

But the other camp often doesn’t expect to learn anything new and relevant about the role of Excel, that unruly little brother of “real” BI.  In conversation, that camp splits into two subgroups rather quickly – one that is willing to consider new ideas and possibilities, and another that regards Excel as the devil.  Interactions with that second group can get ugly.  I also tend to fear for them a bit.

Today, I’m going to focus on the Excel crowd and explain why a db pro can make a huge difference in their lives.  While I have danced around the topic for two years on the blog, this will be my attempt to provide a definitive list of the benefits.

And you die-hard Excel haters out there, don’t worry, I’m coming back around to you soon Winking smile

Benefit #1:  Data Shaping is Easier in Databases.  MUCH Easier.

Hey, Excel is just a collection of individual cells when you come right down to it.  Cells go into formulas, and cells come out.  But turning a wide and short table into a tall skinny table (or vice versa) for instance can consume the better part of an Excel pro’s day in some cases.  That same operation might take a db pro 5 minutes or less.  And even better, the next time might take them no time at all…

Benefit #2:  Auto-Refresh!

Once the db pro has a script in place (or query or view or sproc or whatever it is that they deem best), it can now be run automatically in response to your refresh request from PowerPivot. 

If you are running PowerPivot on your desktop, hey, now it’s just one click (refresh) to pull in the latest, properly-shaped data.  But even better, if you have access to a PowerPivot-enabled SharePoint server, you can put your workbook up there and schedule it to refresh itself!

Even if the manual shaping that you do today only takes you a few minutes each time, it’s still worth it to outsource it to a db.  The SharePoint option means you don’t have to do anything each day to merely update the reports, and the difference between “small” and “nothing” is huge.  Like, “you can go on vacation without repercussions” huge.  Or “you don’t have to come in early each day” huge.

Benefit #3:  Quality

Did you know there are entire conferences devoted to the topic of spreadsheet errors?  Spreadsheet errors are a fact of life in traditional spreadsheets, but they are primarily due to specific problems that a good PowerPivot “ecosystem” eliminates.  One is the lack of convenient named reference – “what did D$14 refer to again?” crosses the inner monologues of Excel pros worldwide millions of times a day.  PowerPivot fixes that – everything is referenced by table/column/measure name.

But raw repetition is the real killer.  If you perform the same spreadsheet task every day for a year, pure statistics tells us you will make mistakes.  And if the task is tedious, you will make even more.

The lack of “portable formulas” in traditional spreadsheets is an underappreciated source of repetition.  Even a single iteration of a reporting task explodes into repetitious subtasks and provides lots of opportunity for error.  (PowerPivot fixes that one too).

So we are left with raw repetition – performing the same task every day – as our primary source of error.  Once you have PowerPivot, data shaping is the biggest source of repetition.  Outsourcing that shaping logic into a database, then, doesn’t just save time.  It prevents mistakes. 

Why?  Because the db logic is written only once, and it doesn’t change no matter how many times you run it.  Even if there were mistakes made in the db logic, you will catch them sooner or later (usually immediately), and once you fix them, they stay fixed.

Come back Thursday for part two, with items four through seven.


The Ultimate Date Table

November 15, 2011

 
image

“Looks like it’s time for me to get myself a date.”

-Ace Ventura, PowerPivot Detective

The Importance of a Date/Calendar Table

I get a lot of questions from people who are struggling with the time intelligence functions in DAX.  And nine times out of ten, the answer is that they don’t have a proper date table.

I know it’s tempting.  You’ve got your sales table, and hey, there’s a Date column in there!  So you use it, and pass that column as a parameter to, say, DATESBETWEEN, or DATEADD.

Sometimes that will give you an error.  And other times, it won’t…  but the results will be funky.

You need a separate Dates table, or perhaps you prefer to call it a Calendar table.  A separate table, whose only purpose is to store dates (and the properties of dates, like DayOfWeek, etc.)  And it contains consecutive dates – no “gaps.”  Even if your business is never open on weekends, you need unbroken ranges of dates.

Oh, and then you need to relate it to your Sales table.  (Or whatever fact/measure tables you have).

Much More Than a Single Column

A single-column table that contains merely dates is enough to make the time intelligence DAX functions operate smoothly.  But you will almost certainly want other fields too.  Like Year.  MonthName.  DayOfWeek.  The list goes on.

Maybe something like this:

image

And yes, you can cobble this together on your own in Excel.  Tedious work though.

Would You Like One for Free?  Try DateStream from Boyan Penev!

Imagine just being able to open up PowerPivot and always having three nice date tables awaiting import:

image

That’s what Boyan Penev has put together for you.  Three great calendar tables that you can download directly into PowerPivot, for free.

He published them to Azure DataMarket, a service from Microsoft where data providers can actually sell you their data sets – things like weather, demographics, etc.

Boyan did this for free though – I suspect half as a service to the community, and half as a project to learn how to provide a service on DataMarket.

It’s pretty damn cool, and really, the story should end there.  If you’ve used DataMarket before, then it DOES end there.  Go get the date tables and try them out. 

But if this is your first exposure to DataMarket, it takes a few minutes to get it set up.  It’s not bad as long as you don’t make the mistakes I did.

How To Get It – Short Version

Hey, it’s on Azure DataMarket.  The URL is in the next section below, or you can just go to Azure DataMarket and search on “DateStream.”

DataMarket is going to be a wonderful service someday, but right now it has a few warts, so there is a Long Version too.

How To Get It – Long Version with Occasional Snarky Commentary

Step 1:  Go to the DateStream page on DataMarket.

image

Step 2:  Get confused.  OK, now is where things get choppy, because frankly, the DataMarket site itself has a terrible user interface.  I sent a full page of feedback to the DataMarket team about a month ago and as far as I can tell, they ignored it.  (Which is pure karma – I used to be one of the people at MS who ignored 90% of the feedback coming in, and now I get to be the one who is ignored).

I don’t want this to be a tutorial on how to navigate their website, or even how NOT to design a website.  So let’s just hit the highlights and try to get to Boyan’s date tables as soon as we can.

Step 3:  Get an account.  OK, this isn’t bad.  Another MS site that requires a Live ID.  Most of us have three of those by now.

Step 4a:  Scan the DateStream page looking for the “Download to PowerPivot” button or link.

Yeah that’s right.  There is no such link – you can get to one by navigating a few levels deeper but I’m going to skip that.  Don’t despair though, good things await you!

Step 4:  Find the URL of the DateStream Service

This is NOT the same as the URL of the DateStream page.  But it IS displayed on the page.  Here’s the URL you need:

image

And here again in text:  https://api.datamarket.azure.com/BoyanPenev/DateStream/

OK, copy that.  You will need it.

Step 5:  Launch PowerPivot, Go Into the PowerPivot Window

And click this button:

 

image

If you don’t have that button, you need a newer version of PowerPivot.  Go get that from PowerPivot.com and resume the next step.

Step 6:  Fill in the Dataset URL From Step 5

image

Step 7:  Account Key

See that last text box in the picture above?  The one with the long code in it that I’ve partly blurred?  That’s my account key.  I highly recommend clicking that Find button.  It’s actually pretty damn useful.

Be careful – the DataMarket site has TWO long nasty codes like that for you.  One of them is the one you want, and the Find button takes you to that one:

image

THIS is Your Account Key

Do NOT, under any circumstances, do what I did, and confuse Account Key with Customer ID:

 

image

This is NOT Your Account Key.
Do NOT Be Tempted to Use This!

Step 8:  Click Next, and Pick Your Table or Tables

image

NOW we are on familiar ground.

Last Note:  Parameterization?

One thing I have not yet figured out is how to limit the date range I import.  The table starts in the year 1900, which goes back a bit far for my needs, and makes the dataset take a long time to download.

You’ll notice that when importing from DataMarket, the Preview and Filter UI lacks the filter dropdown buttons:

image

No Filter Dropdowns, Just Checkboxes

But the DateStream homepage DOES indicated that parameterization is possible:

image

So if you’ve got that figured out, drop me a note Smile


PowerPivot Refresh thru VBA at last!

September 2, 2011

 
Cool Tshirt.  Click for the original.

OK folks I have not tried this out yet myself but it deserves to be spread far and wide:

Tom Gleeson appears to have found a live specimen of Bigfoot itself:  the ability to refresh PowerPivot data from VBA.

Full post: 

http://blog.gobansaor.com/2011/09/01/automating-powerpivot-refresh-operation-from-vba-the-code/


Date and Date/Time – Sneaky Data Types!

February 21, 2011

 

                                                    image image

These Two Columns Both Contain Dates, Right? Sure They Do!

Relationships that should work, but aren’t?

Here’s a quick tip, one that I think we’ll all need sooner or later. 

When I got back from last week’s amazing consulting/training trip, the team let me know that we had a problem.  An existing PowerPivot model, one that had been working just fine for a long time, had stopped working when we refreshed the model with the latest data:

image

All the slicers were now indicating that there wasn’t any data.  And those empty rectangles on the right?  Those are charts.  They just weren’t showing any data anymore.

What’s the problem?

We pretty quickly determined that it had something to do with the date relationships.  When we cleared the date slicer, data came back.  So it was just when we were filtering by Date that data disappeared.

imageIn cases like that, the first thing I like to check is data types.  Somehow, did one of the columns get changed from Date to Text?  Nope, not this time.  They both are definitely still Date, as evidenced by this screenshot from the PowerPivot window ribbon, here at right.

 

But Data Type Isn’t Enough!

Notice the Format option there, though?  Try setting both related columns to a format that displays time as well as date:

image

Now look at the two columns:

                                         image   image  

Like I said, sneaky little Date/Time data type.  When we got new data this time, one of the source systems in the loop decided to include the time of day, whereas before, it had been omitting it.

So yeah, the two columns don’t match up anymore even though by default they LOOKED like they matched.  Sneaky.

So…  if you have a relationship on two Date columns, and the relationship does not seem to be working (all of your measures are returning blanks), this is a likely culprit.

Two ways to fix this

The quickest fix is to create a “clean” calc column that strips the time:

   =DATE(YEAR([CalendarDate]),MONTH([CalendarDate]),DAY([CalendarDate]))

Then you use the calc column for the relationships rather than the original.

But the better, more reliable way to clean your date columns is in the underlying SQL (as long as that’s an option).  If your SQL sources always trim your Date columns down to pure Dates, and truncate Time, then you never have to write calc columns for this purpose again – solve it one place, and save yourself a lot of ongoing work.

Furthermore – columns imported from SQL end up being a lot better compressed by PowerPivot than calc columns, AND this often results in faster pivot performance as well.

If you do the trimming via Views, you can keep the time component in your SQL tables for later, in case you want to do “time of day” analyses.  But your default views that you import should always be protected against this.

And remember, if you aren’t a SQL pro yourself, having a good relationship with the folks who run your databases is a very positive thing.  At Pivotstream for example, I never touch SQL.  But my colleagues who maintain SQL are very helpful, and our cooperation lets us do things that otherwise would be impossible.  One of my favorite themes – cooperation between PowerPivot pros and SQL pros – and I will be hammering said theme every chance I get Smile


PowerPivot Accelerators: The Story (And Announcing: the Private Beta)

February 2, 2011

 Pivotstream's PowerPivot Accelerators

Sunday night update:  we’ve received a lot of exciting interest and have identified about 15 great beta sites so far, representing a broad cross section of the industry – BI and SQL pros, SharePoint pros, and Excel pros who are “growing up” into BI and SharePoint via PowerPivot.  Great international representation, too.  Still looking for a few more participants, so drop us an email – beta@pivotstream.com

Remove hardhat.  Pick up thinking cap.

Short version:  We have some more toys to share.  Skip to the end if you want in on the beta.

Long version:  In August 2009, while still a member of the PowerPivot team at MS, I took off my product design hat and put on my “user” hat.  From Cleveland, far removed from the internals of the product team in Redmond, I started this website, and embarked on The Great Football Project.    I was as curious as you how well the product would work.  Maybe that’s hard to believe, but I promise it is 100% true :)   
 
I think it’s fair to say that applying a platform like Excel, SharePoint, PowerPivot, or SQL tends to foster a very different kind of expertise than that fostered by building or designing it.  You become much more familiar with the gaps, in particular, which is why those product teams listen so carefully to customer feedback (much more so than the Word team needs to, for example).

So I dove in, learning things at every turn.  About six months later, armed with the knowledge that PowerPivot performed VERY well in real scenarios, I dove in even deeper:  I left MS to join Pivotstream, where we started prototyping our PowerPivot infrastructure and models even prior to the product’s final release date.

“Gentlemen, we can rebuild it.  We have the technology…”

We started finding things in PowerPivot that we wished we could change.  Not earthshattering things.  “Last mile” type things – gaps in the feature set that made a big difference in practice.

That’s when my background as a software engineer became useful again.  Imagine working your whole career on MS products, where every day, your job is to identify flaws and opportunities to fix and fulfill.  And working right next to the people who actually reshape the products every day, as if the software were made of clay – stubborn clay, but clay nonetheless.

That breeds a certain optimism, a refusal to simply accept things as they are.  Instead of saying “crap I wish they had put X in the product,” my conditioned response is “hmm…  how hard can it REALLY be to fill that gap ourselves?”

Yes, it almost always turns out to be harder than it seems up front, sometimes MUCH harder.  But without that optimism, we probably wouldn’t have gotten started.  So it serves a purpose right?  Programmers everywhere are shaking their heads in disagreement, to which I reply, “Hey, why are you reading this?  Get back to coding!”  (Kidding.  Sorta.)

The Results:  A Complete System

I’ve already shared one of the projects with you:  the trimmed-down SharePoint pages optimized for the BI portal role.  And then their subsequent beautification.  Those have been in production for us for a long time now.

That’s just the tip of the iceberg though.  There are many other things we have done to improve our efficiency, or our customers’ experience, or both.  We are very proud of the results.  Here’s a glimpse:

  1. Bulk Workbook Modification – when we started out, we would have to manually edit double-digit numbers of workbooks by hand whenever we wanted to make a change.  The same was true if we wanted to add or modify a lot of measures in bulk.  Not only was this inefficient, but also error-prone.  Today, our workbooks are very rarely touched by human hands.  We queue up a change and kick if off.
  2. End to End Data Refresh – our nightly/weekly refresh process is completely automated, from FTP delivery of new data from clients, through SSIS, PowerPivot refresh of “Core” workbooks (triggered only when the underlying SQL sources are ready!), and automatic refresh of the “Thin” workbooks connected to them.  We can schedule Cores to refresh more than once per day.  In fact we can manually trigger a “right now” refresh of Cores in bulk, with basically one click.  In the Thins, we can even increment Date slicers to the latest value :)   Status updates are automatically emailed out to our team.  And the “refresh on open” problem is a thing of the past.
  3. SharePoint Tools – there’s also a reasonable amount of drudge work involved with SharePoint stuff, particularly publishing and linking pages and workbooks.  You may not notice this, depending on how many workbooks you have, but for us, well, we have far too many to be clicking around in the SharePoint config UI all day, so we’ve built tools for that, too.

Even better:  these things will all remain valuable once SQL 11 / Denali ships, and we have things like BISM and Crescent to play with.  In fact I hear rumor that the SQL CTP2 beta release may be just around the corner…  are you pumped?  I’m pumped.

The Private Beta:  Now Taking Applications

All of that stuff is working great for us internally.  And just like with the SharePoint Pages, we’d like to share them with the community and recoup some of the development costs.  Getting these components ready for broader usage IS more work, however, and we’d like to recruit a small group of early beta testers to kick the tires.  The Accelerators won’t release until they are ready.

If you are interested, please send an email to beta@pivotstream.com and specify:

  1. Your Name
  2. Company Name
  3. Current Usage of PowerPivot – personal or organizational, prototype/exploratory or already in production, and whether you are using PowerPivot for SharePoint or not
  4. Which areas you are interested in testing (1-3 above)
  5. Website(s) – your company website and/or blog URL if applicable

We’re going to keep this first round kinda small, maybe no larger than 10-15 participants.  So get your emails in, we’re anxious to start selecting the group :)


PowerPivot Scheduled Refresh Pt 3: Thin Workbooks

February 1, 2011

 
PowerPivot Thin and Core Workbooks

What is a “Thin” Workbook?

At Pivotstream, we came up with the term “thin workbook” to describe an Excel workbook that does not contain a PowerPivot model, but that DOES connect to another workbook that does, which we call a “core” workbook.  In other words, thin workbooks use core workbooks as their data source.  Others have called this “hub and spoke,” and I may even like that phrasing better, but hey, we’ve been using “thin and core” at Pivotstream for nearly a year and it’s hard to change.

So, a thin workbook contains pivots, charts, cube formulas, etc. – it’s a report-only workbook.  And whereas our core workbooks may be 200 MB or more, our thins are often 5 MB or less.

Why create thin workbooks?

Here are a few reasons why you may want to use a thin workbook approach.

1) To avoid duplicating data.  If you want to create multiple workbooks that offer different views of the same data, which we do all the time, you probably don’t want to duplicate the PowerPivot data across all of those workbooks.  That duplication needlessly burns RAM on the server, takes extra time to run scheduled refreshes, and also increases your maintenance work (ex:  if you want to modify a measure, you have to do it multiple places instead of one).  I *highly* recommend you never duplicate PowerPivot models like that, and thin workbooks are a good alternative.

2) To lock down data.  In many ways this is another flavor of #1.  At Pivotstream we often create one workbook that contains a full set of measures, and then another that omits certain measures that are particularly sensitive.  Both can use the same core workbook as a data source, but if you properly lock down your permissions on SharePoint, you can give one set of users a different level of data than another set.

3) To avoid the refresh on open performance problem.  In the last post about refresh, I pointed out that the PowerPivot refresh service on SharePoint kinda “cheats” a little bit – it refreshes the data in the PowerPivot model, but the data in the pivots (the data in the Excel sheets themselves) is untouched.  To work around that, PowerPivot for SharePoint sets the “refresh on open” flag, forcing Excel to refresh all pivots whenever the workbook is opened.

That can be slow.  In many cases in our work at Pivotstream, it is VERY slow.  Since the PowerPivot refresh service does NOT touch the thin workbooks, “refresh on open” never gets set.

Of course, that creates another problem doesn’t it?  The data in the Excel sheets then remains stale, with no automatic way to refresh it.  That’s no good.

But we are busy little beavers at Pivotstream and we don’t give up easy :)   We started working on this problem in March 2010.  More on this later.

How do you create thin workbooks?

The first step is to publish your “Core” workbook to SharePoint.  You CANNOT connect a thin to a core sitting on your desktop – the desktop version of the PowerPivot engine does not support cross-workbook connections like that.

Once you have your Core published, there are two primary ways to create a thin workbook:  Start from Scratch, or Hack like Mad.

Boring:  Start from Scratch

In this method, all you do is go to the PowerPivot Gallery and click the highlighted button:

Creating a Thin PowerPivot Workbook from the Gallery 
…and that gets you started in Excel, in a new workbook, with a blank pivot connected to the Core model.

More Fun:  Hack Like Mad!

For those of you who prefer to color outside the lines, here’s another technique that I often find helpful.  You’ve created a single Core workbook, and it has lots of report sheets in it.  You don’t want to start from scratch and throw all of those report sheets away, right?

So here’s what you do:

  1. Create a copy of the workbook.  Just copy/paste the file in Windows Explorer.  Name one file something like Core.xlsx and the other Thin.xlsx
  2. Delete pivots from the Core workbook.  Two reasons for this.  One, it reduces confusion – when you open the Core, you know it’s the core because you don’t see any report sheets.  But deleting pivots also makes the scheduled refresh process on SharePoint run faster – after PowerPivot refreshes the model on the server, it then needs to refresh the thumbnails in the gallery.  To do that, it asks Excel Services to open the workbook, which triggers “refresh on open.”  Why pay extra time and CPU penalties on reports no one is going to see?
  3. Publish the Core workbook to SharePoint.
  4. Change the connection string in the Thin workbook.  Open it up in Excel, and in Connection Properties change the “$Embedded$” string to be the full URL to the core workbook on SharePoint.  Example:

Changing the Connection String to Point to a Published PowerPivot Workbook

When you click OK, Excel performs a full refresh against the new source.  Be patient.  Once that is done, your thin no longer connects to the PowerPivot data embedded in its own file, it now queries the server copy instead. 

Of course, now you probably want to make the file smaller and get rid of all that unused PowerPivot embedded data.  Remember where that lives?  This post explains in more detail, but here’s the image of the thin workbook renamed from XLSX to ZIP:

Embedded Data - One Reason Why PowerPivot Will Not Work With Excel 2007

Item1.data is the file you want to get rid of.  But if you delete it, you will corrupt the file.  So I just create a 0-byte Item1.data file on my desktop and copy it over the top of the file inside the ZIP.

Rename back to XLSX and you have a true “Thin” workbook with all of the original report sheets intact and functional!

Summary, and a hint at the next post

Thin workbooks are a VERY useful tool in a PowerPivot system.  For Pivotstream, they are absolutely essential.

The only trick, as mentioned above, has been getting those thin workbooks to always be up to date.  I’m happy to say we have that problem solved.

I’m pretty sure others are going to need this sort of thing, too, as well as some other internal tools and components we have built.  If you have interest in beta testing our PowerPivot Accelerators, as we are calling them, watch for the next post :)


Data Refresh Twists/Turns Pt2 – Pivot Refresh

January 12, 2011

 
When You Are With the Right Organization You Don't Need to Wait on Fresh Pivots

“People looked at me differently.  They knew I was with somebody. 

I didn’t have to wait in line at the bakery on Sunday morning for fresh pivots.”

-Henry “DAX” Hill

 

 
That Second Refresh Can Really Bite You

Back before the great distraction known as Donald Farmer hijacked this blog for a few days, I was talking about Scheduled Data Refresh.  Specifically, I left off talking about how refresh is really TWO refreshes – PowerPivot and Excel.  Here’s the illustration again:

powerpivotrefreshvsexcelrefresh

OK, so the PowerPivot refresh service has one primary mission in life, and that is to refresh the PowerPivot model.  Which brings us to…

Surprise #1: PowerPivot refresh service does NOT refresh the Pivots!

That’s right, it leaves the pivots in the workbook alone.  So by default, those pivots (and/or cube formulas) still contain stale data!  I’m 100% serious.

“But wait, Rob, you’re wrong!” you say.  “I’ve tested this feature out, and I have NEVER seen stale data in my pivots when I view them in Excel Services!”

That’s right, you DON’T see stale data do you?  That PowerPivot refresh service is one resourceful beast, and it’s playing a clever little trick…

Surprise #2:  PowerPivot refresh service sets “Refresh on Open”

Have you ever seen this feature in Excel?  It’s buried pretty deep:

Refresh On Open Setting in Excel is Used By PowerPivot Scheduled Refresh 
Refresh on Open Setting in Excel 

Yep, click Connections on the ribbon, select a connection (typically named “PowerPivot Data” in our case), then click Properties.  On the resulting dialog you will see the checkbox “Refresh data when opening the file.”

By default that checkbox is NOT set.  You can try this out to see what I’m talking about:  take a PowerPivot workbook, open in Excel client and verify the checkbox is not set.  Then upload to SharePoint, schedule a refresh.  When complete, download the workbook and look at this setting again.  It will be checked now.

Side Topic:  How Does PowerPivot Refresh DO THAT???

I’m 99% certain that there is no API on the server for doing this.  I don’t think Excel Services “helps” PowerPivot at all.  I’m pretty sure PowerPivot modifies the workbook directly, via the file format.  The Open XML File Format, to be precise…  which happens to be what Office uses – XLSX, DOCX, PPTX – these are all Open XML files.

You can do pretty astounding things with that format, so if you are a developer type, I suggest playing around with it.  I can tell you that OUR developers at Pivotstream had a heart attack when they saw that SDK :)

How This Can Bite You:  Refresh Sometimes Takes Awhile

HOURGLASS So far so good.  PowerPivot refresh does NOT refresh the pivots, but it DOES set the refresh on open flag.  So when you open it in Excel Services, the first thing Excel Services does, before it shows you any workbook content, is refresh the pivots.  So you never see stale data.  Ever.

In many cases, that’s the end of the story, everything is happy.  But in other cases, it can be painful.

Remember the post I did awhile back on the differences between Update and Refresh?  The overall theme of that blog post was basically:  Refresh Can Be Very Slow.

Why is that?  Short version:  Refresh refreshes EVERY pivot in the workbook.  AND refresh does more work than a normal pivot query.

So…  if you’ve got a lot of pivots in your workbook, or a large data model, or complex measures, or perhaps a combination of those, refresh can take awhile.

Other Considerations

The worst part, of course, is that when a user opens a report in the browser, they have to wait awhile before they see any data.  Sometimes that’s just an extra second or so, which is why you may not have noticed.

In other cases though, it can be as much as 1-2 minutes!

That happens for every user.  In fact, if you open a browser and hit a report, wait through the refresh, and then close the browser, even if you immediately return to the report in a new browser window…  you have to wait out the refresh again.

Imagine what that can do to the CPU(s) on the PowerPivot server(s).  Of course, if you planned your hardware around data refresh as the peak load scenario, then you likely have enough CPU, but still…  a lot of users doing this at once will only magnify their wait time.

“Give me the good news!”

First of all, I brought this to the attention of my former colleagues at Microsoft last time I was in Seattle.  They’re working on solutions that don’t involve “Refresh on Open” but it’s not yet known when or how those solutions will be made available.  Kudos to them for their responsiveness on the matter – I will keep you posted.

In the meantime, there are obvious things you can experiment with.  Keeping the number of pivots smaller is one.  Cutting back on cross-filtering in slicers is another.  But fundamentally, if you have a reporting/modeling/analysis scenario that involves heavy lifting, you can’t exactly just remove those needs from your workbook.

At Pivotstream, we found ourselves in precisely that situation.  And I am happy to say that we have beaten the problem, but the answer is complex – deserving of its own series of posts in fact.  Plus, there are some things here that I am not yet ready to talk about publicly, muhaha…  so let’s put that on the back burner for now.

Next up in this series:  “thin” and “core” workbooks, aka, “hub and spoke.”

Click Here for Part Three >>