PowerPivot field list and Excel field list at same time!

June 26, 2010

-Posted by Rob Collie

OK, ignore my last post, that was just silly.

Yesterday one of my fine colleagues at Pivtostream mentioned that he was seeing both field lists simultaneously.  Then followed an exchange where I said “BS,” he said “bet me some cash, oh PowerPivot boy” and, well, he was right.

One should be careful never to wager against one’s own boss.

Anyway, in beta releases of the addin, it was IMPOSSIBLE to bring up the Excel field list without disabling the addin.  Apparently, the team decided at the last minute that allowing the user to bring up the Excel field list could compensate for the lack of certain features.

It’s a brilliant move on their part.  So here it is, both field lists, living together in harmony:

Built in Field List and PowerPivot Field List

You don’t have to disable anything.  Just go to the Options tab on the ribbon when you have a pivot selected, and click the field list button there:

Excel Field List Button

Implications Galore!

OK, so…  you can do the defer layout update trick which is excellent for long-running queries, without turning off the addin, that’s huge.

But wait, there’s more :)

Can Move Measures to Rows or Up and Down Relative to Other FieldsYou can also adjust the layout of your measures.  Try dragging the little Sigma “Values” capsule around the Excel field list (shown here at right) and you will see what I mean – move it to Row Labels for instance.

This also comes in handy when you have other fields on Column Labels and want to change whether Measures are nested inside of those fields or vice versa.

But the last trick that this side by side field list stuff enables is truly my favorite…

 

 

Batch Creation of Measures Without Refresh!

Follow along, this is fantastic.

Step one:  turn on defer layout update:

Defer Layout Update

Two, go create a measure:

Create New Measure Without Refresh

Three, ignore this error you get when you enter the formula and click OK:

Ignore This Formula Is Invalid Error

Four, observe that the measure was created anyway:

Measure Was Created Anyway With No Pivot Refresh

This changes my life in rather significant ways.


Quick tip: disable addin for rapid layout

June 23, 2010

-Posted by Rob Collie

UPDATE:  It turns out that all of this is unnecessary, there is no longer a need to disable the addin to get these benefits.  (I was still stuck on what was possible in the Betas).  See the latest post on this topic for details.

If you’re like me and you work with large data sets and/or complex measures all the time, you may be annoyed by the time it takes to layout 6-10 fields on a pivot – a handful of measures, a field or two on rows, a couple of slicers… even with the performance of PowerPivot, you can be burning a reasonable amount of time in the click-and-wait game.

I have a trick for you.

First, disable the PowerPivot addin.  Yep, you heard right.  On the File tab, go to Options:

Excel Options

Then select Addins.  At the bottom of that screen, open the dropdown for Manage, and pick COM Addins from the list:

Manage Addins

And then in the dialog that comes up, uncheck the PowerPivot addin:

Disable PowerPivot Addin

Now go to the Options tab on the PivotTable ribbon and enable the Field List:

Field List Enable Button

OK, now you see the native, built-in Excel field list.  If you are familiar with normal Excel pivottables, this will appear foreign to you, because this is the “OLAP” field list – designed strictly for use with OLAP db’s like Analysis Services, which PowerPivot happens to be under the hood.

Anyway, at the bottom of that field list, you will see the “Defer Layout Update” checkbox.  Check that, then you can make a bunch of changes to the layout of the pivot WITHOUT them refreshing the pivot.  Click Update and all of your changes will run in a single batch while you go get a fresh cup of coffee:

Fast Pivot Layout With Defer Layout Update in PowerPivot

Cool huh?  A few things to note:

  1. You can’t create measures without the addin enabled.
  2. You also don’t get the fancy slicer auto-layout.
  3. The field checkboxes are organized very differently, split into measures and non-measures AND by table – you will see what I mean
  4. There are a few “dummy” measures visible in this field list that are hidden in the addin

OK…  I should also mention that we might have had this feature already built-in to the PowerPivot v1 field list if it weren’t for me :)   It kept coming up during development – “should we build that defer layout feature in?”  And I kept saying “no, not yet.”

Why would I do such a thing, you ask?  Two good reasons.  One, we were running WAY low on time and resources.  By NOT doing that feature, the team was able to do several other important things that today we’d be complaining about instead. 

Two, I knew that if we added that feature, as we worked on PowerPivot internally, we would all turn it on whenever we hit something that was slow.  As a result, we’d not feel the pain of performance issues as acutely as we should, and not as many things would be identified and fixed.  So, like Cortez burning his ships, I wanted to deprive us internally of this crutch.

Of course, now that I no longer work at MS, and I find myself in a job where I crunch hundreds of millions of rows daily, irony has come home to roost :)   Just kidding really – I still think it was a good v1 decision.  I’m positive that engine query performance is much better for it, and it will be easy for the PowerPivot team to add this feature in v2.


Using Excel CUBE Functions with PowerPivot

June 21, 2010

By Dick Moffat
Personal Logic Associates Inc.

Busy, busy with exciting Access Services applications the last few weeks and my PowerPivot writings have fallen behind…. Sorry.

But today I am going to give you a quick and dirty example of what I think is one of the key features of PowerPivot that will give it a much broader initial and on-going impact for experienced power spreadsheet developers.

This is where existing spreadsheets can get the value-add of having PowerPivot data available to them in a way that is not only understandable for the traditional spreadsheet junkie (as opposed to the typical BI one) but that also that will add major value while integrating into existing models.

Excel CUBE Functions

A few weeks ago my friend Dany Hoter wrote a piece here about the use of the CUBE functions in Excel with PowerPivot data.  Dany’s article was a great intro to the capability of this unsung capability of Excel available since Excel XP.  I am going to go deeper into this issue here.

The Cube functions require access to an Analysis Services On-Line Analytical Process cube (OLAP) which has to be provided to the user from SQL Server’s Analysis Services application.

While there are companies around the world that are capable of taking advantage of this capability, this is by no means a large percentage of overall Excel users and from my experience it is generally an unknown and untried feature.

In Excel 2007 (and now 2010) the CUBE functions became native to Excel (as opposed to available through an Add-In) and were integrated with the new “Connection” object within the program.  Once again though, if one wanted to use an OLAP Cube in a Connection it required availability of an Analysis Services Cube of data.

But in Excel 2010, thanks to PowerPivot, users can now create their own “Cubes” inside PowerPivot and they automatically present themselves as an available Connection inside the Excel parent file.  This is a big thing and I hope to show you another reason why.

A Cube Automatically?

I’m sure I don’t have to explain here how one creates a PowerPivot data source consisting of multiple Relational data sources.  But one fact that may not be obvious is that the data set that is created by these PowerPivot objects, and by their relationships, is in fact a “Virtual” Cube in itself.  In the purest sense it is a ROLAP Cube – created at run-time from Relational data.

This default “Virtual” Cube is named “PowerPivot Data” and is exposed when you click the Connections Button on the Data Tab:

This is inherently an OLAP Cube conceptually and so is an acceptable source for Excel’s native CUBE Functions.

So what does this “Cube” Do For me?

OLAP “Cubes” were designed for two major reasons:

  1. To allow analysis of data in a hierarchical fashion.  In the business world that usually means Financial or Sales or other data organized by the hierarchy of business units (Region/Country/Zone/State/County/City) or by dates (Year/Quarter/Month/Week/Day) or by Product lines (Product Category/Product/SKU), where a number or numbers (i.e. Account balances or sales quantities or values) are stored at the lowest  level of detail (i.e. by Day or by SKU) and the total value or quantity can then be “Rolled Up” along any Hierarchy or Dimension and across dimensions.  So for example you want to see total sales for March 2010 of a particular Product and its SKUS in a particular Country….  Then you want to be able to “Drill up” or Drill Down” on any value in any dimension down to the lowest level or up to the highest easily and automatically.  This is where a Pivot Table attached to one of these Cubes is a natural presentation and analysis mechanism.
  2. To take disparate data sources and aggregate them automatically using the hierarchies in 1 above.  In the original implementations of OLAP Cubes the hardware available had limited RAM and slow processors and so many OLAP Cubes had to be created over-night and written to disk in the classic De-Normalized format that allowed for relatively quick queries to be made against data that otherwise would simply not be possible in a Normalized format.  But this is the 21st Century with incredibly cheap and sizable RAM memory on every PC and with processors so fast that it’s hard to believe.  This is the work that PowerPivot is designed for – so it can reconstitute your OLAP “Virtual” Cube at run-time inside your Excel file itself (and with compression to boot).

So if you bring in your sales data (like in the example Bike data available with PowerPivot samples) and set it up in a PowerPivot “Connection” you can refer to that info from your spreadsheet layer using not just Pivot Tables, but also using the Cube functions.  But the Cube functions give the traditional spreadsheet maker (and his/her boss) the flexibility to present this data in the classic free-form spreadsheet style as Income statements with analysis or to group them on the sheet in any way they want.

The Example

The data being used in this example is available in the “AW_CompanySales.accdb” database supplied as an example for use with PowerPivot. Keep in mind that there are many “Memo” field-types in this example database and it would be worth your while to change most of those fields to a Text format and add a few Indexes to improve the performance of the database when importing into PowerPivot.

This is an example of what the key tables look like when imported into PowerPivot :

When you create Relationships between these tables within PowerPivot, the result is a Virtual Cube with the ability to “Roll-Up” the core values (“Facts”) along any Dimension of the data, which actually are fields in the data sources of the table being used for Hierarchical relationships.  So Product Categories in the “ProductCategory” table have children in the “ProductSubCategory” table, which have children in the “Product” table.  This is the fundamental design of a “Snow-Flake” OLAP design (read about it!) and it should be fairly intuitive to anyone who knows the data sources and how they are related conceptually.

In my example, however, I have pulled together this same data into a single large De-Normalized PowerPivot table by creating an Access Query.   The result is exactly as PowerPivot does virtually within its “PowerPivot Data” Connection and hopefully will help me explain this functionality better.

This is how that PowerPivot window looks (in an unfortunately stiched together image I’m afraid):

Do you notice how there are multiple instances of Country and State across records (?)  This is in effect a result set from an Access query or SQL View or SQL Statement and is truly de-normalized.  There will be certain scenarios where this might work to your advantage over bringing in individual tables and joining them inside PowerPivot, but that’s an issue for another day. There is little or no impact on file size for the two ways of importing the source data however.  It is meant to help us see our data easier in this example.

Keep in mind that there are a million daily records in this data set and is a recipient of the exceptional compression algorithms of PowerPivot.  It is also worth noting that the single table de-normalized version is 1/10th the size of the file with multiple tables (3 megabytes compared to 30 megabytes).

The “Measures”

Once you have the data in place within PowerPivot you have to take the next step and create the Measures you will be working with.  Measures are the core of OLAP Cubes and are also the core of PowerPivot.  Measures are the aggregations that you will be analyzing based on the data in columns you wish to SUM or COUNT or AVERAGE (or whatever of the standard Excel Mathematical and Statistical functions you want to use).

The most basic example of a Measure is ours called “[Sum of SalesAmount]’, where it simply sums the [SalesAmount] column at every conceivable level of the Virtual OLAP Cube inside Excel,  created by PowerPivot.  In this case [SalesAmount] is simply a basic data column provided directly from the data source, but there is no reason why you can’t use DAX to create new Calculated Fields of calculated data and then create “Measures” based on those fields.

To create the aggregations you want in your spreadsheet you need to have created the “Measure” inside a Pivot Table in the Excel environment either within the “drag and drop interface or using the menus.  I created our Measure using the Pivot Table interface like this:

I started to create a Pivot Table from inside my PowerPivot environment (or in the Excel environment from the PowerPivot Tab), and then I dropped the [SalesAmount] field from my Field List into the Values section below.  By default the Pivot Table assigned this as a “SUM” function and it created the Measure [Sum of SalesAmount] automatically.  It is now available to the Workbook, not only to any Pivot Tables you might create, but also to any CUBE functions that you might want to use.

This does NOT mean you have to have a Pivot Table in your Workbook, BUT you do have to have started the process in order to create the Measure or Measures you want to refer to in your CUBE functions.   You can create the Measures from within the PowerPivot Pivot Table menu as well,  but using the Excel Pivot Table GUI this way strikes me as more intuitive and equally capable.  We can then rely on this Measure to aggregate the total “Sales Amount” at every level across any Dimension in our Virtual OLAP Cube.

The Spreadsheet:

So here’s the spreadsheet I want to feed from PowerPivot:

The CUBEVALUE() Function

In order to drive right to the heart of this functionality I am going to use just one of Excel’s Cube functions, CUBEVALUE in this model (but will show one more function later as well in an enhanced version).

So we are looking for the Sum of the Sales Amount ([SumofSalesAmount]) of Bikes ([Category]) for Fiscal Years 2006 ([FiscalYear]) for Australia ([Country]).  So this is the pseudo-code for this formula:

Sum the Sales Amount where Category = “Bikes” and Fiscal Year = 2006 and Country = “Australia”

Notice that the values are in the PowerPivot tables at the lowest level by Product, by City, by Date of the “Fact” Sales Amount.  Each value is shown as a member of a Week and Fiscal and a Calendar Month and a Fiscal and a Calendar Year.

So when we ask for values from the Fiscal Year, the Category and the Country fields we are inherently asking for aggregated totals ate the intersection of those three fields.  That is what you do when you look for values in an OLAP Cube.

This is the CUBEVALUE() function to get this value for “Bikes” in Australia for Fiscal 2006 :

=CUBEVALUE("PowerPivot Data","[Sum of SalesAmount]","[Country].[Australia]","[Category].[Bikes]","[FiscalYear].[2006]")

Which returns a total of $91,490,280.05.

Therefore to get the equivalent value for “Accessories” in Australia for 2005 the formula would be:

=CUBEVALUE("PowerPivot Data","[Sum of SalesAmount]","[Country].[Australia]","[Category].[Accessories]","[FiscalYear].[2005]")

And the result is $902,316.21.

If you have ever used a SUMIFS Function in Excel (or SUMIF or even IF functions) then this kind of formula is pretty straightforward IMHO.

The “Inherent” Connection

Notice that this function is a DIRECT reference to the “PowerPivot Data” Connection rather than referencing another cell that has a reference to the Connection?  If you were connecting to an external data source you would likely want to make a single cell the one that links to the data source and then you would “borrow” that link for all subsequent references to the Cube.  Otherwise you would have a HUGE performance hit.

But if the Connection is a “Virtual” Connection in PowerPivot inside the spreadsheet itself, then there is no such penalty.  So you can refer to the “PowerPivot Data” inherent connection in every formula. That is much easier to understand and to audit as well.

Adding the Power of Excel to the Mix

As a loooong-time spreadsheet user I have developed a fondness for the “Naming” capability of Excel.  By applying a “Name” to a cell or cells I can then refer to them everywhere in the Workbook without regards to the actual Sheer and Cell reference of the range.  This makes things more auditable and just generally more readable.

In addition, I believe in the use of cell references for Variables in formulae rather than hard-codes words and values.  To an experienced Spreadsheet developer this is simply “Best Practices”, but I still see many instances of “Hard-Coding” in Workbooks that make me cringe.  If the value will (or may) change it must be a cell reference or a named reference.  Of course ALL references could be named either as individual cells or arrays but I tend to use only single cell Variable names as it is likely easier for others to comprehend than using an multi-cell reference as an array in a formula (but there’s no reason nt to do so if you feel so inclined).

So by changing the View to “Formulas” in Excel these are the contents of the key cells and the first column of our little model:

 

Using Cell References and Names

The reference to the “PowerPivot Data” is located in cell B11 on the spreadsheet and is named in a way that would make it easy to reuse and to understand (“strConnection”).   This is the first expression in the CUBEVALUE() function.

The reference to the Field we want is in cell B12 and refers to the Measure we created [Sum of SalesAmount].  This is the second expression in the CUBEVALUE() function.

Referencing the Dimensions

Now that we have defined the Connection to the PowerPivot data and the definition of the Measure that we will be summing, we have to tell the CUBEVALUE() function the values we want to select for across the various Dimensions we want to analyze.

PowerPivot is capable of intuitively determining the Field (or Dimension) that the value you are looking for is from based on the contents and the data types of the fields in the OLAP cube.  This is way cool … but I am not quite ready to use this capability in my spreadsheets.  Instead I simply use a [Fieldname].[Value] syntax to tell the function what I am looking for.

So the criteria in cell B4 are going to be:

"[Country].[Australia]","[Category].[Bikes]","[FiscalYear].[2006]"

But I am using the contents of cells to drive my selections … so the correct syntax is:

"[Country].["&$B$5&"]","[Category].["&$B7&"]","["&$C$5&"].["&C$6&"]"

The result is exactly the same, but is driven by the contents of cells rather than being “hard-coded”.

There is no doubt that there are advanced CUBE functions that will draw the values in each Dimension automatically into the cells based on the contents of the data and also we can drive all of this using Slicers (as I will demonstrate later) but in many, many business scenarios the business is organized in a stable and consistent way. Month over month managers want to view certain values according to an organization that may or may not match their organization or groupings of the data coming in from outside sources.  So in our little example I am most concerned about Bikes, Accessories, Clothing and Components and I want all others to fall into a catch-all category called “Other”, and this total is calculated in cell B7 by taking the total for Australia’s sales for 2005 and subtracting that from the SUM of the three Categories looked at distinctly above:

=B8-SUM(B4:B6)

The ISERROR function is necessary in case there is no data for the combination of dimension values you chose and returns a zero rather than an error.

So what we have here is a simple spreadsheet model that uses the cache of PowerPivot data by drawing on the CUBEVALUE() function to query the OLAP cube inside the PowerPivot cache and to return the totals in the database at the intersect of all the values in the Dimensions my spreadsheet asked for.

This is serious spreadsheeting …..

Adding Slicers to the Mix

So let’s take our same example and add some of the new Slicer functionality of Excel and PowerPivot.

Rather than setting up a separate spreadsheet for every Country the company does business with I have created a Slicer that is pulling a unique list of all the values from the Country Dimension.  This Slicer is then tied to cell B5 and not only shows the name of the Country but also drives the formulas that use the [Country] Dimension of the PowerPivot OLAP cube.

So cell B5 has this formula in it:

=CUBERANKEDMEMBER("PowerPivot Data",Slicer_Country,1)

Which is referencing the value of the Slicer named “Slicer_Country”.

Cell C6 has this formula:

=CUBERANKEDMEMBER("PowerPivot Data",Slicer_CalendarYear,1)

It is of course referencing the slicer that is driving the Calendar year.  Cell D6 is just the contents of C6 minus 1 (which could also be derived using a CUBE Function as well – but this works for me).

In order to get the totals for all countries one need only click on the Funnel in the upper right of the Slicer.

Conclusion

There is no doubt that even in this little example of a spreadsheet there are a lot of moving parts, but if you understand how this model works you will be well on your way to integrating PowerPivot into many, many of even your existing Excel spreadsheets.  Or at least this will definitely change how you design future ones.   I have spent years emulating this functionality using VBA and DAO and ADO and customized functions for which I have sometimes been handsomely paid.  But this capability of PowerPivot moves the bar way forward and will allow you and me to finally start seriously integrating Excel into the world of serious BI analysis.

While there is a lot of talk about DAX and the Filtering capabilities of PowerPivot I think that I will be applying PowerPivot to many of my spreadsheets in the way detailed here.   With an understanding of the Relational nature of your data and with a strong knowledge of the capabilities (and flexibility) of Excel I believe that many of you will be able to bring value to your use of PowerPivot in Excel 2010 in short order using the CUBE functions against the inherent OLAP cube that is a PowerPivot data cache.

Dick Moffat

London, Ontario

June 19, 2010

dick@plogic.ca


TechEd / BI Conference Slides

June 17, 2010

I’ve received requests for the slide deck that I presented with Dave Wickert at Microsoft TechEd / BI Conference last week.

Well, here ya go, the whole slide deck right here:

PowerPivot Best Practices from TechEd

Yeah, that’s right.  One slide.

You see, it was an “Interactive” session, which is basically “we jam people in a room and they pepper us with questions for 90 minutes.”  The ground rules given to us were “1-2 slides to set the tone, and then after that it’s all audience-directed.”

Going in, I had mixed feelings about that format.  I mean, there’s a real danger that people won’t be prepared to ask questions, and that the conversation won’t get going.  In fact, in many other Interactive sessions, the speakers decided to ignore the ground rules altogether and simply present.  But it seemed like a potentially entertaining approach, and hey, Dave and I were so busy going in that this minimal preparation thing sounded pretty damn good.  So we rolled the dice.

Well, it turned out EXTREMELY well.  We started 15 minutes early, ran over the end time, and were answering fantastic, high quality questions non stop.  It was awesome.

The bad news is, I don’t have a good way to share that discussion after the fact.  We basically would need to do it again, and sadly, they did not record the session.

So for now, all I can really do is offer the teaser above.


Use slicer values in a calculation with PowerPivot DAX

June 17, 2010

By Kasper de Jonge, crosspost from PowerPivotblog.nl

A while ago i had a question on my ask a question page, Sasha wanted to do the following:

How to define the date I want to see as “open items”.

E.g. Open Items by 25.05.2010
Item 1: Posting Date: 20.05.2010 Clearingdate: 28.05.2010 Value 100 EUR
Item 2: Posting Date: 22.05.2010 Clearingdate: 27.05.2010 Value 200 EUR
Item 3: Posting Date: 23.05.2010 Clearingdate: 24.05.2010 Value 300 EUR

Result will be 300 EUR (Item 1 + 2) Postingdate 25.05.2010 OR Clearingdate = 00.00.0000).

But how to select the 25.05.2010 for this calculation. I have not a date like 25.05.2010 in my Pivottable?

We have two challenges here:

  1. Create a between like measure using dax
  2. Get values from a slicer to be used inside this calculation, the data from this slicer mustn’t effect the data inside the pivottable

I have created the following simplified scenario to recreate sasha’s question. I started with a dataset:

I want to get the sum of amount where my slicer is between values a and b. The first thing i want to solve is how to get a value from a slicer inside my calculation.

To do this i created a new dataset with the values i wanted to use in my slicer, since i might want to use a value that is not inside my powerpivot data. I loaded this into PowerPivot:

This made sure of two things: i have the data i want inside a slicer and because i didn’t create a relationship between this data and the fact table nothing will happen when i select data from the slicer.

Because the data from the slicer isn’t connected to my main fact table doesn’t mean we cannot get data from it. The slicer will make sure the table i just loaded will be “sliced” to the value we want.

When we create the pivottable we seen the following:

Because we didn’t create a relationship PowerPivot keeps reminding us a relationship might be needed, because this is pretty annoying we can turn this off by clicking on the Detection button:

So we are good to go, to get the value of the slicer we can now do sum(aantal[aantal]) in our measure. A measure that uses the value from the slicer would be:

Slicer measure=if(COUNTROWS(aantal) = 1, sum(aantal[aantal]), BLANK())

This will check if we have only one value selected, and if that is the case return the sum of the column. This results in the single value because the table has one row (it is sliced to one row). Otherwise return BLANK(). This will look like:

We now have the value of the slicer in our pivottable and we can use it to create a measure that will give us the sum of amount where the value of our slicer is between a and b.

What we are going to do use sumx to sum Table1[amount] over a filtered table, this table will filter the values where the value of slicer aantal between column a and b. This will look like:

=IF(COUNTROWS(aantal) = 1 ,
			SUMX(FILTER(Table1,
				Table1[a] <= sum(aantal[aantal])
				&& sum(aantal[aantal]) <= Table1[b]
			)
			,Table1[amount])
	, BLANK())

step by step:

  • Check if we have selected only one value in our slicer, if more return blank
  • Do a sum of Table1[amount] over a filtered table, more on sumx at this blog post of Rob Collie.
  • Use filter to return a dataset of specific values from a table, in this case return all rows where value of column a <= value of the slicer and value of the slicer <= value of column b

This will look like this in our Pivottable, as we can see we only have the rows where our slicer measure is between a and b:

When we remove a and b from the pivottable we see the total sum:

This again shows you the amazing power of DAX, a lot is possible. But a word of caution is at his place, SUMX and FILTER are two of the most CPU consuming functions in PowerPivot, it creates a new dynamic in memory table for every cell in the pivot where FILTER is used. I tried a similar function at a Pivottable based on Contoso and it took me a lot of CPU. Check out this blog post of Rob Collie on the use of filter. SUMX and FILTER are amazing giving you all kinds of possibility’s but you need to think where to use what as you can read in Rob’s blog post.


Welcome Query designers!

June 14, 2010

 
-Guest Post by the one and only Dany Hoter !

We Excel geeks aren’t spoiled with fancy query builders. We have had MSQuery for the last what? 15 years?

Suffice to say it is a tool from the class of Access 1.0.

For those of you not familiar with it’s looks here it is:

msquery1

Not only it looks very old but on the technical side it only supports SQL features which existed somewhere in the late 80’s or early 90’s.

Enters PowerPivot

If you followed the posts on this blog and others about PowerPivot you heard a lot about DAX calculations and creating relationships between tables and about using the tables in PP to create nice looking presentations.

What you haven’t heard so much about are the query designers. Maybe it is because this feature doesn’t look so special for someone coming for the world of SQL Server Reporting Services (SSRS)  and SQL Server Integration Services or just plain Visual Studio development. For us the lowly Excel users the possibility to use a decent query designer for creating complex SQL queries in a drag and drop  fashion is a very significant improvement.

Not only we received a query designer but we actually have 2 , one for SQL Server and one for Analysis Services. (Notice the specific mentioning of SQL Server and not just any relational database, not even Access (Sigh…)

Using the SQL server Query designer

So you want to bring data from some tables in SQL Server. You connect and choose the option to write a query that will specify the data to import and not the option to select tables. This is what you see:

image

Is it the time to remember your (extinct (?)) SQL skills or maybe paste a SQL statement you generated in SQL enterprise management?

This is what I thought too and I actually did paste SQL syntax generated by a foreign tool (Business objects but please do not tell anyone about it).

Only later I noticed the button on the bottom right saying “Design” and gave it a try with very low expectations.

The main protagonist appears:

msquery2

(The red frames are added by the special effects department)

Table Relationships (AKA joins)

After selecting a few tables we get this:

image The relationships were discovered automatically by the designer based on constraints in the database.

If the auto detected relationships are not enough or needed to be changed you can unselect the auto detect option and click Edit fields or on the icon for add relationships. For the example in the case of the Adventure works database there are three date fields in the Fact table related to the DimDate table. The auto detected relationships for the DimDate table look like this:

image

This settings will result in 0 rows retrieved because it requires that all three dates will be the same which is very unlikely.  You can delete 2 out of 3 relations and chose the Date dimension that is best. You can also edit the SQL syntax and join the DimDate a second time. I couldn’t find a way to do that in the graphical designer. Notice that once you edit the graphically designed query, you can’t go back because the wizard will discard your query and you’ll have to start over.

Filters

image

Notice that the option for parameters is not implemented in this release.

A nice touch is the option for creating multiple choices by clicking on the + sign in the edit control. Unfortunately there is no list of values and you have to know the value you want to filter on.

At any point you can run the query and see the results.

Aggregate query

image

You can generate aggregate queries by clicking the Group and Aggregate button and changing selecting some fields for the aggregation. Any other field in the query will automatically be part of the group by.

Entering the designer again

From the PowerPivot menu you can use table properties to enter the designer again.

Using the analysis services designer

Before we see the AS or MDX query designer we should ask ourselves why do we need to bring data from a cube into PowerPivot.

Aren’t we making a multidimensional object into a flat table just to make it again a multi dimensional object ?

At this point I’ll give just two reasons why this exercise can be useful.

  • Bringing data from two or more separate cube on potentially more than one server.
  • Combining data in a cube from data not (yet?) in the cube from a SQL table or from a table in Excel. The last option I found very powerful and have used it in production already(A good topic for another post)

Using the designer

image

Looks familiar ah? Same principle here, first try to your MDX skills and only then after some frustration notice the design button and give it a try. I’ll risk being obnoxious here:

PowerPivot team – first use the graphic designer and only from there allow text editing

I hope the message is clear enough :)

image

From here you are basically in Drag&Drop heaven. You can drag measures, fields , whole hierarchies , a whole dimension etc.

Filtering experience is very friendly indeed:

image

You have lists of values and hierarchy navigation while choosing values.

You can even create new calculated values but I’m afraid that one can get lost with all the different options of creating calculations in the server cube,  as part of the query , in the table once in PowerPivot , in the pivot table :)

One important word of caution about calculations. The calculated members from the cube will be transformed to regular fields with fixed values once brought into the PP environment. Any dynamic nature they have will be lost.

Another word of caution is about security.

Unless you publish the PowerPivot with data open for everyone and force the user to refresh the query using her own credentials, you risk breaching cube security. If you query using the credentials of a user who can see all the cube and give other users the right to open the Excel workbook, they are going to see all data until they attempt to refresh. This is not different from sharing a static report or a regular Excel pivot that does not refresh on open.

Summary

These two designers are a reason enough to start using PowerPivot , don’t you agree?


PowerPivot observations from TechEd/BI Conf, Pt 1

June 11, 2010

 
Kasper and Rob in The Big Easy

 

“Ouch”

-Rob’s feet, knees, brain, and liver

Hi folks.  On my way back from the MS BI Conference (and TechEd) in New Orleans.  Had a great time, almost too much to report.  Here’s a sample:

  1. Met Kasper and Denny for the first time!
  2. Met a bunch of other PowerPivot community members like Vidas Matelis, Marco Russo, Stacia Misner, and Andrew Brust…  a bunch of SQL celebrities, like BrentO, SQLRockstar, and BuckWoody…  and a fascinating individual named Jimmy who Kasper and I hope to introduce to all of you soon.
  3. Presented a session on Best Practices for PowerPivot (with Dave Wickert) where they literally turned away as many people as they let in – we were all seated, room packed, people standing in the back, and doors closed 15 minutes before scheduled start…  so we just went ahead and started 15 minutes early.  Never had an experience like that, ever.  Even Denny Lee was locked out.
  4. Had several private meetings with members of the PowerPivot team, learned a bunch of things (some of which I can share, some I can’t, but everything bodes well for us)
  5. Talked with dozens of attendees, got their thoughts on PowerPivot, answered questions on PowerPivot, and staffed Microsoft’s PowerPivot booth on three different days.

So, from all of that, what did I learn?  Here ya go, as compressed as I can get it:

IT attitudes toward Excel are shifting dramatically

Excel Rebellion Circa 1977

“The more you tighten your grip against Excel, the more data will slip through your fingers.”

6-7 years ago when I attended BI conferences, Excel was widely regarded as the enemy, and there were even sessions titled things like “how to get your users out of Excel.”  The prevailing sentiment was that a responsible BI practitioner had a duty to replace Excel with dedicated BI tools like Cognos or Business Objects.

There has been nothing short of a seismic shift since then.  I’d like to think that has something to do with our efforts in Excel 2007 to legitimize Excel as a BI tool, but honestly, I think it’s just that the realization has sunk in…  no one is ever getting rid of Excel.  Furthermore I think it’s more than just capitulation – at this point most people realize that getting rid of Excel would be a bad idea even if it were possible.

Some favorite quotes from IT/BI pros:

“Fact is I get most of my best analytics ideas from my Excel users’ workbooks”
”We’ve been decriminalizing the use of ad hoc Excel”
”Data just wants to be free, data will find a way”
”Rob I am really impressed with your jumpshot, you could play near the highest levels in Holland were you in shape.”

(OK that last one was from Kasper – I felt obligated to include that since he kinda beat me in four out of five games and I need to save face.)

I very much think it’s time to revisit a few of my favorites posts on this topic, so if you have started reading recently, I encourage you to check out:

Microsoft Unveils New Programming Language XL#
Putting the “Intelligence” in “Business Intelligence,” Part 1
Putting the “Intelligence” in “Business Intelligence,” Pt 2
Putting the “Intelligence” in “Business Intelligence,” Pt 3

PowerPivot is arriving at precisely the right moment.  BI and IT pros are embracing it EXTREMELY eagerly.  Much more than we dared hope back when I worked on the engineering team in Redmond, and more than they realize even now I believe.  They are prepared to cooperate with their Excel users in order to bring about a more efficient data culture.  Which brings me to the next point…

PowerPivot does NOT need viral adoption!

novirus_sticker-p217678350895040539qjcl_400 Are you listening, PowerPivot team?  I’m going to keep standing up on every soapbox and hilltop I can find, screaming as loud as I can, that PowerPivot does NOT require grass roots adoption in order to be broadly adopted at a record pace!  That perceived need for bottom-up adoption lingers within the team from the early days, and it will be a shame, both for Microsoft and for the rest of us, if they continue to think that way.

Why?  For one, they will build the wrong feature set going forward if viral adoption is still a goal.  Originally for example, data cleaning features were intended to be included in v1.  The thinking here was that the Excel users could not remotely rely on IT to help them, or to provide them the right kinds of data, or even the right kinds of access.  That simply is NOT turning out to be true.

And if the team spends a ton of time building features that aren’t needed, it will subtract from the quality of the other features we get, and/or replace other features altogether.

Second, the marketing message is noisy today.  This dual “top-down through IT and bottom up through Excel user empowerment” message is not being well absorbed by the MS field.  By far, the Excel component is easier for the MS field to understand and repeat.  So it gets a lot more air time than the top-down message.  And as a result, an IT team that otherwise would have been enthusiastically receptive to the first message only get exposed to the “Excel gone wild” message and recoil from it.

This truly is a case of subtraction by addition and should be avoided :)

PowerPivot was the Buzz of the BI Conference

Even Wears the Right Color I expected PowerPivot to grab an outsized share of attention at the conference, but multiple people pointed out to me that it basically took over.  People from other companies stopped by at the PowerPivot booth to find out was going on, because everyone coming to their booth was asking things like “how does this integrate with PowerPivot?”

And the Microsoft messaging mirrored that.  PowerPivot dominated the BI keynote on Tuesday – we were all kinda surprised to basically only see PowerPivot demos for 90 minutes (seriously, the other MS BI teams had to be a little irritated by that.  You can watch the keynote here and see what I mean, and maybe see us smiling ear to ear in the first row).  It dominated the overall session count.  It showed up in sessions that weren’t about PowerPivot.

Kasper looked at me at one point and said “I really like the names of our websites.”  To which I replied, “I really like what we’ve been studying and practicing for the last nine months.”  OK for me it’s been several years.  Take THAT Kasper! :)

Kasper’s stuck on a plane for the next 24 hours and can’t fight back.  Muhaha.  OK, final observation for part one…

PowerPivot is evolving into a “Gateway Drug” in SQL11

That’s right, PowerPivot leads to stronger tools.  If you watched the keynote demos above, you saw something that’s pretty exciting once it sets in.  Remember that the Analysis Services product has existed for over ten years as a toolset that people like me weren’t able to use.  You could learn the query language MDX, but when I discovered that even a simple IF() statement required a PhD, I decided to do other things with my time.

PowerPivot, by contrast, is the kind of canvas on which I can paint.  Keep in mind, however, that PowerPivot is really TWO things:  1) It’s a toolset where someone like me can build models, applications, and reports   and 2) It’s the Vertipaq engine, the thing that makes monstrous compression and performance possible.

In theory, I guess, the PowerPivot toolset – the Excel addin, the DAX language, the integration with SharePoint – was possible without the Vertipaq engine.  Probably would have been too slow and unwieldy to gain much traction, but possible

The real question though was what Microsoft was going to do about the traditional Analysis Services product.  The existing AS product did NOT get equipped with Vertipaq in the 2008 R2 release, so in many ways the BI pros have been quite jealous of us :)   It was obvious that Vertipaq was going to find its way into AS proper in a future release, but it was not clear what the resulting product would look like.

OK, here ya go:  It’s gonna look like PowerPivot.  With an even larger data capacity.  And lots of features that we don’t currently have.  With no SharePoint dependency and no requirement that you embed the models in Excel files. 

And ok, to make it look industrial strength, they’re gonna put it in Visual Studio.  But that’s a minor detail to me, more cosmetic than anything.  I’m told it will support DAX, and generally use all the same sorts of Excel-user-friendly UI gestures that we are used to, as if the Excel addin were dropped into the Visual Studio frame.  No requirement that we change to MDX and other concepts in order to “upsize” into the more powerful future version.

Best news of the conference.


Calling a Stored procedure in PowerPivot

June 9, 2010

By Kasper de Jonge, crosspost from PowerPivotblog.nl

While building a new PowerPivot workbook i wanted to use Time Intelligent functions. One of the golden rules of PowerPivot time intelligent functions is to create a seperate related time table. I have created a stored procedure which creates a new table in PowerPivot. The important thing to do is when you want to call a stored procedure withing PowerPivot is set the “SET NOCOUNT OFF” at the beginning of your sproc. This will make sure your stored procedure will return only one dataset.

You can use this code to create the range:

/*
Get a range of dates in the range @fromdate until @todate
*/
create procedure getDateRange
@fromdate datetime, @todate datetime
as
begin
SET NOCOUNT OFF
/*Declare @todate datetime, @fromdate datetime
select @fromdate = '2005-01-01'
select @todate = '2008-12-31'*/

;With DateSequence( [Date] ) as
(
                Select @fromdate as [Date]
                               union all
                Select dateadd(day, 1, [Date])
                               from DateSequence
                               where Date < @todate
)

select
                               YEAR([date]) as Year,
                               Month([date]) as Month,
                               DAY([DATE]) as Day,
                               [date] as Date,
                               CASE Month([date])
                                                WHEN 1 THEN 'Jan'
                                               WHEN 2 THEN 'Feb'
                                               WHEN 3 THEN 'Mar'
                                               WHEN 4 THEN 'Apr'
                                               WHEN 5 THEN 'May'
                                               WHEN 6 THEN 'Jun'
                                               WHEN 7 THEN 'Jul'
                                               WHEN 8 THEN 'Aug'
                                               WHEN 9 THEN 'Sep'
                                               WHEN 10 THEN 'Okt'
                                               WHEN 11 THEN 'Nov'
                                               WHEN 12 THEN 'Dec'
                               END as [MonthShort],
                               DATENAME(MONTH,[Date]) as [MonthName],
                               max(DATEPART(DAYOFYEAR,[Date])) as DayInYear,
                               DATEPART( wk, [date])  as Weeknumber
from DateSequence
group by YEAR([date]),Month([date]),DATENAME(MONTH,[Date]), [date]
option (MaxRecursion 10000)

end
GO

Importing this

will give me:

Great tip got from Dave Wickert :)


Get insight to SharePoint PowerPivot usage without Central admin rights using custom SSRS reports

June 7, 2010

Posted by: Kasper de Jonge

As a salute to the session: “Building Custom Extensions to the PowerPivot Management Dashboard” of Dave Wicker today at MS Bi conference a blog post on customizing  the dashboard.

I just love the PowerPivot management dashboard, I think it’s a great dashboard and shows you all kind of information you need to maintain your PowerPivot environment. Unfortunately the management dashboard is placed inside the Central admin.You need access to the central admin  to be able to visit the PowerPivot management page, this means you also have access to all the other SharePoint admin pages.

I can understand your sharepoint admins aren’t too happy when your BI department is running around in your central admin.

To solve this have I have created a report based on the data inside the PowerPivot Management Dashboard data, that you can use without having the credentials. For more information check Dave Wickerts blog post: Customizing the PowerPivot Management Dashboard (too bad i had finished my report when this article was released )

The report will show you the usage of reports by users, nr of queries and load in the last 7 days and in the last 30 days. This way you can quickly identify which workbooks is being used a lot and you need to take a look at as BI department.

The data is placed inside a PowerPivot file which is stored at Central Administration site (Go to Site Actions -> View All Site Content -> under Document Libraries click PowerPivot Management folder). The file will be placed inside the directory with the name of a guid which you see there. Inside this folder you find “PowerPivot Management Data.xlsx” which is the PowerPivot file which contains the PowerPivot usage data.

We are going to create a user in the AD and give this user access to the central admin and use this to run our report with:

One thing to keep in mind, you need to use a reporting services that is integrated in the same SharePoint, i was unable to connect to the data using a separate reporting services. I guess this has something to do with claims / kerberos.

Ok on to the report. I created a report in BIDS. First I created a shared datasource connection using a Microsoft SQL Server Analysis Services connection type pointing to: “data source=http://sp2010rc:5566/PowerPivot Management/466f3002-543f-4c06-93bf-4d922641a73f/PowerPivot Management Data.xlsx”.  So we can reuse this connection for multiple reports we can build in the future. Change this link to the location of your PowerPivot Management Data.xlsx file.

Next i created the report, in this report i wanted to show the usage from the last 7 and 30 days of my PowerPivot Files. To get the data I altered a SSRS generated dataset in MDX to always return the last 7 days.

To determine the measure I wanted i used the “Workbook Activity.xlsx” provided in the management dashboard. The three measure used there to determine workbook activity are:

  • UserDCnt : Number of distinct users making connections to workbooks
  • QuerySum : Total querys fired in requests at workbooks
  • SizeMBMax : Total size of the load put against the server at each request, as i understand this is the size of the powerpivot data.
  • For dimension members i selected:

    • [Date].[Month] : Month a request was done in
    • [Date].[Year] : Year a request was done in
    • [Date].[Date] : The date a request was done in
    • [Documents].[AuthorLoginName] : Author of the document requested
    • [Documents].[FileUrlRelativeToSite] : Filename of the document requested
    • [Users].[FullName] : user that requested the filename

    This results in the following MDX statement:

    SELECT NON EMPTY { [Measures].[UserDCnt], [Measures].[QuerySum], [Measures].[SizeMBMax] } ON COLUMNS,
    	  NON EMPTY {
    				(	[Date].[Month].[Month].ALLMEMBERS *
    					[Date].[Year].[Year].ALLMEMBERS *
    					{ ParallelPeriod([Date].[Date].[Date], 7, StrToMember("[Date].[Date].[" + Format(now(), "yyyy-MM-dd") + "]"))
    						: StrToMember("[Date].[Date].[" + Format(now(), "yyyy-MM-dd") + "]") }*
    					[Documents].[AuthorLoginName].[AuthorLoginName].ALLMEMBERS *
    					[Documents].[FileUrlRelativeToSite].[FileUrlRelativeToSite].ALLMEMBERS *
    					[Users].[FullName].[FullName].ALLMEMBERS
    				)
    				}
    	DIMENSION PROPERTIES MEMBER_CAPTION, MEMBER_UNIQUE_NAME ON ROWS FROM [Sandbox]

    Take notice on the Date part:

    { ParallelPeriod([Date].[Date].[Date], 7, StrToMember(“[Date].[Date].[" + Format(now(), "yyyy-MM-dd") + "]“))
    : StrToMember(“[Date].[Date].[" + Format(now(), "yyyy-MM-dd") + "]“) }
    What happens here is that i only want to select the measures that in  the date range from 7 days ago to today, we can do this because the date dimension contains all dates and thus can do a strtomember. You can change this number to whatever you want, or maybe even make it a parameter in ssrs.

    Putting this in a report i show the powerpivot file together with its author, total nr of users, nr of queries and the size of the workbook, then i used a sparkline to show the nr of users for each day, and in a databar i show the numer of querys * the size of the workbook to show the load it has on the server.

    This results in the following report:

    Too bad that I didn’t have more data in my management dashboard so the trend doesn’t really come out as it should, but it should give you an idea what you can do. When we actually will have more data we can fine tune this report.

    I have published this report to SharePoint and made sure the report is executed as the user we just created and gave access to the central admin:

    The report will now have access to the powerpivot management data while the report user doesn’t.

    Feel free to download the the SSRS 2008 R2 project here. Tell me what you like or dislike from it or what you changed.


    PowerPivot time intelligent functions: why use ALL() and how to work around it

    June 3, 2010

    Posted by: Kasper de Jonge

    Last week I got a most excellent question from Sasha at my question page. Using my timeintelligent function screencast Sasha created a workbook using  YTD  where he used one table with facts and dates in it. The problem he had was when created his time intelligent function he wasn’t able to use data from other columns than the date columns. In this blog post i´ll try to describe how the time intelligent functions work, what pitfalls are and how to solve them.

    To be able to use a time intelligent function in DAX you use the Calculate function to group measures by a filter. With a time intelligent function you want to filter your values over a period of time (like YTD or previous month).  Most of the time you want the use these functions inside a pivottable where you use dates on the x or y-axis, the values inside the pivottable would show values per the current period context. This would logically result in the following DAX formula:

    Dates YTD = CALCULATE(sum(Tablix1[nroforders]),DATESYTD(Tablix1[Date]))

    This writes out to: you want the sum of Tablix1[nroforders] from the first Tablix1[Date] value of the year to the Tablix1[Date] belonging to the Tablix1[nroforders] in pivottable context. The YTD of the Tablix1[nroforders] from march 2009 would mean we need to take the sum of Tablix1[nroforders] from all rows from the start of year to march 2009, in the image below you can see a sample of the values that will be summed:

    But when we add the formula to the measures and use it in a pivottable we see something strange:

    As you can see the nroforders and the YTD formula result in the same values … this is not what we expected. The reason is because the time intelligent function requires an additional parameter, in a blog post at PowerPivotpro.com the PowerPivot product team gave the following answer to my question to why the result is not as expected:

    When using Time intelligence functions like LASTDATE or DATESMTD, it is necessary to add another argument which is ALL (TimeTable). Without this only the selected dates are considered, and you can’t find the last month unless you are in the last month.

    As you can see in the screenshot this is indeed what happens, the function only uses only the current date context in the sum.

    Ok so we need to use ALL to get the results we want, this will result in the following syntax:

    DatesYTD w All =CALCULATE(sum(Tablix1[nroforders]),DATESYTD(Tablix1[Date]),all(Tablix1))

    This is indeed the result we expected, but having to use the ALL() function has a huge downside.  The ALL() function according to BOL:

    Returns all the rows in a table, or all the values in a column, ignoring any filters that might have been applied. This function is useful for clearing filters and creating calculations on all the rows in a table.

    This means when you use a filter or slicer this is being ignored by the time intelligent function that uses ALL(). As you can see below the results of the YTD are the same as unsliced:

    There are two methods we can use to work around this problem:

    1. When you know what slicers and filters you want to work with you can use an ALL() alternative: ALLEXCEPT(). With ALLEXCEPT you can pass through “a list of columns for which context filters must be preserved”.
      In our case we would like to be able to slice on country. This would look like:
      DatesYTD w AllExcept =CALCULATE(sum(Tablix1[nroforders]),DATESYTD(Tablix1[Date]),allexcept(Tablix1,Tablix1[country]))
      I would say use this option if you already know what you want to slice on and don’t have much time to solve it properly.
    2. The proper and most flexible is option number 2. To be able to slice / filter on all the columns you can think of you should create a separate time table. This isn’t very user friendly and your end users will have a hard time grasping this. A few options i can think of to create this time table:
      1. Import the fact table with distinct on date columns (i hope your table isn’t too big ..)
      2. use excel to copy the date rows, remove duplicates and create a linked table (new dates won’t be added)
      3. Supply your users with a default time table in SharePoint/SSRS, they can import this by using the data feed option, use your DWH datetime table as source.

    you should create a relationship on the datetime column between the fact table and the the imported time table (make sure your datetime columns have identical granularity, like year, month, day, otherwise the join wont find results). With this relationship in place you now are able to use ALL over the datetime table. When you use ALL() over the separate time table it no longer ignores filters / slicers over your  fact table.
    The function would look like:
    DatesYTD w All TT =CALCULATE(sum(Tablix1[nroforders]),DATESYTD(DateTable[Date]),all(DateTable))
    You now are able to filter or slice all the columns from the fact table you want. In the RTM version of PowerPivot the ALL() in the DAX function is no longer required, the separate time table unfortunately still is.

    The final workbook now looks like:

    My conclusion is that Time intelligent functions still are a very powerful feature but i am really disappointed on its user-friendliness, while these workarounds are easy for IT/BI personnel to grasp and implement, end users will have a harder time implementing this. Maybe MS can implement a “add date time table” button in PowerPivot to automatically create a date time table to our PowerPivot tables to make it a little easier for end users to implement time intelligent functions.