Customized SharePoint Report Portal

September 23, 2010

 
Believe It

 


“I can’t believe it’s not SharePoint, er, Excel, er, wait… it IS Excel and SharePoint…  so confused…”

-Me

 

 

 
A couple weeks back I posted screenshots of our custom SharePoint master pages.  The functionality was great, but the aesthetics were still a little bland.

We’ve had some plastic surgery done since then.  Here’s the new home page:

Custom PowerPivot SharePoint Report Portal
Redesigned Customer Report Portal

We kept all of the functionality from before (no Ribbon or Site Actions menu unless you are an admin, etc.) but there’s a new component now as well:  the treeview.  There are five reports under each node of that tree, so our customers can quickly find any of the 100+ reports they are looking for.

(No, we don’t use the built-in Report Gallery – it really doesn’t handle more than 2-3 reports in practice).

And here is one of the redesigned report pages:

Custom PowerPivot SharePoint Report Page 
Redesigned Customer Report Page

As I said before, if you need advice or help doing something similar, drop us a note, we’ll see if we can work something out so that you don’t have to start from scratch:  info@pivotstream.com


Contest Update

September 22, 2010

A few things I forgot to clarify yesterday:

  1. Cut a hole in a box

Wait, that’s a different list.  Here is the real list:

  1. Contest will remain open until Sunday October 3rd, 11:59 PM US Pacific time.
  2. Entries should be emailed to me:  rob@pivotstream.com
  3. Have fun!

Contest: Many Charts, Shared Slicers, Fast Queries

September 20, 2010

 
Excel Pros Searching for a Worthy Challenge

“Shooting space garbage is no test of an Excel Warrior’s mettle!

I think it is appropriate to start this post with the following disclaimer: 

You do NOT have to be able to win this contest in order to be a monster at PowerPivot.

Think of this as brain candy.  Something that opens your eyes to some of the high-end power available when you merge PowerPivot calcs with native Excel calcs.  A 300-level technique.

***CONTEST UPDATE***  Check here for 3 small rules clarifications.

The Goal:  One Slicer Sheet Impacting Many Report Sheets

This is a pretty common desire.  If the report consumer wants to see ALL of the data broken out by the same set of filters, why make them repeat those filter selections on every single sheet of a workbook?

Isn’t it better, in those cases, to give them a single set of “master” slicers on a single sheet like this?

Slicer Home Page 
Master Slicers Sheet
(Sensitive Data Redacted as Always)

There are 14 slicers on that sheet.  The user can select the department, class, etc. that they care about up front.

Once they have made their selections, in this particular workbook they then have dashboard sheets that look like this:

Dashboard Sheet
Multi-Chart Dashboard Sheet Fed from PowerPivot 

We have several sheets like that in this one workbook, with 20-30 charts on each.  All charts react to the slicers on the Master Slicers sheet.

But you’ll also notice that 4 slicers appear on this sheet, too.  Those are duplicates of 4 of the slicers on the master sheet.  They are the same exact fields as the corresponding 4 on the master sheet, so they are always in sync – set them on one sheet and all other sheets reflect the same selections.

That is merely a convenience – we figured that most slicer selections would be made once, but that users would appreciate being able to bounce around the calendar a bit with the time slicers, without having to switch sheets.

There are also about 20 sheets in the workbook that look like this:

Full Chart Sheet
1 of 20 Single-Chart Sheets Fed from PowerPivot

Again, with the four time slicers repeated for convenience.

The Snag:  Slicer Cross-Filtering Performance

80 unique charts and 14 unique slicers.  What could go wrong?

Well, as I documented in the post on slicer cross-filtering, this is roughly the equivalent of having 80 * 14 = 1,120 pivots all updating at once in response to a single click.  (Maybe it’s not quite that bad, but it DOES get VERY complicated).

And that is very slow, even if the performance of any single pivot is super-fast.

But our report that we built at Pivotstream is fast.  It does NOT bog down on slicer cross filtering performance.  So, how did we do it?

The Contest:  Find Efficient Techniques!

After we built this workbook at Pivotstream I realized that our technique could be refined quite a bit, made simpler.  And while we are already doing that, I realized:

THIS IS AN EXCELLENT CONTEST FOR EXCEL PROS!

I still have two MSDN Subscriptions to hand out.  These basically are unlimited, free MS software for your own personal use, so they are a pretty damn good prize.

The top two submissions will win those.  Runners-up will win PowerPivot architecture posters from Denny Lee, assuming he has not run out :)

Rules Schmules!
 

  1. Use lots of charts, and have them all respond to a shared set of slicers
  2. Entries must include your PowerPivot workbook, plus an explanation of what you did
  3. Use any data source you want, but please use non-sensitive data and enough rows that we can spot performance differences
  4. Use at least 6 slicers, with cross-filtering enabled
  5. You do NOT have to use PivotCharts, but you can
  6. Macros can be used to design the report if you find that helpful, but should NOT be part of the update process when I click a slicer.  (Authoring time is ok.  Run time is not.)
  7. Entries will be judged on, in roughly descending order:
    1. Performance – I will compare query speed with slicer cross-filtering turned on vs. turned off.  Smaller differences are better for this contest.
    2. Polish – as I slice the report, do the charts all still look nice?  Ex:  Blank space in charts = not good.
    3. Ease – how hard is it to execute your technique?  We will turn winning entries into blog posts, and this will become a new reference technique here at PowerPivotPro.
    4. Originality – this never hurts but is not crucial.  Fun is good.  Unexpected benefits and features are even better.

Have fun :)


Interview with JamieMac of Predixion Software

September 15, 2010

 
headshot

Me:  “Can you PLEASE stop answering the question before I even ask it?  It’s kinda freaking me out.”

Jamie:  “I knew you were going to say that.”

It is a good time to be an Excel pro.  The world is waking up to the fact that Excel pros pretty much run things.  Some of the most visionary work being done in the software industry is targeting Excel pros as a result – PowerPivot is of course the #1 example.

True story:  near the end of the PowerPivot project, a faction within the team splintered off to form a new company.  Their goal was to build a brand-new class of tool for Excel users – one that doesn’t just crunch numbers, but that can predict them. 

That new company is Predixion, and Jamie MacLennan is their CTO (but I like to think of him as “Leader of the PowerPivot Rogue Faction.”)  When PowerPivot was getting underway, Jamie was the manager of the entire programming team.

Brief Aside:  A Peek Inside Jamie’s Mind

My favorite Jamie story:  every year, Jamie would get up in front of the entire Analysis Services team and deliver an hour-long presentation on the employee annual review system at Microsoft.  He would explain in great detail EXACTLY how everything worked behind the scenes.

In short, he basically told all of the secrets.  Not official secrets, because the review model was never officially secret.  But in practice, only the managers knew how it worked.  And by pulling back the curtain on all of that, he helped all of the individual employees greatly, while making the jobs of every manager MUCH harder.

No one at Microsoft EVER did that, but Jamie did it every year.  And no one could stop him.  No one could even openly acknowledge that they wanted him to stop.  Since I was a former manager myself, I knew the kind of empowering effect his presentation was having.

On to the interview!

Kasper has already posted on some of the things that Predixion can do.  Continuing on that vibe, I sat down “virtually” with Jamie to ask him some questions that I thought you might find relevant.

ROB:  OK – Predixion Software.  If I’m the kind of person who uses PowerPivot (which I am), why should I be aware of what you guys are doing?

JAMIE:  I realize I have to be careful not to answer this as why I should be excited about the PowerPivot integration, since as we discussed at length, we come from significantly different backgrounds in our way of looking at data.

So from the PowerPivot guy point of view, there are some significant things that we’re doing and planning on doing with PowerPivot that are really cool. First of all, we’re a predictive analytics vendor that really cares and really believes in the PowerPivot vision – over half of our development team, myself included, helped design and develop the product anyway – so we design and develop a predictive analytics solution that makes sense in a PowerPivot world.

Predixion Insight consumes data from PowerPivot for the purpose of predictive analytics. This means that all of the data consolidation and business modeling power that PowerPivot provides to the Excel Power User is now available to feed an equally powerful predictive analytics engine. Furthermore, PowerPivot users can use Predixion Insight to write back the predictive insights they gain into PowerPivot. This means that for the first time you can perform your BI/PowerPivot magic, sprinkle in some predictive fairy dust to enhance your data, and seamlessly continue to be a true Excel Superman by directly integrating those predictive results back into your PowerPivot application. This is what I show in our extended demo when I add predicted results to a PowerPivot table, use a PivotChart report to analyze those results, and then use Insight Now to determine the best analytical attribute, and even more apply Insight Analytics to reformat that attribute to make it useful in reporting scenarios. This is the true back-and-forth mashup of predictive and traditional analytics that we’ve been dreaming of for ages.

 

image

Some of the Predictive Tools in the Predixion Addin

On top of all that, Predixion’s roadmap includes deeper and more substantial integration with PowerPivot than we have even today. There are a lot of exciting things on the horizon for Predixion, and they all include PowerPivot.

ROB:  In a sense, what Excel and BI pros do every day, is “mine data.”  So how should we think of actual Data Mining tools as relates to our existing toolset – calculations, aggregations, pivots, queries, etc.?  And is there a difference between Data Mining and Predictive Analytics?

JAMIE:  This is one of those problems where the terminology has been consistently misused and morphed over time so that the label “Data Mining” doesn’t mean much of anything anymore – similar to the labels “Democrat” and “Republican”. The term “Data Mining” originally meant what we now call “Predictive Analytics”, but now has been co-opted to include anything involving looking at data including ad-hoc query, to search, to OLAP, even triggered eventing, or Amazon’s Mechanical Turk. Interestingly enough, most of the public opposition to “data mining” focuses around these interpretations around the term, and not the original intended meanings.

In hindsight, I think the term “Data Mining” really is more applicable to “what Excel and BI pros do every day” because you are semantically digging through data to find meaningful or useful tidbits. Predictive Analytics is a much better term to describe what we do because it eliminates the connotation of manual labor to accomplish information discovery. I honestly believe what we do goes beyond “predictive,” so I like to call it “predictive and behavioral analytics.” Some disagree since “behavioral” has a human-psych connotation, whereas I apply it to the behavior of the data – so maybe “predictive and descriptive analytics” is a better term. In any case it’s too long and “predictive analytics” seems to be the terminology that wins.

image 
More Goodies 

ROB:  I *love* that all I have to install is an addin.  I just need is my username/pwd, and the Predixion addin uses YOUR servers to do all the heavy lifting.  When I run a model using the addin, how much server horsepower am I tapping into?

JAMIE:  How much server horsepower do you want?

The beta cloud that you ran against had an array of about 10-12 4-core, 4 GB machines that we can grow or shrink dynamically based on usage patterns. The initial cloud at launch is around the same size. We’ve found that that machine configuration gives us the best bang for the buck for these types of tasks.

ROB:  Speaking of that, my 10 Million row data set uploaded fast, in about a minute.  Are you guys taking advantage of PowerPivot compression to make upload faster, too?

JAMIE:  No, even though our architect co-invented and implemented the PowerPivot compression mechanism, we aren’t using that method for our upload/download scheme. We are using different compression, as well as 128-bit encryption, of course, and it doesn’t hurt that you have a big fat internet pipe on your Pivotstream data center :)

ROB:  In my experience, the type of data set is important – some data sets don’t yield any new insights, whereas others reveal some pretty astounding stuff.  Can you describe the sorts of data sets we should focus on with the Predixion tools?

JAMIE:  That is an interesting question – the typical goal of the report writer or OLAP jockey is to reduce the large datasets into key numbers through massive aggregation, thus producing a view onto the data providing high level information.

However what happens with this approach is all of the marvelous gory details of the data are lost. Predictive Analytics thrives on those bits of details, so you need to come up with data sets that are a little different than what you would use for slicing and dicing.

Typically the problem is with the granularity of the data. For example, if you were looking at records around medical prescription data that was summarized by zip code, the best you could determine is patterns at the zip code level. You’re not likely to find actionable information an individual doctor could apply for treatment. Whereas for aggregate type of analysis, details are the devil, for predictive analytics, the devil’s in the details. I don’t want to make a blanket statement that you always want the lowest level of granularity, but you have to make sure you have the appropriate granularity to solve your problem.

One example that comes to mind is a cancer study where a variety of tumor measurements were taken for both benign and malignant tumors. For each aspect, such as diameter, several measurements were taken. Since the final analysis was at the tumor level, the individual measurements of the tumor weren’t interesting – there’s no particular reason to believe that the first diameter measurement has any different importance than the second – so aggregate values of the measurement were used. In this particular case, the researcher took the average of all measurements, the deviation of the measurements, that is, a measurement of the variety of results, and the most extreme measurement.

So, in summary, I would say that detailed data is much more important for predictive analytics than for traditional analytics, but the answer really lies in understanding what you’re analyzing and the answers you’re trying to get. Predictive analytics is a slightly different discipline, but in reality, if you understand your data, it’s really just a small mental hop away from where you’ve been working all the time.

ROB:  What’s it like at Predixion Software on an average day?  Does the vibe feel like Bungie Studios, part two?

JAMIE:  Yes! Definitely! We even have a life-size replica of a rules engine in our lobby – no, wait – that’s the “No solicitors” sign…. Actually a typical Predixion Software day is a lot of fun. I’ve worked with and have known most of the development team for many years so verbal and mental bandwidth is very high – we can spin on ideas extremely quickly without having to get bogged down in procedures because we all know where each other is coming from and we deeply understand each other’s capabilities. We moved out of a shared office space into a private space a little over a month ago so ad-hoc communication is simply shouting between offices – it’s actually very reminiscent of the early Analysis Services days when discussions were loud and aggressive and decisions were made quickly and acted upon immediately. No hurt feelings – just common goals and shared responsibilities driving the company.

That, and punctuating the day with coffee, pinball, and nearby parks and restaurants makes Predixion the best place I’ve worked so far.

[PDT[4].png]

The Official Programming Pose of Predixion Software

ROB:  I’ve heard you have some pretty interesting early adopters.  What can you tell me about who they are and how they are using Predixion?

JAMIE:  Well, I can’t say who they are, but there has been wide industry representation in our early adopters. Along with over 50 consulting firms, we’ve had telco, retail, law enforcement, manufacturing, finance, and healthcare customers all participate in our beta.

ROB:  What is the most useful, surprising, etc. thing you’ve ever found with Predixion?

JAMIE:  Once I performed a segmentation analysis on a large corporation’s internal employee satisfaction data and a name popped up in the results, and not in a happy way. Usually these types of analysis are particularly generalized and individual manager names would be random to the overall trends and anonymous, so it was pretty surprising to see– there had to be a lot of employees indicating this individual for their name to show up like this. To validate these results I performed some classification analysis and this name showed up again as a top-level indicator for employee satisfaction. (As a side note, this kind of data analysis is what I would refer to more as “behavioral” rather than “predictive” since I wasn’t trying to predict anything, rather just seeing the pictures that can be painted through data analysis).

It turned out through examining surrounding circumstances that there were compensation issues that were impacting employee satisfaction in the region under the manager that weren’t represented in the data – when these factors were added to the data, the manager’s name fell out and the true satisfaction issues could be addressed. In the end by using predictive (and behavioral) analytics the corporation was able to find and address a key driver in employee dissatisfaction.

JAMIE:  That’s all the questions you have, isn’t it Rob?

ROB:  Showoff.


AVERAGEX – The 5-point palm, exploding fxn technique revisited

September 10, 2010

The 5-Point palm technique is a beauty to watch, but o so lethal!

By Kasper de Jonge, original post at PowerPivotblog.nl

I recently had a problem that needed The 5-point palm, exploding fxn technique as described by Rob in his blog post.  I have used SUMX and COUNTX with success a few times before but this time I had a hard time getting my head around the AVERAGEX solution to my current problem.

While talking with Rob about it I realized I was thinking way to difficult. In this blog post I will try to make more sense of the X functions and give you a little help in building you own X function.

First let’s set a scenario. I want to compare the salesamount of each year to the average salesamount over the years. Sound easy huh ..

We start by creating a pivottable based on Contoso with CalendarYear on Y and the sum of salesamount as measure

Next we create a measure to get the average using =AVERAGE(FactSales[SalesAmount])

hmm what happens here?  The description of the function Average is: “Returns the average (arithmetic mean) of all the numbers in a column.”  So what is the function will do is use the lowest grain of data and determine the average of these values, in our case this means he uses all the sales amount values from the FactSales table that are in the current context (year) to calculate the average.

So how do we calculate the average over each year? In Excel this is easy:

And in PowerPivot this is essentially the same, we can use the AVERAGEX function to calculate average over a specific table. The description from MSDN is:

Calculates the average (arithmetic mean) of a set of expressions evaluated over a table.

This means we can change the scope of the average, the normal average automatically will take the lowest grain, but now we can supply our own table it has t o calculate the average. We want the function to do the same as the Excel calculation above. Our function should do average over:

2007 € 4.561.940.955 2008 € 4.111.233.535 2009 € 3.740.483.119

And this is exactly how the AVERAGEX should be used:

=AVERAGEX(all(DimDate[CalendarYear]), FactSales[Sum of SalesAmount])

The all(DimDate[CalendarYear]) would return a table of all years available (it even includes years that don’t have sales but this won’t alter the calculation). For each year in this table it returns the sum of salesamount, we need to use a measure here because it automatically calculates the sum no matter what context were are in. Then it uses this values of these sums to calculate an Average over.

The parameters are just like you say it, we want to calculate the average for each CalendarYear of Sum of SalesAmount.

This measure results in the following values:

The key here is the use of the first parameter supplied to the AverageX function, this is the table it will iterate over. Some other samples:
I want to calculate the Average sales amount of the “Contoso Bangkok No.1 Store”, we create a table by using the FILTER function where filter all storenames by one single store.

=AVERAGEX(FILTER(ALL(DimStore[StoreName]),
	DimStore[StoreName] = "Contoso Bangkok No.1 Store"),
	FactSales[Sum of SalesAmount])

I want to calculate the Average sales amount of the “Contoso Bangkok No.1 Store” and “Contoso Seoul Store”:

=AVERAGEX(FILTER(ALL(DimStore[StoreName]),
	DimStore[StoreName] = "Contoso Bangkok No.1 Store"
	|| DimStore[StoreName] = "Contoso Seoul Store"),
	FactSales[Sum of SalesAmount])

I want to calculate the average salesamount of all years leading up to the current year. First it checks if the current year has values, 2011 doesn’t have values but is available in the dimdate table. As table we want to return all years from the dimdate table before the current year. The current year is selected by values(DimDate[CalendarYear]) because we have set CalendarYear on the y-ax.  We need to check if values returns more than one result because when the subtotal is calculated values will return all years.

=IF(FactSales[Sum of SalesAmount] > 0
	,AVERAGEX(
		FILTER(ALL(DimDate[CalendarYear]),
					DimDate[CalendarYear]
					<= IF(countrows(VALUES(DimDate[CalendarYear])) > 1
						,Blank()
						,VALUES(DimDate[CalendarYear]))
				)
		, FactSales[Sum of SalesAmount])
	, Blank())

The AverageX, CountX and SumX functions are very powerful but difficult question, try to visualize what you want to do or use Excel to get the result you want and work from there.


1,300 Measures – Nightmare or Utopia?

September 6, 2010

  
multi-straw firehose

Which problem do you prefer, when thirsty -
Too Much Bottleneck (left) or Too Little Structure (right)?

Answer:  Neither is preferred of course!

I’m part of an email distribution list that involves a bunch of BI pros, as well as a number of the PowerPivot engineers back in Redmond.  Last week on that list, I asked a question that I knew would draw some fire:

   “How many measures can I safely put on one table in PowerPivot?”

There were 2 flavors of response:  technical on one hand, and “um, is that really a good idea?” on the other.

Technical answer:  no practical limit to # of measures

The official technical answer was good news:  the number of measures defined on a table in PowerPivot should not have any real impact on performance, unless you are using all of them in a single query of course.  Measures that aren’t part of the current report won’t drag you down, in other words.

(I am told this is in contrast to the traditional Analysis Services product, which scanned all measures at once.  That prior approach is a good thing when you are using a large percentage of your measures in a single query, but a bad thing when you are using only a small percentage.)

So I immediately dug in and built a model that has nearly 1,300 measures on a single table :)

Yes, they are real measures.

Yes, the addin seems stable with that many measures in the field list, and performance does not seem to have suffered, at least not in my initial testing.

And yes, it was tedious.  More on this in a later post.

Modeling Hygiene Answer:  Why the hell would you do that Rob?

The natural reaction from the BI/DB pros was to suspect I was doing something wrong.

Now, I know myself pretty well, and the prospect of me Doing Something Wrong can best be described as Entirely Plausible. 

So I asked a few followup questions.  And it turns out, no, there isn’t any pure philosophical flaw in what I am doing, but they were still skeptical.

Diet, Caffeine Free, Cherry, Coke Classic, Zero…  Please

diet_coke_plus_nextnature

Here’s a joke that I play with my colleagues when they are leaving the room and it NEVER gets old:

“Hey, would you mind getting me a can of Diet Caffeine Free Cherry Coke Classic Zero?”

This joke is a roundabout explanation for how we end up with 1,300 measures.

Thing is, we only have 5-6 core numerical data columns in this model.  But all the permutations are what get us.  For example, we have Sales of course.  But we also have Sales in Prior Period, and Sales from Same Period Year Ago.  Don’t forget Sales Growth.  Ooh, and Sales Growth Percentage.  And, you guessed it, Sales Growth Percentage Year Ago.  You get the idea :)

coke-zero-vanilla Traditional Analysis Services has a feature named Calculated Members that helps address this problem.  But until that feature comes to PowerPivot, well, Coke is just gonna have to make all 2^11 flavors for me.

Bottleneck vs. Discipline

One member of the list in particular took an interest in what I was doing.  His name is Thomas, and he knows 100 times more about traditional Analysis Services than I do.

His chief concern was that I was using the wrong table structure in my model, like perhaps one big single table.  As it happens, I am not.  I am using something known as a Star Schema, which is something I have shown a lot of on the blog but never really described with any clarity.

I think Thomas has an excellent point.  It is easy to start out with the wrong table structures, especially if you are coming from an Excel background.  So I think it’s time someone who understands the Excel mindset AND the need for proper table structure, um, explain that.

Dick has volunteered to do a post (or series of posts) on that topic, so I will leave that to him.

My point back to Thomas was essentially this:  the need for smart measures today is almost certainly 10x what is provided, or more.  When my consulting budget ran out on the original Football Project, and my Analysis Services pro was no longer sitting in my office all day…  well, I was not getting any more new measures.

Doesn’t mean I didn’t want new measures, or need them.  I just wasn’t going to get them.  The person with the questions (me) lacked the tools to build them himself.  That is changing now.

Conclusion

I think that hundreds of measures in a model, or at least the potential to have hundreds, is a VERY good thing.  It is a sign that the bottleneck is going away.

Now, there are consequences of course.  A 1,300 measure field list isn’t a lot of fun to navigate (although the Search box in the PowerPivot addin is a savior!), and there will of course be the danger of “I just recreated measure X without realizing I already had a measure that did the same thing.”

But those are TOOLS issues.  Microsoft can build better field lists for us, with better ways to organize and view things, etc.  They will have to.

Don’t limit me arbitrarily.  Sooner or later, all 1,300 measures in this new model WILL be used.  Not all in the same report of course.  But they will. 

We could have built these measures on an as-needed basis rather than all at once, but guess what?  They would be less consistent, more prone to error, etc. – building them all at once was actually MORE disciplined :)

So, BI pros, my advice to you is to embrace the fact that the number of measures is about to grow, a LOT, in the average model.  And that is a Good Thing.

But as Thomas points out, correctly, in his latest blog post, the Excel folks can learn a TON from the BI community in terms of discipline and technique.  So let’s keep the conversation going :)