Excel team blog – great PowerPivot content

October 31, 2009

One of my longtime friends, Joe Chirilov, has a great series of posts running on the official Excel Team blog.  Think of it as the 5-minute walkthru of everything, whereas my Football project is going to take its sweet time on every last detail.

So if you’re getting impatient with me and want to see everything now, definitely check it out:

Introducing PowerPivot

Using PowerPivot With Excel 2010

PowerPivot:  The Business User’s Perspective

Note that the introduction post is written by another PowerPivot team member, Ashvini Sharma.

Joe and Ash truly are fantastic people, and if you ever get a chance to meet either of them in person, don’t pass it up.  Knowing each of them has enriched my life.  And that’s not the sort of thing I say about just anyone.


PowerPivot Data Cleaning: Three Options (with a football angle of course)

October 30, 2009

In the previous PowerPivot football project post, recall that I’d encountered some dirty data.  Even in the world of sports stats, data quality is a huge problem:

Where's 007?

Some players have names, others are just numbers.  And we can’t edit/delete these records in-place, for the PowerPivot compression reason I outlined last time.  I mentioned three options to address this problem, so what are they?

Option #1a – Pre-filter the data during import (graphically)

In my previous post, I used the point-and-click method to grab my database tables out of SQL Server.  I performed no filtering whatsoever, even though there IS a really simple, Excel-style UI for filtering the data included in the import wizard:

PowerPivot Excel-Style Query

In many cases, that will work for you – you can deselect certain values and set simple rules like “equals” and “does not equal.”

In my case, though, I don’t think that works.  I just have too many bad values to uncheck them individually.  And there isn’t a rule for “is text” or “is not numeric.”  (Something we should add next time I’d think).

So, moving on to:

Option #1b:  Pre-filter the data during import (by writing SQL)

That same import wizard also contains a tool that lets you write (or paste) you own SQL statements:

SQL Query

Which I’m pretty sure I could figure out how to do if I researched “filtering non-numeric values using SQL” on the web.

But for now, let’s stick with the “I’m just an Excel user” vibe. So…

Option #2:  IT Gives me Clean Data to Begin With

Does that sound lazy of me?  :)  Not really.  Did you see the post earlier today on Enterprise BI Mashups?  One of the quotes in there is about how IT focuses on data sources and data quality, freeing business users to build better reports and analyses.  Specialization.  Division of labor.

And while PowerPivot does not require it, it’s still very much a best practice.

Option #3:  Clean the Data in Excel with a Linked Table

Let’s go with the most Excel-style solution.  PowerPivot has Clipboard support!

So…  I can select the entire DimPlayers table (ctrl-A), copy it (ctrl-C), switch over to the Excel window of the same workbook, and paste it.  Then it’s just a matter of sort/filter, and delete to get a table that only contains names:

PowerPivot Cleaned Table in Excel

OK, so how to get it back into PowerPivot?  Do you see the “Create Linked Table” button in the Excel ribbon above?  That’s a feature of our addin that copies the selected Table from Excel into PowerPivot, and then links the PowerPivot table back to the Excel table.

(Note that in order to do this, you have to format the Excel data as a real Table.  I use Ctrl-L)

Here’s the cleaned table back in PowerPivot:

PowerPivot Linked Table

See the little link icon on the sheet tab?  That means I can edit the data back in Excel, flip back over here, and the edits automatically flow through (actually, we replace the entire table, but the effect is the same).

Maintaining the Table in Excel

There are some lists that are just meant to be maintained by a human being.  For those, this linked table feature is perfect – open the spreadsheet every now and then, update the list, PowerPivot refreshes, and then you re-publish.

This local-editing-thru-Excel also works well in my case, since I am not getting any new data from Stats (although…  please please please, Stats, give me a free license – we will make beautiful music together).

The one case where this won’t work very well is if the Players table data changes all the time on the backend.  I don’t want to manually update my workbook all the time.  Remember, scheduled automatic report refresh is a big PowerPivot feature.  In those cases, options 1 and 2 are the ways to go.

Next Football Post: Formulas and Relationships! >>


Article on Enterprise BI Mashups

October 30, 2009

Given the extremely mashup-friendly nature of PowerPivot, I found this article fascinating:

http://searchsoa.techtarget.com/news/article/0,289142,sid26_gci1372670,00.html# 

Especially liked this part (my emphasis):

"Mashups won’t succeed in a BI context unless it’s fun, unless its highly interactive and the user says ‘wow that’s totally easy to use let me use that so I can build my reports,’" said Kobielus. "If it’s approached where IT is forcing self-service on reluctant users, I think it will fail. It’s got to be user enthusiasm that drives this."

"The key is to break up the work a bit," said Robert Eve, EVP of marketing at data virtualization vendor Composite Software. "We work on the data plumbing side. You have your data-oriented people – your data architects. Then you have people more focused on the business consumer and the application usage to work on the visualization side."

It sounds like someone has been reading the PowerPivot playbook :)

Actually, I prefer to think that these are very insightful people independently reaching the same conclusions we did – it has to be light, easy, and fun.  And no matter how good the tools are, the users still need good, clean, accurate business data, so the role of IT becomes more clearly-defined.  Division of labor and specialization always makes for greater efficiency.


Honk if you hate site previews!

October 30, 2009

You know those annoying popups whenever you hover over any link or image on the site?  That serve no purpose whatsoever, other than to clutter your screen and make your status bar flicker all the time?  Argh.

I just now figured out how to turn them off.  Good riddance.


PowerPivot ranked #1 favorite feature from SharePoint conf

October 30, 2009

File this in the “I couldn’t agree more but am maybe a touch biased” department:

http://blogs.technet.com/sharepointexperts/archive/2009/10/26/sharepoint-conference-2009-favorite-technologies-unveiled.aspx


Better image sizes

October 29, 2009

I apologize for the poor size of the images in my posts thus far.  I’m still figuring my way around this blogging thing.

Changed the theme so that images could be larger inline, and figured out how to upload larger images so that when you click to get a bigger image, it truly is bigger.


Football: loading the data

October 29, 2009

No more messing around, let’s dive in. 

Rather than boiling the ocean up front, let’s set a modest goal for getting started:  I want to end up with a report that shows passing yards by player.  (PowerPivot lends itself quite well to this approach – no need to build an all-encompassing model up-front like with traditional BI tools.  This is more like…  Excel.)

First, I launch Excel 2010 (with the PowerPivot addin installed), and using the PowerPivot ribbon tab, I open the PowerPivot window:

01 PowerPivot Blank Wkbk

(I’m not yet set up to do videos, but will be soon.  For now, let’s make do with pics).

Then, since my football tables live in a SQL db, I use the “From Database” button, type in my server and db name, and end up here:

02 PowerPivot Select Tables

As I showed yesterday, there are 40+ tables in there.  Some of which I probably won’t ever need.  And I certainly don’t need them all in order to deliver my first report.

But it’s just so much easier to grab all of the tables now.  That way I can browse them all, look at the data, and decide which tables I need for the report in a hands-on manner.

So I just click the little “select all” checkbox at the top-left.  Great feature…  that I argued against for awhile.  I was wrong.  I use it all the time.  Amir and Olivier, allow me to apologize publicly :)

Hit Finish, and I wind up with this:

03 PowerPivot Import Done

A few things to note:

  1. I’ve maximized the PowerPivot window.  Don’t worry, I can switch back and forth to Excel as much as I want, without closing this window.
  2. There is one “sheet” per tab per table from the database.  Gives Excel users a comfortable way to browse the data.
  3. I am actually seeing the data, just like in Excel.  No abstract schema browsers here.  We get real data, all of it, to work with.
  4. I have expanded the sheet navigation “overflow” popup.  Given that PowerPivot will be working with a lot more “sheets,” on average, than Excel, it needed a revamped navigation concept.  (We stole this from OneNote – thanks folks!)

All good.  But there’s a problem here, too, that you may have noticed.

I have dirty players!

No, I’m not talking about Hines Ward.  (Leave the guy alone, ok, he just “plays hard.”)  I mean that my players table contains some bad data – players for which Stats Inc. did not record player names, and instead recorded numbers.

What do I do about this?  The PowerPivot window itself is read-only – I can add formulas, which I’ll show later, but I cannot edit or delete rows. 

Why Read-Only?

The reason for this is the remarkable compression employed by PowerPivot – some db’s shrink to 5% original size.  And that same compression is used, in-memory, to deliver faster query results at report and analysis time. 

Jamming individual edits into that compressed structure, while maintaining lightning-fast query speed – well, let’s not go there.  Outstanding compression and query perf.  It’s a good tradeoff, especially given the other methods available to me.

OK, so how *can* I clean the Players table?

I have three options, which I will cover in the next post.

Next Football Post


PowerPivot’s impact on BI pros?

October 29, 2009

A few of us are having a really interesting discussion on Twitter right now that I thought could use a different forum.

“PowerPivot – bad for BI developers and consultants because it removes some need for their services, or good because it opens more doors?”

Here are some of the posts, I mean, tweets:

  • Vidasmatelis_normal VidasM: Preparing key points to talk at #sqlpass FoB lunch.Topic #PowerPivot impact on DW projects.More/less work for us?I welcome any suggestions.
  • Rd_headshot_cropped2_normal andrewbrust: @VidasM I think #PowerPivot will give rise to projects that would otherwise never have happened.
  • Rd_headshot_cropped2_normal andrewbrust: @VidasM And if a published #PowerPivot solution becomes popular, then it’s a candiadte for a more conventional implementation
  •  powerpivotpro: @VidasM I think more work for SSAS dev. In my exp, iterating on reqs =90% of project.Now u’ll do many short "upsize" conversions from PPwkbk
  •  powerpivotpro: @andrewbrust Agree with all your pts. #powerpivot good 4 BI pro. If BI were less crucial, 1 could squeeze other. But crucial feeds crucial:)
  • 110483183_4_yu59_normal Kjonge: @VidasM my idea is analysts creating adhoc and reports for single users with #powerpivot but enterprise reports run on SSAS and SSRS
  • Vidasmatelis_normal VidasM: @powerpivotpro , @andrewbrust – But – I can count projects where #PowerPivot would have been "Good enough". So I can quantify my loss.
  • Vidasmatelis_normal VidasM: @powerpivotpro , @andrewbrust – But I just have to guess how much I’ll gain (maybe). That is why I worry. And again-I just love #PowerPivot
  • Vidasmatelis_normal VidasM: I also see risk – companies building "Data dumps" instead of proper data warehouse, as it is faster and #PowerPivot can be used for reports
  • 110483183_4_yu59_normal Kjonge: @VidasM i still see some #powerpivot issues like security that will prevent it from being used as real enterprise wide reporting
  • Would love this conversation to continue, both the four of us above, as well as others.  Please post your thoughts as comments!


    SharePoint Conf 2009: Observations #2 and #3

    October 29, 2009

    Continuing my series of 5 Things the ‘Pro Wants You to Know (About the Show)

    #2: If it isn’t SharePoint, it isn’t Microsoft BI

    Another one of those things that’s been creeping up on us with no formal announcement (until the conference, and even then only in a few places).  Every MS team that builds BI tools has, over the past couple of years, decided to embrace SharePoint.  Excel Services was first, then Reporting Services offered it as a mode, PerformancePoint jumped in, then PowerPivot… 

    I might have some of the timeline mixed up, but the point is that this was a series of organic, independent decisions.  There wasn’t some edict coming down from on high.  One by one, each team decided it made sense to embrace SharePoint.

    And because of this bottom-up nature, there was no one central team/person/marketer to trumpet the shift.  We’re starting to say it, but in my opinion, we need to be a bit louder about it.

    So, let me say it loud and clear:  SharePoint is now the Microsoft BI platform.  I seriously doubt you will see much, if anything, come out of Redmond that’s BI-focused and not integrated with SharePoint.

    If you’re a BI pro, start learning about SharePoint.  If you’re a SharePoint pro, start learning about BI.

    #3:  A lot of SharePoint admins already “get” BI

    Donald Farmer had a conflict, so I offered to take his Tuesday morning 9-10:30 shift at the PowerPivot booth.  10:30 rolled around, and I was having so much fun that I stayed the rest of the day.  At one point I had to ask someone to bring me lunch, which I ate standing up.

    Knowing that this was a SharePoint conference and not a SQL/BI conference, when people first showed up at the booth, I made a point of asking about their background, their exposure to BI, etc.

    And while I only met a couple of real BI pros during the day, I was very pleasantly surprised at the overall level of BI awareness.  A few high-level impressions:

    1. Reporting – everyone knows about the need for published, interactive data, and is already doing at least one project on SharePoint.
    2. The notion of “central” BI vs. “business unit” BI – I’d say about half the folks dropping by already were at least peripherally aware of this breakdown and tension, and the other half were quick to understand it.
    3. Cubes – again, about half were familiar with what cubes are, and the purpose they serve, and the other half picked it up awfully fast.
    4. General BI Awareness – near-100% awareness of what BI means and what the products offerings are.  A little confusion about “when do I use this vs. that” but it wouldn’t be a Microsoft conference if I didn’t field that question multiple times an hour :)

    It’s a recurring theme:  customers are ahead of where I expected them to be.


    Football data source tables

    October 29, 2009

    OK, last post for the night.  Turns out there were over 40 tables of data that went into the MSN football cube.  These are the tables I will be importing into PowerPivot.

    It was pretty difficult finding a way to list them all out in one place, with their column names, so everyone can see.  I eventually settled on a db diagram from SQL Server Mgmt Studio.

    Filter/Grouping Tables

    First, here are the tables that contribute to all of those Filter/Group fields from my last post:

    powerpivot football dim tables

    Note how they all start with “Dim” in their names?  For the non-cube folks out there, that is because Filter/Grouping fields are often called “Dimensions” in cubespeak.  PowerPivot does not use that lingo, which is a good thing, but I’m just telling you these things in case you encounter them elsewhere.

    Normally, if you are presented with this number of tables and you have to do something with them in Excel, you block off your calendar for the rest of the week.  It will be much, much faster with PowerPivot, as we will see.

    Numerical/Play-by-Play Tables

    And here are the tables that actually contain the data of what happened on individual plays.  There will be more rows in these tables than in the tables above.

    powerpivot football fact tables

    Just as the tables above are prefixed with “Dim,” these are prefixed with “Fact,” because that’s what they are called when you are building a traditional cube.

    So, all I have to do is turn these tables into the fields from my previous post, and then into the reports from my first football post.  Simple!

    But it wouldn’t be as much fun if I had picked a simpler example.  Remember, I intentionally chose an extremely realistic and complex project, because we are really gonna kick the tires :)

    Next Football Post:  Loading the Data >>


    I have my work cut out for me…

    October 28, 2009

    PowerPivot is going to be put to the test perhaps even more than I had thought.  And um, it looks like I will be, too.

    I spent a good chunk of today taking inventory of what the MSN football cube could do, as well as what the underlying data tables look like.

    Oh my goodness.  It looks daunting from the aerial view.  But I’m optimistic that it won’t be so bad when I start working on the individual pieces, akin to how this beautiful painting is really just a bunch of dots when you get close to it:

    Georges Seurat - a PowerPivot artist before his time

    (I’m going out of my way to use the word “akin” here, I’ll tell you why later – I promised someone I’d do it.  But I lost 15 minutes in the process of forcing the analogy, reading the wikipedia page on Seurat – fascinating stuff, his philosophy).

    Anyway, the MSN cube contained well over 150 fields that were exposed to the user (and many more behind the scenes).  I’ll stick to the 150 here.

    Numerical Measure Fields

    Interceptions Team First Downs
    Times Sacked Team Pass Attempts
    2PT Conversions (Pass) Team Pass Yards
    2PT Conversions (Rec) Team Points
    2PT Conversions (Run) Team Rush Attempt
    Catch % Team Rush Yards
    Completions Team Total Yards
    Completion % Team FG Att*
    Fantasy Pts Team FG Att Allowed*
    First Down % Team FG Att Allowed per game*
    Fumbles Team FG Att per game*
    Interceptions per Game Team FG Made*
    Pass Yards (Air) Team FG Allowed*
    Pass Yards (YAC) Team FG Allowed per game*
    Pass Yards (Air %) Team FG per game*
    Pass Yards (YAC %) Team INT thrown*
    Completions per Game Team INT allowed*
    Times Thrown To Team INT allowed per game*
    Rec Yards (Air %) Team INT per game*
    Rec Yards (YAC %) Team Move*
    Rec TD Team Move Allowed*
    Rec TD per game Team Move Net*
    Rec Yards Team Pass Attempt*
    Rec Yards (Air) Team Pass Attempt allowed*
    Rec Yards (YAC) Team Pass Plays*
    Rec Yards per game Team Pass Plays per game*
    Receptions Team Pass Yards Allowed*
    Rec per game Team Pass Yards Allowed per gm*
    Rush First Down % Team Pass Yards per game*
    Rush Attempts Team Points*
    Rush Att per game Team Points Allowed*
    Rush TD Team Points Allowed per game*
    Rush Yards Team Points per game*
    Rush Yards per game Team Rush Plays*
    Rush TD per game Team Rush Plays per game*
    Yards per Pass Att (Sack Adjusted) Yards per Rush
    Sack Yards Lost Team Yds per Pass Att net*

    And that isn’t even all of them.  Remember, we hired a consultant to build this for us, and when the consultant was gone, we had to be able to build any report we wanted – it was going to be very expensive to re-hire the consultant for incremental fields, akin (ahem) to death by a thousand cuts.  (Gee, if only we had possessed a tool that allowed us to add our own, without learning about cube design…)

    The * fields are ones that I remember being particularly tricky, both for the consultant to build and for me to use properly, so I’ll be keeping my eye on those.  There are some other tricksy hobbits in there as well, like Catch %, Rec Yards (Air), and Rush First Down %.

    Filtering and Grouping Fields

    And these are the fields we could filter and group by (known as dimensions/attributes in cubespeak) – the fields you’d place on rows, columns, page filters, or slicers in a pivot table.

    Conference Pass Distance in Air
    Division Half
    Team City Minute
    Team Nickname Play Time
    Down Quarter
    First Down (Yes/No) Second
    Day/Night Play Type
    Roof Type Player Age
    Field Type College
    Year Height
    Season Half Last Name
    Season Segment Nick Name
    Week Player Full Name
    Stadium Position
    Humidity (!) Weight Group
    Home Team Weight in Pounds
    Offensive Team Years Pro
    Defensive Team Scoring Event
    Home Team Win Loss Temperature
    Home Team Offense Field Position
    Leading vs. Trailing Yards to Go
    Offense Win Lose  

    Again, that isn’t quite a complete list.  I have simplified a bit.

    Now imagine being able to mix and match any of these fields with any of the numerical fields above.  Boundless possibilities.  Which is why cubes are so cool to have.

    Next Football Post:  Examining the Source Tables >>


    SharePoint Conf 2009: Five Observations

    October 28, 2009

    Allow me to let you in on a little secret:  when Microsoft employees attend Microsoft conferences, we are often learning just as much as everyone else.  The company is just too big, and we’re all just too busy, to really keep tabs on everything that’s going on, even with our immediate partner teams.

    Last week’s SharePoint conference in Vegas was no exception, and was in fact the most eye-opening yet for me.  The moments where I sat/stood there thinking “Really?  We have that now?” really stood out this time.

    So…  5 Things the ‘Pro Wants You to Know (About the Show)

    #1 – SharePoint is suddenly HUGE

    Where did this come from?  It wasn’t long ago that Microsoft teams were skeptical of betting on SharePoint – “it’s still new, are customers really adopting it,” etc.  Even while working on a team that was betting heavily on SharePoint, I must have fallen asleep, because WOW!

    I’ve heard attendance numbers ranging from 6,000 to 9,000, and I’d believe anything in that range.  Wall to wall people (think:  techie Mardi Gras).  Quarter-mile hikes between sessions.  Sold out – even I had to show up without a reserved spot, and pull a major Houdini to get in just to staff the PowerPivot booth.  Row after row of booths in the exhibit hall, some of which had no immediately obvious connection to SharePoint.  That told me something:  SharePoint is now a platform and a center of gravity in the business world.  This felt a lot like the PDC I attended in 1998, when Windows was still the focus of basically all commercial software.

    Oh, and there’s even a real-life SharePoint fairy now!

    SharePoint Fairy

    (I didn’t meet her – SharePoint fairies are intimidating! – but I hear she’s the marketing director for one of the vendors at the show.  Different fairy outfit each day, and walking around in those shoes can’t be comfy.  My feet and legs were hurting each night, and I wore Pumas most days.  Plus she had the guts to carry this off for a full week, smiling and outgoing all the time.  SharePoint Fairy, we salute you!)

    Anyway, if you’re like me and have been snoozing on this one, here’s some advice:  investigate SharePoint.  Think about what it would mean to translate your apps/solutions/etc.  The customers are ahead of us this time.

    Coming up:  Items Two thru Five