Use datamining with PowerPivot in Excel

governor-arnold-schwarzenegger

“For me life is continuously being hungry. The meaning of life is not simply to exist, to survive, but to move ahead, to go up, to achieve, to conquer.”

-California Governor, Kasper de Jonge

UPDATE:  The technology in this post has now been taken over by Predixion Software.  To learn more, see more recent posts here and here.

Much as Tommy Chong reminds me of Donald Farmer, every time I watch one of Kasper’s videos, his voice inspires a very imposing picture.  I can’t help but think of Arnold, or Dolph Lundgren.

The man is a basketball fiend, too, so I imagine him being 7 feet tall, like a cross between Arnold, Dolph, and Yao Ming.  But then I find out things like he uses a Tolkien-inspired name on Skype, and, like me, played a Rogue in World of Warcraft for a few years.

And that’s when I realize:  oh yeah, he’s MY kind of people.

But the quote above is suitable for Kasper.  He never stops pushing the envelope.  No one asks me more questions than Kasper, or is exploring the boundaries of PowerPivot more enthusiastically.

So it’s appropriate, then, that I found that picture on a site in the Netherlands.  It’s also appropriate that he be the next guest poster on this site.  Check out Kasper’s Blog if you haven’t been following him.  And now, on to a killer technique by Kasper:

Data Mining with PowerPivot

I’m currently talking to a client who is very charmed by the possibilities of PowerPivot to analyze data, one thing led to another and we came to data mining .. I got thinking wouldn’t it be GREAT if we could use the Excel data mining add-in on PowerPivot data (with DAX at our disposal). Rob and I agree that this would be a great addition to the powers of PowerPivot and we got very excited. After some sparring with Rob I found a way! Rob asked me to do a guest blog post at PowerPivotPro with the outcome of my research.

In this screencast i will show you how you can use data mining with PowerPivot data:

Think about the possibilities we have, make some great calculations with DAX to make a calculated column or measures and then unleash the power of the data mining add ins !

A few prerequisites: since the data mining add-ins are 32-bits you have to install the 32 bits office 2010 with 32 bits PowerPivot. I hope they launch a 64 bits Excel 2010 version soon!
You have to have a SSAS (2005/2008 /2008R2)  instance installed on your network, not the in memory PowerPivot instance but a regular version, to use the data mining models in SSAS.

I used this video to guide me in the demo’s Introducing the Table Analysis Tool for Excel 2007.

And you can download the Excel data mining add-ins here (more videos in there to see the possibilities).

UPDATE: The SQL Server Data Mining team is working to extend the power and ease of use of SQL Server Data Mining to the Cloud. You can use this add-in to run datamining against a service in the cloud, you don’t need a SSAS instance installed ! Check it out on this page: http://www.sqlserverdatamining.com/cloud/

4 Responses to Use datamining with PowerPivot in Excel

  1. This is very interesting. I’ve had the DM add-in installed since Excel 2007. Currently, the DM tab sits next to the PowerPivot tab in the Excel 2010 32-bit beta (I have several add-ins that aren’t compatible with 64-bit Excel). I’ve always thought that DM support in PowerPivot’s AS engine would be nice, but the scenario Kasper describes would not have occurred to me because you still need SSAS to do anything related to data mining. However, Kasper is in a much better position than I to know how common this scenario would be in real life situations. I’m also interested to know if a 64-bit version of the DM add-in
    will be available anytime soon (if at all).

    A good book (the only book, other than the 2005 edition?) on the subject of SSAS data mining is “Data Mining with Microsoft SQL Server 2008″ by Jamie MacLennan, ZhaoHui Tang and Bogdan Crivat.

    Kasper, keep ‘em coming!

  2. Thanks for presenting on this topic. I am a consultant with statistics teaching experience, and I have presented on the topic of using Excel for Microsoft Data Mining. My web link has references to some of my past presentations. I believe we need more presentations on data mining, since some people are confused about the mechanics, and even more are confused about creatively determining a data analysis strategy.

    In this post, I will make specific comments to the excellent video (thanks for producing it)

    1:08 I like the use of DAX to create a calculated column. In this case, the goal was to make a binary variable to apply logistic regression. However, another approach would be to put the number of bikes into ranges and apply a decision tree to understand the different groups (instead of choosing a nominal cutoff value of 9). Any bike represents a sale to Adventure Works.

    2:44 I noticed on the right the panel was labeled “Gemini Task Pane” — that name might be leftover from the demo days, but perhaps it is intentional, I do not know. I would expect it to be named “PowerPivot Task Pane”.

    5:38 You commented on a married woman who made a lot of money: “Don’t want to be rude, but the tool says it is not OK”. Statistics and data mining do not imply value judgments, such as “not OK”. People, however, apply interpretations based on their past experiences. I believe it is a good thing that a married woman can make more money than her peers in this data sample. That exception provides an organization (Adventure Works) with insight that income is not likely to be an issue for this specific customer. Using the exception analysis can provide salespeople with insights into who is either above or below their peer group on individual variables. It is accurate, therefore, and value-neutral to call values in the bright yellow as “exceptional”.

    8:45 The graph demonstrates that the forecasting algorithm bases its decisions more heavily on the more recent data points. How far to go back depends on several factors: 1) what is the purpose of the forecast, and 2) what variability or stability patterns show in the past data. When I was a graduate student at GA Tech, time series analysis was an entire class because there are subtleties which even the SQL Server tools allows an Excel analyst to surface (through parameter setting). On this topic, the more people can learn about it, the better their results and interpretation can be.

    9:12 You mentioned about the difficulty in typically using data mining. Mechanically, some have yet to feel comfortable with the ribbon (or have not used it yet), and I would encourage that group to jump into Office 2007 or 2010. I have believed that the leverage of the Microsoft tools pairs accessible data mining (particularly through Excel) with a world-class database. There are other free tools on the market which perform data mining. However, many people would not know the names of these alternative choices. On the other hand, Microsoft has earned its brand recognition of the Excel product, which is a platform for showcasing newer technologies like PowerPivot, DAX, and SQL Server Data Mining. The true leverage will come in corporate environments who have to have large volumes of data analyzed by a number of skilled analysts, a task which all the open source data mining solutions struggle with.

    I encourage your readers to see my free data mining resource at http://www.marktab.net. Also, I have produced several YouTube videos illustrating data mining at:
    http://www.youtube.com/marktabnet

  3. Michiel Rozema says:

    This is great stuff. I discussed PowerPivot with a data mining consultant a couple of months ago, and our conclusion was that PowerPivot would be a great tool to prepare data for data mining. We now have the missing link.
    It actually makes a lot of sense to use PowerPivot even when there’s a SSAS instance around, and this is because PowerPivot enables you to easily include external data. For instance, I might take daily sales data from my corporate datawarehouse, combine this with external weather data, and do a forecast on sales based on the weather forecast (if the weather matters, of course). In fact, PowerPivot enables you to look for influencers outside of your own data, all within Excel.
    The only restriction is, of course, the 1M-row limitation of Excel.

  4. Wow says:

    Kasper de Jonge = schwarseneger

Leave a Comment or Question