Predictive Analytics -- Data Mining

After some frenetic development at my new position, I've had the opportunity to try out some new technology (read that as 'new to me', rather than new in general): Sql Server 2008 Data Mining. I'll be sharing quite a lot of perspective on this in upcoming posts, with some practical examples where possible.

One thing that's surprised me so far is how small the community seems to be around this. It doesn't seem to be very well publicized, even withing Sql circles. Most of the documentation that's out there are from members of the Sql Server development team at Microsoft that have come up with the server implementation.

After having some hands on, there's plenty to love and quite a bit of frustration, however I believe ultimately that most developers and information workers will find a great deal of value in this little-hyped tool.

Here's some trivia that gives you a picture about what it takes to learn how to effectively use this:

1) Data Mining is tightly integrated as a part of SSAS -- Sql Server Analysis Server (with all that goes with that -- Windows Auth only included)

2. Data Mining queries (predictions) are done using DMX queries (not to be confused with MDX queries for cubes, although you can blend a statement with a DMX query and an MDX subselect)

3) Data Mining started out as an ODBC set of extensions  -- literally Data Mining eXtensions for ODBC (aka DMX)

4) Mining Models and Structures are stored under the covers as cube-ish metadata. Don't believe me? Check out the tooling, and note how attributes are made... there's a lot of similarities there.

5) The Data Mining add-on for office is really something that should be exploited -- essentially making an on-the-fly cube based on a table in excel, and then having the ability to create live trending forecasts / clustering / market basket associations? Sign me up!

6) The tooling is inconsistent, depending on the type of data you're working with. If you predict nested details, you lose some tooling functionality; if you only predict master-level fields, you lose other functionality. 

7) Although the technology has been out since at least Sql Server 2000, the adoption rate looks low, or people aren't sharing their learned lessons with the community at large. I've gathered the blog feeds I've found so far here: http://www.google.com/reader/bundle/user%2F09775912853343203303%2Fbundle%2FSqlServerDataMining

8) This book is invaluable -- there's learned lessons and documentation here that can't even be found in the Books Online documentation in Sql Server. Data Mining with Microsoft Sql Server 2008 by Jamie McLennan, ZhaoHui Tang, and Bogdan Crivat.

 

Posted on 8/11/2010 7:55:00 PM by Jason Nadal

Permalink | Comments |

Categories: dmx | DataMining

Tags: , , ,

Be the first to rate this post

  • Currently 0/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5