Google has taken a step into the open machine learning with Google Predict. Rather than release a toolkit, in typical Google fashion, they've set it up as a webservice. This is great. There needs to be greater interaction with machine learning from all walks of life. Currently, the entry bar is pretty high. Even with open source tools like weka, current interfaces are intimidating at best and require knowledge of the field. The Prediction API strips all that away: label some rows of data and let 'er rip.
The downside of this simplified process is that Google Predict works as a black box classifier (and maybe regressor?). It "Automatically selects from several available machine learning techniques", and it supports numerical values and unstructured text as input. There are no parameters to set and you can't get a confidence score out.
In all likelihood this uses the Seti infrastructure to do the heavy lifting, but there's at least a little bit of feature extraction thrown in to handle the text input.
It'll be interesting to see if anyone can suss out what is going on under the hood. I signed up for the access waiting list. When I get in, I'll post some comparison results between Google Predict and a variety of other open source tools here.
Thanks to Slashdot via Machine Learning (Theory) for the heads up on this one.