Wish 3: Easier Paths To Advanced Analytics
Developing algorithms and predictive models is work that has to be carried out by hard-to-find, expensive data scientists. Or is it? Scarcity of talent is one reason big-data, analytics and business intelligence vendors are developing machine-learning approaches. Proven in applications including optical character recognition, spam filtering and computer security threat detection, machine learning uses learning algorithms that are trained by the data itself. If you show the algorithm thousands or tens of thousands of examples of scanned text characters, unsolicited email messages, or virus bots and malware, it can reliably find more examples.
The same approach can be applied to spotting customers who are ready to churn or jet engines that are about to fail. With machine learning, trained models also can continue to learn from new data. Amazon.com and Netflix, for example, use algorithms to spot patterns in customer transactions so they can recommend other books or movies. When a new book or movie comes out, these companies can start recommending it as soon as their algorithms discerns the preference pattern in the data.
Apache Mahout is the leading route to deploying machine-learning-based clustering, classification and collaborative filtering algorithms on Hadoop, but these techniques are also supported by the R statistical programming language. Commercial vendors supporting or embedding machine-learning techniques include Alpine Data Labs, Birst, Causata, Lionsolver, Revolution Analytics and a growing list of others.