Software // Information Management
Commentary
6/25/2009
07:04 AM
Seth Grimes
Seth Grimes
Commentary
Connect Directly
Twitter
RSS
E-Mail
50%
50%
Repost This

Summer Reading: IR, Sentiment Analysis, and Visualization

Summer's slower pace allows time to work through material set aside for calmer days. My reading list includes works on Information Retrieval, Sentiment Analysis, and Visualization. The items on my list are technical and accessible, of potential interest to anyone who works with analytics. You might also find them worth at least a quick look.

Summer's slower pace allows time to work through material set aside for calmer days. What's on your reading list? Mine includes a variety of papers and also longer works on Information Retrieval, Sentiment Analysis, and Visualization. The items on my list are technical and accessible (which is not the same as easy), of potential interest to anyone who works with analytics. I've paged through them and plan to take a deeper dive. TechWeb readers might also find them worth at least a quick look.First up, an Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan, and Hinrich SchÜtze. Information retrieval (IR) is, of course, a fancy, academic name for what most of us know as search. Why is search/IR important? We all know the answer: "[The] explosion of published information would be moot if the information could not be found, annotated and analyzed so that each user can quickly find information that is both relevant and comprehensive for their needs." I'm looking to this book to fill gaps in, and greatly expand, my understanding of IR.

Manning and SchÜtze are academics and authors of a 1999 work, Foundations of Statistical Natural Language Processing, while Raghavan heads Yahoo! Research, teaches, and was CTO at search-engine vendor Verity (acquired in 2005 by Autonomy). Their IR book is designed for use as a textbook; it covers basic and advanced topics and Web search. It is available free on-line in PDF and HTML forms, or you can buy a copy.

Next, Opinion Mining and Sentiment Analysis by Bo Pang and Lillian Lee is a monograph, at 135 pages essentially an extended but very accessible academic paper. It cites 332 references, mostly to technical literature, but it also presents the business case for sentiment analysis and firmly roots discussions in real-world examples, from movie reviews to quotations from literary sources such as novelist Charlotte Brontë. You may find the opening chapters, "The Demand for Information on Opinions and Sentiment" and "Applications," helpful, even if you don't read further into the monograph, which goes deep into the technology.

You can similarly read Pang's and Lee's text in electronic form via links to a free PDF and to a Now Publishers e-book with hyperlinks to references and other formatting.

Number 3 on my book list is Now You See It: Simple Visualization Techniques for Quantitative Analysis by Stephen Few. The premise of Steve's new book is, "Although some quantitative data analysis can only be done using sophisticated statistical techniques, most of the questions that organizations typically ask about their data can be answered using simple visualization techniques." He aims in Now You See It to teach those "simple, fundamental, practical techniques that anyone can use." The book is well produced and full of examples that, based on an initial look, very ably illustrate analytical concepts.

This short reading list reflects my own analytics and visualization interests. There's lots out there to read, time permitting. If you've encountered or recently read any particularly helpful BI, data warehousing, or analytics materials, please do let me know.Summer's slower pace allows time to work through material set aside for calmer days. My reading list includes works on Information Retrieval, Sentiment Analysis, and Visualization. The items on my list are technical and accessible, of potential interest to anyone who works with analytics. You might also find them worth at least a quick look.

Comment  | 
Print  | 
More Insights
The Agile Archive
The Agile Archive
When it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
Register for InformationWeek Newsletters
White Papers
Current Issue
InformationWeek Elite 100 - 2014
Our InformationWeek Elite 100 issue -- our 26th ranking of technology innovators -- shines a spotlight on businesses that are succeeding because of their digital strategies. We take a close at look at the top five companies in this year's ranking and the eight winners of our Business Innovation awards, and offer 20 great ideas that you can use in your company. We also provide a ranked list of our Elite 100 innovators.
Video
Slideshows
Twitter Feed
Audio Interviews
Archived Audio Interviews
GE is a leader in combining connected devices and advanced analytics in pursuit of practical goals like less downtime, lower operating costs, and higher throughput. At GIO Power & Water, CIO Jim Fowler is part of the team exploring how to apply these techniques to some of the world's essential infrastructure, from power plants to water treatment systems. Join us, and bring your questions, as we talk about what's ahead.