Open Source Developers Build on Amazon Web Services - InformationWeek
Software // Enterprise Applications
07:29 PM
Connect Directly
Building Security for the IoT
Nov 09, 2017
In this webcast, experts discuss the most effective approaches to securing Internet-enabled system ...Read More>>

Open Source Developers Build on Amazon Web Services

Programmers are using the Lucene search engine library, a Web search crawler called Nutch, and Hadoop, an implementation of Google's MapReduce algorithm.

Clustered grid computing is increasingly available to the masses. Thanks to developer Doug Cutting, a Yahoo employee and prominent open source developer, aspiring programmers have access to the Lucene search engine library, an open source Web search crawler called Nutch, and Hadoop, an open source implementation of Google's MapReduce algorithm for processing large data sets.

U.K.-based software developer Tom White calls Nutch "Google in a jar."

Companies such as Krugle, Powerset, Wikipedia, and Zimbra have reached into the Google jar and are putting this open source code to use.

Used in conjunction with Amazon Web Services, Hadoop, Lucene, and Nutch promise anyone the fuel to fire up their own Google using Amazon S3 for storage and Amazon EC2 for processing. (PhDs, Google's ad sales system, a well-stocked kitchen, and the Google T-Rex are not included.)

Powerset, a natural language search startup, is pressing ahead with its own search engine based on Cutting's open source code. It will rely on Amazon's EC2 service for processing power.

Would-be search barons still have some work to do connecting the open source software and Amazon's hardware, but the open source community is rapidly doing just that: Developers like Cutting and White have been working to implement the Hadoop file system on S3, Amazon's storage service.

"I wanted to run some large natural language processing jobs on Hadoop but couldn't since I only had a handful of machines at my disposal," White said in an e-mail.

With the debut of Amazon's EC2 service last August, metered processing and storage gave developers the tools for ad hoc grid computation.

"The vision is that I write a MapReduce job, put my data on S3, run a simple script to run the job on a cluster of EC2 machines, and out pops the result on S3 for me to pick up," White said.

That vision is not far off.

Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
[Interop ITX 2017] State Of DevOps Report
[Interop ITX 2017] State Of DevOps Report
The DevOps movement brings application development and infrastructure operations together to increase efficiency and deploy applications more quickly. But embracing DevOps means making significant cultural, organizational, and technological changes. This research report will examine how and why IT organizations are adopting DevOps methodologies, the effects on their staff and processes, and the tools they are utilizing for the best results.
Register for InformationWeek Newsletters
White Papers
Current Issue
2017 State of IT Report
In today's technology-driven world, "innovation" has become a basic expectation. IT leaders are tasked with making technical magic, improving customer experience, and boosting the bottom line -- yet often without any increase to the IT budget. How are organizations striking the balance between new initiatives and cost control? Download our report to learn about the biggest challenges and how savvy IT executives are overcoming them.
Twitter Feed
Sponsored Live Streaming Video
Everything You've Been Told About Mobility Is Wrong
Attend this video symposium with Sean Wisdom, Global Director of Mobility Solutions, and learn about how you can harness powerful new products to mobilize your business potential.
Flash Poll