Big Data. Big Decisions
InformationWeek
Special Coverage Series


Hadoop Alternative: Open Source Quantcast Touts Speed

Quantcast File System uses less disk space than Hadoop and shines on key speed metrics, developers say.

Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends
(click image for larger view and for slideshow)

San Francisco-based online-traffic analysis vendor Quantcast announced Thursday it shipped and released to the open-source community a big-data management engine designed to outperform the industry-dominating Hadoop Distributed File System.

Quantcast touts its Quantcast File System 1.0 as using half the disk space of the Hadoop Distributed File System (HDFS) while outperforming the more widely known big-data system in batch processing of data and in the speed of input/output of data between servers and back-end storage.

Quantcast File System (QFS) is designed as an alternative to Hadoop, which dominates the big-data market to such an extent that it will become the de-facto industry standard for big-data management and generate as much as $2.2 billion per year by 2018, according to a July report from MarketAnalysis.

QFS, on the other hand, is an internal app Quantcast uses to collect and analyze more than 500 billion data records per month, processing in excess of 20 petabytes per day, according to the company.

QFS is an enhanced version of the Kosmos Distributed File System (KFS), the open-source version of Google's internally designed Google File System--the data-management software that drives Google's search engine and other products.

KFS' main advantage, according to Google, is that it improves the performance of backend storage hardware for compute- and data-intensive applications such as search engines and data-mining projects.

KFS was designed to use two separate backend components: One to manage reads, writes, and searches of huge piles of data broken into chunks, and another to supply metadata defining the data's meaning and source.

[ Read Hortonworks' Hadoop Dilemma: Get Rich Giving Ideas Away. ]

Quantcast began using KFS internally for its own data management when the app was open sourced in 2007, as an alternative to Hadoop. At the time, KFS was "fundamentally experimental and insufficiently stable for production usage," however, according to Quantcast.

To fix that, Quantcast, which uses QFS as the primary data-management app for its production applications, chose it for load-balancing abilities that are more flexible and timely compared to Hadoop, according to Schubert Zhang, a VP at Hanborq, a Hadoop performance-optimization provider based in China.

According to Quantcast, QFS outperforms Hadoop because its client software is written in C++ rather than the slower Java, and its core services are compiled in C++ rather than C and Java, as is the case for Hadoop. QFS also encodes data using the same Reed-Solomon algorithm used to compact data onto DVDs, which lays data out in nine stripes, each of which is painted on a different physical disk and could be painted on entirely separate storage racks in the cluster.

Hadoop, by contrast, simply makes three copies of each data set and stashes them in different corners of the cluster so it can get to them using high-speed cluster interchanges rather than network connections.

Having to go through a server's PCI bus and a comparatively slow network could make QFS slower than Hadoop. But because every read and write is parallelized across six or nine different drives, the performance of QFS rises quickly if the cluster uses 10 Gigabit Ethernet or InfiniBand rather than the more-standard gigabit Ethernet, according to Quantcast.

To keep its performance high QFS also includes automatic file replication; fixed-footprint management of memory; data-storage location based on space and workload rather than static tables; and direct I/O from disk. A separate module is designed to integrate data and queries across both QFS and Hadoop to make the two compatible as well, the company said.

"In our Big Data future, file systems such as QFS will underpin cost-effective critical infrastructure for commerce and government," Quantcast CEO Konrad Feldman said in a statement. "Quantcast makes use of open source software and by making our own contribution with QFS we're hopeful that others will benefit as we have."

Quantcast, a startup that launched in 2006 using Hadoop to process its data, started using KFS in 2008 and began almost immediately to enhance and expand the software to suit its own needs.

Other big-data management vendors also have put out their own alternatives to Hadoop, most notably Datastax and MapR, both of which use proprietary enhancements in combination with open-source software including Hadoop.

Binaries and source code for QFS, as well as deployment and administrator's guides, are available at no cost here.

At this hands-on virtual event from Dr. Dobb's, GPU And CPU Programming, experts will offer insights that will enable developers who know little or nothing about GPU computing to add this co-processing dimension to existing and greenfield projects. When you register, you'll gain access to live and on-demand webcast presentations, as well as virtual booths packed with free resources. It happens Nov. 6.



Related Reading


More Insights




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.