The "Spark as Hadoop replacement" myth
I'm seeing a lot of misleading and innaccuarate stories about Apahce Spark as an alternative to Hadoop. Yes, Spark can operate in a stand-alone clustered server deployment, but talking to Databricks, they're very clear in saying that they foresee operating on top of Hadoop, for the most part (or in the cloud). Maybe if data volumes are small the stand-alone approach would work, but I don't think Databricks wants to solve the high-scale storage problem. It has its hands full just ensuring solid-performing analysis engines for machine learning, SQL, R, graph, and streaming analysis.