10 Big Data Books To Boost Your Career
People with big data and data science skills are some of the most sought after professionals because demand is outstripping supply. Here are 10 books that can help you learn everything about the emerging field and the tools you will need to conquer it.
![](https://eu-images.contentstack.com/v3/assets/blt69509c9116440be8/blt42a541a88c264b9b/64cb380795778a52c8286b49/Kindle_MichaelJay_iStock_000016969899_Medium.png?width=700&auto=webp&quality=80&disable=upscale)
Data science and analytics are some of the top in-demand job categories in the technology industry today. Indeed, demand is higher than supply for these specialists, and many data science master's degree programs have sprouted in the past few years. The online learning curriculum has expanded significantly, too, with offerings from the big MOOC providers (massive open online course) such as Coursera and Udemy, as well as vendors who offer the technologies that enable big data, such as MapR and Confluent, among many others.
Create a culture where technology advances truly empower your business. Attend the Leadership Track at Interop Las Vegas, May 2-6. Register now!
But aside from formal education, either online or offline, there are other ways to learn about this emerging field, and to gain some of the skills you need if this is the next step for you on your career journey. If you're an executive leading a team of data scientists, you might need better grounding to learn about the technology the group's members use to do their jobs.
InformationWeek has put together a collection of essential reading for data scientists, business analysts, executives, and others who are interested in this rapidly growing field.
Our collection features 10 books to help you understand everything from the ramifications of widespread algorithms and models for our future society, to how to use some of the most popular languages and tools to generate insights from data.
What are the essential skills for data scientists to possess? What are some of the key recipes for R users to leverage in their work? How can you use data to tell stories that compel your audience to action? How can you work with big data technologies such as Apache Hadoop and Apache Spark?
What are the cultural and economic ramifications of a future world where so many decisions are based on a black box of algorithms? Take a look at this list to find out. Are there any that you will add to your reading list? Did we miss any? Let us know in the comments below.
Python is one of the top languages suggested for data scientists to learn, and it's a skill that commands more money during salary negotiations. For any data scientist, aspiring data scientist, or developer looking to add this language to their skill set, Python Machine Learning could be essential reading. The book promises to help readers leverage Python's open-source libraries for deep learning, data wrangling, and data visualization. It offers help with learning strategies and best practices for improving and optimizing machine learning systems and algorithms.
Author: Sebastian Raschka
Price: $22.39 on Kindle, $40.47 in paperback
This book provides the reader with an overview of data analytics, so it could be a good book for a beginner wanting to learn more about the field or for managers who need a primer on the technologies and an understanding of how they all work. The book offers mini case studies at the beginning of each chapter and offers an overview of data mining techniques and platforms. It also provides a tutorial for the R statistical analysis platform.
Author: Anil Maheshwari
Price: $9.99
Written by the chief data scientist for MailChimp.com, this book concentrates heavily on using Microsoft Excel to gain insights from data, so don't expect to learn about R or Hadoop or Apache Spark here. But do expect to learn how to get the most out of data sets that can be handled by Excel.
Author: John W. Foreman
Price: $23.99 Kindle, $27.99 paperback
This book is based on an MBA course at New York University, which is taught by one of the authors. It introduces the principles of data science and leads readers through the "data-analytics thinking" needed to get business value from collected data. Data mining techniques and using data for competitive advantage are among the topics covered.
Authors: Foster Provost and Tom Fawcett
Price: $21.49 Kindle, $37.99 paperback
Want to learn about Hadoop? Here's the book you need. Published last year, this is the fourth edition of this guide. This edition uses Hadoop 2 exclusively and adds new chapters on YARN and Hadoop-related projects such as Parquet, Flume, Crunch, and Spark. Plus, this book covers the fundamental components of Hadoop -- MapReduce, HDFS, and YARN. There are also instructions on how to set up and maintain a Hadoop cluster running those three fundamental components. Other key technologies discussed include Pig, Hive, Crunch, HBase, and ZooKeeper.
Author: Tom White
Price: $24.99 Kindle, $32.62 paperback
This guide will give you the recipes needed to perform data analysis with R quickly. It includes more than 200 recipes for this open-source language, which has been a top choice of statisticians. Reviewers of the book who were new to R describe it as a practical guide and reference tool that has saved them lots of time.
Author: Paul Teetor
Price: $20.49 Kindle, $31.50 paperback
What good is it finding key insights with data if you can't explain them in a way that is meaningful to your audience? The ability to put the information into context can be a valuable skill. This book is designed to help with tips on how to direct your audience's attention to the most important data points, how to create the right visualizations to communicate your data, and how to use storytelling to get your message across to the audience.
Author: Cole Nussbaumer Knaflic
Price: $20.79 Kindle, $22.44 paperback
Hadoop has become synonymous with big data, but Spark is the newer and hotter technology that is making big data projects faster. Every big data book collection should include a book about Spark, and this one is written by Spark's developers. It covers topics such as distributed datasets, in-memory caching, the interactive shell, built-in libraries such as Spark SQL and MLib, and connections to data sources such as HDFS, Hive, JSON, and S3.
Authors: Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia
Price: $21.49 Kindle, $34.26 paperback
This is not a how-to book or a guide. Instead, this book looks at whether algorithms, by taking humans out of the calculations, can make the world more equitable, since everyone is judged by the same rules. But the author points out that the opposite is actually true. Written by a former Wall Street quant, the book takes a look at the cultural and economic effects of our algorithmic future. The author argues that the models used today are opaque, unregulated, and incontestable, even when they are wrong, and he maintains that they can reinforce discrimination. For instance, if a poor student can't get a loan because his zip code shows he is too risky, then he's cut off from the kind of education that could pull him out of poverty.
Author: Cathy O'Neil
Price: $13.99 Kindle, $18.50 hardcover
This free e-book was written by the US Chief Data Scientist at the White House Office of Science and Technology Policy. It explains the skills, perspectives, tools, and processes that he believes position data science teams for success. Author DJ Patil brings his experience as an architect of LinkedIn's data science team to this effort, describing the four essential qualities of data scientists and what it means for an organization to be "data driven."
Author: DJ Patil
Price: Free
This free e-book was written by the US Chief Data Scientist at the White House Office of Science and Technology Policy. It explains the skills, perspectives, tools, and processes that he believes position data science teams for success. Author DJ Patil brings his experience as an architect of LinkedIn's data science team to this effort, describing the four essential qualities of data scientists and what it means for an organization to be "data driven."
Author: DJ Patil
Price: Free
-
About the Author(s)
You May Also Like