Big data’s big in more than one way.
First, there’s an ungodly amount of it.
Second, figuring out how to get insights out of all that data is a lucrative career field. There’s a reason why the business intelligence software market is expected to grow at a rate of 21.1% over the next four years.
Glassdoor rated data scientist the number one job for 2017. The six-figure median base salary is impressive enough, but the growth potential for people who can make use of that big data is even more impressive. It’s no wonder so many people are interested in big data certifications.
If you want to get certified to work with big data, there are numerous options out there. These five big data certifications can help you find your way around the massive amounts of data currently in existence.
What are they? They’re Cloudera’s certifications that you can use their platform to turn raw data into useful information. Cloudera offers four:
Cloudera Certified Professional (CCP) Data Engineer: Certifies that you can “develop reliable, autonomous, scalable data pipelines that result in optimized data sets for a variety of workloads.” In other words, CCP Data Engineer demonstrates that you can wrangle data into a clean, useful shape that can be used by different people, for different purposes.
Cloudera Certified Associate (CCA) Spark and Hadoop Developer: The Spark and Hadoop Developer certification shows you can ” ingest, transform, and process data using Apache Spark and core Cloudera Enterprise tools.” This means you can do everything from importing and exporting data between MySQL and HDFS, to changing a data set’s format, to querying your data to generate reports.
CCA Data Analyst: The Data Analyst certification shows you can prepare, structure, and analyze data in Cloudera’s CDH environment. You’ll be able to do things like import data from MySQL into Hadoop, create and alter tables, and make reports with select and join queries.
CCA Administrator: Certifies that you can install and set up Cloudera Manager and CDH, “perform basic and advanced configuration needed to effectively administer a Hadoop cluster,” and manage a company’s Hadoop cluster on an everyday basis.
Cost: CCP Data Engineer: $400 per test
CCA Spark and Hadoop: $295 per test
CCA Data Analyst: $295 per test
CCA Administrator: $295 per test
Prerequisites: Officially, there are no prerequisites for any of the Cloudera certifications. However, Cloudera suggests you have certain knowledge beforehand for each of the certs:
CCP Data Engineer: “In-depth experience developing data engineering solutions and a high-level mastery” of data ingest, data transformation, data storage, and data analysis.
CCA Spark and Hadoop Developer: Cloudera suggests their training course is good preparation.
Tests required: Only one test is required for each certification. In all tests, you will have a remote proctor who keeps an eye on you via webcam. Each test will include between 5 to 12 questions that present different customer questions or business scenarios to address. The CCP Data Engineer test takes four hours to complete, and the rest take two hours.
Where offered (location-based or online): All tests are offered online; a webcam is required.
What is it? The Data Management and Analytics track is just one of several Microsoft offers as part of its Microsoft Certified Solutions Expert program, and it’s the one to focus on if you’re in big data.
Cost: $165 per test, but there are nine tests to take
Prerequisites: To get the MCSE in Data Management and Analytics, you’ll first need an MCSA in SQL Server 2012/2014, or an MCSA in SQL 2016 Database Administration, BI Development, Machine Learning, or Database Development.
Tests required: You’ll have to pick, and pass, one test from a list of 12 available exams, ranging from “Designing Database Solutions for Microsoft SQL Server,” to “Implementing a Data Warehouse using SQL.”
Where offered (location-based or online): Exams are offered through Pearson VUE, a testing company that offers exams online and at physical locations.
What is it? Two certifications, actually: the Mongo Database Administrator Associate, and the MondoDB Developer Associate. MongoDB is one of the most popular NoSQL technologies, and both certifications prepare you to work with NoSQL databases.
The Mongo Database Administrator (DBA) certifies that you can use Mongo’s popular open source database management technology to get value out of loosely-structured data. The DB Developer certification shows you can craft applications with Mongo. What differentiates Mongo is that it’s a document-based database, rather than a relational database. In relational databases, your data is organized in rows and tables. In document-based databases, your data is stored in documents. This difference is one of many things that makes Mongo good for location intelligence, social media data, and (of course) text and HTML.
Cost: $150 per exam
Prerequisites: None required, but the folks at Mongo suggest you take their training course. The good news is that exam registration comes with a free study guide, so if you can’t invest your time in a training course you can still study on your own.
Tests required: One 90-minute, multiple-choice test for each of the four certificates
Where offered (location-based or online): Both in person and online.
What is it? Software giant Oracle’s certification that you’re skilled with their latest BI software.
Cost: $245 per exam
Prerequisites: None required, but Oracle suggests you take their 11g Bootcamp course.
Tests required: Just one, the melodically named Exam Number 1Z0-591. You have two hours to answer 75 questions, and you’ll need to get 63% to pass. It’s multiple choice, though, so there’s the option for eeny-meeny-miny-moe logic if you get stuck.
Where offered (location-based or online): In person, at Pearson VUE test centers.
What is it? Software megavendor SAS’s certification that you can work with their popular business intelligence software. Prep courses are available in both classroom and “blended” (some classroom, some online) formats.
Cost: $180 per exam. The SAS Big Data course will set you back $9,000 for the classroom/in-person version (exam included), and $4,275 for remote training and both exams.
Prerequisites: SAS requires you have “at least six months of programming experience in SAS or another programming language.”
Tests required: You’ll need to pass both of the following to get certified:
Where offered (location-based or online): Both in person and online.
Do you need a big data certification?
Get ready to feel like Harry Truman asking for a one-handed economist, because I’m about to complicate things.
Not all data professionals agree on the benefit of big data certifications. “A data science certificate is a good start, but not enough,” says Gregory Piatetsky-Shapiro, editor of the data science website KDNuggets. Beyond the certification, Piatetsky-Shapario says, “you need to show skills and understanding,” whether by winning data science competitions at a site like Kaggle,or by just doing “some interesting analysis by yourself.”
Piatetsky-Shapiro is not alone in preferring practical know-how to official certification. Speaking of SAS big data certifications, veteran SAS developer Patricia Flickner says she’s more interested in whether “you can code and think on your feet.” Even if you have “a stack of certificates a mile high,” she says, you won’t be considered without knowing how to code and mine the right data from the right places.
Bo Peng of Datascope, a data science consultancy, says big data certifications aren’t a guarantee you’ll be hired. “I read in our last hiring round probably 200 resumes in all, with all sorts of different backgrounds and degrees, and I found no real correlation between the quality of the candidate, and the kind of certification they had.”
What mattered more, Peng says, was how candidates performed on the data science challenges they were given during the interview.
Randy Zwitch, principal data scientist at Comcast, corroborates Peng’s take. “The most important part of data science that we hire for is proven ability to solve data problems… we’re looking for people who understand math, are comfortable reading textbooks/technical papers, and understand how to work with varied data sources.” Certification, on the other hand, would only help a candidate “already working as a data analyst,” he says.
If there’s any correlation between certification and a candidate’s chances, Peng notes, it’s more likely to be negative. “A lot of the certifications come from proprietary software, where once you’re certified, you’re locked into a piece of mega software that costs consultant and client a lot of money.”
Licenses for business intelligence software and big data software are expensive, and that expense can seem unnecessary when there are multiple free, open source options, like the programming languages Python and R. This is even more the case when you consider how R and Python’s popularity is outpacing, and replacing, bigger data science players like SAS. Data science is a constantly shifting field, but open source languages are currently on top.
However, some people argue that big data certifications are a good idea. They point to the absence of data scientists, like the predicted shortage of 1.7 million employees with the required data skills. Big data certifications, they argue, are a useful way to signal that you can help plug the data science holes many organizations may have.
While a certification may signal some data science knowledge, most data scientists would agree that a certification is far less useful than academic training or hands-on experience. Erwan Rouzel of Credit Agricole Consumer Finance, for instance, argues that “being [a] real data scientist can’t be possible through just a certification, as it requires at least one or two years of studying advanced mathematics and statistics.”
A certification won’t provide the same in-depth knowledge as a college-level program.
What’s your take on big data certifications?
Do you have a big data certification? If so, do you feel it’s been a help or a hindrance? I’d love to know whether your career in data has been helped by certification.
If you’re further interested in big data, check out one of these other Capterra posts: