What’s a citizen data scientist?
A person who can do (some) data scientist-level work, without a data scientist’s training.
In other words? A citizen data scientist is every business’s friendly neighborhood unicorn.
If you want a formal definition, Gartner defines the citizen data scientist (CDS) as “a person who creates or generates models that use advanced diagnostic analytics or predictive and prescriptive capabilities, but whose primary job function is outside the field of statistics and analytics.”
In many cases, that “outside the field of statistics and analytics” means the CDS is a business analyst—a business analyst who’s learned to make those high-level models out of a mixture of initiative (if I can find out variable x, I can increase our revenue) and need (our data scientists are as overtaxed as a teacher without a planning period).
As data scientists are increasingly taxed with requests to make a business more data-driven, citizen data scientists can help their businesses in two key ways:
- they can lighten the load of data scientists by using the right business intelligence software to do the simpler data science tasks.
- They can bring an outsider’s, business-side perspective to data science.
Fortunately, becoming a citizen data scientist doesn’t require a degree, or even a full year of training. It does require work, but the benefits make the work worthwhile. If you’re interested in becoming a citizen data scientist, here are four steps that can start you on that road.
1. Ask for access to more, and new, sources of data.
If you’re tired of dealing with the same old data from the same old reports, you’ve got the citizen data scientist itch, and it may be time to ask your supervisor for access to data that isn’t included in your normal reports and information.
When you open up access to data to non-data scientists, you can see the strength and benefits that derive from citizen data science. By expanding data to a very unique group of citizen data scientists, IBM turned the 2016 Wimbledon tournament into a library of information. The computer giant empowered tennis professionals to use their data analysis program, Watson Analytics. The result was unprecedented insight into players’ performances. Watson Analytics was able to use data points as small as where the ball landed to determine whether a player’s style had changed.
Expanding access to people without data science degrees was also surprisingly easy—easier, actually, to train professional athletes to use data science software than it was to train data scientists to understand the intricacies of professional-level tennis. Better yet, it meant a group of people with expert-level knowledge were able to contribute to the otherwise inaccessible field of data science.
2. Learn how to use business intelligence software with advanced analytics features and smart data discovery.
Once you’ve got the new sources of data for new insights, you’ll need to know how to use the tools that make high-level data science a possibility for someone without a data science or statistics PhD.
What sort of features should you look for in software that can enable you as a citizen data scientist?
- Advanced self-service data preparation
- Behavioral analytics
- Graph analytics
- Location analytics
- Web analytics
- Smart data discovery
Advanced self-service data preparation already helped Sears transform their business intelligence analysts into citizen data scientists. Sears invested in Platfora’s big data discovery software solution, granting access to 400 of their analysts. As a result, the analysts were able to use customer segmentation—normally an advanced data science action—to improve product recommendations for customers on the Sears website.
Business intelligence vendor Alteryx offers an easy-to-use visual tool to do complex data blending. Rather than having to create a new data set to incorporate different types of data (say, an Excel file and an Oracle file), you can use Alteryx’s drag-and-drop function to reduce that lengthy data science task into a few clicks of a mouse.
Like Platfora and Alteryx, Paxata’s software makes advanced data analytics a reality. I spoke with Farnaz Erfan of Paxata, who described how one of their customers, a consumer packaged goods company, brought PhD-level activities to analysts.
Paxata created “a complete self-service paradigm for the analysts,” which didn’t require the help of data scientists. The company used the self-service solution to improve inventory, supply and marketing. For instance, using Paxata “has reduced the time it takes business analysts to prepare transit time data from five hours a month to less than one hour.” Another source of savings has been the ability to “detect coupon fraud by identifying and matching offending email addresses.”
While learning to use advanced analytics offers a lot of benefits, there also will be a learning curve. That said, it’s not too overwhelming: according to Gartner’s estimation, it should only take one to two weeks to get up to speed. Most vendors offer training, tutorials, and community forums with answers to common questions.
3. Make sure governance is set up
Mo’ access, mo’ (data governance) problems. Or that could be the case, unless you make governance a priority. With more citizen data scientists accessing more data sets, there are more opportunities for data to fall into the wrong hands.
“Data governance is absolutely key,” explains Werner Krebs, CEO of data science consultancy firm Acculation. “You have to train your employees to understand that data is valuable, and help provide them tools and frameworks to help them collect it,” he continues. Fortunately, there are multiple frameworks for organizing that data, from Total Quality Management to ISO 9001 to “the various six sigma frameworks.”
Gartner puts similar emphasis on data governance: “Proper governance is crucial, as is guidance on how to understand data, its relationships and appropriate uses.”
There’s a lot of value in letting more people access more data, but those people need to understand how to access it, and how to keep it secure (don’t go reading sensitive documents in an area with unsecured public Wi-Fi, for instance).
4. Make sure your organization has “guardians” overseeing how you use your data
A new role like citizen data scientist requires new rules, and roles, to manage it. The benefits you can get from advanced data preparation are definitely worth business-wide rethinking and reorganization. That said, you don’t want to get rid of the old data management roles.
One rule of thumb for data management in the age of citizen data scientist is summed up by the old Girl Scout song: make new friends, but keep the old. In other words, keep roles like data steward and database administrator, but also add new roles, like Gartner’s idea of the guardian, to make sure citizen data scientists can use what they need responsibly.
Gartner defines the “guardian” role as people who “ensure data can be industrialized, safe and scalable.” In other words, they’re people who oversee data security, and also see to it that successful instances of citizen data science can be adopted by the entire business. They also bridge the gap between traditional data management roles (Gartner calls them “operators”) and the citizen data scientists using data in new ways (“innovators”).
How have you used citizen data science?
Do you play the role of a citizen data scientist? Have you used advanced data analytics to help your business save money or make money? If so, let me know in the comments below!