As both data and data management challenges seem to grow exponentially, hiring qualified people for data science roles is more critical than ever. In a recent report, IBM estimated that the number of job postings in the broad data and analytics category will grow by 15 percent to 2.7 million in 2020. And data scientist/advanced analyst positions—a 2 percent slice of the broader category—will approach 62,000 by 2020, a 28 percent increase over 2015. To fill this need, many universities and colleges have jumped on the data analytics train and offer programs that teach the methods of data science, though many of these initiatives are still in their infancy. The resulting talent gap leaves many data-driven companies facing stiff competition for the select few graduates who do have the right skills.
This reality has forced our data and analytics company to analyze the key qualities that make a good data scientist—and the results may surprise you.
Over the past year and a half, we have hired six researchers to work on our data development team, which is responsible for maintaining an extensive suite of data products in Canada and the U.S. The members of this team fall into the broad category of data scientists, though “data alchemist” may be a more accurate title because building the company’s data products requires a combination of science and art. And we know we are only rarely going to find that rare someone with experience and knowledge of a long list of techniques and methods.Instead, we focus on finding people with one or two key technical skills who can demonstrate an ability to troubleshoot. It is amazing how important, yet undervalued, the ability to troubleshoot is in the marketing analytics and geodemographic industry. Given our company’s rapid growth and commitment to developing new products while enhancing old ones, it is critical that team members be able to think on their feet, juggle many projects and get to the heart of technical issues quickly—the essential qualities of a good troubleshooter.
I find comparing data scientists to blacksmiths helpful in illustrating why troubleshooting skill is one of the key attributes of an effective data scientist candidate. During their heyday in the pre-industrial economy, blacksmiths were critical workers, skilled craftspeople who constantly had to adapt their techniques to properly work with metals that ranged widely in quality from one piece to the next. In today’s post-industrial economy, data science is increasingly becoming an underpinning for nearly every organization as c-suite executives demand authoritative data to make informed business decisions. Like blacksmiths, data scientists must use a wide variety of techniques and technologies to crunch and analyze data. And just like blacksmiths, data scientists must cope with the fact that the fundamental inputs of their craft, data, are highly variable. No two data datasets are the same, and that means data scientists must constantly adapt their techniques to deliver a quality analysis. And that kind of alchemy requires the capacity to troubleshoot.
Ultimately, troubleshooting comes down to the three critical attributes—attention to detail, critical thinking and creativity—and our new hires quickly find themselves drawing on all three. For example, one new staff member who had been working with a legacy algorithm used to build one of our more complex economic datasets discovered it was not behaving as expected when applied to the latest data. As a result, he had to interrogate both the incoming data and how the data was handled by the algorithm. Only by digging into the details and thinking critically about the problem was he able to adapt the legacy algorithm to handle the latest data, improve the processing time and create a quality output.Another group of new hires had been working with several senior researchers to adapt a set of methodologies used for a Canadian data product to develop a similar product for the U.S. While the overall framework for the Canadian data applied to the U.S data, the devil is always in the details. Adapting the existing methodology to the unique attributes of the new data mentally stretched the new hires and senior researchers alike to think creatively about how to implement certain types of modelling frameworks. After a few days of effort, they were able to reconceptualize components of the modelling framework, resulting in a significant improvement in the reliability of the core models underpinning the data product.
Because we aren’t looking for data scientists with a laundry list of specific skills, our data development team boasts a wide range of skills. You might expect the team to be full of statisticians, but in fact the group consists of geographers, mathematicians, economists, engineers, sociologists and, yes, a few statisticians. Their ability to troubleshoot and their diverse backgrounds are the real strengths of the team. As a group, they can draw on different techniques and theories to solve problems. Another benefit is that, because each of our new hires has a unique set of skills, they can cross-train each other thereby multiplying the number of skills in each other’s toolbox.
Based on this experience, I’ve come to believe that the ability to troubleshoot is a critical skill for anyone doing data science. And there are tangential benefits as well. New hires who can troubleshoot are able to become productive members of the team faster than others because they are able to learn new techniques and adapt existing skills quickly. And the pool of potential candidates suddenly opens up when an organization no longer fixates on one technique, software or scripting language when looking for new talent. More than knowing R or how to train a random forest, the ability to troubleshoot transcends specific approaches, and unlike some languages or software, it will never become obsolete.
—————————
Sean Howard is senior vice president of research and development at Environics Analytics, a marketing and analytical services company.