Choosing the Right Data Scientists

Matching software skills to your technical culture is essential


Every organization wants to hire the best talent. When it comes to data science and data mining professionals, many college and university programs specialize in producing these individuals. As the graduation date draws near for the current class, you may be eager to bring some of that new talent on board. But before interviewing begins, you’ll need to understand not just technical needs but also your technical culture. All too often, a technical culture clash can negatively impact a high-performing team.

If you’re a senior manager or director like me, you’ve probably been working for more than 20 years. We completed our formal education at a time when computers were just being introduced into the workplace. I remember vividly staying up late writing SAS and SPSS code as a student and then submitting jobs to the university’s mainframe computers for processing; getting results meant walking across campus to the computer centre to pick up my output. Statistical software at the time was all commercial off the shelf, but those who were inclined could write their own routines in COBOL, Pascal, Basic or any number of programming languages.

Fast forward to 2017. When interviewing recent graduates for data science roles, we look for people who are critical thinkers, analytical, good communicators and, of course, have a sound understanding of statistics and data techniques. We ask about their knowledge of statistical methods and assess their approach to solving complex problems. At some point, we inquire about their expertise with different kinds of software. Invariably, the applicant says they learned R for statistics, and Python for implementing customized analytical routines, and with a far-off look, they supposed that they may have heard about SAS or SPSS in one of their classes.


Today, open source software is increasing in popularity, and it dominates the skillsets of graduates. Universities have embraced it because it is free, there are thousands of libraries of algorithms to draw on and an extensive network of user forums offers valuable opportunities for discussion and learning. The software is continually improved and new releases are published at least once a year. Just as important, there is an active community of developers enhancing and innovating new features all the time. Even an old SASS guy like me can see there are compelling reasons to adopt open source software.

Data science is a discipline that can be pursued by pointing-and-clicking the menus and options offered by the software’s developers or it can be fully programmed by writing code, importing libraries of algorithms and inventing new processes and methods—all to improve the accuracy and efficiency of statistical models. Whether using menu-based or command-line code, the simplest to most complex data science/mining algorithms—for analytical functions like cross-tabs, regression, clustering and decision trees—all are accessible in both approaches. Deciding which software package to use often comes down to which one you were first exposed to and had some training on.

Perhaps you have already faced this challenge in your organization, trying to balance the software preferences of old school staff and new up-and-comers. There are bound to be issues around which software and methods are better, faster and easier to implement. Management may not understand the flexibility in the new software and methods, and the incoming employees may not fully appreciate the years of experience the organization has developed with commercial software. Both groups need to understand that there are advantages and disadvantages to both commercial and open-source software.


But ultimately you will have to decide whether your organization wants to continue paying to license software from another company, which bears the costs of maintaining the software, customer support and training materials. While the open source software is free, it won’t come with a dedicated team to support your program.

And then there’s training. Is your organization prepared to invest in training new hires? How long will it take an individual to learn new software and possibly a new statistical language? Are they capable of “programming” or are they reliant on selecting from menus? Hiring new people always involves some investment, but the learning curve for data scientists can be especially steep if they need to migrate from one approach to another.

As you interview freshly minted data scientists, realize that your hiring decision must weigh the candidates’ skills, your organization’s culture and your vision for the organization. Whether you choose to hire old school, new school or hybrid, the only sure thing is that the skills required of a data scientist will continue to evolve. And that means a candidate’s flexibility and willingness to adapt may be even more important that their technical skillset.


Danny Heuman is Chief Analytics Officer at Environics Analytics, a marketing and analytical services company.