How to Deliver on Your Data Strategy

By Salvatore Salamone

RTInsights Team
RTInsights

--

Sponsored by Anaconda and Intel

A discussion about the challenges data scientists face, why they’re looking to Python for help, and the need for enterprise-class features and support.

Businesses constantly strive to transform operations and differentiate their offerings to stay ahead of the competition. At the heart of most efforts is the need to rapidly develop and deploy many new data-centric applications. Unfortunately, projects often are slowed or delayed because data scientists are swamped with ever-more projects, or there are inefficiencies in the handoff between development and production.

Increasingly, Python-by virtue of its ease of use and powerful automation capabilities-is being used to speed the creation and deployment of data-driven applications. However, a limiting factor is that such open-source tools may not meet the performance, security, and replicability demands of production business environments.

To sort through these and other issues, RTInsights sat down with Stanley Seibert, Director, Community Innovation, at Anaconda; and Heidi Pan, Director, Data Analytics Software, at Intel. We discussed the challenges data scientists face, why they’re looking to Python for help, and the need for enterprise-class features and support.

Getting the Most Out of Data Scientists

RTInsights: With data scientists in such high demand, how can companies help them work to their fullest potential?

Seibert: Early data scientists were jacks of all trades, doing many different tasks. They would be involved in data prep, data quality checks, modeling, and then figuring out how to deploy the models. They did a bit of everything, which was part of the reason they were in such high demand. It was hard to find people with skills in so many different areas. But the field has matured. We’re starting to see specializations emerge. We’re starting to see more focus on organizations building out a team with different people with special skill sets.

You can have someone who just focuses on data prep and data quality, and you have someone who can focus on the model in question and how you validate the model. And you can have someone else focus on how to get models into production looking at what’s required to go from research prototype to something that you can deploy at scale. Each of these groups is becoming their own subspecialty. Getting more out of your data scientists might mean limiting the definition of what is a data scientist, but then augmenting them with other people with other job titles.

Pan: Adding to that thought, data scientists are the most expensive part of the pipeline. We really do want them to be productive. We need them to easily have the tools and the right vocabulary that they always use. We also need to enable them to understand, explore, and analyze data quickly. One of the big points is that data scientists take a lot of time to really sift through the data to figure out which subset of the data to use.

We want data scientists to do a fast iteration of large amounts of data. The role that Intel is playing is to bring modern compute power to data science. You have a lot of parallel hardware, you have a lot of large memories, and it’s relatively cheap compared to data scientists in the grand scheme of things. How do you throw machines at their problem and make their workflows much faster? At the end of the day, you get the insights, and you get the models much more quickly.

Continued on RTInsights.com

--

--