I recently read a Harvard Business Review (HBR) article  “You need an algorithm, not a Data Scientist”. Other articles present similar arguments  . I disagree. Data Scientists and automation (data products, algorithms, production code, whatever) are complementary functions. What you actually need is a Data Scientist and then an algorithm.
Data Science supports automation
Good Data Science supports automation. It tells you:
- what you didn’t already know about the data (profiles, errors, nuances, structure)
- what an appropriate algorithm should be, given what you now know about the data
- how your data should be prepared for that algorithm (removing correlations, scaling variables, deriving new variables)
- what the measurable expectations of that algorithm should be when it is automated in production
Data Science and Automation are Complementary
The author (from an analytics vendor) makes the following points which I address below:
- Companies are increasingly trying to do more analysis of their data to find value and are hiring people (data scientists) to do this work. This people-centric approach does not scale.
- The point of Data Science is to be a service. This service can quickly do agile experiments to quantify and investigate business hypotheses about data and help inform the roll out of products. Doing Data Science therefore informs the investment decision in software development, software purchase, software tuning, etc. It is never meant to scale up to replace automation.
- Some patterns are too imperceptible to be captured by humans. The author gives the example of monitoring a slowly changing customer profile which would go unnoticed with a manual examination of the data. However algorithms can continuously monitor this data at scale and so are better.
- This is partially true. Algorithms can certainly work day and night, quickly processing refreshed and streaming data better than any human could ever hope to. However, if the system being analysed is not well understood then appropriate analyses cannot be chosen and tuned before ‘switching on the fire hose’. It is this understanding, modelling, analysing and tuning that is the job of the Data Scientist in collaboration with the domain expert. The Data Scientist does this in part using statistical and machine learning algorithms.
- Modern tools “require very little or no human intervention, zero integration time, and almost no need for service to re-tune the predictive model as dynamics change”.
- The vast majority of time on a data project is spent understanding and cleaning the data. Be very sceptical of claims that automation software can simply be ‘turned on’ without the necessary understanding of the data and the problem domain. Data is just too varied.
The HBR article poses an interesting challenge. Are completely automated algorithms the future? Get in touch and let me know your thoughts.
You can read more about how to do agile Data Science that transfers from the ‘lab’ to the ‘production factory’  in my book Guerrilla Analytics: A Practical Approach to Working with Data and get the latest news at http://guerrilla-analytics.net.
 You Need an Algorithm, not a Data Scientist, Harvard Business Review
 Why You Don’t Need a Data Scientist, Ubiq
 To work with data, you need a lab and a factory. Redman, T.C., Sweeney, B., 2013. Harvard Business Review.