Can you be a data scientist without knowing how to code? Nope. But if it is that simple, why did I write this article?
It turns out the answer is a little more nuanced than that. In my view a data scientist is someone who is at a scientist level of knowledge for the uses of data. Scientist level skills are not something you can take a shortcut on.
Not all scientists match stereotypes
There are several different “flavors” of scientist in the real world. While there are of course university researchers, field biologists, and pharmaceutical developers, there are also practitioners like the chemists at Procter & Gamble working on what your Old Spice body wash smells like.
Data science is similar in that respect. While the perception of typical scientists all wearing white lab coats is wrong, the perception that all data scientists work at internet companies like Facebook and Google is equally wrong. Listen to some of the guests on the DataFramed podcast and you’ll see they come from backgrounds including consulting, non-profits, medicine, retail, etc.
Where am I going with this?
Lots of professionals need to use science without being scientists
The science of brewing beer is based on chemistry that is simple at a high level but incredibly nuanced in the details. Large brewers and their suppliers employ scores of chemists to ensure consistent quality of execution and tinkering with new formulas.
However, one does not need to be a chemist to brew good beer at home. Knowledge of the chemical principles along with standardized ingredients and processes can produce remarkably good beer. (I used to be a good homebrewer despite struggling in Chemistry 101 & 102)
I see data science in the same way. I do not refer to myself as a data scientist because I do not think my skills are Heisenberg level. However, I do consider myself a strong data analytics practitioner (to be clear, I was a strong SAS programmer a couple years ago and am currently a Python hobbyist).
New tools and programs such as Microsoft Azure, KNIME, and Alteryx make it possible to perform complex analyses without writing a line of code. However, practitioners still need much of the appropriate background in math which is far more difficult than the coding aspect to know well.
Alteryx is to data science as WordPress is to web development
There is an example of something similar in the tech world. Twenty years ago in order to create a website that was anything more than one page with static content required coding skills.
In 2018 amateurs can create excellent sites with little no web development background using wordpress (well maybe some amateurs can…you all can see www.theanalyticsdude.com is mediocre at best). However, in order to call yourself a web developer you still need to have a solid background in html, CSS, PHP, etc.
Data science without coding; Is dropping this barrier to entry awesome, awful, or somewhere in between?
Whether this is a good or bad thing will largely depend on who you ask. Many current and “credentialed” data scientists will probably see it as somewhat negative. A higher supply of labor typically means lower salaries or lower salary growth.
However, that might not be the largest negative effect. Statistical models, including modern machine learning ones, typically involve some mathematical assumptions. If these assumptions are not valid, the entire model is useless. This is bad when it means lost time, but is even worse when it creates a model that produces results that seem useful but in fact are not accurate.
When statistical models break, it is not always obvious. Sure, there are times when code won’t compile or results turn out obviously wrong. However there are also plenty of times when the results seem like they could be correct, but have no mathematical basis whatsoever.
I’ll return to my homebrewing analogy; thankfully there really is not a whole lot that can go wrong with home brewed beer that can actually hurt someone. If there is bad bacteria present it will taste and smell awful and thus no one will drink any. Unfortunately moonshine (homemade liquor) can be dangerous if made improperly. While stories of people going blind from drinking moonshine are largely old wives tales, there are still bad health effects from poorly produced ‘shine.
Lowering the barrier to entry for people who have a proclivity towards data but are either scared of learning to code or feel they dont have the time (that is an excuse, and not a particularly good one) is a net positive. Not only will there be more “data aware” workers out there, most of their output should be better. For instance a financial model workflow in Alteryx is light years more straightforward than one in Excel, and easier to troubleshoot also.
You cannot be a data scientist without knowing how to code. That will likely never change. However, you can do lots of good things with data, including some sophisticated models provided you understand the math.