Data science is a field known for how much effort individuals put into their self-development. After all, it is a field that is either new, or a hodgepodge of several other fields (depending on who you ask and how you ask), so it makes sense that individuals would need to learn a lot on their own. Also, it is a constantly changing field, so needing to stay current is paramount.
Unfortunately, I think many data scientists ignore the most important skill to develop if they want to make an impact. That skill is presenting their findings in a way that drives others to take action. To understand why this is you need to understand the general workflow of analysis. In general the data science workflow has four steps:
Note: This post will include quotes or comments that people working in the industry made on posts I made about the subject. The post can be found here: Data scientists waste professional development time
Framing the problem
Framing the problem may well be the hardest part. It is often the most important part because only by doing it properly do you ensure results come from the work. If you frame your problem well both positive and negative results from your tests or study will have business value. If you do not frame it properly the best model in the world may not provide any value whatsoever.
In my opinion, a lack of understanding between analysts and the business are a big reason why reason why framing issues happen. Tri Ngo writes, “Do not be afraid to assume that business partners or requesters have no idea what they are asking for. I think this is where consultative skills and interpersonal come in handy, how to handle the dialogue exchanges tactfully, how to address pros and cons of the problem, and to find/provide alternative solutions without the need to build complicated models.”
Wait what? I’m not supposed to build complicated models? Maybe not. The simplest solution is usually the best one.
Obtaining & scrubbing & exploring the data
This part of the process is usually tedious, sometimes fun, and always time consuming.
Part of “framing the problem” should include verifying whether or not the needed data actually exists. Obtaining should be thought of as the actual process of getting that data. Scrubbing that data is the hard work that needs to be put in before you get to have fun with it. It is like paddling out before you surf; you need to work to get through the whitewater and breaking waves to get to the lineup and are ready to catch a clean wave. If you don’t put in the work to get your data right your models will not work or produce faulty results. It is the analytics equivalent to getting caught inside while trying to paddle out…like this guy below.
Data exploration is often an iterative process with obtaining and scrubbing data. That is one of the ways to identify what needs to be scrubbed, what is missing, and what you may be able to do with the data. Many times initial hypotheses will not survive data exploration; this saves tons of modeling time searching for a solution that does not exist.
Analysis & modeling
Performing analysis, building models and solutions is the fun part. If you don’t enjoy modeling just quit now and find something else to do. I’m totally serious, you will hate the job and as a result be mediocre at best, not find fulfillment, and not make the money you want.
Because analysis and modeling is the fun part, and possibly because most people from the outside think that is the most important part of the job, lots of self-development effort is focused on this step. It makes sense; you do more of what you enjoy doing, especially if you think it is doing you good.
Unfortunately, newer and better models typically get you incremental improvement over existing ones. It is rather rare that a new technique can solve a problem that was previous unsolvable. What delivers outsized value more often is taking existing analytical work and better framing the problem or solution.
I am not suggesting that data scientists should cease working on improving their modeling skills. Far from it, they need to keep learning new things. I am only saying that anecdotal evidence (and my own tendencies) imply that most of them over index on it.
Note: this applies to data scientists working on business problems and may not apply to some things such as ML/AI research. If you’re looking into new ways to use algorithms to detect cancer, you’re probably better off just figuring out new ways to keep refining that.
Developing and “selling” the solution
Once you’ve “solved” the problem, or discovered more about the root causes, it is time to develop a way to fix the problem. That part can be tough, but is usually rewarding. Sometimes it leads to a need to totally redesign a system from the ground up…the engineer’s dream!
The sad part about data science is that in many organizations it does not provide nearly the impact that it can. Data scientists and analysts come up with fantastic ideas only to have the business leaders not take action to implement them. There can be scores of reasons for this, but often times it is because the analyst does not make a good enough case to “sell” the solution.
I was crestfallen when one of my biggest analytical achievements failed to gain support among executives. I had spent months working on my own original idea, came up with a novel method of solving the problem, developed a case that was mathematically airtight, and identified a fix that would save millions of dollars per year. It was rejected the first and second times I pitched it when executives “felt” it was wrong.
Of course it was partially their fault that they did not understand, but I couldn’t control that. What I could control, and didn’t, was how well I laid out my “sales pitch” on how the solution was correct and how much it would save the company.
There are lots of ways to improve communication around complicated ideas. My personal favorite is what I learned as a management consultant. I wrote a course about it here (Present Like A Management Consultant), but honestly my purpose in this article is not to try to sell more courses. In fact, it is now free if you sign up for our mailing list (it will email you a coupon code).
The management consultant presentation style built around the pyramid principal is the best way to discuss complicated recommendations. If you follow the method I outlined it will dramatically increase the number of projects and solutions you get approved.
If you have even the slightest feeling I am correct, why not take the course. It is ~2 hours and now free. Also, if you do not believe me, listen to Jonathan Nolis’ interview on the data camp podcast and he makes exactly the same point.