Should you learn R or Python to get started in data science
In my extensive study of the sheer mass of articles and LinkedIn posts about R vs Python I have concluded that people spend far too much time thinking about where they should start.
114,000,000 results on google for Python, 828,000,000 for R. And on Bing…haha, Bing, that’s hilarious.
Yeah, I know, that is not actually an analysis. One, or two, data points without a reference point do not mean anything. That is the sort of crap that politicians use to make their points that may not withstand any analytical scrutiny whatsoever.
But you know that I am right.
The reason you are spending too much time on the decision is that the decision frankly is not that important. The reason why revolves around three inescapable facts.
- They are both powerful, effective, and relatively user friendly
- Both R and Python can do 90+% of the tasks that you will need before branching out into something different
- Many modern analysis suites will let you use both at the same time and if you take this to a serious and professional level you will probably wind up using both anyways
And your bonus 4th “fact”: If you are spending a ton of time researching this it probably means you don’t know enough to appreciate the finer differences of what you will or will not need.
But first, before we get into the framework let me lay out two baseline assumptions (trust me, both are correct):
- If you have someone who knows either language in depth and can answer your questions in person, choose that language. The amount of time you can save by asking questions is worth any trade off in whether the language is optimal for you.
- In depth knowledge of one is better than mediocre knowledge of both. If you know Python well, but suddenly need to use R for something, you can always google, “How do I do ____ in R.” Trust me, you’ll find what you are looking for, and if you actually know Python it will be easy enough to understand. You would not believe how fast you can get in depth answers on forums like Stack Overflow.
Here is how to choose between Python and R:
Why are you seeking to learn an analytics language?
For work: If you are looking to move to an analytics role in your current company or industry, learn what they are using at your company. You’ll be able to ask questions of your analysts and test out what you can do with real world (aka messy) data.
If you are looking for a job, check to see which is more popular in job postings in that industry. If you cant draw any conclusions from that, flip a coin and then do something.
For school: If you need to learn it for a class, pick what everyone else is doing (see point above about asking questions). If you have freedom to do what you want, e.g., grad student research, check in your department what they think is best. This relates back to the question point, but also to the fact that as a grad student doing research there really might be something optimal for you, e.g., R for statistical processing. If none of that applies, flip a coin and then do something.
For a hobby: Which sounds more interesting to you; exploratory data analysis and statistical modeling (R) or web scraping and machine learning (Python). If they’re both equal, or you don’t really understand what I just wrote, flip a coin and then do something.
In case you did not realize what I was doing there, the most important part of the decision making framework is doing something. The decision is far less important than taking action to follow through on the decision. You cant get into paralysis by analysis before you’ve actually become an analyst. That is a terrible precedent.
Like any skill, you need to work at it. Gary Vaynerchuk always says, “you cant read about doing pushups, you actually have to do them.”
If you still want a little more info, or want to see my scowling face telling you this, you can check out these videos on the subject:
You can also look at SAS if you are looking for a job in a bigger company. SAS is a great platform and from what I’ve been told they are opening it up to using R and Python in addition to the native SAS code.
What, that is your whole framework?
You may be thinking, that’s way too simple, there is way more nuance than that.
My retort; you are overthinking it, just go and do something. You are wasting time that you could have been using to install R Studio or Anaconda.
How to learn Python or R for data analysis
First, the best way to learn something is because you need to use it. If there is a problem you need to solve, learn what you need to do in order to solve the problem.
Second, I am a fan of the online material to learn from. If you are starting from the beginning, and have time to learn the in-depth nuances, offerings from Coursera or EdX have top level university classes in data analysis. Some of them are created by the people who are literally inventing some of the methods or real world uses. Most of them are free to take but seek money for a certificate of completion.
A faster, but less in depth version, would be to take some of the classes on a site like Udemy. In good classes the quality and depth of instruction for $10-15 is exceptional. These classes are from real world practitioners with extensive track records.
Python – Frank Kane’s class is well done. He spent years building recommendation engines and other analytical products at Amazon and IMDB, so his knowledge of web based analytics are top notch. I’ve negotiated this $10 rate for Analytics Dude readers (I don’t get a dime, its just a discount for you guys).
R – Kirill Eremenko is a former data scientist at Deloitte so his experience crosses numerous industries and has delivered results. I recommend his podcast Super Data Science. I am trying to negotiate the same deal for you that I got from Frank.
Is that all I need to get started?
Not really, I skipped a step for ease of storytelling. You also need to learn SQL (pronounced either Ess-qyoo-ell or sequel…I prefer sequel).
SQL is the language basically every database on the planet is written in. SQL queries are how you get data out of a database in order to analyze it.
There is a benefit to learning SQL first. In addition to being foundational, it is also easy to write a query. In the military we often referred to three stages to skill development as “crawl-walk-run.” SQL queries are the crawl stage. Check out my overview of SQL and resources to learn it for free here: How to Learn SQL the Easy Way and Why You Should.
SQL stays important no matter how good of a data scientist you become. I’ve heard many experienced data scientists regret that they should have paid more attention to SQL when they were starting.
So it that all?
Of course not, but it is a start. Analytics, like most subjects, is a bit like fractal geometry (if you don’t know what that is, I present Benoit Mandelbrot). You can go deeper and deeper, to a near infinite level of detail, but at all levels there is a familiar feel, theme, or foundation. Get the basics down and then you can think about going as deep as you want.
Let me know what I missed. I’d love to hear why this article is crap, or why you think it is brilliant (spoiler alert, it isn’t).
Any questions I did not answer?