Learning R from Python — 5 lessons I wish I knew a week ago

(or, why I wish I’d read the docs)

Andreas Varotsis
Towards Data Science

--

After much, much fighting it, last week I finally decided it was time to stop working entirely in Python, and figure out this R nonsense I keep hearing about. There are endless articles about which is best, so I won’t re-tread that here — suffice to say you can’t do everything in either — but there are a few niggles that I wish I’d learnt before I started. What I did instead was dive in head first, and wrote some very, very messy code exploring how burglary and robbery shifted under COVID lockdown in London.

With that in mind, here are 5 things I wish I knew before I started coding. Hopefully, you’ll find it useful and not get as frustrated with some of these as I did.

photo by Kelly Sikkema on Unsplash

1. Python is broad, R is “focused” — so use the core tools and don’t worry about it

Python is all things to all men — it’s a sprawling ecosystem of a language, that can do everything from training your neural network to hosting your website to powering your robot.

R is not that. While it can build your website with the right shoving, it’s a tool for research and analysis first…and that means a lot of the options you face when exploring Python aren’t present in R. That’s a blessing. You’ll be using the same tools as everybody else, and they just work.

How many endless posts have you seen asking what the best Python IDE is? Whether you should be using Conda or Pip? Anaconda Core or Conda Forge? There are a million questions, and rarely a “right” answer, because it depends on what you’re doing.

In R, there is none of that: you’ll download R that comes out of the box(with one exception I’ll touch on later), and you’ll be running R studio. Everything will work-ish, and it’ll be *fabulous*. When you do import a library — there’s a graphical interface and everything — you’ll import globally. 90% of the time, you’ll be fine.

R Notebooks make combining code and beautiful visualisations easy (image by author)

The core toolset in R is actively maintained, free, built for research, and great. Starting there is just fine.

2. You can Google anything in Python — you might need to go search a bit wider for R. Go read the docs.

If you’re anything like me, you do a lot of Google-ing for function you know should exist, but you can’t quite remember the code for. “Remove just first duplicate row”, “remove white space from column names”…you know, stuff you probably should just figure out, but it’s easier to just grab some quick code from the first Stack Overflow post. In Python, there will be 100 posts that fit your exact need. There might not be a 100 in R, and sometimes you might not find any at all.

It’s hard to measure this — after all, R has nearly twice as many questions on Stack Overflow as Pandas — but I found quickly getting informal answers far harder than I expected. This is also true for tutorials: there are umpteen thousand Medium posts walking you through training Random Forests in Python, and far fewer in R (though that’s partly down to how hard searching single letters is).

I suspect this reflects how people learn: R is taught by academics and statisticians, while Pandas is picked up by semi-programmer hacker-wannabe types like me (no offence to either group intended). The latter live on Google, the former have libraries, classrooms and textbooks.

The good news is those textbooks are often free, online, and well managed and curated — in fact, there are even whole classes with their content available, and some function as interactive tutorials. So take an hour, run-through a tutorial, and consider reading an actual book before diving into Stack Overflow questions. You might not regret it. I’ve left a list of awesome resources below.

3. Oh yes, Tidyverse

I know, I know…I said you wouldn’t need to worry about toolsets, but I told a tiny, teeny lie: Tidyverse is the exception that proves the rule.

Tidyverse is a curated, well maintained ecosystem of libraries for data-science and manipulation in R. The fact is, if you’re learning R anywhere, you’re probably already learning Tidyverse — the best known R tools like ggplot2, dplyr and tibble, which make data-manipulation and gorgeous visualisation a real breeze, are all in Tidyverse, so don’t worry too much. If you’re coming from Python, you might be wondering whether or not it’s all worth learning…don’t worry about it. Tidyverse is great, it’s essential, and you’re probably learning it anyway.

4. Notebooks are still awesome, but they’re a little different

Remember how I told you you’d love RStudio? If you work in Jupyter Notebooks, that’s even more true, because R Markdown is amazing, and with a bit of tweaking, RStudio is a fantastic competitor to Jupyter Lab.

Once you’ve asked RStudio to open a notebook, and set the preview in-window, you’ll probably find an interface that’s awfully familiar. You can write notes in Markdown, add pictures, equations, execute per cell…Give or take anything you’d do in Jupyter translates well. You can even fit a convenient documentation viewer in there (which I recommend).

There are a few differences that take some getting used to. R accepts a myriad of formats, and some will “knit” your document together at the end — play around with the options, but the default html_notebook will probably be fine. Executed code also gets removed easier than I’d like, that takes some getting used to. Depending on your needs, you may want to explore the cacheing options.

In some ways, it’s also so much better. R markdown makes it so easy to hide specific bits of a cell, add a table of contents, and produce pretty HTML or PDF documents. There is even some nice functionality for dashboards and websites.

Once again, R rewards a bit of patience on this stuff. Read the RMarkdown documentation, learn about the fancier options, and enjoy your glorious notebook. Add a table of contents and some references. It’s lovely.

5. If you just dive in, that’s fine too

Despite all the niggles I ran into, R and Pandas are very similar from a functionality and structure perspective. Hitting Ctrl+Shift+Enter versus Ctrl+Enter is never going to stop confusing me, and when I find the person who decided “%>%” should be a fundamental code piece I may just have to throw something at them — WHAT IS IT EVEN MEANT TO BE — but if you’re comfortable in Python, all the fundamental workflow and principles you know will translate preeetty well…when they don’t, go read the docs.

Hopefully, at least some of this was useful! I certainly won’t be swapping to R for day-to-day data-wrangling, but it’s been weirdly fun learning a whole near toolset I feel isn’t quite so alien anymore. May your notebooks knit quicker than mine did!

I’ve compiled some of the resources I discovered and relied on below. If you find them as useful, please check out the authors/buy their books/send them a nice email — they’re all invaluable.

--

--

quantitative crime science @ MPS | Coordinator @ Police Rewired | My (personal) thoughts on crime, data, and economics | https://andreasthinks.me/