
During the past couple of years, I’ve trained thousands of students on R Programming and interacted with a lot of people trying to learn R as their first programming language.
As an open source language, R has so many features that one can get lost when building a study plan around it. Where do you start? Learning objects? Learning models? Tackling data frames? The entanglement between these concepts make everything a bit more complex.
For beginners, R has an advantage. It’s, arguably, a bit more simple than Python as most people pick up the language to do one of three things:
- Data Analytics
- Data Science
- Data Visualization
With Python, you can pick up the language to do other stuff (more tied to software engineering) such as back-end development or some automation project – that makes the language a bit more complex than R.
This post should help you to plan your Learning journey around R. I’ve designed this learning flow after iterating a lot on my R for Absolute Beginners Course and incorporating a ton of feedback from my students (kudos to them!).
These are the 6 big topics that I recommend you to study, sequentially:
- R Basic Objects
- R Data Frame
- Modelling
- Functions
- Libraries
- Plotting
Let’s jump into them, with more detail!
R Basic Objects
R objects are the foundations to understand R language as a whole. The basic ones are:
- Vectors
- Lists
- Arrays
- Matrices
As you study them, you will get into contact with two important features that will define how you can interact with them, namely:
- uni-type vs. multi-type objects
- uni-dimensional vs. multidimensional objects.
Why not jump immediately into the main R object, the Data Frame? Because the are important operations that are just possible by mastering the other objects – two examples:
- Vectors can be used with the
%in%
command to subset multiple instances in a filter. - Lists are the only object in R that lets you nest objects.
The fundamental programming logic of R is centered around these basic objects. You should study them first before tackling other stuff to increase the likelihood of becoming a good R developer. You can check more about objects here or by jumping into the first sections of my R Programming Course.
Nailing the Data Frame Object
If you are working with Data Analysis or Science, manipulating data frames will be the most important skills to get on your tool belt.
If you are used to work with other two dimensional formats, such as the SQL table, this object will be a paradigm shift. To master data frames, you will need to get your feet deep into:
- Indexing rows and columns in R;
- Sorting objects;
- Aggregating by a specific key;
- Filtering;
These operations are ultra common when building our data for analysis or modelling. You will only be able to speed up your code development if you really understand how to work with data frames back and forth.
W3 Schools contains a nice guide about them!
Functions
Functions are what make your code reusable and clean. They are the backbone of proper R scripts and without them, we wouldn’t be able to call several methods on different objects.
Do you know that you interact with functions as soon as you start R? For instance, when you create a vector using c()
, you are interacting with a function called c
that combines the objects you feed into the arguments. Don’t believe me? Just call help(c)
on your R console!
Without functions, everyone would be writing dull and repetitive code that would be impossible to maintain and debug.
The first time you stumble upon building your own function, you may get a bit confused because most of us are used to do scripting (particularly, if you are not from a software engineering background). Learning how to write them will improve your coding skills. making you ready to tackle other programming languages and coding paradigms.
You can learn more about them on my R Programming Course or by checking some best practices on this blog post.
Step into Libraries
You only become a real R developer when you work around with other people’s code. How to do that? Using libraries!
Libraries (or packages) are the main advantage of using R (when compared with other non open-source languages). Learning how to install, load and debug packages’ code will give you access to literally million lines of code developed by the community.
What libraries can you start looking into? Here are some recommendations:
- The built-in
[rpart](https://cran.r-project.org/web/packages/rpart/rpart.pdf)
library that trains decision trees. [dplyr](https://dplyr.tidyverse.org/)
a really cool data wrangling library.[ggplot2](https://ggplot2.tidyverse.org/)
the most famous plotting library in R. You should leave this one for a bit later in your learning.
You might as well go to this blog post when you see fit and see some recommendations of libraries you can explore.
Modelling
The main mistake people do when they jump into R is to go right into modelling.
If you start here without understanding the basic objects and functions you will probably have a frustrating experience. Why?
First, you won’t be able to manipulate the output of your models so well because multiple models require different objects and may even output different formats.
Secondly, you will have a hard time understanding how the arguments of the modelling functions work. You don’t want to be tied with modelling only in base R – you want to be able to train your own advanced models using caret
, h2o
or other stand-alone libraries such as ranger
. All these libraries have their own trivialities and features. They all need different types of arguments, objects and specifics.
In the end, every model is a function in itself with its own set of arguments and parameters. And, there are three important things you must know to work with them seamlessly:
- How to manipulate functions.
- What type of objects do the arguments expect.
- How you can improve your training process by using external libraries with faster or more accurate models.
When you’re ready to tackle modelling, feel free to jump into my R Data Science Bootcamp, where you’ll learn theory and practice of building Machine Learning models in R.
Plotting
The last element of this list is Plotting. With ggplot2
, plotly
or altair
, you have plenty of choosing when it comes to visualization libraries. All of them are suited build really interesting plots that can tell a story about your data.
Becoming an expert in Data Visualization is no easy task. The libraries I’ve spoke of have, literally, hundreds of parameters and settings that one can tweak to improve. I recommend you start by building a baseline of the following plots:
You can do one of these plots in each library that I’ve detailed above.
Understand their major differences and intricacies will give you more flexibility when building your own storytelling around data. Another important detail – I recommend you skip base R plotting as it’s pretty limited when compared with any of the packages mentioned above.
That’s it! I hope you’ve enjoyed this post and you can plan your learning journey a bit better.
Doing this flow of learning has helped me train thousands of people around the world that want to learn R – and I’ve seen major breakthroughs in their coding ability when people follow this journey.
Of course, this doesn’t mean that you should only skip to the next concept after you completely master the concepts before. First, get a good grasp on the basics, strengthen the fundamentals and after doing a couple of practical exercises and building some code, you’re ready to move on to the next component.
The important part is that you feel comfortable on each skill set before moving on the next one.
If you would like to drop by my R courses, feel free to join here (R Programming for Absolute Beginners) or here (Data Science Bootcamp). My courses loosely follow this structure and I would love to have you around!
