The world’s leading publication for data science, AI, and ML professionals.

Introduction to R

Is R the right language for me?

Photo by Chris Liverani on Unsplash
Photo by Chris Liverani on Unsplash

From time-to-time, I enjoy browsing for available jobs in my area. Not only does it aid in finding new opportunities, but it also provides a perspective on what skills are up-and-coming. Knowing which languages to study early on can be critical to getting a job. This was a reason I created a Kubernetes Cluster on Raspberry Pis for my Individual Studies course in College. It is also why I am working on more Kubernetes work, moving now to virtual machines. If you want to learn more about that project, I have another series on Kubernetes you can check out here.

While digging through an assortment of jobs, I began noticing that with some back-end developer jobs, employers were looking for at least a little familiarity in R. Because employers are looking for a little familiarity, I thought it might be a nice practice. Not a deep dive, but let’s scrape the surface of R, learn what it is, who it’s made for, why R, and a couple of basic functions.

What is R?

R is a language built for creating graphics and statistical computing. However, R can do more than computing. There is also CRAN (Comprehensive R Archive Network), which hosts both FTPs and web servers.

Not only this, but R can also dive deep into machine learning. R can also work with databases, connect to other languages (such as rJava), and even has a markdown framework.

The most important aspect is that R is a simple language to learn, which is why it is primarily directed in business. Users of any skillset can learn how to create a visual representation of data.

R is also multi-platform. It can operate on Unix, Windows, or macOS.

Who is R for?

R can be used for many different types of users. One such user is data miners. Another is statisticians. But R could also be used by business users. While PowerBI has become more popular, R has both an easier learning curve but also more capabilities. Especially with the capabilities in many different graphs, anyone requiring graphs and visual representations of data.

Why R?

As previously mentioned, R has an easier learning curve than many other tools for data analysis, while still having a wide range of capabilities in terms of Data Science for business. As also mentioned, R has a wide range of abilities that are not limited only to analysis, such as machine learning and CRAN.

But enough talk about R. Time to get to know a little bit about the syntax.

Basic Graphic Representation of Data

To install the necessary components to run R, I first had to install a single package on my Ubuntu 16.04 server via the command line. This was to ensure I would be able to run the command from the terminal within VS Code. First, on the terminal I installed the r base core package:

sudo apt-get install r-base-core

This was the only line I needed to get R working on my Ubuntu server. Now to VS Code.

First, I installed the "R", "R LSP Client", and "r-vscode-tools" extensions.

Once the extensions were downloaded, I created a file titled "rTutorial.R". Be sure the extension R is capital. We can now begin coding. A note is that for each segment of code I completed, because the R file needed to be executed, I would use the following command in the terminal:

Rscript rTutorial.R

This ensured that the R file would execute and display the results. I will not write about using this line every time we run the code, as it will be the same.

As stated before, we are not going to dive very deep into R today. Just a brief overview of some of the basic capabilities. To do our calculations, first, we will need to declare a variable:

testScores <- c(92,87,58,74,90,86,86,94,82)

Viewing is much like Python and only requires a print statement.

print(testScores)
Test scores output.
Test scores output.

The first basic calculation will be the mean of the data set. Variables can have periods in them, which we will use to describe our next variable:

mean.testScores <- mean(testScores)
print(mean.testScores)
Mean of test scores.
Mean of test scores.

The median is very similar in syntax. We can simply call the function in a variable, then display the output:

median.testScores <- median(testScores)
print(median.testScores)
Median of test scores.
Median of test scores.

The mode is a little more difficult. This is because it is made to calculate not only for integer/decimal values but would also work for string values. Because of this, we will need to create a function to find the unique mode value:

findMode <= function(x) {
     uniqv <- unique(x)
     uniqv[which.max(tabulate(match(x, uniqv)))]
}

Now you may return to the normal call:

mode.testScores <- findMode(testScores)
print(mode.testScores)
Mode of test scores.
Mode of test scores.

Another useful tool of R is being able to easily locate any outlier values. To do so, simply look at a boxplot version and find the values outside of the range. Of course, a variable can represent that value:

outVals = boxplot(testScores)$out
print(outVals)
The outlier of test scores.
The outlier of test scores.

You may also choose to simply view the dataset without any outliers:

testScores[ !(testScores %in% outVals) ]
Dataset without outliers.
Dataset without outliers.

The next calculation that can be done quickly is the normal distribution. Although usually more math-intensive, R can simply calculate with a function:

norm <- rnorm(testScores)
print(norm)
Normal distribution.
Normal distribution.

One more very basic use of R is to find the standard deviation of the dataset. Once again, no math is required. R also has a function for handling this:

standDev = sd(testScores)
print(standDev)
Standard deviation.
Standard deviation.

An important functionality that we did not learn was the graphing side. This is because currently, we are using the command line to execute the code, as all the R code is in the rTutorial.R file. Because no graphs display in the command line, we did not show this code. However, I can get you started. For a basic barplot, simply use the function with the dataset:

barplot(testScores)

There are of course many other types of graphs, charts, histograms, etcetera. One handy choice would be to use RStudio, which is a program that has a user interface. This would not only allow you to constantly view variables, but to also view any desired graphs. For example, if I opened my installed version of RStudio, which I have on my windows desktop, declaring the dataset first:

RStudio dataset.
RStudio dataset.

And now we can use the barplot function:

Barplot of test scores.
Barplot of test scores.

There is a wide range of options for graphs. You can manipulate the colors, the orientation, add labels, and so on. For now, we had a good first dive into R and learned something new.

Conclusion

Although we did not spend too much time in R, we learned a lot about what it can do. We learned that it was built primarily for statistics and graphical computing and saw an example of each. We also learned that although R can be used for deep machine learning and data miners, it can also be a powerful tool for business users that need a clear representation of data.

I hope you enjoyed learning a little more about R as I have. Until next time, cheers!


| Sign up to join my mailing list here.


| References

The R Project for Statistical Computing

6 Reasons To Learn R For Business [2021]

Why Learn R? 10 Handy Reasons to Learn R programming Language – DataFlair

The Comprehensive R Archive Network

R – Mean, Median and Mode

R: The Normal Distribution

Standard Deviation


| Check out some of my recent articles

A Journey with Kubernetes – Part 4: APIs

MongoDB

Why I Needed to Graduate College With No Debt

3 Ways to Handle Errors in FastAPI That You Need to Know

SpeechRecognition in Python


Related Articles