How R simplifies data wrangling operations

I’m a self-taught data scientist who spent almost two years to make a career change and land a job in the Data Science domain. I started off my journey with Python and I’m glad for this.
Python has many advantages that attract aspiring data scientists. A rich selection of third party libraries that expedite and simplify data wrangling tasks is just one of them.
I have always been a big fan of Python data science ecosystem. In addition to Python, I started to use R packages a while ago. Now, I feel like R beats Python in data wrangling.
In this article, We will go over several examples that demonstrate how R packages perform typical data wrangling tasks simply and seamlessly. The packages we will be using are data table and stringr.
We start by importing the packages. You need to install the libraries first if you are using them for the first time. It is worth mentioning that I use RStudio as IDE.
# install
install.packages("data.table")
install.packages("stringr")
# import
library(data.table)
library(stringr)
What "beat" means here depends on your expectations. As a person who has been using Python libraries exclusively, switching to R felt like performing the tasks more concisely.
Without further ado, let’s start with the examples. We will use the Melbourne housing dataset available on Kaggle for the examples. The fread function can be used to create a data table by reading a csv file.
melb <- fread(file_path)
The file path depends on the location of the dataset in your computer.

For each example, I will first define a task and then provide the solution.
Task: Find the address, price, and date of the houses in Albion. Then, sort the results by price in ascending order and by date in descending order.
melb[Suburb == "Albion",
.(Suburb, Address, Price, Date)][order(Price, -Date)]

We first apply the filter and then specify the columns to be selected. It is quite straightforward to sort the results. The order function is used along with the column names. The minus sign indicates sorting in descending order.
Task: Change the name of the following columns as indicated:
- Regionname to Region
- SellerG to Seller
setnames(melb, c("Regionname", "SellerG"), c("Region", "Seller"))
The first argument we pass to the setnames function is the name of the table. The second and third arguments are the factors that hold the current and new names, respectively.
Task: Remove the rows with missing values in the price column.
Handling missing values is a frequently done operation in data wrangling processes. The data table package allows for eliminating missing values with the "is.na" function.
melb <- melb[!is.na(Price)]
We use the output of the is.na function as a condition for filtering. The "!" means "not". Thus, we only take the rows in which the Price is not null.
Task: Calculate the average house price in each region for the houses with type "u".
melb[Type == "u", .(avg_price = mean(Price)), by = "Region"]

The first part is the filtering as we have done before. The second part is where we do column operations. We take the average of the price column and assign a new name. The last part is used for grouping the rows.
Data table package provides an order of operations separated by comma in square brackets. Such a standard makes it easier to learn and perform complex operations.
Task: In the previous example, we have calculated the average price in each region. Let’s say we want to compare the house prices on a more general level. For instance, we can compare the western, northern, and eastern parts.
We can create a new column by extracting the first word in the region column and name it as region group. Then, we can easily calculate the average house price for each group.
melb[, region_group := str_split_fixed(Region, " ", 2)[,1]][, .(avg_price = mean(Price)), by = "region_group"]

We have used the str_split_fixed function from the stringr package. The "[,1]" expression after splitting means we want to get the first part after splitting. This extracted part is assigned to a new column called "region_group". The remaining part calculates the average price in each group.
Conclusion
We have done several examples to demonstrate how typical data wrangling tasks can be done with R packages. There are, of course, much more complex operations that R provides.
What I like most about R packages is how it combines multiple small operations into a concise and standard one. We can achieve a lot with R packages in just one line of code.
The goal of this article is not to declare R as superior to Python with regards to data science libraries. I just wanted to emphasize how I liked R better than Python for data wrangling tasks.
Thank you for reading. Please let me know if you have any feedback.