
Python and R are the two key players in the data science ecosystem. I have worked on projects written in both Python and R. They do a fine job at efficiently handling data cleaning, manipulation, and analysis tasks.
However, we sometimes want to translate a code base from R to Python, or vice versa. The motivation could be the skillset within the company. If you have many Python developers and few R developers, you may want to maintain the projects in Python.
In some cases, specific project requirements may encourage you to translate a code base. For instance, while Python is renowned for its versatility, encompassing areas like web development, software development, and machine learning, R excels in statistical analysis and visualizations. Therefore, if the project involves heavy statistical analysis or the creation of intricate graphical representations, it may be beneficial to translate the Python code to R.
The translation process usually requires people with skills in both programming languages, which can be difficult to find. The good news is we can use ChatGPT, which possesses a decent level of skills in almost all programming languages.
In this article, we will be prompting ChatGPT for translating code written in R using the "data.table" package to Python using the "pandas" library.
We will just give the code without telling what it does or is supposed to do. Let’s start with the initial prompt to define the role of ChatGPT and tell our expectations:
Initial prompt: You are a software engineer with several years of experience in Python and R. Your task is to convert R code to Python. The R code does data cleaning and manipulation using the data.table package. You will convert it to Python code using the Pandas library.
The first example is a line of code that creates a new column in a data table.
Convert the following R code that uses `data.table` package
to Python using the `pandas` library:
melb[, Price_per_area := Price / Landsize]
Here is the answer of ChatGPT:

It not only gives the code but also explains what the code does. This is quite useful for someone who does not have an expertise in the source or target language.
The second example creates a subset of the data table by filtering on the price and type columns.
Convert the following R code that uses `data.table` package
to Python using the `pandas` library:
subset <- melb[Price > 1000000 & Type == "h"]
ChatGPT:

Let’s do a slightly more complicated example. The line of code in the following prompt calculates the average house price in different regions for houses with type "u".
Convert the following R code that uses `data.table` package
to Python using the `pandas` library:
melb[Type == "u", .(avg_price = mean(Price)), by="Regionname"]
ChatGPT:

The code in the screenshot above:
avg_price = melb[melb['Type'] == 'u'].groupby('Regionname')['Price'].mean().reset_index().rename(columns={'Price': 'avg_price'})
It successfully completes the task but I would prefer using named aggregations, which I think is more readable and shorter. Let’s ask ChatGPT to use it as well:
Convert the following R code that uses `data.table` package
to Python using the `pandas` library. Used named aggregations.
melb[Type == "u", .(avg_price = mean(Price)), by="Regionname"]
ChatGPT:

The code in the screenshot above:
avg_price = melb[melb['Type'] == 'u'].groupby('Regionname').agg(avg_price=('Price', 'mean')).reset_index()
This is better as it does not require renaming the columns afterwards. We could also use the as_index
parameter of the groupby
function and set its value to False so that we don’t have to reset the index after the calculation.
The R code in the next prompt calculates the average house price and the total number of houses in each region and sorts the results in descending order.
Convert the following R code that uses `data.table` package
to Python using the `pandas` library. Used named aggregations.
melb[ Type == "u", .(avg_price = mean(Price), number_of_houses=.N), by="Regionname" ][order(-avg_price)]
ChatGPT:

As with the previous examples, the step-by-step explanation is quite useful for understanding and debugging the code.
Let’s do a final example on changing column names. The R code in the following prompt renames the type and distance columns as "HouseType" and "DistanceCBD", respectively.
Convert the following R code that uses `data.table` package
to Python using the `pandas` library.
setnames(melb, c("Type", "Distance"), c("HouseType", "DistanceCBD"))
ChatCPT:

The code in the screenshot above:
melb.rename(columns={'Type': 'HouseType', 'Distance': 'DistanceCBD'}, inplace=True)
I liked the note about the inplace
parameter, which can easily be oversighted.
Final words
The examples in this article demonstrate how successful ChatGPT is at translating code from one Programming language to another. We did a line-by-line translation, which cannot be considered as proof that ChatGPT is able to translate an entire code base.
However, it is clear that using ChatGPT for this task when working with a large code base can save you hours. You can always take a second look and test the output to make sure it works as expected.
You can become a Medium member to unlock full access to my writing, plus the rest of Medium. If you already are, don’t forget to subscribe if you’d like to get an email whenever I publish a new article.
Thank you for reading. Please let me know if you have any feedback.