
One of the beauties of the data Analytics field is its openness to multiple tools. The field has already shown that knowledge of one single tool is a handicap. In order to survive and grow in this field, you ought to be:
Jack of all trades and master of some. – unknown
Excel is one of the prominent tools when dealing with data in the majority of businesses. I myself had the experience of working with Excel for data analysis and found few shortcomings working with them. The problems faced by me were not my own but the majority of also users face them.
Here, I pitch my reasons to switch from Excel to open-source programming languages or rather leverage their power:
- Excel columns do not demarcate between inputs and outputs.
- The user-defined formulas are not intuitive as instead of variable names cell locations are labeled.
- Opening/handling large-sized files are slow and the system often crashes (the business cannot often upgrade computers).
- Merging different Excel files into a single file is cumbersome and similarly dealing with different sheets within a single file.
- Excel is behind the paywall.
As already mentioned, open-source programming is one of the options and here I will be using R to automate simple Excel file handling.
The topics covered are:
- Machine-readable file naming
- Reading Excel files
- Merging Excel files
- Splitting Excel files
- Create multiple Excel files
Objective
The objective is two read three Excel files with global data related to life expectancy, GDP expenditure, natural fuel usage, and water availability. Merge the files into a single Excel file with three sheets and save them in the working folder. Combine the three sheets into one using Outer Join
and save the data for each country into their corresponding continent folder.

So, let’s get started.
Machine-readable file naming
Proper naming of files is important as it comes in handy when dealing with Automation. The best way to name a file should be such that it is machine-readable. Try implementing the following rules when naming the files:
- Avoid accented characters, spaces, and avoid case-sensitive text.
- Use delimiters or separators such as "-" or "_".
- Use machine-readable date formats such as YYYY-MM-DD.
- Add version control at the end for easy navigation (e.g. -01,-02,…,-10).
An example of naming a file: 2021–07-15_Excel_Automation-01.xlsx
Reading Excel files
For reading the files, the xlsx
package is used. There are two functions: read.xlsx()
and read.xlsx2()
to read files in xlsx
package. read.xlsx2()
is fast and efficient when dealing with large datasets. Here the below code is for reading Excel workbooks with one sheet.
# reading individual files
fuel <- read.xlsx2("global-fuel-vs-gdp.xlsx",sheetIndex = 1)
life_exp <- read.xlsx2("global-life-expectancy.xlsx",sheetIndex = 1)
water <- read.xlsx2("global-water-share.xlsx",sheetIndex = 1)
Since we are here for automation, so let us look into that. To approach the problem of reading multiple files, use list.files()
function to create a list containing names of all the files of interest. Store the Excel data from individual sheets into data frames. Use a proper naming convention for the defined data frame objects as it will help in automation. The assigning of data frames to individual sheets is achieved by two For loops and assign()
function.
# set the working directory
setwd("C:/Users/amalasi/Documents/R/Blog post/Blog17-Excel1")
Merging Excel files
Once the Excel data has been loaded, the data is ready for processing. Here two possible ways of data merging are discussed. The first path is to create a single workbook containing multiple sheets corresponding to all the workbooks’ sheets. The second path is to merge all the sheets in the workbook created in the first path into a single sheet.
Creating a single workbook with multiple sheets corresponding to multiple sheets of all the workbooks
By using the write.xlsx()
function inside a for loop can help add multiple workbooks with multiple sheets into a single workbook as different sheets. Care should be taken when using the write.xlsx()
function, for the first time make sure to set the append
parameter value to its default, which is FALSE. This creates the workbook and to add individual sheets to this workbook, just set the parameter append=TRUE
.
Creating a single sheet by combining multiple sheets of the workbook
The trick here is to remember to name all the temporary data frames that are created during the automation process to have a set pattern (remember convention defined in machine-readable file naming). This helps tremendously at least in this part.
The objective of this exercise is to merge the sheets using an outer join. To do so, create a list of all the relevant data frames that will be combined to create a single data frame or sheet. Using ls()
and mget()
functions, the list of temporary data frames are created and assigned to the object named _locallist. By defining the list object, the data frames can be easily manipulated inside a For loop. This chunk of code will create an object named _globalfinal with all the data merged into a single sheet.
Splitting Excel files
When dealing with a bulky workbook with multiple sheets, all the individual sheets can be extracted and assigned to individual data frames using the below code. This is achieved by using the getSheets()
function.
Create multiple files and folders
The last objective was to create country-specific files and save them in the folders corresponding to their respective continents. Here are two steps that are required to full fill the objective.
The first step is to create the folders named after the continents which are achieved by using the dir.create()
function.
The second step is to split the sheet into multiple data frames with information for individual countries and save them. Start with indexing the individual countries as this helps in creating For loop. Next, using filter()
, extract the country information and store it in a temporary object. Then create a workbook, define a sheet and write the data to the sheets. The final step is to save the files in the destination folders.


The link to the complete code is here.
Conclusion
The article clearly demonstrates spending few hours scripting the code in R and can save a tremendous amount of time over the long run in merging and splitting multiple Excel files.
The above code is independent of the number of sheets in a workbook and can handle sheets with varying data columns.
The key learning was: defining machine-readable file naming conventions and reading, writing, merging, and splitting multiple Excel sheets and workbooks.
The next steps will be to use pivot tables, Vlookup, and other Excel functions using R.
Interested in more automation-related blogs from me then do check out this blog:
Other blogs on R and plotting in R:
3 lesser-known pipe operators in Tidyverse
All you need to know to build your first Shiny app
7 shortcuts worth knowing for efficient workflow in R
Data visualization using ggplot2: 5 features worth knowing
Master data visualization with ggplot2: pie charts, spider plots, and bar plots
Data source
- https://ourworldindata.org/grapher/life-expectancy
- https://ourworldindata.org/water-sanitation-2020-update
- https://ourworldindata.org/energy-poverty-air-pollution
Link to the Github repository.
_You can connect with me on LinkedIn and Twitter to follow my Data Science and data visualization journey._