The world’s leading publication for data science, AI, and ML professionals.

Business Automation with Python (1) – A very easy tutorial for file systems

How to process document in batch, Python OS 101

Photo by Testalize on Unsplash
Photo by Testalize on Unsplash

During my career, I’ve seen many people working very hard day and night. But does it have to be in this way? I am always looking for creative Automation methods to improve work efficiency. So I decided to write a series of articles to introduce how python can speed up your work so that you could spend more time with your family. Hope you enjoy it!

Use Case

Charles just found a job as an Account Payable coordinator. He received tons of invoices from vendors every day. Different vendors have different name conventions. And Charles needs to add the invoice receiving dates in all the files. What is the quick and sweet way?

Here we will use the OS module in Python to help Charles.

The OS module in Python provides functions for interacting with the operating system or in other words, interacting with files.

Image by Author
Image by Author

The procedures to fulfill Charles’ requests are:

  1. Locate the file folder
  2. Get all the file names
  3. Rename the file names

I will use the below codes as a template to add the procedures one by one.

import os
def filename_modify():
 pass
filename_modify()

1. Locate the file folder

Think about the function filename_modify we just created, one parameter required obviously is the file folder location. Therefore, we add ‘target_dir’ as the first parameter. The newly added codes are in bold.

import os
def filename_modify(target_dir):
 pass
target_dir=r'C:test'
filename_modify(target_dir)

However, we would also need to consider whether this is a valid path. Here we use ‘os.path.exists’.

import os
def filename_modify(target_dir):
    if os.path.exists(target_dir) == False:
        print('Error -  File/folder not exist, please double check the path')
target_dir = r'C:test'
filename_modify(target_dir)

2. Get all the file names

The method we use here is ‘os.listdir’

import os
def filename_modify(target_dir):
    if os.path.exists(target_dir) == False:
        print('Error -  File/folder not exist, please double check the path')
    else:
        print(os.listdir(target_dir))

target_dir = r'C:test'
filename_modify(target_dir)

Above codes return a list containing all the file names and file folder names in target_dir.

['company A', 'company B', 'company C.txt', 'company D.pdf']
All the file/folders in target_dir
All the file/folders in target_dir

We can use for loop to handle the files one by one given it returns a list.

import os
def filename_modify(target_dir):
 if os.path.exists(target_dir) == False:
 print('Error - File/folder not exist, please double check the path')
 else:
 for file in os.listdir(target_dir):
 file_name = os.path.splitext(file)[0]
 file_ext = os.path.splitext(file)[1]
target_dir = r'C:test'
filename_modify(target_dir)

‘os.path.splitext’ split the file names to file name and file extension. For example, os.path.splitext(‘company C.txt’) returns [‘company C’, ‘.txt’]. That’s why we use [0] and [1] to extract the file name and file extension.

However, Charles’ request is to change the file names not the folder name. We need to add a few more lines to filter out folders. Remember we get the file extension from file_ext, so our rule to determine whether this is a file is depending on whether the file name contains an extension.

import os
def filename_modify(target_dir):
 if os.path.exists(target_dir) == False:
 print('Error - File/folder not exist, please double check the path')
 else:
 for file in os.listdir(target_dir):
 file_name = os.path.splitext(file)[0]
 file_ext = os.path.splitext(file)[1]
 if file_extend !='':
 pass
 else:
 continue
target_dir = r'C:test'
filename_modify(target_dir)

3. Change file names

The command to change file name is

os.rename(old_path, new_path)

Either old_path or new_path needs to contain path and file name. Here we use os.path.join to get the file’s location.

old_path = os.path.join(target_dir, file)

For the new path, remember Charles wants to add a date on that? so we need a date library.

from datetime import date
today = date.today().strftime('%Y%m%d')

It returns a string ‘20201203’.

Now we have a new version of codes:

import os
from datetime import date
today = date.today().strftime('%Y%m%d')
def filename_modify(target_dir, add_str=today):
 if os.path.exists(target_dir) == False:
 print('Error - File/folder not exist, please double check the path')
 else:
 for file in os.listdir(target_dir):
 file_name = os.path.splitext(file)[0]
 file_ext = os.path.splitext(file)[1]
 if file_ext !='':
 old_path = os.path.join(target_dir, file)
 newfile = add_str + "_" + file_name + file_ext
 new_path = os.path.join(target_dir, newfile)
 os.rename(old_path, new_path)
 else:
 continue
target_dir = r'C:test'
filename_modify(target_dir, add_str = today)

See, we have successfully added today’s date into the file names!

Image by Author
Image by Author

What if we want to add the file names at the end instead of the beginning of the file names? We need to add one more parameter for the position of add_str.

import os
from datetime import date
today = date.today().strftime('%Y%m%d')
def filename_modify(target_dir, add_str=today, position="end"):
 if os.path.exists(target_dir) == False:
 print('Error - File/folder not exist, please double check the path')
 else:
 for file in os.listdir(target_dir):
 file_name = os.path.splitext(file)[0]
 file_ext = os.path.splitext(file)[1]
 if file_ext !='':
 old_path = os.path.join(target_dir, file)
 if position == "start":
 newfile = add_str + "_" + file_name + file_ext
 elif position == "end":
 newfile = file_name + "_" +add_str + file_ext
 else:
 newfile = file
 new_path = os.path.join(target_dir, newfile)
 os.rename(old_path, new_path)
 else:
 continue
target_dir = r'C:test'
filename_modify(target_dir, add_str = today, position = 'end')

And the file names are changed to:

Image by Author
Image by Author
Image by Author
Image by Author

We could add a ‘replace’ in the function.

import os
from datetime import date
today = date.today().strftime('%Y%m%d')
def filename_modify(target_dir, add_str=today, position="end", old_str=None, new_str=None):
 if os.path.exists(target_dir) == False:
 print('Error - File/folder not exist, please double check the path')
 else:
 for file in os.listdir(target_dir):
 file_name = os.path.splitext(file)[0]
 file_ext = os.path.splitext(file)[1]
 if file_ext !='':
 old_path = os.path.join(target_dir, file)
 if position == "start":
 newfile = add_str + "_" + file_name + file_ext
 elif position == "end":
 newfile = file_name + "_" +add_str + file_ext
 elif position == "replace":
 newfile = file.replace(old_str, new_str)
 else:
 newfile = file
 new_path = os.path.join(target_dir, newfile)
 os.rename(old_path, new_path)
 else:
 continue
target_dir = r'C:test'
filename_modify(target_dir, add_str = today, position = 'replace', old_str="company", new_str="Company")
Image by Author
Image by Author

We will need to add one more parameter for the file extension.

Here, we introduce another library for the regular expression. If you are interested on how regular expression works, please see my other article here:

A Very Easy Tutorial to Learn Python Regular Expression

old_str_list = re.findall(old_str,file_name)
 for i in old_str_list:
 file_name = file_name.replace(old_str, new_str)

Now we have done all the requests that Charles’ has asked for!

Conclusion

The full codes for Charles’ task are listed below. I also add all the comments to help you understand.

Here is a summary of the commands used today:

Image by Author
Image by Author

I like Machine Learning as I believe it can truly help our daily life. Please connect with me through LinkedIn:

https://www.linkedin.com/in/violamao/


Related Articles