The world’s leading publication for data science, AI, and ML professionals.

Stripping Strings in Python

Exploring Python String Methods

Source
Source

In computer science, the string data type is defined by a sequence of characters. Strings are typically comprised of characters, words, sentences, and/or numerical information. In python, string objects have access to several methods that enable operations such as text stripping, sanitation, searching and much more. Having a good understanding of these methods is fundamental to any data scientist’s natural language processing toolkit. In this post, we will discuss how to use strip methods, available to string objects in python, in order to remove unwanted characters and text.

Let’s get started!

Suppose we wanted to remove unwanted characters, such as whitespace or even corrupted text, from the beginning, end or start of a string. Let’s define an example string with unwanted whitespace. We will take a quote from the author of the python Programming language, Guido van Rossum:

string1 = '     Python is an experiment in how much freedom programmers need. n'

We can use the ‘strip()’ method to remove the unwanted whitespace and new line, ‘n’. Let’s print before and after applying the ‘strip()’ method:

print(string1)
print(string1.strip())

If we simply want to strip unwanted characters at the beginning of the string, we can use ‘lstrip()’. Let’s take a look at another string from Guido:

string2 = "    Too much freedom and nobody can read another's code; too little and expressiveness is endangered. nnn" 

Let’s use ‘lstrip()’ to remove unwanted whitespace on the left:

print(string2)
print(string2.lstrip())

We can also remove the new lines on the right using ‘rstrip()’:

print(string2)
print(string2.lstrip())
print(string2.rstrip())

We see in the last string the three new lines have been removed. We can also use these methods to strip unwanted characters. Consider the following string containing the unwanted ‘#’ and ‘&’ characters:

string3 = "#####Too much freedom and nobody can read another's code; too little and expressiveness is endangered.&&&&"

If we want to remove the ‘#’ characters on the left of the string we can use ‘lstrip()’:

print(string3)
print(string3.lstrip('#'))

We can also remove the ‘&’ character using ‘rstrip()’:

print(string3)
print(string3.lstrip('#'))
print(string3.rstrip('&'))

We can strip both characters using the ‘strip()’ method:

print(string3)
print(string3.lstrip('#'))
print(string3.rstrip('&'))
print(string3.strip('#&'))

It is worth noting that the strip method does not apply to any text in the middle of the string. Consider the following string:

string4 = "&&&&&&&Too much freedom and nobody can read another's code; &&&&&&& too little and expressiveness is endangered.&&&&&&&"

If we apply the ‘srtip()’ method passing in the ‘&’ as our argument, it will only remove them on the left and right:

print(string4)
print(string4.strip('&'))

We see that the unwanted ‘&’ remains in the middle of the string. If we want to remove unwanted characters found in the middle of text, we can use the ‘replace()’ method:

print(string4)
print(string4.replace('&', ''))

I’ll stop here but I encourage you to play around with the code yourself.

CONCLUSIONS

To summarize, in this post we discussed how to remove unwanted text and characters from strings in Python. We showed how to use ‘lstrip()’ and ‘rstrip()’ to remove unwanted characters on the left and right of strings respectively. We also showed how to remove multiple unwanted characters found on the left or right using ‘strip()’. Finally, we showed how to use the ‘replace()’ method to remove unwanted text found in the middle of strings. I hope you found this post useful/interesting. The code in this post is available on GitHub. Thank you for reading!


Related Articles