Regular expressions are sequences of characters that define patterns which can be used for tasks such as pattern matching and text searching. In this post, we will discuss how to use the search method in the python regular expressions module.
Let’s get started!
Consider the following sentence:
sentence1 = 'Python is great'
We can use the ‘search()’ method from the ‘re’ module to search for patterns in this text. The syntax for searching for patterns in the beginning and end of text is as follows:
import re
result = re.search("^begin.*end$", text)
The ‘^’ is the character we use for finding a pattern at the start of a string and the ‘$’ is the character we use to find a pattern at the end of a string. Let’ s check if our sentence starts with ‘Python’ and ends with ‘Great’:
result1 = re.search("^Python.*great$", text)
print(result1)

We see that the result is a ‘re.Match object’, with a match=’Python is great’. If look for a pattern that is not in the sentence, we get:
result2 = re.search("^Python.*bad$", text)
print(result2)

Now let’s consider the following list of sentences:
text_list = ['Python is great', 'C plus plus is nice', 'Python is a fantastic language, it is great', 'Java is a nice Programming language' ]
We can loop over our list of text, apply the ‘search()’ method to look for the pattern and print the results:
for text in text_list:
print(re.search("^Python.*great$", text))

The re.Match objects and ‘None’ values are not as readable as we’d like, so we can incorporate some conditional statements to provide more useful output:
for text in text_list:
result = re.search("^Python.*great$", text)
if result:
print('"', text, '"', 'Begins with Python and ends with great')
else:
print('"', text, '"', 'Does not begin with Python and end with great')

We can make this even more useful by generalizing this logic within a function. The function will take two strings corresponding to the begin and end text we are searching for:
def match_begin_end(begin, end):
And we can use f-strings to format our strings. We will check the pattern using the ‘search method:
re.search(f"^{begin}.*{end}$", text)
When we find a match we print:
f'Begins with {begin} and ends with {end}'
and when we don’t find a match we will print:
f'Does not begin with {begin} and end with {end}'
The full function will be as follows:
def match_begin_end(begin, end):
for text in text_list:
result = re.search(f"^{begin}.*{end}$", text)
if result:
print('"', text, '"', f'Begins with {begin} and ends with {end}')
else:
print('"', text, '"', f'Does not begin with {begin} and end with {end}')
If we call our function with ‘Python’ and ‘great’ we get the same results:
match_begin_end('Python', 'great')

We can also call our function with ‘C plus plus’ and ‘Nice’:
match_begin_end('C plus plus', 'nice')

Notice our search is case sensitive:
match_begin_end('c plus plus', 'nice')

We can fix this by using the ‘to_lower’ string method in our for-loop:
def match_begin_end(begin, end):
for text in text_list:
text_lower = text.lower()
begin_lower = begin.lower()
end_lower = end.lower()
result = re.search(f"^{begin_lower}.*{end_lower}$", text_lower)
if result:
print('"', text, '"', f'Begins with {begin_lower} and ends with {end_lower}')
else:
print('"', text, '"', f'Does not begin with {begin_lower} and end with {end_lower}')
match_begin_end('C plus plus', 'nice')
match_begin_end('c plus plus', 'nice')

We can also have a separate functions for begin, only using the ‘^’ character:
def match_begin(begin):
for text in text_list:
text_lower = text.lower()
begin_lower = begin.lower()
result = re.search(f"^{begin_lower}", text_lower)
if result:
print('"', text, '"', f'Begins with {begin_lower}')
else:
print('"', text, '"', f'Does not begin with {begin_lower}')
match_begin('Python')

And end using ‘$’:
def match_end(end):
for text in text_list:
text_lower = text.lower()
end_lower = end.lower()
result = re.search(f"{end_lower}$", text_lower)
if result:
print('"', text, '"', f'Ends with {end_lower}')
else:
print('"', text, '"', f'Does not end with {end_lower}')
match_end('great')
Another important character is for finding an either/or pattern. We use the ‘|’ character to achieve this. We can define our function as follow:
def match_either(str_one, str_two):
for text in text_list:
text_lower = text.lower()
str_one_lower = str_one.lower()
str_two_lower = str_two.lower()
result = re.search(f"{str_one_lower}|{str_two_lower}", text_lower)
if result:
print(f"'{str_one_lower}' or '{str_two_lower}' found in", text)
else:
print(f"Neither '{str_one_lower}' nor '{str_two_lower}' found in", text)
match_either('c plus plus', 'nice')

I’ll stop here but I encourage you to play around with the code yourself.
CONCLUSIONS
To summarize, in this post we discussed how to perform some basic tasks using regular expressions in python. First, we defined a function that searched for patterns at the start and end of a text. Next, we defined separate functions that searched for start and end patterns individually. Finally, we defined a function that checked if either of two string inputs are found in a text. If you are interested learning about the basics of python programming, data manipulation with Pandas, and machine learning in python check out _Python for Data Science and Machine Learning: Python Programming, Pandas and Scikit-learn Tutorials for Beginners._ I hope you found this post useful/interesting. The code in this post is available on GitHub. Thank you for reading!