Programs do not necessarily have to crash to alert of issues within them: some other factors may serve as severe warnings that an issue could be on the horizon. For example, if you smelt gas or smoke somewhere in your house, it could indicate that you’ve got a gas leak or there’s something burning. Both scenarios require investigation before it becomes a major issue (i.e. your house explodes).
A code smell can be thought of as smelling gas or smoke in your home. Your code wouldn’t stop executing because it’s present, but it’s worth investigating before it blows out of proportion. It serves as an indicative warning that your code needs some attention.
"’Smells are certain structures in the code that indicate violation of fundamental design principles and negatively impact design quality’. Code smells are usually not bugs; they are not technically incorrect and do not prevent the program from functioning. Instead, they indicate weaknesses in design that may slow down development or increase the risk of bugs or failures in the future. Bad code smells can be an indicator of factors that contribute to technical debt."
- [Source: Wikipedia]
The mere presence of a code smell does not equate to a bug, but its smell warrants a cause for concern worth investigating. All programmers would agree that it’s much less effort and cost less time to prevent a bug before we encounter it – getting rid of code smells is one way to ensure this.
To reduce the number of code smells, it’s important to know what they look like. In this article, we will cover seven of them in non-chronological order.
1 Using print statements to debug
Print statements are likely to be one of the first built-ins you learn in your Programming journey (i.e. most people’s first program is print("Hello World")
). There’s nothing inherently wrong with print statements, except that developers often get too attached to them.
How do you know if you’re too attached to print statements? If you’re using it to debug your code.
Print statements are easy to implement, hence it does an extremely good job of deceiving people into thinking it’s the best way to debug their code. However, debugging with print usually requires you to perform multiple iterations of running your program before you display the necessary information to fix a bug in your code – it works out longer, especially as you go back and delete all of them.
There are two solutions that are better than using print debugging: 1) using a debugger to run your program one line at a time and 2) using log files to record large amounts of information from your program that can be compared with previous runs.
I prefer using log files which can easily be done in Python with the built-in logging
module.
import logging
logging.basicConfig(
filename = "log_age.txt",
level = logging.DEBUG,
format = "%(asctime)s - %(levelname)s - %(message)s")
logging.debug("This is a log message.")
2 Duplicate code
The most common code smell you’ll find in programs is likely to be duplicated code. It’s so easy to recognize duplicate code: all you have to do is consider where you could have simply copied and pasted code in different parts of a program. Thus, duplicate code may be defined as code that’s repeated in more than one location.
print("What would you like for breakfast?")
breakfast = input()
print(f"One {breakfast} coming up")
print("What would you like for lunch?")
lunch = input()
print(f"One {lunch} coming up")
print("What would you like for dinner?")
dinner = input()
print(f"One {dinner} coming up")
On the surface, duplicate code looks harmless. Where it becomes a thorn in the backside is when updates or changes to the code must be made. Changing one copy of the duplicate code means changes must be made to all areas of the code, and forgetting to do so can result in costly, hard to detect, bugs in your program.
The solution to this problem is pretty straightforward: deduplicate the code. We can easily make code appear once in our programs by leveraging the power of functions or loops.
def ask_meal(meal_of_the_day:str) -> str:
print(f"What would you like to eat for {meal_of_the_day}")
meal = input()
return f"One {meal} coming up"
meals_of_the_day = ["breakfast", "lunch", "dinner"]
for meal in meals_of_the_day:
ask_meal(meal)
Some people take duplication to an extreme level and seek to deduplicate their code anytime they’ve copied and pasted. Though there may be some programmers that argue for it, at times it can be overkill. You can probably get away with copying and pasting code once or twice but if it occurs three times, create a function or loop to fix it.
3 Magic numbers
Sometimes we have to use numbers in our code; Some numbers that we use in our source code can cause extreme confusion to other developers – and to yourself if [or when] you have to revisit the code in the future. These numbers are called magic numbers.
"The term magic number or magic constant refers to the anti-pattern of using numbers directly in source code."
- [Source: Wikipeida]
Magic numbers are considered a code smell because they don’t give any indication as to why they’re present – it obscures the developer’s intention in choosing that specific number. Thus, your code is less readable, harder for you and other developers to update or change in the future, and much more prone to subtle errors like typos.
Consider the following scenario:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
0.3,
0.7,
25,
True,
None
)
In the code above we’ve imported the train_test_split
function from Scikit-learn and instantiated it with some hyperparameters that seem to have no clear meaning.
One solution to make this code more readable is to add informative comments that tell us why we’ve chosen that specific number.
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X # features array,
y # labels,
0.3 # test size,
0.7 # train size,
25 # random state,
True # shuffle,
None # stratify
)
A much more informative way to solve this code smell is to use a constant. Constants are data valuables that remain the same each time a program is executed. [I’m not sure about other languages but] In Python, we typically write constants in uppercase letters to inform others (and remind ourselves) that their values shouldn’t change after their initial assignment.
You’ll often see constants defined in configurations or as a global variable at the beginning of a script.
from sklearn.model_selection import train_test_split
TEST_SIZE = 0.3
TRAIN_SIZE = 0.7
RANDOM_STATE = 25
SHUFFLE = True
STRATIFY = None
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
TEST_SIZE,
TRAIN_SIZE,
RANDOM_STATE,
SHUFFLE,
STRATIFY
)
How much more readable is that?
Note: It would also help to call the parameter names and set them equal to the constant for extra security.
It’s important you use separate constants instead of using one to solve two problems. The reason for this is that they can be changed independently in the future which usually results in much less hassle.
4 Leaving commented out code in place
Comments in your code are definitely seen as a good practice when it’s informative. Sometimes, we even comment out code temporarily to see how the remaining code runs without the line we’ve removed – possibly while we’re debugging – there’s nothing inherently wrong with that either.
Where it becomes a problem is when programmers get lazy. One such example of this laziness is commenting out code but leaving the commented out code in place.
The reason commented out code in place is a code smell is because it’s ambiguous. Other programmers would see the commented-out code as a complete mystery and would have no idea about what conditions should bring it back into the program.
walk()
# run()
sprint()
stop()
Why is run()
commented out? When would it be okay to uncomment run()
? If it’s not needed then remove the code.
5 Dead code
All dead code in your program must be dealt with to save computation and memory.
"dead code is a section in the source code of a program which is executed but whose result is never used in any other computation."
- [Source: Wikipedia]
Having dead code in your programs is extremely misleading. Other programmers reading your code may not catch on right away and assume it’s a working part of code when in reality, it does nothing but wastes space.
# Code source: https://twitter.com/python_engineer/status/1510165975253069824?s=20&t=VsOWz55ZILPXCz6NMgJtEg
class TodoItem:
def __init__(self, state=None):
self.state = state if state else -1
def __str__(self):
if self.state == -1:
return "UNDEFINED"
elif self.state == 0:
return "UNSET"
else:
return "SET"
At first glance, this code looks okay but there’s a bug in it: the code can never be set to 0 because the evaluation in the self.state
variable will set 0 to False
. Thus, setting the state to 0 will return UNDEFINED
instead of UNSET
.
class TodoItem:
def __init__(self, state=None):
self.state = state if state is not None else -1
def __str__(self):
if self.state == -1:
return "UNDEFINED"
elif self.state == 0:
return "UNSET"
else:
return "SET"
_Note: See this video by Python Engineer to get the full explanation._
6 Storing variables with numeric suffixes
I’ve been caught by this code smell a few times – until this day I’m not fully free from it; Sometimes, we may need to keep track of several instances of the same type of data. In such circumstances, it’s incredibly tempting to reuse a name and add a suffix to it so it’s stored in a different namespace in the program.
person_1 = "John"
person_2 = "Doe"
person_3 = "Michael
The reason why this code smell is a code smell is that a suffix doesn’t serve as a good description of what is contained in each variable or the difference between the variables. It also doesn’t give any indication as to how many variables are present within your program – you don’t want to search through 1000+ lines of code to ensure there are no other numbers.
A better solution would be:
people = ["John", "Doe", "Michael"]
Don’t take this as an instruction to change all variables that end in a number: some variables deserve to end in a number, especially if the number is part of the distinct name of the data you’re storing.
7 Unnecessary classes (Python specific)
Programming languages like Java use classes to organize code in a program. Python instead uses modules. Thus, trying to use classes in Python as you would in Java (to organize code) isn’t going to be effective.
Code in Python isn’t required to exist in a class, and sometimes, using classes can be overkill.
Take this class for example:
class Human:
def __init__(self, name: str):
self.name = name
def introduction(self):
return f"Hi, my name is {self.name}"
person = Human("Kurtis")
print(person.introduction())
"""
Hi, my name is Kurtis
"""
A major defining factor as to why this class does not need to be a class is that it only has one function. As a rule of thumb, a class does not need to be a class in Python if it only contains one method or only static methods. It’s better to write a function instead.
To learn more about this concept, check out Jack Diederich’s PyCon 2012 talk on why we should "Stop Writing Classes."
Thanks for reading.
Connect with me: LinkedIn Twitter Instagram
If you enjoy reading stories like this one and wish to support my writing, consider becoming a Medium member. With a $5 a month commitment, you unlock unlimited access to stories on Medium. If you use my sign-up link, I’ll receive a small commission.
Already a member? Subscribe to be notified when I publish.