Comprehensive Guide to Python Logging in the real world Part 1

Basics, Lifecycle, Hierarchy, Gotchas & Best Practices

Srikanth Shenoy
Towards Data Science

--

Photo by Dariusz Sankowski on Unsplash

Python logging module is the mechanism of choice for real life logging. It is deceivingly simple to use, yet when logging fails, it’s hard to understand why unless one thoroughly understands the lifecycle and nuances. In this first part of the series, you will get a logical understanding of the inner workings of the logging module — hierarchy, lifecycle and the works along with tiny code snippets to bolster the understanding. Gotchas that have tendency to make your life miserable are laid out upfront. In the second part, you will understand preferred usage patterns with python logging, various methods of logging configuration and most of all using logging effectively for exception handling leading upto the last part on best practices with python exception handling in a typical application stack. Without further ado, we start with a short answer to the philosophical question of why logging when there is the handy print() and then jump right into the middle of all logging action.

Why Logging

Imagine a multi user app accessed by hundreds or may be thousands of concurrent users sends custom built messages to stdout using print() method. Here are potential issues with print() and how python logging solves them.

  1. Control over log message persistence: A traditional system has nobody to watch the output of print() sent to stdout/stderr. The messages could just disappear in thin air or with no mechanism to retrieve them at a later point of time for analytics and troubleshooting. Python logging lets us transparently send messages to different persistent destinations (like a file for instance) without affecting the underlying API invoked by user in the application software. The software developer will still continue to invoke logger.info(), logger.warn() etc. but the message will end up at different destination. [This objection to usage of print() is not very relevant now as modern app stacks using containers like docker, AWS redirect stdout etc. to a destination of choice — a json or plain text log file and cloudwatch logs respectively]
  2. Fine grained control: print() outputs plain vanilla text with no concept of severity. Python logging has the inbuilt concept of six level of log message severity and option to customize. The control helps in filtering log messages from log files with a certain severity or above and disregard messages with lower severity.
  3. Separation of functional and non functional aspects: There is no way with print() if we want to limit the output of a very verbose module to only WARN and ERROR messages without the if-else blocks ingrained in the code. With python logging module, it can be declaratively controlled outside the application altogether. In general, non functional aspects are orthogonal to functional concerns and should be handled outside of the functional software altogether. This aspect of elegant architectures is a core design goal of logging module.
  4. Separation of what and how: Python logging is the ideal solution of separating the what (to log) from the how to log. If it were for print() alone, it will be impossible for a third party library to log to unforeseen destinations generally decided by the application software using the library at the time of wiring the application.
  5. Environment specific logging: A software development team might want to log different levels in different environments. For instance, you may want to see DEBUG and INFO messages in development environment, while it might be a better option to turn them off in production environment. The declarative logging configuration facility in Python logging allows to achieve this goal without altering the code.
  6. Capturing user context in logs: We generally want to know the time of occurence of a log message. Sometimes it is helpful in debugging to know which module emitted a certain log message. In a multi user scenario, it ca be a huge time saver to have thread id, session id etc. in a log message to isolate the path of execution to the actual error from millions of log lines. It is akin to the proverbial ability to pick a needle from the haystack.
  7. Rotating and archiving log files: It is hard to write and read from a humongous log file. Logging module allows easily to control the size of files, rolling files at a specific size or specific time of day and archiving older logs. These bells and whistles are very helpful for troubleshooting, and regularoty purposes to name a few.
  8. Business Intelligence: A well thought out logging strategy with all of the features mentioned above helps in generating statistics about software usage, detect anomalies, and eliminate security incidents.

Alright. Enough about the whys. Lets move on to whats and hows.

Getting your feet wet with root logger

Getting started with logging is easy. Just use one of built-in methods from logging — be it debug(), info(), warn(), error(), critical(). The following code prints the message This is a warning to the console.

import logging

logging.warning("This is a warning") # prints message - This is a warning

However this is not how you’d use it. We will never want to invoke logging functions directly from logging module. We always get a Logger object first and then use it for logging as shown in the code snippet below.

logger = logging.getLogger() # gets the root logger
logger.error("This is an error") # prints message - This is an error

# gets the logger with name mylogger
named_logger = logging.getLogger("mylogger")
named_logger.critical("Very critical") # prints message - Very critical

Two flavors of getLogger() are seen in the code snippet above.

  1. The one without any argument returns the root logger — the primordial logger that exists even if not explicitly configured and is available when the logging module is loaded.
  2. The subsequent variation creates a logger with a name mylogger on demand. If a logger with name myloggeralready exists, then the reference to that pre-existing Logger is returned.

Although Logger is a class and can be directly instantiated, it is strongly advised against doing so for two reasons:

  1. We want the logging module to control the lifecycle of Loggers (such as using logging handlers of last resort etc. — more on this can be found later in this post)
  2. We want python runtime to reuse Loggers by holding onto their references in its global dictionary

Best practice: Application and modules should not log directly using the root logger. A specific logger for should be configured for the application and module.

Logging levels

Five logging levels are defined based on log message severity. They are DEBUG, INFO, WARN, ERROR and CRITICAL correpsonding to the five Logger methods debug(), info(), warn(), error(), critical() with numerical value assigned from 10 through 50 respectively with DEBUG < INFO < WARN and so on. The log level indicates the importance (or severity) of the message.

The log level is assigned to a logger with Logger.setLevel() method as follows:

# gets the logger with name mylogger
named_logger = logging.getLogger("mylogger")
named_logger.setLevel(logging.WARN)

named_logger.warn("I am warning you") # prints message - I am warning you

named_logger.info("FYI") # this is not printed

A Level, when assigned to a Logger, acts as the severity marker threshold. Any attempts to log message with lower severity are ignored. In the above code snippet, the logger level is set to WARN. Hence the logger will print messages with severity of atleast WARN (or higher). Messages with lower severity viz. DEBUG and INFO are ignored — as if they were never invoked.

If a level is not set on a Logger, then surprise, surprise! its default logging level is NOT_SET with numerical value of 0 which will not log any messages.

Pop quiz: In the first two code snippets of this post, we never set any explicit log levels, yet the warning log messages were printed. You might think it is perhaps because the default log level is WARN. However that contradicts my earlier statement of default log level = NOT_SET. Here is the deal. It is true that WARN is the default log level for root logger, but not for others. The full answer to this catch 22 needs to wait till you reach the section on logger hiearchy tad down in this post along with the concept of effective log level. Hang in there!

Custom Levels

Python logging, being the extrensible module that it is, lets you define your own levels. But they are not necessary most of the times as the existing levels have been chosen on the basis of prior usage. However, if you feel the need for custom levels, the following is a quick way to add one. This code is preferably placed in application bootstrap logic before any logging starts

import logging
FATAL_LEVEL_NUM = 60
logging.addLevelName(FATAL_LEVEL_NUM, "FATAL")
def fatal(self, message, *args, **kwargs):
if self.isEnabledFor(FATAL_LEVEL_NUM):
self._log(FATAL_LEVEL_NUM, message, args, **kwargs)

logging.Logger.fatal = fatal

# Now you can use the fatal
named_logger = logging.getLogger("mylogger")
named_logger.setLevel(logging.WARN)
named_logger.fatal("A fatal error indeed")

Best practice: If you are developing python libraries, especially third party libraries for use by other applications which you don’t know in advance, then avoid defining and using custom log levels within the library by any means possible. Semantic meanings of custom levels are up for interpretation and impacts the numeric level assigned. It can cause chaos across libraries and friction between the library and the application using the custom level.

Four pillars of python logging

The logging module has four main logical components: Logger, Handler, Filter and Formatter.

  1. Logger provides the functions for the application code to initiate logging. For every log message provided, Logger creates a LogRecord and sends it to associated Handler(s). A Logger is associated with a logging level. A Logger is a mandatory part of logging.
  2. Every Handler corresponds to a log destination. Multiple handlers can be attached to a Logger corresponding to multiple destinations (file, console etc.). A Handler is also associated with a logging level (in addition to the level in a Logger for finer control over whether each destination receives/ignores the log messages). At least one Handler is mandatory for a Logger
  3. A Filter is an optional facility to apply fine grained logic to determine whether the log record is to be processed or dropped. One can think of logging levels as coarse filters themselves deciding whether the Logger/Handler accepts a log message. Just like levels, Filters can be associated with a Logger and its Handlers.
  4. Formatter specifies the layout of the log messages in the output written out to destinations. Naturally it is a construct associated only with Handler.
Figure 1. UML diagram showing important components of logging module (Image by author)

Figure 1 above captures all the four important components in a UML diagram. If you don’t understand UML, it’s okay. You can still intuitively follow the arrows and get a sense of relationships.

  1. Both Logger and Handler extend a common parent class. And due to this inheritance, both Logger and Handler hold onto a collection of Filters (zero or more).
  2. A Logger holds on to a collection of Handlers (zero or more)
  3. A logging level can be set on both Logger and Handler
  4. Handler has exactly one Formatter to format the log message

Pop Quiz: Why is there an option to set logging level on both Logger and Handler? Wouldn’t the option to set logging level on Logger suffice? This question is answered with a best practice in the section below titled hierarchy aware logging lifecycle.

Simplified logging lifecycle

Now that we understand the important structural components of logging module, we can track their interactions at a fixed level within the hierarchy. A full discussion of the hierarchy-aware interaction is covered in the next section. Think of this section as an appetizer to the full course meal later. Figure 2 shows the process flow when the application attempts to log a message at the WARN level. The role of Logger, Handler, Filter and Formatter in the flow is self explanatory.

Figure 2. A simplified logging process without considering the hierarchy (Image by author)

Best Practice: Although there is an option to set logging level on both logger and its corresponding handler, it is better to set the level on logger and forego setting the logging level on handler. The option to set logging level on handler is most beneficial when a logger has multiple handlers attached to it and we want each handler (destination) to log only a subset of messages coming through the logger. For instance, if the logger level is set to INFO and there are two attached handlers — Stream Handler logging to consile — sys.stderr and a FileHandler, and we want to log only error messages in the file, then we leave the level on StreamHandler as is, but set the level on FileHandler to ERROR.

Logging Hierarchy

Python loggers form a hierarchy. The root logger lies at the top of the hierarchy. Any logger created with a name say myapp is a child of root logger. A logger with the name myapp.mymodule is the child of myapp logger.

Logger has an important attribute called propagate. By default its value is True.

  1. When True, all the child child loggers pass log messages to the handlers of their ancestor loggers usually propagating upto root logger, where a majority of handlers are configured.
  2. When propagate is set to False, the child loggers pass log messages only upto to this ancestor logger whose value is set to False (or root logger whichever is earlier).

Best Practice: It is best to retain default value of propagate = True for majority of Loggers and configure handlers for only certain top-level logger , loggers with propagate = False and root logger. Create child loggers on demand per module with no explicit handlers.

app_logger = logging.getLogger("myapp")
module_logger = logging.getLogger("myapp.mymodule")
class_logger = logging.getLogger("myapp.mymodule.myclass")
print(app_logger.parent) # prints root
print(module_logger.parent) # prints myapp
print(class_logger.parent) # prints myapp.mymodule

Watch out for those missing parents in dot notation hierarchy

As mentioned earlier, a logger say, with the name myapp.mymodule is the child of myapp logger. However there is catch. This assumes the existence of myapp logger. If the myapp logger was never created, then the logger myapp.mymodule is created as a child of root logger!

class_logger = logging.getLogger("myapp.mymodule.myclass")
print(class_logger.parent) # prints root !!!

Hieraerchy-aware logging lifecycle

Armed with the knowledge of simplified logging lifecycle and logging hierarchy, it won’t be hard to understand the full lifcyle using this diagram from Python logging advanced tutorial.

Figure 2. Hierarchy aware full logging lifecycle (Image from Python Logging Advanced Tutorial)
  1. As before, the logging lifecycle begins with a logger receiving a request to log a message at an appropriate level.
  2. The logger determines if it is okay to proceed for the specified logging level threshold and then checks if the configured filters for green signal. So far so good. However at this point, since the propagate is set to True, the flow is slightly different going forward
  3. If there are explicitly configured handlers assigned to the logger at the current hierarchy, they are invoked.
  4. In addition to the processing in step 3, the logger also delegates the log message processing to its parent (if present), until propagate is set to False or root logger is reached
  5. The above steps 2, 3 and 4 are recursively executed and until one of the exit conditions is met.
  6. In each of 3 and 4, when the log message handling is passed on to the Handler, it performs checks similar to step 2. However there is no hierarchial processing on the handler side of the world (like 3 and 4 steps pertaining to loggers). Whichever handler is asked to do the logging job, will do it without any fuss.

Best Practice: From the above hierarchial process flow, it is clear that one handler should be explicitly tied to only one logger. Multiple loggers should not share the same handler reference (although it is technically possible). Using the same handler from multiple loggers defeats the very purpose of achieving logical separation for which multiple loggers were designed in the first place. Separate handlers are best configured explicitly only for those loggers with propagate=False or with root logger. Apart from being a good design, this also helps prevent the confusion arising from the annoying duplicate logging to a destination.

Important implications of logging hierarchy

Missing handlers, handler of last resort are two behavioral boundary cases that follow from the hierarchy aware logging flow. But it is better to make this boundary case behavior explicit to avoid confusion.

The logging level is set to NOTSET when a logger is created with getLogger() function. We generally explicitly invoke setLevel() to assign a logging level at some logger in the hierarchy, but not for all. Two scenarios arise

No handlers found for logger XYZ

If the logging level is not set explicitly for an entire chain of loggers and propagate = False at some ancestor logger, then the log messages are propagated to parent loggers until the ancestor logger with propagate = False is reached. If that ancestor logger XYZ with propagate = False has an appropriate logging level set, but has no explcitly configured handler(s), then an error message “no handlers could be found for logger XYZ” is provided.

Handler of last resort

If the logging level is not set explicitly and propagate = True, then the log messages are propagated to parent loggers. The chain of parent loggers is traversed until an ancestor with a level other than NOTSET is found, or the root logger is reached. The root logger has a default WARNING level set. The root logger also has a handler of last resort configured by default. Since there is no other handler at the root, the lastResort handler is used. This lastResort handler is a convenience handler to prevent the message from getting totally ignored and writes WARNING level messages to sys.stderr. This is why we see only warning level messages getting displayed in console (Jupyter notebook or terminal) when logging is invoked with almost zero configuration. For some reason if you want the behavior of flagging this (as in pre Python 3.2), then you can the lastResort handler to None as shown in the code snippet below

import logging
rootLogger = logging.getLogger()
rootLogger.lastResort = None

# this shows a message - no handlers could be found for root
rootLogger.warning("Sample warning)

Effective Logging Level

Another important outcome of the hierarchy is effective log level. As you know already, the logging level is set to NOTSET when a logger is created with getLogger() function. If a child logger does not have the level set, then the logging module moves up the chain of its parents until it finds a logger with a set level. That is the effective log level of the child logger. Thus the effective log level is the level at which a logger apparently projects itself to log the messages. This effective log level is either equal to the level set explicitly or the nearest ancestor whose level is set explicitly as shown in the code below.

import logging

rootLogger = logging.getLogger() #logs at WARN level

# logging level not set explicitly in app_logger. Falls back to root level
app_logger = logging.getLogger("myapp")

# module_logger has the logging level set explicitly to INFO
module_logger = logging.getLogger("myapp.mymodule")
module_logger.setLevel(logging.INFO)

# logging level not set explicitly in class_logger
# falls back to paernt's level = INFO
class_logger = logging.getLogger("myapp.mymodule.myclass")

print(rootLogger.getEffectiveLevel()) # WARN
print(app_logger.getEffectiveLevel()) # WARN

print(module_logger.getEffectiveLevel()) # INFO
print(class_logger.getEffectiveLevel()) # INFO

Conclusion and going forward

That’s it for this installment folks. Hope you enjoyed reading this logical and intuition first introduction to logging than a facts-first approach. We have covered a lot of ground in this part — with humble beginnings in a hello world to comprehensively understanding the logging lifecycle, sniffing out the landmines along the way and rounded up with related best practices. The best parts are yet to come. And they will come soon. The next part will cover usage patterns, python logging configuration, logging effectively for exception handling. We will conclude the series with an in-depth coverage of best practices with python exception handling combined with effective usage of python logging in a real world application stack. Please keep an eye out for them.

--

--

Passionately curious 💡 Co-founder CoachBuddy.AI ➕ Masters in AI & ML 💜 Previously Director of Engineering @ Manipal.Net 🐾 linkedin.com/in/srikanthshe