Improve Code Monitoring with Effective Logs

Have you ever had a bug in prod, and you opened your logging system only to figure out that the logs are not indicative enough to easily trace what happened and why? Well, I can promise you it never happened to me 😉
But if it had, I’d be sure to learn the lesson and make my logs much more informative and indicative for the future.
Today I will share with you the conclusions I have gathered regarding how to make our logs as useful as possible for future Debugging. And may your features never have bugs in prod!
Use Indicative Entity IDs
This may sound a bit trivial but is sometimes missed. It can be tedious to open your Logs system and have to cross-check different hints you can catch, only so you can find the entity you are looking for, so you can start debugging the real problem.
In a microservices architecture, getting the relevant entity ID for your log is not always possible, but if it is – I warmly recommend using it.
Log the Beginning and the End of Main-Flows and Sub-Flows
Let’s say we have a few main flows, and each main flow is composed of a few sub-flows or actions.
When an error occurs, we want to know where it happened, and what happened before/after the error. Adding a log at the beginning and the end of each main & sub-flow will make the logs more readable and clear for us. Not only will we have a broad context on the sequence of events, but we will also be able to know if a few events were expected to happen and eventually did not take place – maybe due to an exception that was thrown.
This doesn’t mean that we want to log the beginning and end of everything that happens in the code (for example each function). We will discuss the need not to over-log later on in this article.
Log Relevant Changes of Main Variables
Has the status of an account/application changed? Maybe from pending to approved, or closed? Or maybe the value of a decision-making variable has changed and affects future flows in the code?
If, for example, an entity starts a flow in one state and ends this flow in a different one, it is useful to document this change, and potentially add to your log the reason for it. It is also possible that during the flow, the state will change a few times due to different reasons, and the logs will tell us the full story – what happened, when, and why.
Log the Reasons for Errors
Logs should be easy to skim – we want to quickly identify what was the issue and where it occurred, and to do so relatively quickly. When we classify each message according to its severity level, we can filter out the levels that are irrelevant for us at the moment. This enables us to log very specific messages on the one hand and avoid reading too much data during a debugging process, on the other.
To get yourself familiar with the different levels, and see usage examples, I invite you to read my appendix article – Levels of Logs.
Keep a Consistent Format
When scanning the logs to find the relevant piece of information you need – it is easier to do so when you know where your expected answer lies. Also, when all logs of a specific flow or sub-flow are consistent, you can quickly distinguish between the logs of your flow and those that may also appear in your search results but were produced in other, unrelated, flows. This will enable you to quickly filter them away from your sight.
Let’s see these tips in practice, for the following codebase:
loans/
__init__.py
main_flows/
__init__.py
car_loan.py
student_loan.py
sub_flows/
__init__.py
auto_approval_eligibility_check.py
car_loan_criteria_check.py
technical_requirements_check.py
student_loan_criteria_check.py
Each file in _subflows has a main class with several functions. A _mainflow is composed of several sub-flows. For example:
- student_loan uses _minimal_eligibilitycheck & _student_loan_criteriacheck
- car_loan uses _minimal_eligibilitycheck & _auto_approval_eligibilitycheck & _car_loan_criteriacheck
Use Indicative Entity ID + Log the Beginning And the End of Main-Flows and Sub-flows
logger.info(f'Applicant - {applicant_id} started StudentLoanApplication')
logger.info(f'Applicant - {applicant_id} started TechnicalRequirementsCheck')
logger.info(f'Applicant - {applicant_id} finished TechnicalRequirementsCheck')
logger.info(f'Applicant - {applicant_id} started StudentLoanCriteriaCheck')
logger.info(f'Applicant - {applicant_id} finished StudentLoanCriteriaCheck')
logger.info(f'Applicant - {applicant_id} finished StudentLoanApplication')
Log Relevant Changes of Main Variables
logger.info(f'Applicant - {applicant_id} started AutoApprovalEligibilityCheck')
logger.info(f'Applicant - {applicant_id} is qualified for student loan, application status changed to - {ApplicationStatus.APPROVED}')
logger.info(f'Applicant: {applicant_id} finished AutoApprovalEligibilityCheck')
Log Reasons for Expected Errors + Use The Relevant Log Level
logger.info(f'Applicant - {applicant_id} started MinimalEligibilityCheck')
logger.debug(f'Applicant - {applicant_id} started passed_technical_requirements check')
logger.error(f'Applicant - {applicant_id} application status is set to {ApplicationStatus.ERROR}. '
'Student ID not found in the University's database')
logger.info(f'Applicant - {applicant_id} finished MinimalEligibilityCheck')
What NOT to Do
Don’t Write Unindicative Logs
An indicative log will provide us info that we can start working with – so in addition to mentioning what happened, we will also know either the entity ID, the part of the code in which the change occurred, the reason for the change, or something else that we will be able to use.
Here’s an example of what we should NOT do –
logger.info('Application rejected')
A much better log would look like this –
logger.info(f'Applicant - {applicant_id} financial balance in the past {MONTHS_TO_CHECK} months does not match the criteria for a student loan.
Application status updated to {APPLICATION.STATUS.REJECTED}')
Don’t Pollute The Logs
There is a fine line between logs that are indicative enough, to ones that are too verbose, and we should be careful not to cross it.
Examples for logs that are too descriptive would include:
- providing the whole data structure (dictionary, list, etc.) with the data that was used to make a specific decision in the code
- writing a log message that is too descriptive – the same message could have been written in a much shorter sentence, and sometimes too verbose = more cumbersome and less clear
Eventually, the logs should be clear, concise, and easy to skim. Sometimes, we "settle" for log messages with imperfect grammar, in order to keep the message short and quicker to read. Logs that are too verbose for skimming are not comfortable enough to work with.
Great logs will enable us to quickly pick up the issue and know what to debug and how.
Here’s an example of what we should NOT do –
logger.info(f'Applicant - {applicant_id} not qualified for student loan - average balance in the past {MONTHS_TO_CHECK} is too low. Daily bank balance as provided in bank statements is {daily_bank_balance}')
The example above is also problematic because it exposes sensitive information about the customer. But it is mainly provided here as an example of exposing a full list / dictionary that was used to calculate a single value.
A much better log would look like this –
logger.info(f'Applicant - {applicant_id} not qualified for student loan - average balance in the past {MONTHS_TO_CHECK} is too low. Average bank balance is {average_bank_balance}, minimal acceptable average balance is {min_accepatble_average_balance}')
Don’t Over-Log
If we have to get it wrong in one of the directions, I’d prefer too many logs upon too few logs. After all, trying to debug an error and lacking essential data is a pain. Yet, when adding a new log – we want to keep in mind that the log should be meaningful and useful. If the log is not providing added value, then it is mostly noise and we better do without it.
Logs should be comfortable and useful to read
