Machine Learning for Lead Management

Published in

Towards Data Science

16 min readJan 20, 2020

Leads are the lifeblood of many businesses. Leads represent a starting point for reaching out to potential customers. In simple terms, a “lead” represents a record about a potential customer that typically includes some contact information like an email address and phone number, and possibly additional attributes about the customer (e.g., product preferences and demographic data). A significant amount of time, money and effort is spent by marketing and sales departments on lead management, a concept that we will take to encompass the three key phases of lead generation, qualification and monetization.

In this article, we will look at how machine learning can create tangible value for businesses by providing the basis for an intelligent, dynamic and highly scalable approach to lead management. We will use a case study to make the discussion less abstract and easier to follow.

Three Key Phases of Lead Management

Before diving into the case study, it is worth having at least a high-level understanding of the three key phases of lead management that we will consider:

Lead generation is concerned with producing an initial long-list of leads, often by deploying marketing campaigns that make potential customers aware of — and let them indicate their interest in — products that the business is selling. In an online setting, for example, websites may ask for your email address so that they can sign you up to their mailing lists and newsletters, and send you special product offers and discounts from time to time. Websites can also harvest behavioral tracking data (e.g., which content you visited, how often and for how long) and carry out periodical surveys to develop richer lead profiles.
Lead qualification is about assessing and prioritizing the leads along various criteria. Possible criteria may include ease of acquisition or conversion (the likelihood of a customer saying “yes” to the product), ease of delivery (whether the business has the necessary resources and capabilities to deliver the product to the customer as required), profit potential (the size of the profit margin that the business can achieve as a result of a highly differentiated offering and the customer’s high willingness-to-pay), and strategic fit (whether selling to this particular customer segment is in line with the company’s strategy).
Lead monetization looks at possible ways of extracting revenue from qualified leads. Opportunities for monetization will typically be determined by the scope of the business (especially its position in the supply chain and level of vertical integration) and the potential scalability. A business with high vertical integration can monetize leads by selling directly to potential customers; this can be especially lucrative if the profit margins on the products are high. A less vertically integrated business that is upstream in the supply chain can monetize qualified leads by selling them to downstream businesses that do have the means to deliver the product. The potential scalability will be constrained by supply-side factors (e.g., production capacity and size of the sales force) and demand-side factors (e.g., market size, growth rates, availability of complementary and substitute products).

Case Study

Now we will look at a case study to see how machine learning can be applied to lead management in practice. Our case revolves around a fictitious recruitment agency, Valiant Recruiters Ltd., which helps client companies find the perfect candidates for technical vacancies.

Valiant essentially operates as an online business; it maintains a comprehensive online portal of current vacancies and attracts potential candidates by running ad campaigns across several online channels, including professional networking sites and other job boards. Potential candidates can register on Valiant’s portal by uploading a CV or directly contacting a Valiant recruiter; in each case, a corresponding lead profile is created in Valiant’s candidate database. Valiant specializes in data science positions and typically earns handsome commissions for converting leads by successfully placing candidates at client companies. If a candidate cannot be converted for whatever reason (e.g., no suitable vacancies, candidate’s preference is outside Valiant’s areas of expertise, recruiters working at full capacity), Valiant retains the option of selling the lead profile to selected partnering recruitment agencies (generalists and niche players) that may be better-placed to convert the lead.

Lead management thus sits at the heart of Valiant’s business model. Being a specialist in the data science business, Valiant’s CEO is keenly aware of the potential of using machine learning to improve and scale the firm’s approach to lead management and ultimately boost the bottom line. Over the past six months, the CEO has led an initiative to identify and implement high-impact use cases for machine learning across all three phases of lead management (generation, qualification and monetization).

Generating Better Leads

Valiant has launched its lead management initiative at a time when the hype around data science is at an all-time high. The job of a data scientist is still a fairly new phenomenon, but industry experts have already hailed it as the “sexiest job of the 21st century” (e.g., see this HBR article). Companies are scrambling to hire data scientists and the demand for Valiant’s recruitment services is red hot. Meanwhile, the loosely defined yet highly lucrative nature of many data science roles is such that just generating leads is not a problem for Valiant — generating good leads is. Any marketing campaign that Valiant runs seems to yield an avalanche of applications from candidates with a wide range of skills and experience. However, statistics compiled by Valiant show that the conversion rates of leads across campaigns tend to be quite low. It would appear that significant marketing budgets and recruiting resources are being wasted on bad leads.

Against this backdrop, Valiant’s CEO has identified the optimization of marketing campaigns as a high-impact use case for machine learning. In particular, a resident data scientist at the firm has been tasked with building an intelligent optimizer to improve the performance of marketing campaigns. Fig. 1 shows a simplified, conceptual view of the optimizer’s role in lead generation.

Fig. 1: Logic for Optimizing Marketing Campaigns (Source: Own Illustration)

To understand the logic behind Fig. 1, suppose that Valiant were to run a marketing campaign that generates a number of leads. These leads would be assessed in real-time by the campaign optimizer in terms of whether their expected return on investment (ROI) is above a certain threshold; in practice, performance indicators other than the ROI could also be used. Now, if the expected ROI of a newly generated lead is above the required threshold, the optimizer would give positive feedback to the marketing campaign management tool, and negative feedback otherwise. Positive feedback tells the tool to reinforce the current campaign strategy, while negative feedback is about correcting the campaign strategy so that leads sourced in the future are able to clear the expected ROI threshold as needed.

In simple terms, the ROI in this context may be the commission earned for successfully placing a candidate divided by the cost of doing so (including the time spent by a recruitment consultant, legal fees of drawing up contracts and other administrative overhead). However, we cannot compute the actual ROI of a lead at the time the lead is generated, since we do not know whether the lead will actually end up converting. Thus, the expected ROI discounts the ROI by the probability of a lead actually converting — and this is where machine learning can make a big impact. By using historical data of closed leads and their outcomes, Valiant’s data scientist can build a model to predict the conversion probabilities of new leads. The threshold value that the expected ROI needs to clear may be tied to the strategic objectives and key results (OKRs) that Valiant’s CEO sets for the firm on a regular basis.

Furthermore, machine learning is also key to operationalizing the feedback loop by determining how — and to what extent — the marketing campaign should be corrected or reinforced; this requires the learning logic to be embedded within the campaign management tool so that it can meaningfully react to the optimizer’s feedback and improve the marketing campaign over time (e.g., by better allocating ad spend across channels, and optimizing the ad messaging so that it is more persuasive to the target audience).

Qualifying Leads Accurately

Once the freshly generated leads land in Valiant’s candidate database, they need to be qualified as soon as possible so that the recruitment consultants know how to proceed. Following are some fundamental types of questions that the lead qualification process at Valiant could consider for each lead:

Ease of conversion: How likely is this lead to convert, i.e., will we be able to place this candidate at one of our clients, or otherwise sell the lead to a partnering recruitment agency? How quickly will this lead convert? Is the candidate looking for a new role immediately, or in six months’ time?
Ease of delivery: How difficult (or effort-intensive) will it be for Valiant to convert this lead? Does the firm have the right consultants to guide the candidate? Will the candidate be too selective?
Profit potential: How profitable will this lead be for Valiant? Upon placing a lead, the firm typically receives a commission that amounts to about 20% of the candidate’s starting salary at the client company. So, ceteris paribus, the higher the starting salary that the candidate can command in the market the better. Alternatively, how profitable would it be to sell the lead to a partnering agency?
Strategic fit: Is this lead profile suitable for a data science role? Or is it more suitable for another role (e.g., software engineering, business analysis, product management)?

While qualifying each lead manually has its advantages (it may be cheaper and more effective when the volume of leads is low), it also has some major limitations for a rapidly growing, high-volume business like Valiant’s (it is impossible to manually scrutinize each lead profile properly, and dependence on human involvement makes it difficult to scale quickly and efficiently). Also, the above questions around ease of acquisition, delivery, etc., often cannot be answered with a high degree of certainty at the outset. Given the element of uncertainty that is inherent to the lead qualification problem, an automated solution to the problem should ideally account for this uncertainly as well. Valiant’s CEO has a strong hunch that a solution based on machine learning might just fit the bill.

The solution approach that the CEO has in mind essentially amounts to predictive lead qualification. Suitable target variables for a machine learning model may be the probability of conversion and time to conversion (as proxies for ease of conversion), the number and length of interactions between the candidate and the recruitment consultant as well as the complexity of the issues discussed (proxies for ease of delivery), the profit achieved by placing candidates in the past (proxy for profit potential), and the nature of the feedback received from the client-side hiring manager on the suitability of past candidates to the advertised roles (proxy for strategic fit). Possible predictors for these outcomes may be derived from the lead profile data and any other behavioral tracking data that Valiant has access to. Clearly, some of the outcome and predictor variables may be more readily available than others, so data collection and preparation will be key to the successful implementation of the machine learning model.

Now, suppose that Valiant’s data scientist builds a model to predict the conversion outcomes of leads (i.e., whether the firm will be able to successfully place the candidate or not). Fig. 2 gives a simplified visual representation of the predictive performance that such a model might achieve. From the snapshot in the diagram, we see that the model has classified a total of 25 leads (shown as dots). Leads to the right of the dashed line have been classified as potential hits, while leads left of the line have been classified as potential misses; essentially, the model has predicted the probability of each lead converting, and leads with conversion probabilities greater than 50% have been classified as hits. Moreover, all 25 leads have by now been closed, and those that have actually converted (the true hits) are colored green, while those that have not (the true misses) are colored gray.

Fig. 2: Predictive Performance of a Lead Qualification Model (Source: Own Illustration)

To get a better feel for the predictive performance of the model shown in Fig. 2, Valiant’s data scientist can compute a few commonly used metrics:

Accuracy: This reflects the number of leads that were correctly classified as hits or misses, i.e., (6 + 14) / 25 = 20 / 25 = 80%.
Precision: This shows how many of the predicted hits also later turned out to be actual hits, i.e., 6 / (6 + 2) = 6 / 8 = 75%
Recall: This measures how many of the actual hits the model correctly predicted as being likely hits, i.e., 6 / (6 + 3) = 6 / 9 = 67%

Thus, we see that the model had a high predictive accuracy overall, albeit being a little conservative in its predictions (higher precision than recall). There is often a trade-off between a model’s precision and recall that needs to be considered — a stricter model tends to achieve a higher precision but lower recall, while a more relaxed model does the opposite. In the context of lead qualification at Valiant, both precision and recall can be important. Once the lead qualification model has been rolled out, the recruitment consultants are told to first work on the leads that are predicted to be hits (perhaps even in descending order of the underlying conversion probabilities) before moving on to the rest. As such, high model precision ensures that the consultants make the best use of their time by prioritizing leads that will actually end up converting more often than not. However, high recall is also important, since ignoring good leads that have wrongly been classified as misses by the model can result in missed opportunities; this can be particularly detrimental to Valiant’s financial performance during parts of the year when business is slow and there are few qualified leads for the recruitment consultants to work on.

Finally, it is worth noting that the attractiveness of lead profiles along any of the above qualification criteria (probability of conversion, ease of delivery, etc.) can change over time. With every phone call and meeting, the recruitment consultant hopefully gains a better understanding of the candidate’s preferences, skills and job prospects. Requirements on the client-side may also change in a dynamic business environment. Even Valiant’s own strategic focus may evolve from placing just data scientists and data engineers to also placing business intelligence engineers and other adjacent roles. The lead qualification process may score a given lead differently today that it might a month from now. A key implication of this is that all active leads should be rescored every so often to reflect changes in the lead profiles and the context in which they are assessed. The rescoring may simply happen on a pre-determined, regular basis (e.g., weekly), or be triggered by the occurrence of a particular event (e.g., a phone call with the candidate, acquisition of a new client, changes to Valiant’s OKRs).

Monetizing Leads Effectively

Having qualified the leads, Valiant can monetize leads by either converting them (successfully placing candidates) or selling the leads to partner agencies. Machine learning can drive monetization in at least five key ways: planning actions, pricing leads, packaging leads, pitching leads to clients and passing leads on to partner agencies — we can call these the “5 Ps of lead monetization”. Let us look at each of these Ps in turn.

Planning actions:

Predictive models built for qualifying leads can be useful for planning how to act on a given lead. For example, if a lead ranks high across all qualification criteria, then it might make sense to give it a high priority; this could mean handling it before other leads, and allocating it to one of Valiant’s more experienced recruitment consultants to increase the chance of closing the lead successfully. By contrast, if the job candidate is not suitable for a data science role and/or is not looking for a new job in the near future, then the lead might rank low on qualification criteria such as ease of conversion, ease of delivery, and strategic fit; in this case, Valiant may be better off selling the lead to a partner agency if possible.

Pricing leads:

Being able to assign a monetary value to a lead becomes especially important if Valiant wishes to sell the lead. In theory, the maximum sales price that Valiant can try to extract from a given lead is whatever the highest bidder is willing to pay. If the sales price is higher than the cost of generating the lead, then Valiant stands to make a profit. At the outset, however, Valiant may need to quote a suitable price without necessarily knowing the willingness-to-pay of potential lead buyers — and this is where the predictive models from the lead qualification phase can come in handy again. In particular, the lead price can be weighted by the predicted probabilities (or scores) across the qualification criteria to yield a certain level of profit on average.

To see how this might work, suppose for the sake of simplicity that we only care about a lead’s ease of conversion. We refer to a lead as “high-quality” if it is likely to be easy to convert (where the conversion rate is beyond some threshold value), and “low-quality” otherwise. Realistically, we might expect the lead generation cost and sales price of high-quality leads to be higher than those of low-quality leads. In mathematical terms, let C(High) and C(Low) be the lead generation costs of high-quality and low-quality leads, respectively, and let R(High) and R(Low) be the realistic sales prices or revenues that Valiant can obtain from high-quality and low-quality leads, respectively. If a fraction p of the generated leads is predicted to be high-quality, then the total expected profit would be p(R(High)-C(High)) + (1-p)(R(Low)-C(Low)). Crucially, Valiant can toggle the levers in the formula (p, R, and C) to determine the conditions necessary to achieve the desired level of profitability, and price the leads accordingly.

Packaging leads:

In addition to selling leads individually, Valiant can also sell multiple leads in packages to willing partner agencies. In such cases, the willingness-to-pay of the buying agency may be a function of the quality of the lead package as a whole. Presumably, a package that consists of mostly high-quality leads would command a high price, and lead buyers may have even more specific requirements in terms of the different qualification criteria. Having machine learning models to predict the quality of leads across multiple qualification criteria would make Valiant well-positioned to package the leads to meet the bespoke requirements of a given buyer. For instance, Valiant can put together a package of leads that has a certain expected ease of conversion and delivery, profit potential, and so on.

It is worth noting that, while packaging leads in this manner may seem novel in the context of talent recruitment, the practice is actually quite well-developed in other sectors like finance and insurance, where lead packages are structured using complex mathematical models to fit certain “risk profiles” (e.g., “AAA”, “BB”) — Valiant can potentially borrow some concepts from these sectors to structure its own lead packages.

Pitching leads to clients and passing on leads to partner agencies:

Machine learning can be used to help Valiant pitch leads to clients and pass them on to partner agencies in similar ways. In both cases, Valiant is faced with business-to-business (B2B) deal-making scenarios. A client will typically contract out the staffing of a particular data science role to Valiant with certain conditions (e.g., skills and experience level of the hire, salary range, starting date). Valiant can earn a commission if it pitches a candidate that is ultimately hired, regardless of the amount of effort involved on Valiant’s part in sourcing, vetting and guiding the candidate through the recruitment process. Being able to predict the quality of a talent pool in a particular region and timeframe can thus clearly help Valiant decide whether it is worth entering into a contract with a client or not. Meanwhile, selling leads to partner agencies means reallocating some of Valiant’s finite resources (recruiters, budget, time) from client work to lead sales — this might only make strategic sense if the expected value of selling leads in a particular context is higher than engaging in client work. Again, knowing what to expect from the talent pool can help Valiant decide how to allocate its resources in an optimal manner.

Fig. 3 shows how a predictive model can guide strategic B2B deal-making. The horizontal axis describes the quality of generated leads (e.g., in a particular city and time of year) in terms of the fraction (p) of leads with a high predicted quality. The vertical axis shows the expected profit for a given deal at different values of p. Fig. 3 shows a comparison of two example deals that Valiant may have to choose between. Notice that, although Deal 1 is less attractive than Deal 2 for low values of p, Deal 1 becomes more attractive once the fraction of high-quality leads exceeds the value p*. All else being equal, making a rational, strategic decision would seem possible for Valiant if it is able to derive the values of p* and p, which is precisely what a predictive lead qualification model can be used for.

Fig. 3: Strategic B2B Deal-Making (Source: Own Illustration)

The Wrap

Machine learning can help businesses generate better leads, qualify them with greater accuracy, and ultimately monetize them more effectively. Given the differing objectives of each phase, the problems that machine learning helps solve, as well as the performance and limitations of the solutions, differ across the three phases.

It is worth noting that, although Valiant is a fictitious company conjured up to help explain the value of machine learning by example, the problems that Valiant faces around lead management, and the opportunities to solve these problems using machine learning, are firmly rooted in reality. In fact, the case study of Valiant represents a somewhat stylized amalgamation of several similar data science projects that I — and likely others — have been involved with in the past. Besides recruiting, diverse industries ranging from finance and insurance to tourism and automotive retailing can all benefit greatly from more intelligent and scalable approaches to lead management.

Finally, while it is increasingly possible to buy software-as-a-service (SaaS) solutions for lead management, it is arguably still important for companies to drive their own lead management agenda. The agenda should, among other things, clarify the motivation for lead management (the “why”), set out success criteria for the SaaS system (e.g., minimum levels of accuracy, precision, etc.), and link them back to strategic KPIs (e.g., conversion rates, profit, etc.) that are going to be top-of-mind for the executives and investors of the company.