Redefining Audience Targeting in the Digital Age with Predictive Analytics

How PCA (Principal Components Analysis) can be used to increase ROI (Return on Investment) by 200% +

Yaakov Bressler

--

SUMMARY

With the advent of digital media, potential consumers can be targeted and analyzed in every step of their engagement with a product or service. Methods of identifying potential audiences vary from firm to firm but generally involve a heavy research approach analyzing past performance and comparative campaigns.

If you’re only interested in the analytical tools + results, skip down to section 4.

Terms:

  • PCA: Principal Components Analysis. A statistical transformational procedure which realigns data into more meaningful dimensions.
  • KPI:Key Performance Indicator. A measurable variable which indicates performance, usually in comparative measures.
  • ROI: Return on Investment.

The article is broken down as follows:

  1. Audience Targeting in History
    a. in the Television Age
    b. in the Digital Age
  2. Audiences Targeting Today
    a. Good Practices
    b. Poor Practices
    c. Bigger is NOT Better
  3. Layout of Experimental Approach
  4. Redefining Analytic Benchmarks
    a. Linear Regression + Multi-Corroleation
    b. Principal Component Analysis (PCA)
    c. Formulaic Alignment with Linear Algebra.
  5. Results
    a. Implications
    b. Next Steps
  6. Conclusion + Personal Remarks

1. Audience Targeting in History

How audience targeting was done, historically.

Television Age:

During the golden age of television (1950–1980), audiences had limited channels available and thus consumed their media with relative uniformity Advertisers could reach massive uniform audiences with undifferentiated ads.

The advent of multichannel television and cable networks in the 1990’s changed this. Audiences began to migrate to specific channels and programs. The growth in cable programming during this period was tremendous, with the number of available channels choices doubling every 5 years.

The percentage of homes with cable TV went from 19.9% in 1980 to 56.4% in 1990. By 2000, more than 67.2% of homes had cable TV.¹

Such large choices in availability meant consumers could choose content specific to their interests. In turn, advertisers had to adapt and target differentiated groups rather than a population as a whole. Target audiences as defined by aggregated consumer information within an interest or preference came into the mainstream advertising.

Digital Age

The arrival of high speed internet, smart phones, and online streaming further decentralized audiences. By 2018, an estimated 95% of Americans own a cell phone and 77% own a smartphone.²

Rapid technological advancement at the turn of the 21st century, especially high speed mobile internet, stimulated additional media channels, further defragmented the mainstream audiences of the old days.

Interactive visualization.

With this boom in media sources, consumption behaviors correspondingly grew more dispersed, concentrating around increasingly specific common preferences via self-selection. This new digital environment with its countless platforms provides a unique opportunity for increased precision in targeting.

2. TARGET AUDIENCES TODAY

Given the volume of content in today’s digital age, only the most efficient and well-tailored content will reach its audiences with efficiency. Despite this, some companies choose to operate with an emphasis on volume, resulting in lost profit as well as an annoying customer experience.

In today’s superabundant media environment, every ad should be created and targeted with specificity.

Good Practices:

To elucidate, below is a photo of Miranda Carfrae, three time Ironman world champion. Assume she enters a marketplace and a seller has an opportunity to offer her a product based on her features. A good practice would be to notice her most prominent features — her muscle composition, perfect running form, and wind-swept hair–tells you that she is a world class athlete. A bad practice would be to summarize Miranda’s demographic profile, that she’s a 37 year old married woman.

Miranda Carfrae’s perfect running form tells you that she’s a world class athlete.

In the good practices scenario, Miranda will likely be offered elite-grade sunscreen, running sneakers, and cycling technology. In the bad practices scenario, Miranda will likely be offered personal care products and latest fashion trends. It should be obvious as to which scenario will outperform the other.

Stop: If you are not convinced that the good practices scenario will outperform the other, take a moment and run the following experiment:

Good Practices: Find a few female friends above the age of 30 who run marathons. Tell them you have a really good ultra-endurance deodorant, for marathon running. Cost is $20. Ask them if they want you to send them the link to it.

Bad Practices: Now, find a few female friends above the age of 30 who do not run marathons. Tell them you have a really good deodorant, it has a refreshing scent. Cost is $20. Ask them if they want you to send them the link to it.

→ Count and compare number of yeses and nos.

Plethoric Distribution — Why some ads are terribly annoying:

Rather than refine their targeting, some companies differ to distributing their ads in abundance, a behavior I term plethoric distribution. The principal behind this practice is the mathematical calculation ROI = $Z — Y * $X where $Z revenue is earned after an ad is delivered Y instances at $X per ad instance (cost per ad delivery — CPAD). By increasing their Y these companies determine that their campaigns will yield a profit.

Ads in abundance are annoying yet these annoying ads are in abundance.

Evidently, annoying business attitudes are in abundance too.
Evidently, annoying business attitudes are in abundance too.
Evidently, annoying business attitudes are in abundance too.
Evidently, annoying business attitudes are in abundance too.
Evidently, annoying business attitudes are in abundance too.
Evidently, annoying business attitudes are in abundance too.

There we go. See? It works.
(No, it doesn’t. And it’s annoying)

This same principles stimulates fruit bearing trees to spread extreme amounts of pollen during the Spring season, their aim in fertilization. Humans, however, differ in that their mobility and intelligence enables them to behave in ways that can increase the success of fertilization — both in reproduction, as well as idea propagation.

Pollen being spread with plethoric distribution. For a cooler explanation, check out this youtube video.

Plethoric distribution is the act of overwhelming an environment with vehicles to reach a target, as exemplified by tree pollen during reproductive seasons.

Show me the math!

The mathematics further demonstrate this point. Below is a visualization of the relationship between ROI and Y in ROI = $Z — Y * $X. A smaller Y— a result of good targeting —dramatically increases ROI.

It’s clear that campaigns with smaller and more engaged audiences outperform their comparatives. ROI is boosted and the internet becomes less annoying.

Side note: if you dislike seeing ads on your browser, like me, feel free to install AdBlock on your browser

Targeting with Machine Learning

In contrast to the plethoric approach, many companies target their ads on super summarized groups, almost exclusively on past performance such as sales. This approach, however, is limited by the success of past choices and limits the opportunity for new opportunities (epsilon greedy, ε = 0.1, in green). Alternatively, a heavy exploratory approach (epsilon greedy, ε = 0.01 in orange) results in increased long term performance, at a high initial cost.

Don’t get fooled by quick returns. (Visualization from this article by Mohit Mayank)

Alright, that’s enough of the background info. On towards the data science content (pun).

Source: GFYCAT

4. Redefining Analytic Benchmarks

The obvious starting point for this project, outside of experimental layout, was to redefine analytical benchmarks (i.e. stepwise KPIs or significant variables).

Linear Regression + Multi-Correlation:

The set of ads I was analyzing were conducted on Facebook and Instagram. Ad reports consisted of about 20 non-redundant columns. Upon closer look, these were able to be reduced to 15 independent variables, 8 of which were quantitative.

My decided first step was to generate a series of regression and correlation matrixes (shown below) to get a feel for the relationships contained within.

Left: linear regression matrix | Center: scatter plot, KDE, and KDE-2D matrix | Right: heat map of a Pearson’s correlation matrix.

A quick glance at the linear regression (left, upper quadrant) indicates that many of plots have neat relationships. The scatter (center, upper quadrant) and two variable KDE (left, lower quadrant) shows several messy inverse polynomial relationships. Additionally, the single variable KDE (center, diagonal) shows bifurcated distribution, suggesting underlying traits which separate the data into identifiable groups. Pearson correlation (right) support these observed relationships.

The relationship between CPC (Cost Per Click) and CTR (Click Through Rate) were particularly interesting. When fitted to a 4th order polynomial (shown below, left), it’s clear that trials with higher CTRs have similar CPCs. Additionally, the weight of CTR is misleading in that increased CTR delivers more costly results, until its upper percentiles. Cross comparing these two variables on a KDE shows that their distributions are similarly skewed, thus co-variance is minimized and the two indeed correlated. Of note, these variables share similar variance skews as Frequency (ad deliveries per period), thus more variables are related than meet the eye.

Left: 4th order polynomial fitting to CTR vs. CPC | Right: KDE fit of CPC, CPR, and Frequency

When plotted on two dimensional scatterplots, the data becomes increasingly more straightforward:

Reach is inversely related to CPC , especially as Reach grows. The size of the markers in this graph reflects Clicks, a reasonable outcome given that larger groups will have larger observable behavior. We also know from the previous analysis that a high CTR has a low CPC, especially under the $0.50 mark.

CPR has a less apparent relationship with Leads, perhaps because there were fewer observations. However, the campaigns in the top left hand corner have the best outcome, given their high results and low cost.

As the relationships within became increasingly observable and increasingly complex, it becomes increasingly clear that a more powerful statistical tool would need to be employed to properly compute best performance.

Forming a Hypothesis: Formulaic Alignment with Algebraic Factor Analysis

Factor analysis is an algebraic approach to simplifying causative variables. Given the observed relationships discussed earlier, I had a strong conviction of what I considered to be the most determinant variables.

  • Reach + Impressions: lower values are desired. (5 customers per 1000 impressions is not as valuable as 5 per 50.)
  • Leads: greater values are desired. (The campaign was trying to achieve leads.)
  • Cost per Click (CPC): lower values are desired.

Thus, my formula was as follows: Performance = Leads / ( Reach * CPC)

Applying this formula to my dataset across three dimensions yielded the following:

Applying PCA to the dataset would be a good test for this formula, in addition to reliably determining best performance.

Principal Component Analysis (PCA)

Principal Component Analysis (PCA) is a programmatic statistical tool which employs linear algebra to identify the weight of each column in a data set.

The output PCs (Principal Components) are abstract representations of the dataset’s variables. Their values are eigenvectors — cosines of orthogonal (perpendicular) rotations of the original variables.

For a fuller explanation, visit Matt Brems’ article A One-Stop Shop for Principal Component Analysis.

The direct output of PCA is the eigenvalue of the matrix of a dataset, summarized by dimensions or Principal Components (PC). Employing explained variance for each PC yields the added value of each added dimension. Essentially, PCA can reduce a dataset to its simplest dimensions.

A visual explanation of dimensional reduction through PCA. Credit to “amoeba” from StackOverflow, full article here.

Explained variance ratio and its cumulative sum for each Principal Component (PC) of my dataset is displayed below. The first PC accounts for ~60% of the variance within the dataset. Thus:

  • At 1D, our dataset is 60% represented.
  • At 2D, our dataset is 80% represented.
  • At 3D, our dataset is 91% represented.
  • At 4D, our dataset is 97.5% represented.

Thus, the data relevant to our analysis most likely sits in 3 dimension with PCA features: PC_0, PC_1, PC_2. There remains a strong possibility we’d need to utilize a 4th dimension with PC_3.

Of note, the labels on graphs throughout are PC_N where N + 1 is the dimension of the factor.

When plotted, these PCA factors reveal interesting information about the dimensions of our dataset.

In 1D, the winners as per determined with the hypothetical formula appear somewhat symmetrically distributed along the PC_0 axis at two intervals equal from the origin. In 2D representations, PC_1, PC_2 and PC_3 are more difficult to unpack. I purposefully left out the visualization of 1D, which is discussed way below.

2D representation of our data set.

When Leads are included in PCA — thus trials which achieved no leads are filtered out of the study — the data remains difficult to comprehend. Strong clustering is exhibited along PC_1 and PC_2, but not along PC_0.

At this point, I suspected that PC_0 was a noisy variable. I decided to progress to 4D and include PC_3. Moving PC_0 to size where size = abs(PC_0), I produced the following visualization:

Nailed it! There’s our winning cluster is located in the quadrant described by: PC_1+, PC_2-, PC_3- Our winners can be defined by their distance from this proximal point. Hypothesis is proven correct!

Revisiting PC_0, it seems off that this value doesn’t translate to hierarchal ranking. When examined more closely, it appears that the equation below is a clear predictor for performance, except for lowest scores:

# n_winners represents the number of desired winning trials
n = n_winners // 2
# Create an empty array, then add winners to it
winners = np.empty(0)
#PC_0 is an array of values which can be indexed for later reference.
np.append(winners, PC_0.sort()[:n])
np.append(winners, PC_0.sort()[-n:])
# Print your outcome
print('Your winning PC_0 values are: {}'.format(winners))

A possible reason for this puzzling relationship is the interplay between some of the punishable and rewarding factors related to mixed result variables, such as Spend. Of note, Spend is positively related toLink Clicks but also positively related to CPR. Because PCA doesn’t differentiate between these groups in the context of other noisy variables without pre-processing, Spend is determined as mixed indicator and returns confusing results. It’s possible that the two trials at the low end of PC_0 are result of similarly mixed variables.

So it turns out, factor analysis backed by PCA is the way to go, for this campaign.

To summarize, the formula Performance = Leads / ( Reach * CPC) turns out to be a statistically reliable measure of performance, in this campaign. PCA determines that this data set sits within 4 dimensions and can be stratified according to Performance = PC_1 * -(PC_2, PC_3). Similar to the hypothetical formula, a similar pattern exists with positive and negative values.

5. RESULTS

Utilization of this equation in determining which trials were to move forward drastically increased results. Progress was non-linear, especially given its multidimensionality, but the final outcome delivered several new populations with a low CPR and high Leads at 260% increase value, compared to utilizing a single variable as a KPI.

Results were increased by 260%

That’s a shocking difference. Here’s a visualization of the compared outcomes, in 3D.

The magnitude of utilizing such techniques is tremendous.

Implications:
PCA combined with Factor Analysis can increase efficiency in marketing campaigns by enormous margins. Given the breadth of complexity in marketing campaigns, a PCA backed formula can help marketers make decisions based on seemingly contradicting data.

Next Steps:

As digital media continues to differentiate, I expect graceful and relevant advertising to become the expected norm. As such ads develop, they’ll become increasingly valuable both for users and businesses.

I estimate that the next stage for “smart advertisement technologies” is the development of sophisticated decision making algorithms, backed by machine learning systems. In the near future, marketing teams will be accommodated by smart technologies which will allow them to be creative and out of the box without shooting in the dark.

The diversity of individual preferences and personalities bring flavor to our existence. I look forward to a world where individuality (or, rather, the choice of it) becomes as useful and as private as people wish, without compromising their ability to choose it.

6. Conclusion + Personal Remarks

I really enjoyed working on a project which included so many of my interests and passions.

Marketing should be useful to consumers — and not annoying. Everyone benefits (long term) when we follow this basic rule.

Data Science is an exciting and often times terribly frustrating endeavor. However, in the end, the “aha!” moment is all worth it!

Credit to GIFY via Wired

CITATIONS:

  1. Census Bureau. Section 31, Statistical abstract of the United States. Government Printing Office, 1999. (LINK)
  2. Pew Research Center. Demographics of Mobile Device Ownership and Adoption in the United States. 5 Feb. 2018, www.pewinternet.org/fact-sheet/mobile/. Accessed 3 Dec. 2018.

--

--