[ad_1]
Using Causal Machine Learning to Narrow Campaign Target Audiences
As an aspiring data scientist, my academic experience has taught me to respect accuracy as the hallmark of a successful project. Industry, on the other hand, is concerned with making and saving money in the short and long term. This article is a lesson on ROI – return on investment – the holy grail of business operations.
Most advertising campaigns target segments of consumers rather than individuals. Examples of this are paid search, display ads, paid social, etc. On the other hand, direct-to-consumer (D2C) campaigns are targeted directly at individual consumers. This is direct mail, email, SMS or even push notifications. Businesses in the banking and fintech space can run massive D2C campaigns because everyone has an app. But these days, these businesses are looking for efficiencies in their advertising spend (how?).
With that in mind, let’s talk about a credit card issuer, Flex, that offers a free first year—that is, no annual fee. From the second year of use, it pays the full annual fee. Over the past 3 years, they have had a low annual retention rate with only 30% of holders renewing their card after the first year. Flex decides to experiment with upgrade offers to select customers to continue growing their customer base. The problem is – this strategy can be expensive if we’re not careful.
As data scientists, we are tasked with training the smallest group of target customers to extend these offers from a list of 5 million customers who are ready to upgrade.
For many years, data scientists have been busy building response models to predict the likelihood that a consumer will respond to a direct campaign. For new businesses, this can work, but as brands mature, their questions evolve.
Problems that are not addressed by response models are:
- How likely is a customer to respond if they participate in the campaign?
- How can we prioritize customers who are at risk of attrition? Who are they?
- Are there customers who may respond negatively to promotional messages? Who are they?
- How to reduce target users in a campaign without affecting additional revenue?
Enter elevation modeling. It is a machine learning technique that predicts the cumulative effect of a treatment on an individual’s purchase behavior, not just the likelihood of the behavior. This way, you can target customers who are most likely to be influenced by your campaign and avoid wasting resources on those who are not. This increases campaign ROI and customer satisfaction.
You may have seen this classification of users before. The of course Have a strong attachment to your brand or product and would make a purchase anyway. The lost causes You don’t need your product. It is unlikely that an advertising campaign will influence these two classes of consumers. The Sleeping dogs There are those who would have been sold if they hadn’t bothered with the stock. This is Persuasive which represent the greatest opportunity – they will only buy if they sell. They explain the ROI of the campaign.
In this task, we must first define Persuasive. Second, find the most suitable offer for each of them.
We have a dataset of 5 million customers with a 10-month tenure, which means they have 2 months to renew. This is a mock user data that you can create yourself with this python code.
We need to do EDA here and I used the ydata-profiling (formerly Pandas Profiling) tool to create an interactive report.
We have 20 user variables – both qualitative (such as age, income level) and quantitative (transactions, spending across categories). Some variables are quite highly correlated.
Flex has already run a pilot campaign on 50K users with the message below.
We are pleased to inform you that your credit card is eligible for renewal with a special offer. For a limited time, you can upgrade your credit card with a reduced annual fee of just $49, saving you up to 50% off the regular fee. This offer is exclusive to our loyal customers, like you, who have been using our credit card for more than a year.
There were 3 offers depending on how much customers pay in the second year – 30%, 50% or 70%. The campaign concluded that the treated segments had a 55% retention rate, which is 25% (55 minus 30) of the control group that paid the full annual fee. This is called Average treatment effect (ATE).
We have campaign results and this data can be used to optimize the next campaign. For this we need to calculate Conditional Average Treatment Effect (CATE) For all users – this is a fancy name for a user-level effect.
Note – A pilot campaign is a small-scale test of an advertising or marketing strategy before it is launched on a larger scale. It allows marketers to evaluate the effectiveness, feasibility and costs of a strategy and to identify and solve any problems or challenges. A pilot campaign can help optimize the marketing plan, increase the return on investment and reduce the risks of failure.
Propensity score matching (PSM) aims to match clients with similar probabilities of receiving treatment based on their observed characteristics. PSM can help reduce bias caused by confounding variables in observational studies where random assignment to treatment is not possible. It involves estimating a propensity score for each customer, which is the conditional probability of treatment given the covariates, and then matching treated and untreated customers with similar scores.
Since we have 3 different treatments in the pilot campaign, I will use PSM to determine an identical control group for each treatment group. Example – A set of customers in a control group (who paid the full annual fee) who are similar to customers who received Annual fee x 30% treatment. And similarly, for groups Annual fee x 50% and Annual fee x 70%. This eliminates any confounding variables in the experimental setup, allowing us to isolate the true lift for each treatment group.
Typically, propensity scores are calculated using simple logistic regression models. I also recommend packages like PSMP Which do it well and also handle the class imbalance for you.
After propensity score matching, we have 3 pairs of data sets –
(Control₃0, Treatment₃0)
(control₅0, treatment₅0)
(Control₇₀, Treatment₇₀)
I used these pairs to build 3 models, one for each treatment group, using the X-learner algorithm in the CausalML library. SHAP values can be used to test which features are associated with elevation.
We construct 3 Qini curves where we see the cumulative lift from adding customers to the target starting from the highest to the lowest CATE. It is similar to the ROC curve in traditional machine learning. The bottom line is the elevation from random assignment to treatment/control. Here we refer to the area under the lift curve or Qini – the higher the better.
as expected Annual fee x 30% The treatment has the highest Kin score. Now the models are ready and we can apply them to new data.
We’re approaching 5 million users who are ready to upgrade. We have the ability to offer them Annual fee x 30%, Annual fee x 50% or Annual fee x 70%. or we offer them nothing – Full annual fee. With three X-students, I predict a CATE from each of them. Maximum CATE will be treated The best treatment. If all treatments have similar CATE (within +-10% of each other), then we select Annual fee x 70% treatment (of course we want higher income). If the maximum CATE is negative, then we do not produce this user (they are sleeping dogs).
Here are our best assignments. About half a million clients are not recommended for treatment.
In this type of representation (see below), we divide clients into deciles based on CATE. Decile 1 has the highest CATE, and decile 10 has the lowest. If we give all consumers the same type of treatment, we will see the bottom deciles fall below 0 sooner. Therefore, we will stay The best treatment For our next campaign.
The Qini curve tells us that we can expect quite a lot of lift from running this campaign. There is no clear cut or inflection point in the curve to distinguish Persuasive.
The average lift for the upcoming campaign is expected to be 0.052. Deciles with higher than average lift are target customers. But for economy in this campaign, we’ll only take the top 20% and call them Persuasive. They are deciles with negative elevation Sleeping dogs. the rest or of course or lost causes.
It’s easier to visualize Persuasive In this update The best treatment plot. In this case they are the top 5 deciles.
We can’t report the lift to the business teams, so let’s scale that extra ROI and revenue. for the decile dAdditional ROI is
Income is the total amount of renewal fees per decile. The cost of the campaign is part of the renewal fee, which is borne by Flex itself. We see that it is only profitable to offer discounts to the top 7 deciles or the top 70% of customers.
The top 20%, that is, Pto understand80% of total revenue is expected to come from these 5 million user upgrades. This is often observed in business and is called the Pareto Principle. Such bar charts can be created for CLV (Customer Lifetime Value) as well as to study the long-term ROI of a campaign.
So let’s answer the question – who do we turn to? This is Persuasive which are about 1 million users. How to personalize their offer? We use the best treatment with the highest conditional average treatment effect.
In this way, Uplift Modeling identifies the customers who will bring the highest ROI to the campaign and targets them accordingly. In doing so, lift modeling optimizes the campaign’s return on investment and minimizes waste.
[ad_2]
Source link