Udacity Data Science Nanodegreee Capstone — Starbucks
This blog is part of Udacity’s Data Science Nanodegree and aims to throw light on the Capstone project. The GitHub Repo for the below findings is here.
Table of contents:
- Introduction
- Dataset Exploration
- Understanding offer types and events
- Data Cleaning
- EDA
- Modeling
- Final Thoughts
1. Introduction:
Once every few days, Starbucks sends out an offer to users of the mobile app. An offer can be merely an advertisement for a drink or an actual offer such as a discount or BOGO (buy one get one free). Some users might not receive any offers during certain weeks. Not all users receive the same offer, and that is the challenge to solve with this data set.
The task is to combine transaction, demographic, and offer data to determine which demographic groups respond best to which offer type.
Every offer has a validity period before the offer expires. As an example, a BOGO offer might be valid for only 5 days. You’ll see in the data set that informational offers have a validity period even though these ads are merely providing information about a product; for example, if an informational offer has 7 days of validity, you can assume the customer is feeling the influence of the offer for 7 days after receiving the advertisement.
We are given transactional data showing user purchases made on the app including the timestamp of purchase and the amount of money spent on a purchase. This transactional data also has a record for each offer that a user receives as well as a record for when a user actually views the offer. There are also records for when a user completes an offer.
1.1 Example:
To give an example, a user could receive a discount offer buy 10 dollars get 2 off on Monday. The offer is valid for 10 days from receipt. If the customer accumulates at least 10 dollars in purchases during the validity period, the customer completes the offer.
However, there are a few things to watch out for in this data set. Customers do not opt into the offers that they receive; in other words, a user can receive an offer, never actually view the offer, and still complete the offer. For example, a user might receive the “buy 10 dollars get 2 dollars off offer”, but the user never opens the offer during the 10 day validity period. The customer spends 15 dollars during those ten days. There will be an offer completion record in the data set; however, the customer was not influenced by the offer because the customer never viewed the offer.
1.2 ML Problem Statement:
Create a model that learns from historical data and helps the business predict whether or not a particular user will be influenced by an offer and end up completing it.
2. Data set Exploration:
We have 3 datasets. Below is an overview of each:
2.1 Portfolio:
This dataset provides metadata about the offers.
2.2 Profile:
This dataset provides metadata about the user profiles.
2.3 Transcript data:
This is the base transactional data with events, timestamps across each profile.
2.4. Combining the 3 datasets:
After some basic data cleanup, we combine the 3 datasets and it looks like this:
We will be using the combined data for the next steps.
3. Understanding offer types and events:
We have 4 event types: offer received, offer viewed, offer completed, and transactions. However, Bogo/discount offers have 3 events at most:
- offer received, offer viewed, offer completed.
Also, Informational offers have at the most 2 events:
- offer received, offer viewed.
It is interesting to note that transactions don’t have an offer associated with them. Looking at the sample for a few customers, we understand that the BOGO/discount offers have the following happy path:
Here how the above path flows:
- Seems the user
recieved
the 1st BOGO offer at time = 0. - They
viewed
the offer at time = 66 - Given that the offer
difficulty
was 5 andtransaction
was worth $21, this offer is marked ascompleted
at the same timestamp as the transaction. - Needless to say, this is a happy path and won’t always be the case. We can see a similar pattern for
discount
offer.
Informational offers have a similar path but no event called offer completed. We will have to engineer one.
4. Data cleaning:
Below are the definitions and considerations pertaining to the data cleanup pipeline.
4.1. Event space:
- We will define an event space for a user as all the events that occur between 2
received
offers. The event space will always begin when a userreceives
an offer. So as an example, this is a potential event space:
Offer Received
-> Offer Viewed
-> Transaction
-> Offer Completed
- We looked at the happy path for all 3 offer types earlier. However, that won’t always be the case for each event space. There could a wide variety of possible combinations and that’s the entire challenge. Some of the other possibilities:
Offer received
-> transaction
-> Offer completed
-> Offer viewed
Offer received
-> transaction
-> Offer viewed
and so on.
4.2 Completed Informational offers:
- As seen earlier, Bogo and Discount offers can be
completed
whereas users cannot complete informational offers. We will create proxy logic to counter this.
Proposed logic:
- If a user receives an informational offer and later views and transacts, in that order, the offer will be assumed to be “completed.” The rationale being the user was “influenced” by the offer. The happy path would look something like below.
4.3 Defining Target variable:
- The goal defined earlier is 2-fold:
- Will the user be influenced by the offer?
- Will the user complete the offer? - We will have to design our target variable to capture both these aspects.
- Influence: We will define the scope of influence as
trasactions
occurring after an offer isviewed
. The rationale is that there are offerscompleted
even when the user doesn'tview
the offer. We will consider such events as "not influenced" by the offer. - Completed: We have
offer completed
event for BOGO/ discount offers. For informational offers, if the offer was viewed and a transaction was registered within the validity period, it is complete. - Target Variable: A successful event space, our first class, will have
transaction
influenced by the offer and anoffer completed
event as well. Below is the path an event space will have to follow to be successful:
Offer Received
-> Offer Viewed
-> Transaction
-> Offer Completed
- All other event spaces will be considered unsuccessful, our second class.
- The target variable will be a boolean and be called:
viewed_before_completed
.
4.4 Cleaning pipeline process flow:
- The goal for data cleaning: Create a pipeline that boils each event space into a single row and captures relevant pieces of information.
- We will create 3 functions:
transcript_treatment
: This will be the "master" function that will loop through each user's event space and clean the data. Depending on the offer received in the event space, it will call one of the 2 other functions.bogo_discount_treatment
: This function will be used to clean the BOGO/discount offers.informational_treatment
: This function will be used to clean the informational offers.
Below is the process flows at a high level:
Using this pipeline to clean our dataset gives us a single row for each event space and also our target variable i.e. viewed_before_completed.
5. EDA
Now, we did perform a bunch of additional EDA which I won’t cover in detail here but here are a few key takeaways:
- We have slightly imbalanced classes as less than 50% of offers seem to be completed. However, this shouldn’t be a major problem.
- Bogo/ Discount offers seem to have a lot more spend than informational offers.
- We have a lot more males in the dataset vs females and others.
- A lot more men seem to be in the relatively low-income groups vs females.
6.Modeling:
- Before any modeling, we create Stratified test and training sets as we have some imbalance in class distribution.
- We later use StandardScaler to scale numerical columns and OneHotEncoder to encode categorical columns.
- We have a full pipeline using ColumnTransformer that takes care of these operations. We will be using the same pipeline to transform the test set as well.
6.2 Performance metric:
- I am making an assumption that we care more about identifying users who would engage in an “effective order”. Another way to think about this is that we want to identify all the users who eventually engaged in an effective order. To this end, we wouldn’t mind a few false positives as long as we identify all the effective orders. So, we care more about recall than precision.
6.3 CV across multiple models:
- We ran 3 fold CV across multiple models. Below is the test recall for each one of those:
6.4 GridSearchCV:
- We shortlisted XGBoost, Gradient boost, Random Forest as our models of choice and performed GridSearchCV to tune them further.
6.5: Ensemble and prediction:
- Once the models were tuned, we sklearn’s VotingClassifier to combine the 3. After fitting on the entire training set and predicting on the test set, we got a test recall of 82%.
7. Final thoughts
- The overall problem was way more challenging and exploratory in nature than I had previously anticipated and took me a lot longer than I had thought. There were loads of nuggets of learning that I was able to pick up along the way.
- One of the biggest takeaway and one which is relevant to all data science problems was to take a step back now and again to think about the bigger picture, business use case, and the end goal. Clearly defining the goal for each sub-task was also critical.
7.2 Potential Improvements:
- One of the sticking points was that users receive the offer and complete it but never view it. They get a reward for the completed offers, but it might not do the business any good as it might be cash wasted by the company. The business could consider adding an “Activate” option to each offer. The user will have to activate the offer to attain the rewards associated with the offer. Such a system would incentivize the user to interact with the Starbucks app actively. It would also impact the business’s OPEX. This system could be one of the drivers towards attaining a higher interaction with the app. Data collected from increased interactions would also help drive initiatives for increased personalization for the user, increasing delight and NPS score.
- As for the model, as stated earlier, enriching the data with additional user demographics, offer data could be one way to improve the model.
Thank you for reading a rather long blog! Hope it made sense!