Modeling Adstock¶
In this section, users will learn how to input a pre-defined adstock weights into the MMM fitting process.
In [1]:
Copied!
import pandas as pd
import matplotlib.pyplot as plt
import karpiu
from karpiu.models import MMM
from karpiu.utils import insert_events, extend_ts_features
pd.set_option("display.float_format", lambda x: "%.3f" % x)
print(karpiu.__version__)
import pandas as pd
import matplotlib.pyplot as plt
import karpiu
from karpiu.models import MMM
from karpiu.utils import insert_events, extend_ts_features
pd.set_option("display.float_format", lambda x: "%.3f" % x)
print(karpiu.__version__)
0.0.1
In [2]:
Copied!
%load_ext autoreload
%autoreload 2
%load_ext autoreload
%autoreload 2
Data Input¶
Recall in quickstart, a demo on training a dataset without adstock is shown. This time the demo will add an extra arguement adstock_df
which is prepared by the user.
In [3]:
Copied!
RAW_DATA_FILE = "resource/full/df.csv"
SCALABILITY_FILE = "resource/full/scalability_df.csv"
ADSTOCK_FILE = "resource/full/adstock_df.csv"
paid_channels = ["promo", "radio", "search", "social", "tv"]
RAW_DATA_FILE = "resource/full/df.csv"
SCALABILITY_FILE = "resource/full/scalability_df.csv"
ADSTOCK_FILE = "resource/full/adstock_df.csv"
paid_channels = ["promo", "radio", "search", "social", "tv"]
This is the core spend and response(the sales) input data.
In [4]:
Copied!
df = pd.read_csv(RAW_DATA_FILE, parse_dates=["date"])
scalability_df = pd.read_csv(SCALABILITY_FILE)
adstock_df = pd.read_csv(ADSTOCK_FILE, index_col="regressor")
adstock_df.head(5)
df = pd.read_csv(RAW_DATA_FILE, parse_dates=["date"])
scalability_df = pd.read_csv(SCALABILITY_FILE)
adstock_df = pd.read_csv(ADSTOCK_FILE, index_col="regressor")
adstock_df.head(5)
Out[4]:
d_0 | d_1 | d_2 | d_3 | d_4 | d_5 | d_6 | d_7 | d_8 | d_9 | ... | d_18 | d_19 | d_20 | d_21 | d_22 | d_23 | d_24 | d_25 | d_26 | d_27 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
regressor | |||||||||||||||||||||
promo | 0.088 | 0.150 | 0.254 | 0.254 | 0.127 | 0.064 | 0.032 | 0.016 | 0.008 | 0.004 | ... | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
radio | 0.003 | 0.005 | 0.010 | 0.017 | 0.031 | 0.056 | 0.100 | 0.180 | 0.180 | 0.126 | ... | 0.005 | 0.004 | 0.002 | 0.002 | 0.001 | 0.001 | 0.001 | 0.000 | 0.000 | 0.000 |
search | 0.129 | 0.226 | 0.226 | 0.147 | 0.095 | 0.062 | 0.040 | 0.026 | 0.017 | 0.011 | ... | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
social | 0.033 | 0.050 | 0.075 | 0.112 | 0.168 | 0.168 | 0.118 | 0.083 | 0.058 | 0.040 | ... | 0.002 | 0.001 | 0.001 | 0.001 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
tv | 0.003 | 0.004 | 0.006 | 0.009 | 0.014 | 0.021 | 0.032 | 0.048 | 0.072 | 0.108 | ... | 0.029 | 0.025 | 0.021 | 0.018 | 0.015 | 0.013 | 0.011 | 0.009 | 0.008 | 0.007 |
5 rows × 28 columns
In [5]:
Copied!
event_cols = [
"new-years-day",
"martin-luther-king-jr-day",
"washingtons-birthday",
"memorial-day",
"independence-day",
"labor-day",
"columbus-day",
"veterans-day",
"thanksgiving",
"christmas-day",
"independence-day-observed",
"juneteenth-national-independence-day-observed",
"juneteenth-national-independence-day",
"christmas-day-observed",
"new-years-day-observed",
]
event_cols = [
"new-years-day",
"martin-luther-king-jr-day",
"washingtons-birthday",
"memorial-day",
"independence-day",
"labor-day",
"columbus-day",
"veterans-day",
"thanksgiving",
"christmas-day",
"independence-day-observed",
"juneteenth-national-independence-day-observed",
"juneteenth-national-independence-day",
"christmas-day-observed",
"new-years-day-observed",
]
Fitting a MMM with adstock_df¶
Once the user prepared a adstock_df, simply input it with the arg adstock_df
and rest of the steps are similar to regular mmm fitting.
In [6]:
Copied!
mmm = MMM(
kpi_col="sales",
date_col="date",
spend_cols=paid_channels,
scalability_df=scalability_df,
event_cols=event_cols,
seed=2022,
adstock_df=adstock_df,
seasonality=[7, 365.25],
fs_orders=[2, 3],
events_sigma_prior=0.3,
)
best_params = {
"damped_factor": 0.949,
"level_sm_input": 0.00245,
}
mmm.set_hyper_params(best_params)
mmm.fit(df, num_sample=1000, num_warmup=1000, chains=1)
mmm = MMM(
kpi_col="sales",
date_col="date",
spend_cols=paid_channels,
scalability_df=scalability_df,
event_cols=event_cols,
seed=2022,
adstock_df=adstock_df,
seasonality=[7, 365.25],
fs_orders=[2, 3],
events_sigma_prior=0.3,
)
best_params = {
"damped_factor": 0.949,
"level_sm_input": 0.00245,
}
mmm.set_hyper_params(best_params)
mmm.fit(df, num_sample=1000, num_warmup=1000, chains=1)
2023-12-10 14:37:31 - karpiu-mmm - INFO - Initialize model 2023-12-10 14:37:31 - karpiu-mmm - INFO - Set hyper-parameters. 2023-12-10 14:37:31 - karpiu-mmm - INFO - Best params damped_factor set as 0.94900 2023-12-10 14:37:31 - karpiu-mmm - INFO - Best params level_sm_input set as 0.00245 2023-12-10 14:37:31 - karpiu-mmm - INFO - Fit final model. 2023-12-10 14:37:31 - karpiu-mmm - INFO - Deriving saturation constants... 2023-12-10 14:37:31 - karpiu-mmm - INFO - Derived saturation constants. 2023-12-10 14:37:31 - karpiu-mmm - INFO - Build a default regression scheme 2023-12-10 14:37:31 - orbit - INFO - Sampling (PyStan) with chains: 1, cores: 8, temperature: 1.000, warmups (per chain): 1000 and samples(per chain): 1000.
chain 1 | | 00:00 Status
2023-12-10 14:40:08 - karpiu-mmm - INFO - Spend channels regression coefficients sum (0.5374089) is within common range (0, 0.8].
In [7]:
Copied!
import pickle
with open("./resource/full/model.pkl", "wb") as f:
pickle.dump(mmm, f, protocol=pickle.HIGHEST_PROTOCOL)
import pickle
with open("./resource/full/model.pkl", "wb") as f:
pickle.dump(mmm, f, protocol=pickle.HIGHEST_PROTOCOL)