Quickstart¶
In this section, users will learn
- the data input required to create a basic MMM
- the fit and predict process of MMM
- basic usage of the MMM
import pandas as pd
import matplotlib.pyplot as plt
import karpiu
from karpiu.models import MMM
pd.set_option("display.float_format", lambda x: "%.3f" % x)
print(karpiu.__version__)
0.0.1
%load_ext autoreload
%autoreload 2
Data Input¶
RAW_DATA_FILE = "resource/seasonal/df.csv"
SCALABILITY_FILE = "resource/seasonal/scalability_df.csv"
paid_channels = ["promo", "radio", "search", "social", "tv"]
This is the core spend and response(the sales) input data.
df = pd.read_csv(RAW_DATA_FILE, parse_dates=["date"])
df.head(5)
date | sales | promo | radio | search | social | tv | new-years-day | martin-luther-king-jr-day | washingtons-birthday | ... | labor-day | columbus-day | veterans-day | thanksgiving | christmas-day | independence-day-observed | juneteenth-national-independence-day-observed | juneteenth-national-independence-day | christmas-day-observed | new-years-day-observed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2019-01-01 | 631.000 | 1070.000 | 7319.000 | 0.000 | 2530.000 | 8755.000 | 1 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 2019-01-02 | 397.000 | 1926.000 | 5729.000 | 4189.000 | 1635.000 | 5621.000 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 2019-01-03 | 530.000 | 2224.000 | 0.000 | 4820.000 | 0.000 | 13586.000 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 2019-01-04 | 766.000 | 2405.000 | 3163.000 | 0.000 | 0.000 | 10953.000 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 2019-01-05 | 1168.000 | 2122.000 | 8359.000 | 2937.000 | 0.000 | 0.000 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
5 rows × 22 columns
Note that the one-hot-encoded dummies are already appended in the dataframe. To use them, users need to supply the list of the events to the model.
event_cols = [
"new-years-day",
"martin-luther-king-jr-day",
"washingtons-birthday",
"memorial-day",
"independence-day",
"labor-day",
"columbus-day",
"veterans-day",
"thanksgiving",
"christmas-day",
"independence-day-observed",
"juneteenth-national-independence-day-observed",
"juneteenth-national-independence-day",
"christmas-day-observed",
"new-years-day-observed",
]
The scalability_df
is used to determine a normalizer of a channel with respect to its spend volume.
scalability_df = pd.read_csv(SCALABILITY_FILE)
scalability_df.head(5)
regressor | scalability | |
---|---|---|
0 | promo | 3.000 |
1 | radio | 1.250 |
2 | search | 0.800 |
3 | social | 1.300 |
4 | tv | 1.500 |
Fitting a Basic MMM¶
mmm = MMM(
kpi_col="sales",
date_col="date",
spend_cols=paid_channels,
scalability_df=scalability_df,
event_cols=event_cols,
seed=2022,
seasonality=[7, 365.25],
fs_orders=[2, 3],
events_sigma_prior=0.3,
)
2023-12-10 14:40:50 - karpiu-mmm - INFO - Initialize model
For now, users can skip the hyper-parameters fitting section and directly set some prepared hyper-parameters.
best_params = {
"damped_factor": 0.949,
"level_sm_input": 0.00245,
}
mmm.set_hyper_params(best_params)
2023-12-10 14:40:50 - karpiu-mmm - INFO - Set hyper-parameters. 2023-12-10 14:40:50 - karpiu-mmm - INFO - Best params damped_factor set as 0.94900 2023-12-10 14:40:50 - karpiu-mmm - INFO - Best params level_sm_input set as 0.00245
Fit the model with supplied dataframe.
mmm.fit(df, num_warmup=1000, num_sample=1000)
2023-12-10 14:40:50 - karpiu-mmm - INFO - Fit final model. 2023-12-10 14:40:50 - karpiu-mmm - INFO - Deriving saturation constants... 2023-12-10 14:40:50 - karpiu-mmm - INFO - Derived saturation constants. 2023-12-10 14:40:50 - karpiu-mmm - INFO - Build a default regression scheme 2023-12-10 14:40:50 - orbit - INFO - Sampling (PyStan) with chains: 1, cores: 8, temperature: 1.000, warmups (per chain): 1000 and samples(per chain): 1000.
chain 1 | | 00:00 Status
2023-12-10 14:43:50 - karpiu-mmm - INFO - Spend channels regression coefficients sum (0.44208885) is within common range (0, 0.8].
Extracting Insights from the Model¶
Attribution¶
from karpiu.explainability import AttributorGamma
ATTR_START = "2019-03-01"
ATTR_END = "2019-03-31"
attributor = AttributorGamma(model=mmm, start=ATTR_START, end=ATTR_END)
activities_attr_df, spend_attr_df, spend_df, cost_df = attributor.make_attribution()
2023-12-10 14:43:50 - karpiu-planning - INFO - Full calculation start=2019-03-01 and end=2019-03-31 2023-12-10 14:43:50 - karpiu-planning - INFO - Attribution start=2019-03-01 and end=2019-03-31
from karpiu.plots import (
plot_attribution_with_time,
plot_attribution_waterfall,
ColorConstants,
)
ax = plot_attribution_with_time(
model=mmm,
attr_df=activities_attr_df,
figsize=(10, 5.5),
colors=ColorConstants.RAINBOW_SIX,
show=False,
dt_col="date",
include_organic=False,
)
ax.set_title("Attribution by Activities Date", fontdict={"fontsize": 12})
ax.set_xlabel("Date", fontdict={"fontsize": 12})
ax.set_ylabel("Sales", fontdict={"fontsize": 12})
fig = ax.figure
fig
Users can also compare sales composition aggregated by channels.
ax = plot_attribution_waterfall(
model=mmm,
attr_df=spend_attr_df,
figsize=(8, 4.5),
include_organic=False,
colors=ColorConstants.RAINBOW_SIX,
show=False,
)
ax.set_title("Sales Decomposition", fontdict={"fontsize": 12})
fig = ax.figure
fig
Cost and Efficiency Analysis¶
from karpiu.planning.cost_curves import CostCurves
cc = CostCurves(
model=mmm,
curve_type="individual",
n_steps=50,
spend_start=ATTR_START,
spend_end=ATTR_END,
)
cc.generate_cost_curves()
2023-12-10 14:43:51 - karpiu-planning - INFO - Minimum spend threshold is hit in some channel(s). Update with value 0.001. 2023-12-10 14:43:51 - karpiu-planning - INFO - Derived channels multipliers based on input spend.
0%| | 0/5 [00:00<?, ?it/s]
To prevent overflow, users can provide a scaler in plotting the cost curves.
cc.plot(spend_scaler=10, outcome_scaler=10, include_organic=False);
Model Regression Coefficents¶
mmm.get_regression_summary()
regressor | sign | coef_p50 | coef_p05 | coef_p95 | Pr(coef >= 0) | Pr(coef < 0) | loc_prior | scale_prior | |
---|---|---|---|---|---|---|---|---|---|
0 | promo | Positive | 0.046 | 0.040 | 0.052 | 1.000 | 0.000 | 0.000 | 0.100 |
1 | radio | Positive | 0.061 | 0.056 | 0.065 | 1.000 | 0.000 | 0.000 | 0.100 |
2 | search | Positive | 0.174 | 0.167 | 0.182 | 1.000 | 0.000 | 0.000 | 0.100 |
3 | social | Positive | 0.094 | 0.089 | 0.100 | 1.000 | 0.000 | 0.000 | 0.100 |
4 | tv | Positive | 0.067 | 0.062 | 0.072 | 1.000 | 0.000 | 0.000 | 0.100 |
5 | s7_fs_cos1 | Regular | 0.181 | 0.178 | 0.185 | 1.000 | 0.000 | 0.000 | 0.300 |
6 | s7_fs_cos2 | Regular | 0.102 | 0.098 | 0.105 | 1.000 | 0.000 | 0.000 | 0.300 |
7 | s7_fs_sin1 | Regular | -0.626 | -0.630 | -0.622 | 0.000 | 1.000 | 0.000 | 0.300 |
8 | s7_fs_sin2 | Regular | 0.006 | 0.003 | 0.009 | 0.995 | 0.005 | 0.000 | 0.300 |
9 | s365.25_fs_cos1 | Regular | -0.100 | -0.107 | -0.093 | 0.000 | 1.000 | 0.000 | 0.300 |
10 | s365.25_fs_cos2 | Regular | -0.004 | -0.008 | -0.000 | 0.043 | 0.957 | 0.000 | 0.300 |
11 | s365.25_fs_cos3 | Regular | -0.040 | -0.042 | -0.037 | 0.000 | 1.000 | 0.000 | 0.300 |
12 | s365.25_fs_sin1 | Regular | 0.013 | 0.007 | 0.020 | 0.999 | 0.001 | 0.000 | 0.300 |
13 | s365.25_fs_sin2 | Regular | 0.022 | 0.018 | 0.026 | 1.000 | 0.000 | 0.000 | 0.300 |
14 | s365.25_fs_sin3 | Regular | -0.137 | -0.140 | -0.134 | 0.000 | 1.000 | 0.000 | 0.300 |
15 | christmas-day | Regular | 0.127 | 0.079 | 0.177 | 1.000 | 0.000 | 0.000 | 0.300 |
16 | christmas-day-observed | Regular | -0.440 | -0.520 | -0.362 | 0.000 | 1.000 | 0.000 | 0.300 |
17 | columbus-day | Regular | -0.220 | -0.269 | -0.173 | 0.000 | 1.000 | 0.000 | 0.300 |
18 | independence-day | Regular | 0.028 | -0.017 | 0.075 | 0.851 | 0.149 | 0.000 | 0.300 |
19 | independence-day-observed | Regular | -0.016 | -0.087 | 0.048 | 0.351 | 0.649 | 0.000 | 0.300 |
20 | juneteenth-national-independence-day | Regular | -0.038 | -0.121 | 0.043 | 0.225 | 0.775 | 0.000 | 0.300 |
21 | juneteenth-national-independence-day-observed | Regular | -0.057 | -0.138 | 0.029 | 0.137 | 0.863 | 0.000 | 0.300 |
22 | labor-day | Regular | -0.225 | -0.270 | -0.178 | 0.000 | 1.000 | 0.000 | 0.300 |
23 | martin-luther-king-jr-day | Regular | -0.027 | -0.076 | 0.024 | 0.186 | 0.814 | 0.000 | 0.300 |
24 | memorial-day | Regular | -0.045 | -0.092 | 0.003 | 0.059 | 0.941 | 0.000 | 0.300 |
25 | new-years-day | Regular | 0.322 | 0.302 | 0.341 | 1.000 | 0.000 | 0.000 | 0.300 |
26 | new-years-day-observed | Regular | -0.005 | -0.502 | 0.492 | 0.493 | 0.507 | 0.000 | 0.300 |
27 | thanksgiving | Regular | -0.201 | -0.250 | -0.153 | 0.000 | 1.000 | 0.000 | 0.300 |
28 | veterans-day | Regular | -0.265 | -0.313 | -0.218 | 0.000 | 1.000 | 0.000 | 0.300 |
29 | washingtons-birthday | Regular | 0.164 | 0.118 | 0.212 | 1.000 | 0.000 | 0.000 | 0.300 |
Forecast Future Outcome¶
Prediction interval is not supported yet but will be available in future version. Here a demo of using insert_event
and extend_ts_features
are also shown to generate future dataframe for long-term forecast checking.
from karpiu.utils import insert_events, extend_ts_features
from orbit.diagnostics.plot import plot_predicted_data, plot_predicted_components
future_df = extend_ts_features(df, 365, date_col="date")
future_df, event_cols = insert_events(future_df, date_col="date", country="US")
pred_df = mmm.predict(future_df)
plot_predicted_data(
training_actual_df=df[-365:],
predicted_df=pred_df[-(720):],
date_col="date",
actual_col="sales",
);
pred_df = mmm.predict(future_df, decompose=True)
plot_predicted_components(
predicted_df=pred_df[-720:],
date_col="date",
plot_components=[
"trend",
"paid",
"events",
"s-7 seasonality",
"s-365.25 seasonality",
],
);
Dump Model for Future Usage¶
import pickle
with open("./resource/seasonal/model.pkl", "wb") as f:
pickle.dump(mmm, f, protocol=pickle.HIGHEST_PROTOCOL)