Let's now explore the options for any iterations in the production lifecycle.
The foremost action is monitoring the serving performance by evaluating the predictions against the eventually obtained true outcomes:
! forml model eval forml-solution-avazuctr \
--lower '2014-10-21 03:00:00'
0.38597474427877504
As time goes, the model is going to start indicating drift. Updating it with the recent data should be part of the regular process. With ForML, this involves producing new generation of the model:
! forml model train forml-solution-avazuctr \
--lower '2014-10-21 03:00:00' \
--upper '2014-10-21 05:00:00'
! forml model list forml-solution-avazuctr 0.1
1 2
Wait for the serving gateway to pick up the newer model (given its default latest
strategy) and try a new request:
! curl -H 'Content-Type: application/json' -d '[{ \
"hour": "2014-10-21 05:00:00", \
"banner_pos": "0", \
"site_id": "887a4754", "site_domain": "e3d9ca35", \
"site_category": "50e219e0", \
"app_id": "ecad2386", "app_domain": "7801e8d9", \
"app_category": "07d7df22", \
"device_id": "0e79d423", "device_ip": "9f423918", \
"device_model": "fc10a0d3", \
"device_type": "0", "device_conn_type": "0", \
"C1": "1002", "C14": "22701", "C15": "320", "C16": "50", \
"C17": "2624", "C18": "0", "C19": "35", "C20": "-1", "C21": "221" \
}]' http://127.0.0.1:8000/forml-solution-avazuctr
[{"c0":0.1343483272}]
The performance monitoring for the next period would then go like this:
! forml model eval forml-solution-avazuctr \
--lower '2014-10-21 05:00:00'
0.3717689632757772
There can be a number of reasons why just refreshing the model might not bring the required improvements and a true gain would only be possible through a conceptually new version of the (logical) model (i.e. its code). This involves a new development iteration(s) and eventually a new release of the model.
We will demonstrate this process by an attempt to simplify the model by removing some of the not-so-useful columns.
Let's tap into our pipeline just after the TargetEncoder
to be able to analyze that data:
from forml import project
from forml.pipeline import payload, wrap
from avazuctr import pipeline
with wrap.importer():
from category_encoders import TargetEncoder
PROJECT = project.open(path=".", package="avazuctr")
trainset = PROJECT.components.source.bind(
TargetEncoder(cols=pipeline.CATEGORICAL_COLUMNS)
).launcher.train()
Now we can simply calculate the pairwise feature correlations and filter anything above 0.9:
import pandas
corr = trainset.features.corr()
corr[corr > 0.90].dropna(thresh=2).dropna(thresh=2, axis=1)
C1 | site_id | site_domain | device_type | C14 | C15 | C16 | C17 | C21 | |
---|---|---|---|---|---|---|---|---|---|
C1 | 1.000000 | NaN | NaN | 0.932983 | NaN | NaN | NaN | NaN | NaN |
site_id | NaN | 1.00000 | 0.97686 | NaN | NaN | NaN | NaN | NaN | NaN |
site_domain | NaN | 0.97686 | 1.00000 | NaN | NaN | NaN | NaN | NaN | NaN |
device_type | 0.932983 | NaN | NaN | 1.000000 | NaN | NaN | NaN | NaN | NaN |
C14 | NaN | NaN | NaN | NaN | 1.000000 | NaN | NaN | 0.985713 | 0.913383 |
C15 | NaN | NaN | NaN | NaN | NaN | 1.0000 | 0.9425 | NaN | NaN |
C16 | NaN | NaN | NaN | NaN | NaN | 0.9425 | 1.0000 | NaN | NaN |
C17 | NaN | NaN | NaN | NaN | 0.985713 | NaN | NaN | 1.000000 | 0.925165 |
C21 | NaN | NaN | NaN | NaN | 0.913383 | NaN | NaN | 0.925165 | 1.000000 |
We see strong correlations between the following features:
device_type
and C1
site_domain
and site_id
C14
and C17
and C21
C15
and C16
Let's update our avazuctr/source.py and avazuctr/pipeline.py to keep only the first feature from each of the sets:
FEATURES
sequence to remove the device_type
, site_domain
, C15
, C17
, and C21
features.! git add avazuctr/source.py
CATEGORICAL_COLUMNS
sequence to remove the device_type
, site_domain
, C15
, C17
, and C21
features.! git add avazuctr/pipeline.py
Now we can evaluate the impact of this change:
! forml project eval
running eval 0.38607750713914
This even comes with a slightly improved loss!
Incrementing Project Version:
version
setting it to 0.2
:version = "0.2"
! git add pyproject.toml
Commit and tag the code:
! git commit -m 'Released 0.2'
! git tag 0.2
[main 330b371] Released 0.2 4 files changed, 7 insertions(+), 9 deletions(-) create mode 100644 application.py
Kick off the packaging and model publishing:
! forml project release
running bdist_4ml Collecting category-encoders==2.6.0 Using cached category_encoders-2.6.0-py2.py3-none-any.whl (81 kB) Collecting forml==0.93 Using cached forml-0.93-py3-none-any.whl (283 kB) Collecting imbalanced-learn==0.10.1 Using cached imbalanced_learn-0.10.1-py3-none-any.whl (226 kB) Collecting openschema==0.7 Using cached openschema-0.7-py3-none-any.whl (14 kB) Collecting pandas==2.0.1 Using cached pandas-2.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (12.3 MB) Collecting scikit-learn==1.2.2 Using cached scikit_learn-1.2.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (9.6 MB) Collecting numpy>=1.14.0 (from category-encoders==2.6.0) Using cached numpy-1.24.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (17.3 MB) Collecting scipy>=1.0.0 (from category-encoders==2.6.0) Using cached scipy-1.10.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (34.4 MB) Collecting statsmodels>=0.9.0 (from category-encoders==2.6.0) Using cached statsmodels-0.14.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.1 MB) Collecting patsy>=0.5.1 (from category-encoders==2.6.0) Using cached patsy-0.5.3-py2.py3-none-any.whl (233 kB) Collecting click (from forml==0.93) Using cached click-8.1.3-py3-none-any.whl (96 kB) Collecting cloudpickle (from forml==0.93) Using cached cloudpickle-2.2.1-py3-none-any.whl (25 kB) Collecting jinja2 (from forml==0.93) Using cached Jinja2-3.1.2-py3-none-any.whl (133 kB) Collecting packaging>=20.0 (from forml==0.93) Using cached packaging-23.1-py3-none-any.whl (48 kB) Collecting pip (from forml==0.93) Using cached pip-23.1.2-py3-none-any.whl (2.1 MB) Collecting setuptools (from forml==0.93) Using cached setuptools-67.8.0-py3-none-any.whl (1.1 MB) Collecting toml (from forml==0.93) Using cached toml-0.10.2-py2.py3-none-any.whl (16 kB) Collecting tomli (from forml==0.93) Using cached tomli-2.0.1-py3-none-any.whl (12 kB) Collecting joblib>=1.1.1 (from imbalanced-learn==0.10.1) Using cached joblib-1.2.0-py3-none-any.whl (297 kB) Collecting threadpoolctl>=2.0.0 (from imbalanced-learn==0.10.1) Using cached threadpoolctl-3.1.0-py3-none-any.whl (14 kB) Collecting python-dateutil>=2.8.2 (from pandas==2.0.1) Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB) Collecting pytz>=2020.1 (from pandas==2.0.1) Using cached pytz-2023.3-py2.py3-none-any.whl (502 kB) Collecting tzdata>=2022.1 (from pandas==2.0.1) Using cached tzdata-2023.3-py2.py3-none-any.whl (341 kB) Collecting six (from patsy>=0.5.1->category-encoders==2.6.0) Using cached six-1.16.0-py2.py3-none-any.whl (11 kB) Collecting MarkupSafe>=2.0 (from jinja2->forml==0.93) Using cached MarkupSafe-2.1.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (25 kB) Installing collected packages: pytz, tzdata, tomli, toml, threadpoolctl, six, setuptools, pip, packaging, numpy, MarkupSafe, joblib, cloudpickle, click, scipy, python-dateutil, patsy, jinja2, scikit-learn, pandas, statsmodels, imbalanced-learn, forml, openschema, category-encoders Successfully installed MarkupSafe-2.1.2 category-encoders-2.6.0 click-8.1.3 cloudpickle-2.2.1 forml-0.93 imbalanced-learn-0.10.1 jinja2-3.1.2 joblib-1.2.0 numpy-1.24.3 openschema-0.7 packaging-23.1 pandas-2.0.1 patsy-0.5.3 pip-23.1.2 python-dateutil-2.8.2 pytz-2023.3 scikit-learn-1.2.2 scipy-1.10.1 setuptools-67.8.0 six-1.16.0 statsmodels-0.14.0 threadpoolctl-3.1.0 toml-0.10.2 tomli-2.0.1 tzdata-2023.3 running upload
We should now see the new release in the registry:
! forml model list forml-solution-avazuctr
0.1 0.2
Let's train first generation model of this new release:
! forml model train forml-solution-avazuctr \
--upper '2014-10-21 03:00:00'
! forml model list forml-solution-avazuctr 0.2
1
! curl -H 'Content-Type: application/json' -d '[{ \
"hour": "2014-10-21 03:00:00", \
"banner_pos": "0", \
"site_id": "887a4754", "site_domain": "e3d9ca35", \
"site_category": "50e219e0", \
"app_id": "ecad2386", "app_domain": "7801e8d9", \
"app_category": "07d7df22", \
"device_id": "0e79d423", "device_ip": "9f423918", \
"device_model": "fc10a0d3", \
"device_type": "0", "device_conn_type": "0", \
"C1": "1002", "C14": "22701", "C15": "320", "C16": "50", \
"C17": "2624", "C18": "0", "C19": "35", "C20": "-1", "C21": "221" \
}]' http://127.0.0.1:8000/forml-solution-avazuctr
[{"c0":0.2057745536}]
! forml model eval forml-solution-avazuctr \
--lower '2014-10-21 03:00:00'
0.3866724370765236
Since we now have multiple model instances in our registry, we might want to change the selection strategy from the default latest to for example A/B testing. It takes only slight tweaking of the application descriptor.
Let's change the strategy so that the models get selected according to the following plan (target
is the weight this model should end up being selected with):
Release | Generation | Target |
---|---|---|
0.1 |
1 |
3 |
0.1 |
2 |
5 |
0.2 |
1 |
2 |
Now, add the generic application descriptor code to the application.py:
ABTest
selector strategy:from forml import application
selector = (
application.ABTest.compare(
project="forml-solution-avazuctr",
release="0.1",
generation=1,
target=3, # variant A 30%
)
.over(generation=2, target=5) # variant B 50%
.against(release="0.2", generation=1, target=2) # variant C 20%
)
application.setup(
application.Generic("forml-solution-avazuctr", selector)
)
! git add application.py
! forml application put application.py
!Restart the serving gateway!
Make a few requests and watch the x-forml-instance
header showing the model:
! curl -s -v -H 'Content-Type: application/json' -d '[{ \
"hour": "2014-10-21 03:00:00", \
"banner_pos": "0", \
"site_id": "887a4754", "site_domain": "e3d9ca35", \
"site_category": "50e219e0", \
"app_id": "ecad2386", "app_domain": "7801e8d9", \
"app_category": "07d7df22", \
"device_id": "0e79d423", "device_ip": "9f423918", \
"device_model": "fc10a0d3", \
"device_type": "0", "device_conn_type": "0", \
"C1": "1002", "C14": "22701", "C15": "320", "C16": "50", \
"C17": "2624", "C18": "0", "C19": "35", "C20": "-1", "C21": "221" \
}]' http://127.0.0.1:8000/forml-solution-avazuctr \
2> >(grep x-forml-instance)
< x-forml-instance: dispatch-registry-forml-solution-avazuctr-0.1-2 [{"c0":0.1364936214}]