Reproducible, Portable, and Distributable ML Solutions in Python¶

Agenda¶

Block Chapters
1. Introduction Motivation, Context, Definitions
Domain Model Properties
ML Lifecycle Patterns
Tooling Comparison
2. ForML Tutorial Data Abstraction
Task Dependency Management
Evaluation
Project Management
3. Avazu CTR Solution Setup & Exploration
Formal Base Model
Pipeline Enhancements
Release & Deployment
Production Lifecycle Iterations

Setup¶

  1. Clone the workshop repository:
$ git clone git@github.com:formlio/mlprague23.git
$ cd mlprague23
  1. Install Docker Engine along with the Docker Compose plugin (should be already part of any recent Docker engine version).
  2. Spin up the workspace container from within the mlprague23 project root directory (this will need to bind ports 8888, 8000, and 4040 on your machine):
$ docker compose up -d
  1. Load the workspace notebook interface at http://127.0.0.1:8888/lab using your browser.

Opening Remarks¶

  • To demonstrate all the core principles, we are going to use the opensource tool ForML.
    • It's a development framework and MLOps platform for the lifecycle management of data science projects.
    • Give it a Star on GitHub!
  • For practical reasons, we choose to use a couple of traditional tools (Pandas, Scikit-learn) - they are by no means essential to any of the demonstrated principles.
  • Participants are encouraged to follow with hands-on engaging in the practical exercises.
  • We are using JupyterLab environment which works great (not only) for an interactive tutorial even though it doesn't shine in terms of reproducibility (see the famous talk at JupyterCon 2018 by Joel Grus).
  • Alternatively, feel free to follow the content in the form of slides (though they sometimes overflow with bulky output content).