Latest AI Tools for AI Media Processing NEWS

[ad_1]

Monitoring Model Performance in the MLOps Pipeline with Python

Image by rawpixel.com on Freepik

A machine learning model is only useful if it is used in production to solve business problems. However, the business problem and machine learning model is constantly evolving. That’s why we need to maintain machine learning so that performance aligns with business KPIs. This is where the MLOps concept came from.

MLOps, or Machine Learning Operations, is a collection of techniques and tools for machine learning in manufacturing. From machine learning to automation, versioning, delivery and monitoring is what MLOps handles. This article will focus on monitoring and how we use Python packages to get the monitoring model working in production. Let’s go in.

When we talk about monitoring in MLOps, it can refer to many things because one of the principles of MLOps is monitoring. For example:

– Track data distribution changes over time

– Track features used in development and production

– Disassembly of the monitor model

– Monitoring of model performance

– Observe system vacuum

There are still many elements to monitor in MLOps, but in this article we will focus on monitoring model performance. Model performance, in our case, refers to the model’s ability to make reliable predictions from unseen data, measured by specific metrics such as precision, accuracy, recall, etc.

Why do we need to monitor model performance? This is to maintain the reliability of model predictions to solve the business problem. Before production, we often calculate the performance of the model and its impact on KPI; For example, the baseline is 70% accuracy if we want our model to still follow the needs of the business, but below this is unacceptable. That is why performance monitoring will allow the model to always meet business requirements.

Using Python, we will learn how to monitor the model. Let’s start by installing the package. There are many choices for model monitoring, but for this example, we’ll use an open source monitoring package called obvious.

First, we need to install the package explicitly with the following code.

After installing the package, we will download an example dataset, insurance claim data from Kaggle. Also, we clean the data before using it further.

import pandas as pd

df = pd.read_csv("insurance_claims.csv")

# Sort the data based on the Incident Data
df = df.sort_values(by="incident_date").reset_index(drop=True)

# Variable Selection
df = df[
    [
        "incident_date",
        "months_as_customer",
        "age",
        "policy_deductable",
        "policy_annual_premium",
        "umbrella_limit",
        "insured_sex",
        "insured_relationship",
        "capital-gains",
        "capital-loss",
        "incident_type",
        "collision_type",
        "total_claim_amount",
        "injury_claim",
        "property_claim",
        "vehicle_claim",
        "incident_severity",
        "fraud_reported",
    ]
]

# Data Cleaning and One-Hot Encoding
df = pd.get_dummies(
    df,
    columns=[
        "insured_sex",
        "insured_relationship",
        "incident_type",
        "collision_type",
        "incident_severity",
    ],
    drop_first=True,
)

df["fraud_reported"] = df["fraud_reported"].apply(lambda x: 1 if x == "Y" else 0)

df = df.rename(columns="incident_date": "timestamp", "fraud_reported": "target")

for i in df.select_dtypes("number").columns:
    df[i] = df[i].apply(float)

data = df[df["timestamp"] < "2015-02-20"].copy()
val = df[df["timestamp"] >= "2015-02-20"].copy()

In the code above, we select a few columns for model training purposes, convert them to a numeric representation, and split the data into reference (data) and current data (val).

We need reference or baseline data to monitor model performance in the MLOps pipeline. Usually, this is data separated from training data (for example, test data). Also, we need current data or data not seen by the model (input data).

Clearly use it to monitor data and model performance. Since data drift will affect model performance, it should also be considered.

from evidently.report import Report
from evidently.metric_preset import DataDriftPreset

data_drift_report = Report(metrics=[
    DataDriftPreset(),
])

data_drift_report.run(current_data=val, reference_data=data, column_mapping=None)
data_drift_report.show(mode="inline")

Obviously the package automatically displays a report of what happened to the database. The information includes data drift and column drift. For the example above, we don’t have a case of data drift, but two columns have drifted.

The report shows that the column “property_claim” and “timestamp” are indeed detected as drift. This information can be used in the MLOps pipeline to train the model, or we still need to investigate the data further.

If needed, we can also get the above data reported in a log dictionary object.

data_drift_report.as_dict()

Next, let’s try to train a classifier model from the data and try to explicitly monitor the performance of the model.

from sklearn.ensemble import RandomForestClassifier

rf = RandomForestClassifier()
rf.fit(data.drop(['target', 'timestamp'], axis = 1), data['target'])

It will obviously require both target and predictor columns in the reference and current database. Let’s add the model prediction to the data set and obviously use it to monitor the performance.

data['prediction'] = rf.predict(data.drop(['target', 'timestamp'], axis = 1))
val['prediction'] = rf.predict(val.drop(['target', 'timestamp'], axis = 1))

As a note, it is better to have reference data that are not training data for real cases to monitor model performance. Let’s set up model performance monitoring with the following code.

from evidently.metric_preset import ClassificationPreset

classification_performance_report = Report(metrics=[
    ClassificationPreset(),
])

classification_performance_report.run(reference_data=data, current_data=val)

classification_performance_report.show(mode="inline")

As a result, we get that the quality metric of the current model is lower than the reference (expected since we use training data for reference). Depending on the business requirements, the above metrics can become an indicator of the next step we should take. Let’s see some other information that we obviously get from the report.

The class representation report shows the actual distribution of the class.

The confusion matrix shows how the predicted values were relative to the actual data in both the reference and current data sets.

Quality metrics by class show how each class is performing.

As before, we can convert the classification performance report into a dictionary log with the following code.

classification_performance_report.as_dict()

That’s all for now. You can set up a model performance monitor in obviously any MLOps pipeline you currently have and it will still work perfectly.

Monitoring the performance of the model is an important task in the MLOps pipeline because it is what will help us maintain how our model meets the business requirements. With a Python package called obvious, we can easily set up a model performance monitor that can be integrated into any existing MLOps pipeline.

Cornelius Judah Vijaya is an Assistant Data Science Manager and Data Writer. While working full-time at Allianz Indonesia, he enjoys sharing Python and data tips through social media and writing media.

[ad_2]

Source link

An in-depth guide to the Meta LLaMa language model and LlaMa 2

Large language models for multilingual AI-driven virtual assistants

How SAS can help catapult practitioners’ careers

Leave A Reply Cancel Reply

Monitoring Model Performance in the MLOps Pipeline with Python

Related Posts

An in-depth guide to the Meta LLaMa language model and LlaMa 2

Large language models for multilingual AI-driven virtual assistants

How SAS can help catapult practitioners’ careers

Leave A Reply Cancel Reply