From 77ed02d1fe93001f3649b878ddbe28697e77d5d9 Mon Sep 17 00:00:00 2001 From: GitLab Bot Date: Tue, 17 Oct 2023 00:11:47 +0000 Subject: Add latest changes from gitlab-org/gitlab@master --- doc/development/data_science/index.md | 7 ++ .../data_science/model_registry/index.md | 98 ++++++++++++++++++++++ 2 files changed, 105 insertions(+) create mode 100644 doc/development/data_science/index.md create mode 100644 doc/development/data_science/model_registry/index.md (limited to 'doc/development/data_science') diff --git a/doc/development/data_science/index.md b/doc/development/data_science/index.md new file mode 100644 index 00000000000..832d3999645 --- /dev/null +++ b/doc/development/data_science/index.md @@ -0,0 +1,7 @@ +--- +stage: Data Science +group: ModelOps +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments +--- + +- [Model Registry](model_registry/index.md) diff --git a/doc/development/data_science/model_registry/index.md b/doc/development/data_science/model_registry/index.md new file mode 100644 index 00000000000..6ebf6430727 --- /dev/null +++ b/doc/development/data_science/model_registry/index.md @@ -0,0 +1,98 @@ +--- +stage: Data Science +group: ModelOps +info: To determine the technical writer assigned to the Stage/Group associated with this page, see https://about.gitlab.com/handbook/product/ux/technical-writing/#assignments +--- + +# Model Registry + +Model registry is the component in the MLOps lifecycle responsible for managing +model versions. Beyond tracking just artifacts, it is responsible to track the +metadata associated to each model, like: + +- Performance +- Parameters +- Data lineage + +## Data topology + +All entities belong to a project, and only users with access to the project can +interact with the entities. + +### `Ml::Model` + +- Holds general information about a model, like name and description. +- Each model as a default `Ml::Experiment` with the same name where candidates are logged to. +- Has many `Ml::ModelVersion`. + +#### `Ml::ModelVersion` + +- Is a version of the model. +- Links to a `Packages::Package` with the same project, name, and version. +- Version must use semantic versioning. + +#### `Ml::Experiment` + +- Collection of comparable `Ml::Candidates`. + +#### `Ml::Candidate` + +- A candidate to a model version. +- Can have many parameters (`Ml::CandidateParams`), which are usually configuration variables passed to the training code. +- Can have many performance indicators (`Ml::CandidateMetrics`). +- Can have many user defined metadata (`Ml::CandidateMetadata`). + +## MLflow compatibility layer + +To make it easier for Data Scientists with GitLab Model registry, we provided a +compatibility layer to [MLflow client](https://mlflow.org/docs/latest/python_api/mlflow.client.html). +We do not provide an MLflow instance with GitLab. Instead, GitLab itself acts as +an instance of MLflow. This method stores data on the GitLab database, which +improves user reliability and functionality. See the user documentation about +[the compatibility layer](../../../user/project/ml/experiment_tracking/mlflow_client.md). + +The compatibility layer is implemented by replicating the [MLflow rest API](https://mlflow.org/docs/latest/rest-api.html) +in [`lib/api/ml/mlflow`](https://gitlab.com/gitlab-org/gitlab/-/tree/master/lib/api/ml/mlflow). + +Some terms on MLflow are named differently in GitLab: + +- An MLflow `Run` is a GitLab `Candidate`. +- An MLflow `Registered model` is a GitLab `Model`. + +### Setting up for testing + +To test the an script with MLflow with GitLab as the backend: + +1. Install MLflow: + + ```shell + mkdir mlflow-compatibility + cd mlflow-compatibility + pip install mlflow jupyterlab + ``` + +1. In the directory, create a Python file named `mlflow_test.py` with the following code: + + ```python3 + import mlflow + import os + from mlflow.tracking import MlflowClient + + os.environ["MLFLOW_TRACKING_TOKEN"]='' + os.environ["MLFLOW_TRACKING_URI"]='/api/v4/projects//ml/mlflow' + + client = MlflowClient() + client.create_experiment("My first experiment") + ``` + +1. Run the script: + + ```shell + python mlflow_test.py + ``` + +1. Go to the project `/-/ml/experiments`. An experiment should have been created. + +You can edit the script to call the client methods we are trying to implement. See +[GitLab Model experiments example](https://gitlab.com/gitlab-org/incubation-engineering/mlops/model_experiment_example) +for a more complete example. -- cgit v1.2.3