Reproducibility and provenance in AI workflows

Hi.
How do you manage reproducibility and provenance in AI workflows? Are there tools that track both code, data, and model lineage, ideally in an open science-friendly way?

Hi Irina,

For tracking both code and model lineage (including which datasets were used to train specific AI models), we recommend MLflow - the cloud version available through AI4EOSC.

To implement MLflow tracking in your source code, please refer to our documentation[1]. The process is straightforward: simply import the MLflow library, set an experiment name, and add the appropriate logging commands for your artifacts.

The documentation provides detailed examples and step-by-step instructions to help you get started with tracking your AI/ML experiments effectively.

[1] https://docs.ai4os.eu/en/latest/howtos/develop/mlflow.html

Hope it helps!

In addition, this is how MLflow fits in the overall provenance picture of the project: