Hi.
How do you manage reproducibility and provenance in AI workflows? Are there tools that track both code, data, and model lineage, ideally in an open science-friendly way?
Hi Irina,
For tracking both code and model lineage (including which datasets were used to train specific AI models), we recommend MLflow - the cloud version available through AI4EOSC.
To implement MLflow tracking in your source code, please refer to our documentation[1]. The process is straightforward: simply import the MLflow library, set an experiment name, and add the appropriate logging commands for your artifacts.
The documentation provides detailed examples and step-by-step instructions to help you get started with tracking your AI/ML experiments effectively.
[1] https://docs.ai4os.eu/en/latest/howtos/develop/mlflow.html
Hope it helps!
In addition, this is how MLflow fits in the overall provenance picture of the project: