AI/ML workflows with AIStor and Metaflow

AIStor follows the ethos of starting small, even on your laptop and scaling up to a full blown production cluster. The reason for this is so that we want to make the initial entry point to using the application as painfree and simple as possible. This has the ability for the app to quickly gain traction and be used more often to test the use cases.

We’re happy to see other apps in the ecosystem follow this same model. In this case it's Metaflow. Metaflow is a human-friendly Python library that makes it straightforward to develop, deploy, and operate various kinds of data-intensive applications, in particular those involving data science, ML, and AI. Metaflow was originally developed at Netflix to boost the productivity of data scientists who work on a wide variety of projects, from classical statistics to state-of-the-art deep learning.

AIStor and Metaflow can be deployed anywhere on any type of infrastructure. In fact more often than not we have folks deploying AIStor and metaflow alongside each other as part of the deployment pipeline.

Integrate with AIStor

Integration with AIStor is pretty simple with Metaflow.

Download and launch MinIO AIStor on your Kubernetes cluster, we’ll launch metaflow in the same cluster as well.

Follow the instructions below to set up a bucket in AIStor.

<div>

  <script async src="https://js.storylane.io/js/v2/storylane.js"></script>

  <div class="sl-embed" style="position:relative;padding-bottom:calc(79.17% + 25px);width:100%;height:0;transform:scale(1)">

<iframe loading="lazy" class="sl-demo" src="https://app.storylane.io/demo/cesgrcyf9wnq?embed=inline" name="sl-embed" allow="fullscreen" allowfullscreen style="position:absolute;top:0;left:0;width:100%!important;height:100%!important;border:1px solid rgba(63,95,172,0.35);box-shadow: 0px 0px 18px rgba(26, 19, 72, 0.15);border-radius:10px;box-sizing:border-box;"></iframe>

  </div>

</div>

Next lets setup Metaflow, there are a few environment variables we need to setup to connect to AIStor

  • `METAFLOW_DEFAULT_DATASTORE` - Set this to s3
  • METAFLOW_DATASTORE_SYSROOT_S3 - set this to s3://<bucket_name>
  • METAFLOW_S3_ENDPOINT_URL - set this to AIStor endpoint http://<minio>:9000/

Once these variables are set, launch Metaflow.

It's as simple as that.

Grow based on need 

Metaflow, just like AIStor, is designed to grow as you need. Most data scientists and AI/ML engineers want to start off with test data on their local environment, without the complexities of the infrastructure, just to make sure the focus is on the product and not on scaling. But once the product is proven then it comes time to scale and the nuances that come with it.

If you have any questions AIStor be sure to reach out to us at hello@min.io.