Creating an ML Scenario in SAP Data Intelligence Cloud to Read and Model Data in MinIO

Creating an ML Scenario in SAP Data Intelligence Cloud to Read and Model Data in MinIO

Enterprise customers use MinIO to build data lakehouses to store a wide variety of structured and unstructured data, and work with it using ML and analytics. Data flows into MinIO from across the enterprise and the S3 API allows applications, such as analytics and AI/ML to work with it.   I previously blogged about building data pipelines with SAP Data

Read more...

Object Detection Made Simple with MinIO and YOLO

Object Detection Made Simple with MinIO and YOLO

Tl;dr: In this post, we will create a custom image dataset and then train a You-Only-Look-Once (YOLO) model for the ubiquitous task of object detection. We will then implement a system using MinIO Bucket Notifications that can automatically perform inference on a new image. Introduction: Computer vision remains an extremely compelling application of artificial intelligence. Whether it’s recognizing

Read more...

A Developer’s Introduction to Apache Iceberg using MinIO

A Developer’s Introduction to Apache Iceberg using MinIO

Introduction Open Table Formats (OTFs) are a phenomenon in the data analytics world that has been gaining momentum recently. The promise of OTFs is as a solution that leverages distributed computing and distributed object stores to provide capabilities that exceed what is possible with a Data Warehouse.  The open aspect of these formats gives organizations options when it comes to

Read more...

MLflow Model Registry and MinIO

MLflow Model Registry and MinIO

Introduction MLflow Model Registry allows you to manage models that are destined for a production environment. This post picks up where my last post on MLflow Tracking left off. In my Tracking post I showed how to log parameters, metrics, artifacts, and models. If you have not read it, then give it a read when you get a chance. In

Read more...

Anomaly Detection from Log Files: The Performance at Scale Use Case

Anomaly Detection from Log Files: The Performance at Scale Use Case

Driving competitive advantage by employing the best technologies separates great operators from good operators.  Discovering the hidden gems in your corporate data and then presenting key actionable insights to your clients will help create an indispensable service for your clients, and isn’t this what every executive wishes to create?   Cloud-based data storage (led by the likes of Amazon S3,

Read more...

MLflow Tracking and MinIO

MLflow Tracking and MinIO

Introduction It’s challenging to keep track of machine learning experiments. Let’s say you have a collection of raw files in a MinIO bucket to be used to train and test a model. There will always be multiple ways to preprocess the data, engineer features, and design the model. Given all these options, you will want to run many

Read more...

AI/ML Best Practices During a Gold Rush

AI/ML Best Practices During a Gold Rush

Introduction The California Gold Rush started in 1848 and lasted until 1855. It is estimated that approximately 300,000 people migrated to California from other parts of the United States and abroad. Economic estimates suggest that, on average, only half made a modest profit. The other half either lost money or broke even. Very few gold seekers made a significant

Read more...

Parallel ML Experimentation leveraging MinIO & lakeFS

Parallel ML Experimentation leveraging MinIO & lakeFS

Introduction This post was written in collaboration with Iddo Avneri from lakeFS. Managing the growing complexity of ML models and the ever-increasing volume of data has become a daunting challenge for ML practitioners. Efficient data management and data version control are now critical aspects of successful ML workflows. In this blog post, we delve into the power of parallel ML

Read more...

Setting up a Development Machine with MLFlow and MinIO

Setting up a Development Machine with MLFlow and MinIO

About MLflow MLflow is an open-source platform designed to manage the complete machine learning lifecycle. Databricks created it as an internal project to address challenges faced in their own machine learning development and deployment processes. MLflow was later released as an open-source project in June 2018. As a tool for managing the complete lifecycle, MLflow contains the following components. * MLflow

Read more...

Enhance Large Language Models Leveraging RAG and MinIO on cnvrg.io

Enhance Large Language Models Leveraging RAG and MinIO on cnvrg.io

This post was written in collaboration with Harinder Mashiana from cnvrg.io. Large language models (LLMs) have revolutionized the world of technology, offering powerful capabilities for text analysis, language translation, and chatbot interactions. The revolution will heavily impact businesses, according to OpenAI, approximately 80% of the U.S. workforce could have at least 10% of their work tasks affected by

Read more...

Object Management for AI/ML

Object Management for AI/ML

Introduction In a few previous posts on AI/ML, I mentioned that one of the benefits of MinIO is that you have tools for Versioning, Lifecycle Management, Object Locking, Object Retention and Legal Holds. These capabilities have a variety of uses. You may need a simple way to keep track of training experiments. You could also use these features to

Read more...

The Architect’s Guide to Storage for AI

The Architect’s Guide to Storage for AI

This post first appeared in The New Stack. Developers gravitate to technologies that are software defined, open source, cloud native and simple. That essentially defines object storage. Introduction Choosing the best storage for all phases of a machine learning (ML) project is critical. Research engineers need to create multiple versions of datasets and experiment with different model architectures. When a

Read more...