MinIO Batch Framework Adds Support for Expiry

MinIO Batch Framework Adds Support for Expiry

You can now perform S3 Delete operations using the MinIO Batch Framework to remove multitudes of objects with a single API request. The MinIO Batch Framework lets you quickly and easily perform repetitive or bulk actions like Batch Replication and Batch Key-Rotate across your MinIO deployment. The MinIO Batch Framework handles all the manual work, including managing retries and reporting

Read more...

The Blog Year in Review: Top 10 for 2023

The Blog Year in Review: Top 10 for 2023

With only a few days left in 2023 (who else can’t believe it?), we have been taking some time to look back on what an amazing year it’s been. There have been so many highlights. Whether it’s been the many awards, conferences, or meeting so many of you, we are eternally grateful!  The biggest part of MinIO

Read more...

Distributed Training and Experiment Tracking with Ray Train, MLflow, and MinIO

Distributed Training and Experiment Tracking with Ray Train, MLflow, and MinIO

Over the past few months, I have written about a number of different technologies (Ray Data, Ray Train, and MLflow). I thought it would make sense to pull them all together and deliver an easy-to-understand recipe for distributed data preprocessing and distributed training using a production-ready MLOPs tool for tracking and model serving. This post integrates the code I presented

Read more...

Recent Launch of Amazon S3 Express One Zone Validates That Object Storage is Primary Storage for AI

Recent Launch of Amazon S3 Express One Zone Validates That Object Storage is Primary Storage for AI

We have made the case for several years that in modern data stacks object storage is primary storage.. This is even more true in the age of AI where enterprises focus almost exclusively on object storage. The modern data stack relies on disaggregated compute and storage alongside cloud-native microservices running in containers on Kubernetes. As more enterprises shift to this

Read more...

Data Science and AI with a SQL Server 2022 Data Lakehouse

Data Science and AI with a SQL Server 2022 Data Lakehouse

Microsoft SQL Server 2022 is one of the most commonly implemented enterprise relational databases. Many of the world's most successful companies, regardless of vertical, have significant SQL Server deployments. Thousands of companies have relied on SQL Server for decades. Microsoft has made great strides over the past decade in embracing open-source and standards-compliant technologies. The result is that

Read more...

Scaling up MinIO Internal Connectivity

Scaling up MinIO Internal Connectivity

A MinIO cluster operates as a uniform cluster. This means that any request must be seamlessly handled by any server. As a consequence, servers need to coordinate between themselves. This has so far been handled with traditional HTTP RPC requests - and this has served us well.  Whenever server A would like to call server B an HTTP request would

Read more...

Airgapped MinIO Deployments

Airgapped MinIO Deployments

In this post we’ll talk about what is an Airgapped Network, what to consider when deploying MinIO in such an environment and how to replicate and scale it thereafter with other airgapped sites.

Read more...

Distributed Data Processing with Ray Data and MinIO

Distributed Data Processing with Ray Data and MinIO

Introduction Distributed data processing is a key component of an efficient end-to-end distributed machine-learning training pipeline. This is true if you are building a basic neural network for statistical predictions where distributed training could mean each experiment runs in 10 minutes vs. an hour. It is also true if you are training or fine-tuning a Large Language Model (LLM) where

Read more...

AI/ML Reproducibility with lakeFS and MinIO

AI/ML Reproducibility with lakeFS and MinIO

This post was written in collaboration with Amit Kesarwani from lakeFS. The reality of running multiple machine learning experiments is that managing them can become unpredictable and complicated - especially in a team environment. What often happens is that during the research process, teams constantly change configuration and data between experiments. For example, try several training sets and several hyperparameter

Read more...