Building a Scalable, Data Sovereign National ID System

Building a Scalable, Data Sovereign National ID System

Some of the smartest minds in philanthropy are backing the concept of a simple yet powerful national ID system. The Bill and Melinda Gates Foundation, the Tata Trusts, the Omidyar Network and the Pratiksha Trust have all gotten involved with this movement because of its foundational capabilities for enabling a wide range of social programmes. They have put their resources

Read more...

Percona Streaming Backup

Percona Streaming Backup

What is streaming mode? Essentially it allows you to backup with Percona xtraBackup without touching disk. When used alongside MinIO Jumbo, it is designed to upload and retrieve large objects from the MinIO cluster.

Read more...

Creating an ML Scenario in SAP Data Intelligence Cloud to Read and Model Data in MinIO

Creating an ML Scenario in SAP Data Intelligence Cloud to Read and Model Data in MinIO

Enterprise customers use MinIO to build data lakehouses to store a wide variety of structured and unstructured data, and work with it using ML and analytics. Data flows into MinIO from across the enterprise and the S3 API allows applications, such as analytics and AI/ML to work with it.   I previously blogged about building data pipelines with SAP Data

Read more...

The Disruptive Nature of Data Lakehouses

The Disruptive Nature of Data Lakehouses

Introduction In 1997, Clayton Christensen, in his book The Innovator’s Dilemma, identified a pattern of innovation that tracked the capabilities, cost, and adoption by market segment between an incumbent and a new entrant. He labeled this pattern “Disruptive Innovation.” Not every successful product is disruptive - even if it causes well-established businesses to lose market share or even fail

Read more...

A Developer’s Introduction to Apache Iceberg using MinIO

A Developer’s Introduction to Apache Iceberg using MinIO

Introduction Open Table Formats (OTFs) are a phenomenon in the data analytics world that has been gaining momentum recently. The promise of OTFs is as a solution that leverages distributed computing and distributed object stores to provide capabilities that exceed what is possible with a Data Warehouse.  The open aspect of these formats gives organizations options when it comes to

Read more...

MLflow Model Registry and MinIO

MLflow Model Registry and MinIO

Introduction MLflow Model Registry allows you to manage models that are destined for a production environment. This post picks up where my last post on MLflow Tracking left off. In my Tracking post I showed how to log parameters, metrics, artifacts, and models. If you have not read it, then give it a read when you get a chance. In

Read more...

MLflow Tracking and MinIO

MLflow Tracking and MinIO

Introduction It’s challenging to keep track of machine learning experiments. Let’s say you have a collection of raw files in a MinIO bucket to be used to train and test a model. There will always be multiple ways to preprocess the data, engineer features, and design the model. Given all these options, you will want to run many

Read more...

Parallel ML Experimentation leveraging MinIO & lakeFS

Parallel ML Experimentation leveraging MinIO & lakeFS

Introduction This post was written in collaboration with Iddo Avneri from lakeFS. Managing the growing complexity of ML models and the ever-increasing volume of data has become a daunting challenge for ML practitioners. Efficient data management and data version control are now critical aspects of successful ML workflows. In this blog post, we delve into the power of parallel ML

Read more...

Get Started with MinIO on Red Hat OpenShift for a PoC

Get Started with MinIO on Red Hat OpenShift for a PoC

When we announced the availability of MinIO on Red Hat OpenShift, we didn’t anticipate that demand would be so great that we would someday write a series of blog posts about this powerful combination. This combination is being rapidly adopted due to the ubiquitous nature of on-prem cloud and the need of large organizations wanting to bring their data

Read more...