The Disruptive Nature of Data Lakehouses

The Disruptive Nature of Data Lakehouses

Introduction In 1997, Clayton Christensen, in his book The Innovator’s Dilemma, identified a pattern of innovation that tracked the capabilities, cost, and adoption by market segment between an incumbent and a new entrant. He labeled this pattern “Disruptive Innovation.” Not every successful product is disruptive - even if it causes well-established businesses to lose market share or even fail

Read more...

A Developer’s Introduction to Apache Iceberg using MinIO

A Developer’s Introduction to Apache Iceberg using MinIO

Introduction Open Table Formats (OTFs) are a phenomenon in the data analytics world that has been gaining momentum recently. The promise of OTFs is as a solution that leverages distributed computing and distributed object stores to provide capabilities that exceed what is possible with a Data Warehouse.  The open aspect of these formats gives organizations options when it comes to

Read more...

MLflow Model Registry and MinIO

MLflow Model Registry and MinIO

Introduction MLflow Model Registry allows you to manage models that are destined for a production environment. This post picks up where my last post on MLflow Tracking left off. In my Tracking post I showed how to log parameters, metrics, artifacts, and models. If you have not read it, then give it a read when you get a chance. In

Read more...

MLflow Tracking and MinIO

MLflow Tracking and MinIO

Introduction It’s challenging to keep track of machine learning experiments. Let’s say you have a collection of raw files in a MinIO bucket to be used to train and test a model. There will always be multiple ways to preprocess the data, engineer features, and design the model. Given all these options, you will want to run many

Read more...

Parallel ML Experimentation leveraging MinIO & lakeFS

Parallel ML Experimentation leveraging MinIO & lakeFS

Introduction This post was written in collaboration with Iddo Avneri from lakeFS. Managing the growing complexity of ML models and the ever-increasing volume of data has become a daunting challenge for ML practitioners. Efficient data management and data version control are now critical aspects of successful ML workflows. In this blog post, we delve into the power of parallel ML

Read more...

Get Started with MinIO on Red Hat OpenShift for a PoC

Get Started with MinIO on Red Hat OpenShift for a PoC

When we announced the availability of MinIO on Red Hat OpenShift, we didn’t anticipate that demand would be so great that we would someday write a series of blog posts about this powerful combination. This combination is being rapidly adopted due to the ubiquitous nature of on-prem cloud and the need of large organizations wanting to bring their data

Read more...

Setting up a Development Machine with MLFlow and MinIO

Setting up a Development Machine with MLFlow and MinIO

About MLflow MLflow is an open-source platform designed to manage the complete machine learning lifecycle. Databricks created it as an internal project to address challenges faced in their own machine learning development and deployment processes. MLflow was later released as an open-source project in June 2018. As a tool for managing the complete lifecycle, MLflow contains the following components. * MLflow

Read more...

Using InfluxDB with MinIO

Using InfluxDB with MinIO

InfluxDB is built on the same ethos as MinIO. It is a single Go binary, cloud agnostic, lightweight, but is also feature packed with things like replication and encryption, and it provides integrations with various applications.

Read more...