MinIO Blog (Page 6)

Building on the Lessons from Kafka: How AutoMQ and MinIO Solve Cost and Elasticity Challenges

Brenna Buuck on Apache Kafka | 30 July 2024

Building on the Lessons from Kafka: How AutoMQ and MinIO Solve Cost and Elasticity Challenges

AutoMQ enhances Kafka's architecture by using MinIO's object storage, cutting costs, and boosting elasticity while keeping Kafka API compatibility. This combo offers scalable, secure, and efficient data streaming, ideal for diverse cloud environments.

The Rise of Iceberg: Transforming Data Architectures

Brenna Buuck

Brenna Buuck on Apache Iceberg | 29 July 2024

The Rise of Iceberg: Transforming Data Architectures

Iceberg is shifting the market's focus to scalable, cloud-native storage. This shift is leading to the commoditization of query engines, offering users more flexibility, better pricing, and innovation.

Build a Distributed Embedding Subsystem with MinIO, Langchain, and Ray Data

Keith Pijanowski Keith Pijanowski on AI/ML | 29 July 2024

An embedding subsystem is one of four subsystems needed to implement Retrieval Augmented Generation. It turns your custom corpus into a database of vectors that can be searched for semantic meaning. The other subsystems are the data pipeline for creating your custom corpus, the retriever for querying the vector database to add more context to a user query, and finally,

The Catalog’s “IT” Moment and What it Means For Object Storage and AI

Brenna Buuck

Brenna Buuck on Modern Data Lakes | 25 July 2024

The Catalog’s “IT” Moment and What it Means For Object Storage and AI

Catalogs are revolutionizing modern datalakes, with industry giants like Databricks and Snowflake adopting Apache Iceberg’s catalog REST API. A commitment to open standards enhances performance, fosters innovation, and transforms data management for AI and ML.

A Closer Look: MinIO Observability

AJ AJ on Observability | 24 July 2024

Observability is all about gathering information (traces, logs, metrics) with the goal of improving performance, reliability, and availability.

Bringing ARM into the AI Data Infrastructure Fold at MinIO Using SVE

Frank Wessels Frank Wessels on AI/ML | 22 July 2024

One of the reasons that MinIO is so performant is that we do the granular work that others will not or cannot. From SIMD acceleration to the AVX-512 optimizations we have done the hard stuff. Recent developments for the ARM CPU architecture, in particular Scalable Vector Extensions (SVE), presented us with the opportunity to deliver significant performance and efficiency gains

The Architect's Guide to the New Private Cloud

Ugur Tigli Ugur Tigli on Modern Data Lakes | 22 July 2024

This post initially appeared on The New Stack. For a few years there, the term “private cloud” had a negative connotation. But as we know, technology is more of a wheel than an arrow, and right on cue, the private cloud is getting a ton of attention and it is all positive. The statistics are clear, Forrester’s 2023 Infrastructure

Enhancing Modern Datalakes with a Robust Semantic Layer

Brenna Buuck

Brenna Buuck on Modern Data Lakes | 17 July 2024

Enhancing Modern Datalakes with a Robust Semantic Layer

The semantic layer in modern datalakes provides context and structure to raw data, crucial for key data initiatives like AI model training, data management and data governance. A unified strategy and robust infrastructure are essential for effective implementation of the semantic layer.

The App Store of OpenShift: MinIO in OperatorHub

AJ AJ , Cesar Celis Hernandez Cesar Celis Hernandez on Red Hat OpenShift | 17 July 2024

Simply put, OperatorHub to OpenShift is what App Store is to Apple. With a web console interface, an Operator can be pulled from its off-cluster source, installed and subscribed on the cluster, and made ready for engineering teams to self-service manage the product across deployment environments.

Architecting a Modern Data Lake

Raghav Karnam

Raghav Karnam on Architect's Guide | 16 July 2024

The Modern Datalake is one-half data warehouse and one-half data lake and uses object storage for everything. The use of object storage to build a data warehouse is made possible by Open Table Formats OTFs) like Apache Iceberg, Apache Hudi, and Delta Lake, which are specifications that, once implemented, make it seamless for object storage to be used as the

Data-Centric AI with Snorkel and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 10 July 2024

With all the talk in the industry today regarding large language models with their encoders, decoders, multi-headed attention layers, and billions (soon trillions) of parameters, it is tempting to believe that good AI is the result of model design only. Unfortunately, this is not the case. Good AI requires more than a well-designed model. It also requires properly constructed training

MinIO hits it out of the Boundary

AJ AJ on Object Storage | 10 July 2024

Boundary helps record SSH sessions to meet compliance and improve security requirements. These sessions are then stored on MinIO for fast retrieval for auditing purposes in case of a data breach incident.

The Significance of Databricks' Acquisition of Tabular: A Triumph for Open Frameworks in Data

Brenna Buuck

Brenna Buuck on Apache Iceberg | 3 July 2024

The Significance of Databricks' Acquisition of Tabular: A Triumph for Open Frameworks in Data

Databricks' acquisition of Tabular, founded by the creators of Apache Iceberg, underscores the importance of open frameworks in modern data lake design. Open frameworks ensure interoperability, flexibility, and simplicity, benefiting those leveraging data for AI.

The Architect's Guide to Machine Learning Operations (MLOps)

Keith Pijanowski Keith Pijanowski on AI/ML | 28 June 2024

MLOps, short for Machine Learning Operations, is a set of practices and tools aimed at addressing the specific needs of engineers building models and moving them into production. Some organizations start off with a few homegrown tools that version datasets after each experiment and checkpoint models after every epoch of training. On the other hand, many organizations have chosen to

Migrate to AI-Ready infrastructure: Hitachi Content Platform to MinIO

Brenna Buuck

Brenna Buuck on AI/ML | 26 June 2024

Migrate to AI-Ready infrastructure: Hitachi Content Platform to MinIO

Migrate from Hitachi Content Platform (HCP) to MinIO using the HCP-to-MinIO tool. Migration is a no-brainer given how MinIO offers modern, scalable, high-performance storage optimized for AI.

Earn your RAG-ing rights with MinIO

Dileeshvar Radhakrishnan

Dileeshvar Radhakrishnan , AJ AJ on AI/ML | 26 June 2024

In this blog, we will demonstrate how to use MinIO to build a Retrieval Augmented Generation(RAG) based chat application using commodity hardware.

The Real Reasons Why AI is Built on Object Storage

Sidharth Rajaram

Sidharth Rajaram @sidharrrrrth on AI/ML | 24 June 2024

The Real Reasons Why AI is Built on Object Storage

tl;dr: In this post, we will explore four technical reasons why AI workloads rely on high performance object store. 1. No Limits on Unstructured Data In the current paradigm of machine learning, performance and ability scales with compute, which is really a proxy for dataset size and model size (Scaling Laws for Neural Language Models, Kaplan et. al.). Over

The Architect’s Guide to the GenAI Tech Stack - Ten Tools

Keith Pijanowski Keith Pijanowski on AI/ML | 24 June 2024

This post first appeared on The New Stack on June 3rd, 2024. I previously wrote about the modern data lake reference architecture, addressing the challenges in every enterprise — more data, aging Hadoop tooling (specifically HDFS) and greater demands for RESTful APIs (S3) and performance — but I want to fill in some gaps. The modern data lake, sometimes referred to as

WARP speed your AI data storage Infrastructure

AJ AJ on AI/ML | 19 June 2024

Do you know the secret to some of the best AI models out there? It's the amount of data they had access to on which they could be trained on. For AI/ML models Fast accessible Data is King. Let me emphasize, it's not just Data, but fast accessible Data.

Dell ECS Data Movement to MinIO

AJ AJ on Cloud Repatriation | 5 June 2024

Dell ECS's “Data Movement”, also called copy-to-cloud is a feature introduced in ECS 3.8.0.1 that allows you to copy objects from Dell ECS to MinIO which is rather popular with customers and prospects who are modernizing their storage stack to support their AI data infrastructure requirements.