AI/ML - MinIO Blog (Page 3)

The Real Reasons Why AI is Built on Object Storage

Sidharth Rajaram @sidharrrrrth on AI/ML | 24 June 2024

The Real Reasons Why AI is Built on Object Storage

tl;dr: In this post, we will explore four technical reasons why AI workloads rely on high performance object store. 1. No Limits on Unstructured Data In the current paradigm of machine learning, performance and ability scales with compute, which is really a proxy for dataset size and model size (Scaling Laws for Neural Language Models, Kaplan et. al.). Over

The Architect’s Guide to the GenAI Tech Stack - Ten Tools

Keith Pijanowski Keith Pijanowski on AI/ML | 24 June 2024

This post first appeared on The New Stack on June 3rd, 2024. I previously wrote about the modern data lake reference architecture, addressing the challenges in every enterprise — more data, aging Hadoop tooling (specifically HDFS) and greater demands for RESTful APIs (S3) and performance — but I want to fill in some gaps. The modern data lake, sometimes referred to as

WARP speed your AI data storage Infrastructure

AJ AJ on AI/ML | 19 June 2024

Do you know the secret to some of the best AI models out there? It's the amount of data they had access to on which they could be trained on. For AI/ML models Fast accessible Data is King. Let me emphasize, it's not just Data, but fast accessible Data.

Dell ECS Data Movement to MinIO

AJ AJ on Cloud Repatriation | 5 June 2024

Dell ECS's “Data Movement”, also called copy-to-cloud is a feature introduced in ECS 3.8.0.1 that allows you to copy objects from Dell ECS to MinIO which is rather popular with customers and prospects who are modernizing their storage stack to support their AI data infrastructure requirements.

The Future of Hybrid Cloud Pipelines: Integrating MinIO, Tailscale, and GitHub Actions

David Cannan David Cannan on DevOps | 24 May 2024

Streamline your data processing capabilities, ensuring high-quality data management and secure operations. This integration not only enhances workflow automation but also leverages the advanced functionalities of MinIO and Tailscale, providing a powerful solution for modern data processing needs.

MinIO Audit Logs in ElasticSearch in Kubernetes

AJ AJ on AI/ML | 22 May 2024

Whether you are on-prem or in the Cloud, you want to ensure in the cloud operating model processes are set up in a homogenous way. This tutorial will give you a full overview of how you can surface MinIO audit logs in ElasticSearch so they can be searchable.

Model Training and MLOps using MLRun and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 20 May 2024

In my previous post on MLRun, we set up a development machine with all the tools needed to experiment with MLRun. Specifically, we used a docker-compose file to create containers for the MLRun UI, the MLRun API Service, Nuclio, MinIO, and a Jupyter service. Once our containers started, we ran a simple smoke test to ensure everything was working correctly.

Essentials for AI Infrastructure—the AI in Business Podcast with AB Periasamy and Matthew DeMello

Sasha Wodtke Sasha Wodtke on AI/ML | 16 May 2024

MinIO’s co-founder and CEO AB Periasamy was recently featured on the AI in Business Podcast where he had a rich conversation with Matthew DeMello—Senior Editor at Emerj—about AI infrastructure and object storage for enterprises. In this blog post, we take you through an abridged version of what was discussed. Let’s get into it. AB and Matthew

Setting Up A Development Machine with MLRun and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 10 May 2024

MLOps is to machine learning what DevOps is to traditional software development. Both are a set of practices and principles aimed at improving collaboration between engineering teams (the Dev or ML) and IT operations (Ops) teams. The goal is to streamline the development lifecycle, from planning and development to deployment and operations, using automation. One of the primary benefits of

Deploy MinIO and Trino with Kubernetes

AJ AJ on AI/ML | 8 May 2024

In this tutorial, we'll deploy a cohesive system that allows distributed SQL querying across large datasets stored in Minio, with Trino leveraging metadata from Hive Metastore and table schemas from Redis.

Manually Rebalance your MinIO Modern Datalake

AJ AJ on AI/ML | 1 May 2024

When a MinIO Modern Datalake deployment is extended by adding a new server pool, by default it does not rebalance objects. Lets dive deep and learn how to rebalance smoothly without affecting cluster operations.

Stateful KES for AI/ML Workloads

AJ AJ on AI/ML | 30 April 2024

Implementing KES within Kubernetes in a stateful configuration ensures the persistence of encryption keys through pod lifecycle events and restarts. This setup offers resilience especially in environments where relying on external KMS is not an option or preferred.

Migrating from Hadoop without Rip and Replace

Brenna Buuck

Brenna Buuck on Apache Hadoop | 29 April 2024

Migrating from Hadoop without Rip and Replace

Discover how to seamlessly migrate from HDFS to modern object storage without ripping out all of your current systems. Learn valuable strategies to retain essential tools and modernize your infrastructure for AI/ML.

Optimizing AI Data Processing with MinIO Weaviate and Langchain in Retrieval Augmented Generation (RAG) Pipelines

David Cannan David Cannan on AI/ML | 29 April 2024

Delve into AI’s next frontier with MinIO S3 Object-Store and SDK, enhancing a Weaviate Retreival Augmented Generation (RAG) Pipeline for robust data management. Discover how to elevate efficiency in AI systems using LangChain, unlocking new dimensions in scalable AI solutions.

Improve RAG Performance with Open-Parse Intelligent Chunking

Keith Pijanowski Keith Pijanowski on AI/ML | 24 April 2024

If you are implementing a generative AI solution using Large Language Models (LLMs), you should consider a strategy that uses Retrieval-Augmented Generation (RAG) to build contextually aware prompts for your LLM. An important process that occurs in the preproduction pipeline of a RAG-enabled LLM is the chunking of document text so that only the most relevant sections of a document

Navigating the Waters: Building Production-Grade RAG Applications with Data Lakes

Sam Cooper Sam Cooper on AI/ML | 11 April 2024

In mid-2024, creating an AI demo that impresses and excites can be easy. Take a strong developer, some clever prompt experimentation, and a few API calls to a powerful foundation model and you can often build a bespoke AI bot in an afternoon. Add in a library like langchain or llamaindex to augment your LLM with a bit of custom

Building Next-Gen Data Solutions: SingleStore, MinIO, and the Modern Datalake Stack

Brenna Buuck

Brenna Buuck on Modern Data Lakes | 9 April 2024

Building Next-Gen Data Solutions: SingleStore, MinIO, and the Modern Datalake Stack

Explore the integration of SingleStore, a high-performance cloud-native database, with MinIO in the Modern Datalake Stack. This tutorial provides hands-on experience in data storage, processing, and querying, fostering experimentation and innovation in data management, analytics, and AI workloads.

Building and Deploying a MinIO-Powered LangChain Agent API with LangServe

David Cannan David Cannan on AI/ML | 9 April 2024

Explore the exciting possibilities of leveraging MinIO and LangChain to create a robust and efficient agent capable of handling complex data processing tasks.

Towards Exascale AI Data Infrastructure

Rakshith Venkatesh Rakshith Venkatesh on Cloud Native | 2 April 2024

It's been just over a week for me here at MinIO. The big takeaway from immersing myself in whiteboarding sessions, architecture reviews and customer calls is that the simplicity of the product is both its distinguishing feature and one of its most defining value drivers. This is particularly true at scale. The explosive growth in computing power due

The Full Stack AI Engineer: A Modern-Day Polymath

Keith Pijanowski Keith Pijanowski on AI/ML | 2 April 2024

Anyone who has worked in a team environment knows that every successful team has one go-to person—that special individual who can help you regardless of the nature of your problem. On a traditional software development team, this individual is an expert programmer and is also an expert in one other technology, which could be a database technology like Snowflake