AI/ML - MinIO Blog (Page 4)

Architect’s Guide to a Reference Architecture for an AI/ML Datalake

Keith Pijanowski Keith Pijanowski on Architect's Guide | 26 March 2024

An abbreviated version of this post appeared on The New Stack on March 19th, 2024. In enterprise artificial intelligence, there are two main types of models: discriminative and generative. Discriminative models are used to classify or predict data, while generative models are used to create new data. Even though Generative AI has dominated the news of late, organizations are still

Powering AI/ML Innovation: Building Feature Stores with MinIO’s High-Performance Object Storage

David Cannan David Cannan on AI/ML | 12 March 2024

MinIO’s high-performance object storage is key for AI innovation, offering scalability and integration for feature stores. Its capabilities enable seamless ML workflows, enhancing data management for AI development and deployment, impacting sectors like e-commerce and healthcare.

MinIO Cache: A Distributed DRAM Cache for Ultra-Performance

Keith Pijanowski Keith Pijanowski on AI/ML | 12 March 2024

As the computing world has evolved and the price of DRAM has plummeted, we find that server configurations often come with 500GB or more of DRAM. When you are dealing with larger deployments, even those with ultra-dense NVMe drives, the number of servers multiplied by the DRAM on those servers can quickly add up – often to several TBs. That DRAM

Dynamic ETL Pipeline: Hydrate AI with Web Data for MinIO and Weaviate using Unstructured-IO

David Cannan David Cannan on AI/ML | 27 February 2024

Unstructured-IO, MinIO, & Weaviate redefine ETL, turning unstructured web data into actionable insights. This collaboration enhances data management, offering a robust solution for dynamic data transformation and analysis, marking a leap in how we process and leverage web-generated content.

Developing Langchain Agents with the MinIO SDK for LLM Tool-Use

David Cannan David Cannan on AI/ML | 20 February 2024

Explore Langchain’s LLM Tool-Use and leverage Langgraph for monitoring MinIO’s S3 Object Store. This guide walks you through developing custom conversational AI agents and creating powerful OpenAI LLM chains for efficient data management and enhanced application functionality.

Powering AI/ML workflows with GitOps Automation

David Cannan David Cannan on AI/ML | 13 February 2024

Explore the fusion of GitOps, MinIO, Weaviate, and Python in AI development for unparalleled automation and innovation. This combination offers a solid foundation for creating scalable, efficient, and automated AI solutions, propelling projects from concept to reality with ease.

Automated Data Prep for ML with MinIO's SDK

Brenna Buuck

Brenna Buuck on AI/ML | 8 February 2024

Automated Data Prep for ML with MinIO's SDK

This tutorial guides you through constructing robust data pipelines on the edge, ensuring flexibility and scalability. Learn to create, populate, and transform datasets seamlessly while prioritizing data privacy. Master the art of automation with MinIO's Python SDK.

Backing Up Weaviate with MinIO S3 Buckets

David Cannan David Cannan on AI/ML | 6 February 2024

Explore integrating MinIO with Weaviate using Docker Compose for AI-enhanced data management. Learn to back up Weaviate to MinIO S3 buckets, ensuring data integrity and scalability with practical Docker and Python examples. Streamline your AI-driven search and analysis with this robust setup.

SQL Server 2022 Machine Learning Services Unlock the Value of Your Data

Matt Sarrel

Matt Sarrel @msarrel on Integrations | 6 February 2024

SQL Server 2022 Machine Learning Services Unlock the Value of Your Data

Learn how to run Python stored procedures on SQL Server 2022.

MinIO and Apache Tika: A Pattern for Text Extraction

Sidharth Rajaram

Sidharth Rajaram @sidharrrrrth on AI/ML | 2 February 2024

MinIO and Apache Tika: A Pattern for Text Extraction

Tl;dr: In this post, we will use MinIO Bucket Notifications and Apache Tika, for document text extraction, which is at the heart of critical downstream tasks like Large Language Model (LLM) training and Retrieval Augmented Generation (RAG). The Premise Let’s say that I want to construct a dataset of text that I can then use to fine-tune an

Hungry GPUs Need Fast Object Storage

Keith Pijanowski Keith Pijanowski on AI/ML | 31 January 2024

A chain is as strong as its weakest link - and your AI/ML infrastructure is only as fast as your slowest component. If you train machine learning models with GPUs, then your weak link may be your storage solution. The result is what I call the “Starving GPU Problem.” The Starving GPU problem occurs when your network or your

Why Your Enterprise AI Strategy Is Likely to Fail in 2024: Model Down vs. Data Up

Jonathan Symonds Jonathan Symonds on AI/ML | 30 January 2024

I suspect some folks will accuse me of clickbait titling. Others will say, that’s not really a reach - most folks will fail in their initial AI attempts but it doesn’t matter and the learnings are worth it. On some level both are right - but I think WHY enterprises will fail is worth exploration and may allow

Innovating S3 Bucket Retrieval: Langchain Community S3 Loaders with OpenAI API

David Cannan David Cannan on AI/ML | 30 January 2024

Explore the synergy of MinIO, Langchain, and OpenAI in enhancing data storage and processing. This article illustrates MinIO’s integration for efficient document summarization using Langchain and OpenAI’s GPT, revolutionizing AI and ML data handling.

Data Before Models: The Unsung Heroes Who Unlock Real AI Results

Brenna Buuck

Brenna Buuck on AI/ML | 29 January 2024

Data Before Models: The Unsung Heroes Who Unlock Real AI Results

Explore the essential role of Data Engineers in unleashing the true power of AI! Data Engineers have a critical foundation in cleaning and structuring raw data for ML success. Learn why their expertise in data infrastructure, feature engineering, and pipeline optimization is indispensable.

The Strengths, Weaknesses and Dangers of LLMs

Sidharth Rajaram

Sidharth Rajaram @sidharrrrrth , Keith Pijanowski Keith Pijanowski on AI/ML | 25 January 2024

The Strengths, Weaknesses and Dangers of LLMs

Much has been said lately about the wonders of Large Language Models (LLMs). Most of these accolades are deserved. Ask ChatGPT to describe the General Theory of Relativity and you will get a very good (and accurate) answer. However, at the end of the day ChatGPT is still a computer program (as are all other LLMs) that is blindly executing

The Future of AI is Open-Source

Brenna Buuck

Brenna Buuck on Open Source | 15 January 2024

Explore the future of AI in an open-source landscape, challenging Big Tech's masked efforts. Learn how embracing extreme open innovation fosters collaboration, drives market growth, and sets the stage for an open-source AI data stack.

LanceDB: Your Trusted Steed in the Joust Against Data Complexity

Brenna Buuck

Brenna Buuck on Vector Database | 29 December 2023

LanceDB: Your Trusted Steed in the Joust Against Data Complexity

Joust against data complexity with LanceDB, a lightning-fast vector database optimized for AI/ML on the open-source Lance format. Teaming up with MinIO, it scales seamlessly, offering high-performance, cloud-native storage. Dive into the tutorial for a swift deployment.

Distributed Training and Experiment Tracking with Ray Train, MLflow, and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 28 December 2023

Over the past few months, I have written about a number of different technologies (Ray Data, Ray Train, and MLflow). I thought it would make sense to pull them all together and deliver an easy-to-understand recipe for distributed data preprocessing and distributed training using a production-ready MLOPs tool for tracking and model serving. This post integrates the code I presented

Distributed Training with Ray Train and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 20 December 2023

Most machine learning projects start off as a single-threaded proof of concept where each task is completed before the next task can begin. The single-threaded ML pipeline depicted below is an example. However, at some point, you will outgrow the pipeline shown above. This may be caused by datasets that no longer fit into the memory of a single process.

The Forest Amidst the Trees - The Takeaway from our AI Year

Jonathan Symonds Jonathan Symonds on AI/ML | 20 December 2023

The calendar year 2023 will be a meaningful one, perhaps one of the most meaningful ones, when the history of AI is written. It was, in essence, the big bang. It started in late 2022 with OpenAI’s ChatGPT but it was the response that was so breathtaking. Within months we had Meta’s LLaMA 2, Google’s Bard chatbot