Keith Pijanowski - MinIO Blog

Agentic AI with Model Context Protocol and AIStor

Keith Pijanowski Keith Pijanowski on AI Agents | 27 May 2025

The Model Context Protocol (MCP) from Anthropic represents a unique approach to Agentic AI tooling as compared to many of its competitors. Rather than building a framework (software that calls your code) or a library (software that your code can call), MCP focuses on the protocol needed for different parts of an agent to communicate with each other. This has

NVIDIA GTC 2025 Wrap-up: 18 New Products to Watch

Keith Pijanowski Keith Pijanowski on | 19 May 2025

This post first appeared on The New Stack on April 18th, 2025. Get a comprehensive summary of the major compute, networking, storage and partnership announcements from NVIDIA’s biggest event of the year. If you follow the tech news, you have read a lot about NVIDIA and its graphics processing units (GPUs). However, it would be incorrect to conclude that

MLflow Model Registry and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 14 March 2025

MLflow Model Registry allows you to manage models that are destined for a production environment. This post picks up where my last post on MLflow Tracking left off. In my Tracking post I showed how to log parameters, metrics, artifacts, and models. If you have not read it, then give it a read when you get a chance. In this

Deploying Models to Kubernetes with AIStor, MLflow and KServe

Keith Pijanowski Keith Pijanowski on AI/ML | 28 February 2025

In several previous posts on MLOps tooling, I showed how many popular MLOps tools track metrics associated with model training experiments. I also showed how they use MinIO to store the unstructured data that is a part of the model training pipeline. However, a good MLOps tool should do more than manage your experiments, datasets, and models. It should be

The Architect’s Guide to Understanding Agentic AI

Keith Pijanowski Keith Pijanowski on AI/ML | 3 February 2025

This post first appeared on The New Stack on January 16th, 2025. Often, while accessing the legitimacy of a new technology receiving a lot of hype, studying existing core capabilities and history is helpful. If the new technology in question is not based on existing or imminent capabilities, we can label it as “hype” and move on. Another litmus test

Mitigating Geopolitical Concerns with a Sovereign Private Cloud

Jelte Eshuis Jelte Eshuis , Keith Pijanowski Keith Pijanowski on Private Cloud | 31 January 2025

2025 has inherited a slew of geopolitical concerns that started years ago. U.S. Foreign policy, U.S. - China Relations, China’s geopolitical maneuvers, Conflicts in the Middle East, Russian Ukraine war, and cybersecurity threats. Additionally, new leadership in the United States adds to the uncertainty created by these concerns. And, as if all this were not enough, the

Model Checkpointing using Amazon’s S3 Connector for PyTorch and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 17 January 2025

In November of 2023, Amazon announced the S3 Connector for PyTorch. The Amazon S3 Connector for PyTorch provides implementations of PyTorch's dataset primitives (Datasets and DataLoaders) that are purpose-built for S3 object storage. It supports map-style datasets for random data access patterns and iterable-style datasets for streaming sequential data access patterns. The S3 Connector for PyTorch also includes

The Innovations from AWS re:Invent

Keith Pijanowski Keith Pijanowski on AI/ML | 31 December 2024

Earlier this month, Amazon held their re:Invent conference in Las Vegas, Nevada, from December 1st to 5th - a 5-day event. If you have never been to a re:Invent conference, then the word that describes it best is “huge” - not just in terms of the number of attendees (60,000) but also the breadth of topics covered.

Iterable-Style Datasets using Amazon’s S3 Connector for PyTorch and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 23 December 2024

In November of 2023 Amazon announced the S3 Connector for PyTorch. The Amazon S3 Connector for PyTorch provides implementations of PyTorch's dataset primitives (Datasets and DataLoaders) that are purpose-built for S3 object storage. It supports map-style datasets for random data access patterns and iterable-style datasets for streaming sequential data access patterns. In a previous post, I introduced the

GPU Trends and What It Means to Your AI Infrastructure

Keith Pijanowski Keith Pijanowski on AI/ML | 27 November 2024

Almost a year ago (actually 11 months ago), I wrote about the “Starving GPU Problem” and how the horsepower of Nvidia’s Graphic Processing Units (GPUs) could be so powerful that your network and your storage solution may not be able to keep up - preventing your expensive GPUs from being fully utilized. Well, in those short 11 months, a

Revolutionizing Mobile Testing with Big Data and AI

Keith Pijanowski Keith Pijanowski on Case Study | 21 November 2024

A mobile application is a company's brand available on demand. It is a window into any service or product an organization offers. At Kobiton, they understand this—it is their mission to improve mobile applications through testing. Kobiton is a mobile testing platform that allows customers to perform manual and automated testing on real mobile devices from anywhere

AIHub: A Private Cloud Repository for Models and Datasets

Keith Pijanowski Keith Pijanowski on AIStor | 13 November 2024

One of the newest features of the AIStor is a private cloud version of the highly popular, open-source project, Hugging Face. This post details how AIStor’s AIHub effectively creates an API compatible, private cloud version of Hugging Face that is fully under the enterprise's control. Before we get started, it makes sense to introduce Hugging Face. Hugging

Map-Style Datasets using Amazon’s S3 Connector for PyTorch and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 31 October 2024

Before diving into Amazon’s S3 Connector for PyTorch, it is worthwhile to introduce the problem it is intended to solve. Many AI models need to be trained on data that cannot fit into memory. Furthermore, many really interesting models being built for computer vision and generative AI use data that cannot even fit on the disk drive that comes

An Easier Path to Scalable AI: Intel Tiber Developer Cloud + MinIO Object Store

Keith Pijanowski Keith Pijanowski on AI/ML | 24 October 2024

One of the biggest challenges facing organizations today for AI and data management is access to reliable infrastructure and compute resources. The Intel Tiber Developer Cloud is purpose-built for engineers who need an environment for proof-of-concepts, experimentation, model training, and service deployments. Unlike other clouds, which can be unapproachable and complex, the Intel Tiber Developer Cloud is simple and easy

Replication, Data Consolidation, and Data Migration

Keith Pijanowski Keith Pijanowski on Case Study | 24 October 2024

Parsec Labs is a company of engineers. Most have designed storage systems, been responsible for backups and replication, or worked in networking building switches. Founded in 2013, their Unified Data Mobility and Protection Appliance provides the most straightforward tools for migrating, replicating, and backing up data at scale. A Common Request As a one-time pre-sales engineer, Mark Clark, CEO of

Microblink: Repatriating Compute and Storage with MinIO

Keith Pijanowski Keith Pijanowski on Case Study | 29 August 2024

Microblink is an AI company specializing in image detection. They got their start in the identity space with products like BlinkID, BlinkID Verify, and BlinkCard. Most recently, their image detection capabilities have led to products that can process other types of images. For example, product detection can be performed on receipts, whereby product descriptions on a receipt are used to

Open Source or Closed? The AI Dilemma

Keith Pijanowski Keith Pijanowski on AI/ML | 26 August 2024

This post first appeared on The New Stack on July 29th, 2024. Artificial Intelligence is in the middle of a perfect storm in the software industry, and now Mark Zuckerberg is calling for open-sourced AI. Three powerful perspectives are colliding on how to control AI: 1. All AI should be open-source for sharing and transparency. 2. Keep AI closed-source and

Build a Distributed Embedding Subsystem with MinIO, Langchain, and Ray Data

Keith Pijanowski Keith Pijanowski on AI/ML | 29 July 2024

An embedding subsystem is one of four subsystems needed to implement Retrieval Augmented Generation. It turns your custom corpus into a database of vectors that can be searched for semantic meaning. The other subsystems are the data pipeline for creating your custom corpus, the retriever for querying the vector database to add more context to a user query, and finally,

Data-Centric AI with Snorkel and MinIO

Keith Pijanowski Keith Pijanowski on AI/ML | 10 July 2024

With all the talk in the industry today regarding large language models with their encoders, decoders, multi-headed attention layers, and billions (soon trillions) of parameters, it is tempting to believe that good AI is the result of model design only. Unfortunately, this is not the case. Good AI requires more than a well-designed model. It also requires properly constructed training

The Architect's Guide to Machine Learning Operations (MLOps)

Keith Pijanowski Keith Pijanowski on AI/ML | 28 June 2024

MLOps, short for Machine Learning Operations, is a set of practices and tools aimed at addressing the specific needs of engineers building models and moving them into production. Some organizations start off with a few homegrown tools that version datasets after each experiment and checkpoint models after every epoch of training. On the other hand, many organizations have chosen to

MinIO Blog Posts by Keith Pijanowski