The Strengths, Weaknesses and Dangers of LLMs

The Strengths, Weaknesses and Dangers of LLMs

Much has been said lately about the wonders of Large Language Models (LLMs). Most of these accolades are deserved. Ask ChatGPT to describe the General Theory of Relativity and you will get a very good (and accurate) answer. However, at the end of the day ChatGPT is still a computer program (as are all other LLMs) that is blindly executing

Read more...

Distributed Training and Experiment Tracking with Ray Train, MLflow, and MinIO

Distributed Training and Experiment Tracking with Ray Train, MLflow, and MinIO

Over the past few months, I have written about a number of different technologies (Ray Data, Ray Train, and MLflow). I thought it would make sense to pull them all together and deliver an easy-to-understand recipe for distributed data preprocessing and distributed training using a production-ready MLOPs tool for tracking and model serving. This post integrates the code I presented

Read more...

Data Science and AI with a SQL Server 2022 Data Lakehouse

Data Science and AI with a SQL Server 2022 Data Lakehouse

Microsoft SQL Server 2022 is one of the most commonly implemented enterprise relational databases. Many of the world's most successful companies, regardless of vertical, have significant SQL Server deployments. Thousands of companies have relied on SQL Server for decades. Microsoft has made great strides over the past decade in embracing open-source and standards-compliant technologies. The result is that

Read more...

Distributed Data Processing with Ray Data and MinIO

Distributed Data Processing with Ray Data and MinIO

Introduction Distributed data processing is a key component of an efficient end-to-end distributed machine-learning training pipeline. This is true if you are building a basic neural network for statistical predictions where distributed training could mean each experiment runs in 10 minutes vs. an hour. It is also true if you are training or fine-tuning a Large Language Model (LLM) where

Read more...

AI/ML Reproducibility with lakeFS and MinIO

AI/ML Reproducibility with lakeFS and MinIO

This post was written in collaboration with Amit Kesarwani from lakeFS. The reality of running multiple machine learning experiments is that managing them can become unpredictable and complicated - especially in a team environment. What often happens is that during the research process, teams constantly change configuration and data between experiments. For example, try several training sets and several hyperparameter

Read more...

Generative AI for the Enterprise

Generative AI for the Enterprise

Introduction Generative AI represents the latest technique an enterprise can employ to unlock the data trapped within its boundaries. The easiest way to conceptualize what is possible with Generative AI is to imagine a customized Large Language Model - similar to the one powering ChatGPT - running inside your firewall. Now, this custom LLM is not the same as the

Read more...

An Unintended Consequence of the AI/ML Revolution - Power Shifts in the Enterprise

An Unintended Consequence of the AI/ML Revolution - Power Shifts in the Enterprise

A lot of ink has been spilled on the significance of the AI/ML technology wave (here are our posts). What doesn’t get attention, but probably should, is how AI/ML is remaking the technology power structure inside the enterprise. As companies reorganize around a data-centric orientation, they are also reorganizing who makes and executes the technology architecture. While

Read more...

Creating an ML Scenario in SAP Data Intelligence Cloud to Read and Model Data in MinIO

Creating an ML Scenario in SAP Data Intelligence Cloud to Read and Model Data in MinIO

Enterprise customers use MinIO to build data lakehouses to store a wide variety of structured and unstructured data, and work with it using ML and analytics. Data flows into MinIO from across the enterprise and the S3 API allows applications, such as analytics and AI/ML to work with it.   I previously blogged about building data pipelines with SAP Data

Read more...

Object Detection Made Simple with MinIO and YOLO

Object Detection Made Simple with MinIO and YOLO

Tl;dr: In this post, we will create a custom image dataset and then train a You-Only-Look-Once (YOLO) model for the ubiquitous task of object detection. We will then implement a system using MinIO Bucket Notifications that can automatically perform inference on a new image. Introduction: Computer vision remains an extremely compelling application of artificial intelligence. Whether it’s recognizing

Read more...

A Developer’s Introduction to Apache Iceberg using MinIO

A Developer’s Introduction to Apache Iceberg using MinIO

Introduction Open Table Formats (OTFs) are a phenomenon in the data analytics world that has been gaining momentum recently. The promise of OTFs is as a solution that leverages distributed computing and distributed object stores to provide capabilities that exceed what is possible with a Data Warehouse.  The open aspect of these formats gives organizations options when it comes to

Read more...