You can now perform S3 Delete operations using the MinIO Batch Framework to remove multitudes of objects with a single API request. The MinIO Batch Framework lets you quickly and easily perform repetitive or bulk actions like Batch Replication and Batch Key-Rotate across your MinIO deployment. The MinIO Batch Framework handles all the manual work, including managing retries and reporting
Read more
Joust against data complexity with LanceDB, a lightning-fast vector database optimized for AI/ML on the open-source Lance format. Teaming up with MinIO, it scales seamlessly, offering high-performance, cloud-native storage. Dive into the tutorial for a swift deployment.
Read more
With only a few days left in 2023 (who else can’t believe it?), we have been taking some time to look back on what an amazing year it’s been. There have been so many highlights. Whether it’s been the many awards, conferences, or meeting so many of you, we are eternally grateful!
The biggest part of MinIO
Read more
Over the past few months, I have written about a number of different technologies (Ray Data, Ray Train, and MLflow). I thought it would make sense to pull them all together and deliver an easy-to-understand recipe for distributed data preprocessing and distributed training using a production-ready MLOPs tool for tracking and model serving. This post integrates the code I presented
Read more
We have made the case for several years that in modern data stacks object storage is primary storage. This is even more true in the age of AI where enterprises focus almost exclusively on object storage. The modern data stack relies on disaggregated compute and storage alongside cloud-native microservices running in containers on Kubernetes. As more enterprises shift to this
Read more
Most machine learning projects start off as a single-threaded proof of concept where each task is completed before the next task can begin. The single-threaded ML pipeline depicted below is an example.
However, at some point, you will outgrow the pipeline shown above. This may be caused by datasets that no longer fit into the memory of a single process.
Read more
The calendar year 2023 will be a meaningful one, perhaps one of the most meaningful ones, when the history of AI is written. It was, in essence, the big bang.
It started in late 2022 with OpenAI’s ChatGPT but it was the response that was so breathtaking. Within months we had Meta’s LLaMA 2, Google’s Bard chatbot
Read more
Rising interest in super-fast analytical databases like ClickHouse Cloud and MotherDuck highlights the benefits of decoupling storage and compute. This architecture, exemplified in AI applications, enhances scalability, speed, and cost efficiency, and is driving a shift towards object storage.
Read more
Microsoft SQL Server 2022 is one of the most commonly implemented enterprise relational databases. Many of the world's most successful companies, regardless of vertical, have significant SQL Server deployments. Thousands of companies have relied on SQL Server for decades.
Microsoft has made great strides over the past decade in embracing open-source and standards-compliant technologies. The result is that
Read more
A MinIO cluster operates as a uniform cluster. This means that any request must be seamlessly handled by any server. As a consequence, servers need to coordinate between themselves. This has so far been handled with traditional HTTP RPC requests - and this has served us well.
Whenever server A would like to call server B an HTTP request would
Read more
In this post we’ll talk about what is an Airgapped Network, what to consider when deploying MinIO in such an environment and how to replicate and scale it thereafter with other airgapped sites.
Read more
Amid the fervor to adopt AI is a critical and often overlooked truth - the success of any AI initiative is intrinsically tied to the quality, reliability and performance of the underlying data infrastructure. If you don't have the proper foundation, you are limited in what you can build and therefore what you can achieve.
Your data infrastructure
Read more
The combination of StarRocks and MinIO offers a cloud-native, flexible, and efficient data architecture for modern enterprises, enabling independent scaling and optimized resource utilization. Read the full tutorial for insights into cloud-native analytics with StarRocks and MinIO
Read more
Explore the integration of Dockerized MinIO with localhost Flask apps. This guide addresses Docker networking challenges, ensuring seamless MinIO and Flask communication for a development environment that closely mirrors production. Dive into practical solutions for robust workflows.
Read more
In today’s post, we’ll go deeper into some of the considerations for long-term MinIO management that you need to take into account, so that when Day 2 does roll around 48 hours later you have all your ducks in a row.
Read more
There is an interesting report out from McKinsey on the impending impact of AI on an enterprise’s cloud investments.
There was a quote early on in the piece where McKinsey states:“While the possible impact varies by sector, adopting cloud represents an opportunity for the average company to increase profitability by 20 to 30 percent.”
To many, this would
Read more
Introduction
Distributed data processing is a key component of an efficient end-to-end distributed machine-learning training pipeline. This is true if you are building a basic neural network for statistical predictions where distributed training could mean each experiment runs in 10 minutes vs. an hour. It is also true if you are training or fine-tuning a Large Language Model (LLM) where
Read more
Discover how Databricks and Apache Iceberg's strides in open table formats influence data portability in the modern data stack. Learn how the shift to a private cloud operating model aligns with this evolution, fostering an adaptable, interoperable data ecosystem.
Read more
This post was written in collaboration with Amit Kesarwani from lakeFS.
The reality of running multiple machine learning experiments is that managing them can become unpredictable and complicated - especially in a team environment. What often happens is that during the research process, teams constantly change configuration and data between experiments. For example, try several training sets and several hyperparameter
Read more
As we were writing the blogs on Event Notifications and Object Lambda we came to a realization of why there are two different features doing almost the same thing? Or are they? What is the difference between the Greek Lambda and Lightning Bolt?
Read more