Explore the integration of SingleStore, a high-performance cloud-native database, with MinIO in the Modern Datalake Stack. This tutorial provides hands-on experience in data storage, processing, and querying, fostering experimentation and innovation in data management, analytics, and AI workloads.
Read more
An abbreviated version of this post appeared on The New Stack on March 26th, 2024.
Businesses aiming to maximize their data assets are adopting scalable, flexible, and unified data storage and analytics approaches. This trend is driven by enterprise architects tasked with crafting infrastructures that align with evolving business demands. A Modern Datalake architecture addresses this need by integrating the
Read more
An abbreviated version of this post appeared on The New Stack on March 19th, 2024.
In enterprise artificial intelligence, there are two main types of models: discriminative and generative. Discriminative models are used to classify or predict data, while generative models are used to create new data. Even though Generative AI has dominated the news of late, organizations are still
Read more
Discover the latest trend in databases: Disaggregation 2.0. Tomasz Tunguz's insightful post on LinkedIn explores how databases are evolving into high-speed query engines, shedding traditional storage constraints. Embrace flexible, performance-driven architectures.
Read more
Unlock the power of modern datalakes with Hudi, MinIO, and HMS. Seamlessly integrate these technologies for enhanced data governance. Set up your own cloud-native datalake and explore it with Spark.
Read more
Explore modern data architecture with Iceberg, Tabular, and MinIO. Learn to seamlessly integrate structured and unstructured data, optimize AI/ML workloads, and build a high-performance, cloud-native data lake.
Read more
Tl;dr:
In this post, we will use MinIO Bucket Notifications and Apache Tika, for document text extraction, which is at the heart of critical downstream tasks like Large Language Model (LLM) training and Retrieval Augmented Generation (RAG).
The Premise
Let’s say that I want to construct a dataset of text that I can then use to fine-tune an
Read more
Amid the fervor to adopt AI is a critical and often overlooked truth - the success of any AI initiative is intrinsically tied to the quality, reliability and performance of the underlying data infrastructure. If you don't have the proper foundation, you are limited in what you can build and therefore what you can achieve.
Your data infrastructure
Read more
The combination of StarRocks and MinIO offers a cloud-native, flexible, and efficient data architecture for modern enterprises, enabling independent scaling and optimized resource utilization. Read the full tutorial for insights into cloud-native analytics with StarRocks and MinIO
Read more
Discover how Databricks and Apache Iceberg's strides in open table formats influence data portability in the modern data stack. Learn how the shift to a private cloud operating model aligns with this evolution, fostering an adaptable, interoperable data ecosystem.
Read more
Unleash data collaboration and quality with Nessie! Learn to manage branches, commits, and merges effortlessly. This guide walks you through deploying Dremio, MinIO, and Nessie, transforming your data engineering with collaborative precision. Dive in to revolutionize your workflows!
Read more
Unlock the secrets of modern datalakes migration to the private clouds. Embrace S3 compatibility, data control, and the ever-evolving landscape for cost-effective data management. Don't miss the journey to enhanced flexibility, efficiency, and the future-proofing of your data ecosystem
Read more
Build a streaming Change Data Capture (CDC) pipeline with Redpanda and MinIO into Snowflake. This solution simplifies data migration and analytics, with Redpanda offering scalability, MinIO as efficient storage, and Snowflake as a cloud-native analytics engine.
Read more
Confluent, Intel and MinIO conducted benchmarking and certification testing for MinIO Tiered Object Storage for Kafka storage. This blog post describes the observations and results of testing MinIO object storage as a backend for the tiered storage feature of Confluent Platform 7.1.0 on servers equipped with third generation Intel Xeon Scalable processors. The scope of these tests was
Read more
Enterprises rely on data to make decisions. Effective decision-making hinges on the accuracy, timeliness, availability, and security of data. Data consistency, an important factor that cannot be ignored when purchasing storage, involves ensuring that all relevant parties can immediately access the results of a database transaction once it has been finalized, either through commitment or rollback. This guarantees that everyone
Read more
Some of the smartest minds in philanthropy are backing the concept of a simple yet powerful national ID system. The Bill and Melinda Gates Foundation, the Tata Trusts, the Omidyar Network and the Pratiksha Trust have all gotten involved with this movement because of its foundational capabilities for enabling a wide range of social programmes. They have put their resources
Read more
Enterprise customers use MinIO to build data lakehouses to store a wide variety of structured and unstructured data, and work with it using ML and analytics. Data flows into MinIO from across the enterprise and the S3 API allows applications, such as analytics and AI/ML to work with it.
I previously blogged about building data pipelines with SAP Data
Read more
With MinIO, enterprises are not forced to make a choice. They can literally use FTP and SFTP to move that data into an S3-like data store. It is the principle of AND not OR.
Read more
Tap into unlimited amounts of valuable enterprise data with SAP Cloud and MinIO.
Read more
Build data pipelines with S3 to MinIO and MinIO to MinIO batch replication.
Read more