Spark Structured Streaming With Kafka and MinIO

Kafka and Spark Structured Streaming are used together to build data lakes/lake houses fed by streaming data and provide real time business insights.
Read more...Kafka and Spark Structured Streaming are used together to build data lakes/lake houses fed by streaming data and provide real time business insights.
Read more...In this blog post, we will build a Notebook that uses MinIO as object storage for Spark jobs to manage Iceberg tables.
Read more...Apache Spark and MinIO are powerful tools for data lakes and analytics. Learn how to run them in Kubernetes.
Read more...The Apache Iceberg data lake storage format enables ACID transactions on tables saved to MinIO. ACID transactions enable multiple users and services to concurrently and reliably add and remove records atomically. At the same time, queries are isolated to maintain read consistency against tables that are in the process of being altered. You can put MinIO and Iceberg, in conjunction
Read more...Learn how to build a multi-cloud data lake with the Delta open storage format and MinIO object storage.
Read more...Migrate data from HDFS to MinIO and enjoy the benefits of cloud-native architecture.
Read more...With another Strata in the rearview mirror, it is time to reflect on what we saw and heard during the week. Strata is clearly a data science show at this point but data science is broad topic. Our perspective, as a provider of high performance object storage, is framed accordingly and we focus on the data stack more than we
Read more...Apache Spark is a framework for distributed computing. It provides one of the best mechanisms for distributing data across multiple machines in a cluster and performing computations on it. Spark achieves this by constructing data structures called RDDs (Resilient Distributed Datasets). RDDs allow data to be broken into disparate chunks and processed independently of one another. The individual chunks can
Read more...High performance object storage is one of the hotter topics in the enterprise today. On the one hand, object storage has become an indispensable part of the enterprise storage strategy (public or private cloud) - carrying the vast, vast majority of the enterprise burden when measured in TBs or PBs. On the other hand, object storage has traditionally served a
Read more...When early object storage APIs were developed they focused on the efficient storage and retrieval of objects. Amazon’s success with S3 and its implementation of the robust S3 API quickly became the de facto standard for object storage in the cloud. MinIO, recognizing this, invested heavily in creating the most compliant implementation of the S3 API outside of Amazon.
Read more...In this post we’ll learn more about object storage, specifically Minio and then see how to connect Minio with tools like Apache Spark and Presto for analytics workloads.
Read more...In the first part of this two post series, we’ll take a look at how object storage is different from other storage approaches and why it makes sense to leverage object storage like Minio for data lakes.
Read more...One of the major requirements for success with IoT strategy is the ability to store and analyze device and sensor data. As IoT brings thousands of devices online everyday, the data being generated by all these devices combined is reaching staggering levels. > Storing the IoT data in a scalable yet cost effective manner, while being able to analyze it easily
Read more...This is a guest blog from our friends at Guardant Health [http://www.guardanthealth.com/]. Guardant Health is the world leader in comprehensive liquid biopsy. Oncologists order our blood test to help determine if their advanced cancer patients are eligible for certain drugs that target specific genomic alterations in tumour DNA. Each test produces huge amounts of genomic data that
Read more...