Benchmarking MinIO vs. AWS S3 for Apache Spark

Benchmarking MinIO vs. AWS S3 for Apache Spark

Apache Spark is a framework for distributed computing. It provides one of the best mechanisms for distributing data across multiple machines in a cluster and performing computations on it. Spark achieves this by constructing data structures called RDDs (Resilient Distributed Datasets). RDDs allow data to be broken into disparate chunks and processed independently of one another. The individual chunks can

Read more...

S3 Benchmark: MinIO on NVMe

S3 Benchmark: MinIO on NVMe

Well written software is fast software. When MinIO was conceived it was designed from scratch to be simple, to scale (because simple things scale better) and to be fast. Simplicity and scale have their own subjective and objective measures - but fast is generally a numbers game. When you take well-written, fast software and pair it with fast hardware the

Read more...

S3 Benchmark: MinIO on HDDs

S3 Benchmark: MinIO on HDDs

High performance object storage is one of the hotter topics in the enterprise today. On the one hand, object storage has become an indispensable part of the enterprise storage strategy (public or private cloud) - carrying the vast, vast majority of the enterprise burden when measured in TBs or PBs. On the other hand, object storage has traditionally served a

Read more...

Open Source = Bombproof

Open Source = Bombproof

Software isn't usually described as bombproof. Particularly the type of software that is responsible for large analytic jobs or machine learning workloads. The words “finicky”, “complex” or in the case of good marketing “professional grade” (meaning you need years of study and multiple certifications) are more common. Bombproof software, however, is one of the many benefits associated with

Read more...

Introducing Spark-Select for MinIO Data Lakes

Introducing Spark-Select for MinIO Data Lakes

When early object storage APIs were developed they focused on the efficient storage and retrieval of objects. Amazon’s success with S3 and its implementation of the robust S3 API quickly became the de facto standard for object storage in the cloud. MinIO, recognizing this, invested heavily in creating the most compliant implementation of the S3 API outside of Amazon.

Read more...

MinIO updates from KubeCon

There has been a lot going on MinIO server development front. We recently added support for * Disk Caching [https://docs.minio.io/docs/minio-disk-cache-guide] * Large/Petascale buckets [https://docs.minio.io/docs/minio-large-bucket-support-quickstart-guide] * Storage Classes [https://github.com/minio/minio/tree/master/docs/erasure/storage-class] While features like MinIO bucket federation [https://github.com/minio/minio/pull/5501], revamped MinIO

Read more...

MinIO, the ZFS of cloud storage

ZFS is best known for abstracting away the physical storage device boundaries by pooling them together. ZFS completely removed the need to manually handle physical storage or worry about their individual capacities. ZFS is also a pioneer in its ability to detect data corruption and recover if data redundancy is available. However, as we already discussed [https://blog.minio.io/

Read more...