Benchmarking MinIO vs. AWS S3 for Apache Spark

Benchmarking MinIO vs. AWS S3 for Apache Spark

Apache Spark is a framework for distributed computing. It provides one of the best mechanisms for distributing data across multiple machines in a cluster and performing computations on it. Spark achieves this by constructing data structures called RDDs (Resilient Distributed Datasets). RDDs allow data to be broken into disparate chunks and processed independently of one another. The individual chunks can

Read more...

Containerized data analytics at scale, with MinIO and Pachyderm

Containers running on orchestration platforms like Kubernetes, Docker Swarm, DC/OS et al. offer powerful, versatile ways to deploy applications. Containers let you deploy isolated application instances, and you can launch multiple such instances to scale up your load serving capacity. You don’t even need to worry about individual server capacities and scheduling thanks to orchestration tool, which provide

Read more...