Streamlining Data Streaming: A Guide to WarpStream and MinIO

Streamlining Data Streaming: A Guide to WarpStream and MinIO

While Apache Kafka is somewhat of an industry standard for streaming data, there are other options emerging in the ecosystem. Given the importance of streaming in the modern data lakehouse, we thought we would take a look at one of the new cool kids on the block – WarpStream. It should be noted that WarpStream is still “under development” in many ways - it is really cool, very simple and exceptionally cost-effective, but taking it to production for mission-critical workloads should be a carefully considered decision. 

Before we get to WarpStream, let's take a minute to explore the challenges that Kafka presents. There's no denying Kafka's foundational role in modern streaming, having revolutionized real-time data processing and reshaped our perspective on data architecture. However, it's important to acknowledge that Kafka, despite its groundbreaking nature when developed over a decade ago, has revealed certain complexities in implementation, management, and cost.

WarpStream is a Kafka protocol-compatible data streaming platform that is more cost-effective and simpler to manage than Kafka. It runs on top of object storage and comes as a single, easy-to-handle Go binary. Unlike Kafka, there's no need to deal with local file storage, broker balancing, or ZooKeeper operations.

Creating a streaming infrastructure with WarpStream and MinIO makes perfect sense because of the scale, performance and simplicity offered by MinIO. Whether you're dealing with massive datasets or intricate data pipelines, this combination of tools simplifies the management of your streaming architecture.

This tutorial will walk you through getting started with WarpStream and MinIO.

Introducing WarpStream: A Kafka-Compatible Alternative

WarpStream can be up to ten times cheaper than using Kafka in the cloud, especially for large-scale deployments where networking costs can be a significant part of the total expenses.

Whereas Kafka has brokers, WarpStream has agents. Agents are stateless Go binaries that are easy to scale to the size of your data workload. WarpStream only discovers agents within the same availability zone, further reducing networking costs. Built on top of S3-compatible storage, WarpStream leverages all the performance and scale that object storage provides. This positions WarpStream as a cloud-native alternative to Kafka, delivering scalability and cost-efficiency without the need to grapple with the complexities of JVM.

However, in return for scale, simplicity and cost savings, there is an increase in latency. WarpStream currently has a P99 latency from producer to consumer of about one second, whereas the latency of a well-tuned Kafka cluster can approach low double-digit ms. There are a few ways to mitigate this increased latency. First by choosing modern, high-performance S3-compatible storage (hint, hint, nudge, nudge). Second, by decreasing batchTimeout when configuring agents. The default for this option is 250 ms, but can be reduced as low as 50 ms. Decreasing increases object storage costs, but decreases latency. Finally, the free tier of WarpStream has been configured for higher latency and lower cost by default. When you’re ready for a production-level tier, the folks at WarpStream are more than happy to show you how to optimize your deployment. 

Configuring MinIO for Streaming Data

Start a rootless Docker container for a Single-Node Single-Drive MinIO Server by running the following command.

mkdir -p ${HOME}/minio/data

docker run \
   -p 9000:9000 \
   -p 9090:9090 \
   --user $(id -u):$(id -g) \
   --name minio1 \
   -v ${HOME}/minio/data:/data \ server /data --console-address ":9090"

You can interact with MinIO either through the MinIO Console or with mc. The following screenshots are for the MinIO Console. 

If you’re using the MinIO Console, create an access key in the Access Keys navigation panel. You cannot use your MinIO username and password for WarpStream. It is a best practice to provide each application with its own credentials to improve security and facilitate usage tracking and troubleshooting. 

Even when testing, avoid manually deleting files from MinIO once you've connected it with WarpStream. If you do, you'll need to create a new MinIO bucket before continuing.

Getting Started with WarpStream

As a warning, WarpStream is still in developer preview. This means that the instructions below might either change over time or be more appropriate for development or testing than a production installation. 

Begin by installing the WarpStream Agent using Docker, the installation script (recommended only for testing purposes), or by downloading the binary directly. 

Now run the WarpStream Demo agent with this command.

AWS_ACCESS_KEY_ID="h9BTrVjlZiwy6D9Cqo4l" \
warpstream demo -bucketURL "s3://<your-bucket>?region=us-east-1&s3ForcePathStyle=true&endpoint="
  • AWS_ACCESS_KEY_ID: You must use your MinIO access key, not your MinIO username. 
  • AWS_SECRET_ACCESS_KEY:  Your MinIO secret key, not your MinIO password.
  • bucketURL: 
    • <your-bucket>: Replace with the name of your bucket. 
    • Region: A required part of the url path. Use us-east-1.
    • ForcePathStyle: Set to true
    • Endpoint: The example endpoint will work for MinIO Server in a Docker container accessed through port 9000. Replace with your endpoint. Use the port number for the MinIO S3-API, not the Console. 

Once you’ve run the agent, you will be able to launch the WarpStream developer console. The link can be found in the terminal where you ran the run agent command.

The WarpStream developer console allows you to view the agents you have running and observe metrics. 

The warpstream demo command sets up a demo account with a playground that expires after 12 hours and an in-memory producer that periodically creates small JSON documents.  You can view your MinIO bucket as you work through the demo and observe the files that WarpStream creates. 

Deploying to Production

When you are ready to deploy to production, WarpStream suggests that you contact them first to discuss your architecture (and their business model). That being said, here are some tips and tricks to take the next step. First, WarpStream has provided these Helm charts to deploy to Kubernetes. Secondly, as you might expect, the configuration for running an agent changes when deploying to production. We’ve already discussed some ways to compensate for the higher latency of WarpStream, but there are other ways to increase performance for production. Finally, to deploy MinIO in production, please refer to the Deploy MinIO: Multi-Node Multi-Drive guide.

Unleash the Potential of Streamlined Data Streaming

While Kafka has undoubtedly revolutionized real-time data processing, we respectfully recognize the challenges that a Kafka-centric architecture presents, especially in terms of complexity and cost. In our opinion, WarpStream is a truly promising alternative to Kafka that overcomes many of Kafka's challenges. It's cost-effective and simplified, with tight integration to MinIO. If you're a regular reader of our blog, then you already know we love applications that are simple and built for object storage. While WarpStream may introduce a slight increase in latency, it offers an efficient and modern data streaming solution that is worth consideration when building real-time analytics and ML solutions.

If you have any questions or need further assistance, don't hesitate to contact us via email a or Slack. We're here to support your data journey. 

Previous Post Next Post