Five Strata NY Takeaways: Thoughts on Data's Marquee Event

With another Strata in the rearview mirror, it is time to reflect on what we saw and heard during the week. Strata is clearly a data science show at this point but data science is broad topic. Our perspective, as a provider of high performance object storage, is framed accordingly and we focus on the data stack more than we do individual algorithms.

Here are five takeaways from data’s marquee event:

Cloudera may claim Hadoop isn’t dead, but their product roadmap suggests otherwise. Philosophical positions have given way to practical one - and Cloudera is dumping the baggage associated with the co-location of storage and compute along with other legacy implementations like YARN. While admirable, it may be too late. The viability of using object storage as a replacement for Hadoop was a part of every conversation with a large enterprise. MapR isn’t helping things - those clients are running for the door as quickly as their remaining employees.
People are still getting their arms around the possibilities associated with ultra-high performance object storage. When data teams see our NVMe numbers for the first time they are blown away. The ability to push ~40GB/s on Read/Write is a legitimate game changer - it brings Spark, Presto, TensorFlow and H2O.ai to the object storage world. This has never been done before - not because it wasn’t wanted, but because it wasn’t possible with legacy, appliance-oriented object storage.
Kubernetes has won. Everyone is talking about introducing, integrating or operationalizing this transformative approach. If you are not - it's because you can’t, meaning you are obsolete. While Kubernetes gets all the press, the truth of the matter is that the rapid evolution of the microservices stack is just as, if not more important. There are four major players in the Kubernetes space when it comes to object storage: Amazon, Google, Microsoft and MinIO. Just ask VMware.
Presto is on the rise. While Spark is still the leading data processing framework, more and more attendees are talking about and asking about Presto given its speed on SQL queries. This is a function of the fact that SQL remains the lingua franca of data science and is resurgent in its popularity. This bodes well for other MPP-oriented approaches like Vertica, Greenplum, Teradata and Splunk that take advantage of SQL.
Open Source is a big plus in the enterprise. This has nothing to do with cost (total or otherwise) and everything to do with scale and resiliency. Strong open source software projects have exceptional reach and that means they are hardened by deployment - from scale to security. We were humbled to have taken home the prize for the Most Significant Open Source Project - given there were so many different entries. It speaks to our scale, our community and the growing recognition that the best software is truly open - not thinly veiled proprietary licenses.

These are exciting times for us as a company and the reception we received at Strata really accentuated that for us. The number of people and companies that engaged us was up significantly from San Francisco and the number of companies running MinIO already was by our unscientific approximation - at least 50% higher.
If you haven’t become part of the movement - do so now. You can download the code, join our Slack channel or just communicate with us on hello@min.io. We are in it to win it when it comes to high-performance, private cloud object storage and that means making you successful so don’t be shy about engaging us as you build your private cloud.