The power of scale is well-documented in the world of business.
Cloud providers - Amazon in particular - have amassed extraordinary scale in a very short period of time. The cloud providers are now using this scale to rearchitect how enterprises interact with their data. They are remaking the enterprise data landscape with two primary levers: price and performance.
Let’s look at each in turn.
On the pricing front, cloud providers are creating a massive incentive to store data on object storage - not file or block. Their strategy is published for all to see. Here is the standard price for Amazon S3 pulled from their website:
Compare that with Amazon’s EFS pricing. Here, their well-regarded File System is priced at 13X what S3/object storage is priced.
The same price differential can be found for EBS.
Here we see a 4.6X premium to run block vs. object storage. Again, these are significant costs and will add up quickly over time. For a 100 TB instance (small in the S3 world, but big in the block world) the difference would be $112K a year assuming no other charges. You would need a good reason to do that - even if you were a large enterprise.
The goal for Amazon is to migrate customers to their core capability (S3) where they have market power, product and technology leadership (the S3 API) and scale. Tremendous scale.
This pattern is replicated at Google and at Azure.
For Google the File is 8X more expensive than Object and Block is 10X more expensive than object.
Azure repeats the trend, with File 6X more expensive than Object (there is not a good apples to apples Block comparison).
Price, however, wouldn’t be enough to incentivize enterprises to move. The performance has to be there too - and it is.
We will use Amazon again, given their leadership role. Amazon’s S3 service is plenty fast on Presto benchmarks and you can see that performance in this paper. The net of it is that for Spark, Presto and other analytical frameworks, the global service that is Amazon can bring extraordinary scale to the problem - resulting in superb performance.
While we are more performant on Presto, we don’t want to take away from the fact that S3, Blob, and GCP Object can all be tuned or configured to dramatically narrow, even eliminate, the performance gap traditionally associated with File and Block. With performance off the table, price becomes the next consideration - and as we have seen, it is not even close.
There is actually a third element beyond price and performance, and that is a suite of modern features. Cloud-native object storage (not to be confused with the appliance vendors) is new, has a modern API, supports modern applications, microservices and architectures like Docker and Kubernetes. Cloud native object storage has HTTP RESTful API support, S3 Select and end-to-end data integrity and encryption.
File and Block are legacy approaches. They employ a POSIX API that, if modernized, effectively loses most of it appeal (compatibility). In fact, any modernization of POSIX would most likely end up looking like S3, which took a reduced set of atomic, immutable POSIX file APIs, before adding the missing features mentioned above.
These two levers, price and performance, are what the cloud is using to remake the data storage landscape. The modern APIs and integration elements complete the picture. This is how the cloud providers are slowly but surely squeezing the life out of the traditional file and block providers.
MapR’s extinction is just the beginning. Cloudera is likely next. There will be others in the coming years as the lifeblood of file and block - data - ends up in object storage buckets.
Don’t hesitate to let us know if you agree or disagree - you can find us on Twitter @minio and on LinkedIn. You can also reach us on firstname.lastname@example.org as well.