A few months back, TDWI's James Powell sat down with MinIO co-founder and CEO AB Periasamy to talk about trends and challenges in the data space. The original interview (from TDWI's Upside) can be found here, but we include it below for posterity.
Upside: What technology or methodology must be part of an enterprise's data strategy if it wants to be competitive today? Why?
Anand Babu Periasamy: Mastering machine learning and AI has to be at the top of the list. Having said that, AI and ML mastery is a journey and will take time, not just from a skill acquisition perspective but from a business execution perspective.
What enterprises need to do today is to develop the foundational data fabric to support that ambition. Any at-scale ML/AI ambition will require object storage. Object storage is the de facto storage of the cloud and is the de facto storage for production-grade ML/AI. You can do sandbox work with block and file, but in production settings the entire ecosystem revolves around object.
It is important to note that when I say object, I don't mean old school appliance vendors offering data backup and recovery, I mean high-performance, cloud-native, S3-compatible object storage that can be run in containers managed by Kubernetes and lends itself to the microservices architecture that defines the modern DevOps environment.
This is the most important technology for competitiveness. It pays dividends today (superior economics, Hadoop-like speed) and will insure the enterprise for at least a decade. Further, pursuing modern object storage provides a critical bridge for the hybrid cloud reality. Now on-premises environments can look and perform like the public cloud, providing exceptional optionality to future technology strategy.
What one emerging technology are you most excited about and think has the greatest potential? What's so special about this technology?
We believe deeply in Kubernetes. It is more than a technology -- it is a different approach to the build/package/deploy framework and is expressly designed for an environment of continuous change. It abstracts the physical infrastructure from the application stack in a way that facilitates the collaboration between development, operations, and IT. This is why entire companies (VMware, for example) are turning the ship to embrace the technology.
What is the single biggest challenge enterprises face today? How do most enterprises respond (and is it working)?
The single biggest challenge is the amount of data they need to manage. Every problem, every single one a CEO talks about, is related to data: how you store it, how you extract value from it, how long you keep it, how you secure it, how you democratize access to it. Everything revolves around data.
Enterprises are doing a mediocre job at this task. Every CIO/CTO/CEO/CFO survey basically says the same thing; "We generally know what we want but we can't seem to execute against it consistently and at scale."
At the heart of the problem is tribalism. DevOps does not have a flattering view of IT. IT doesn't think DevOps understands their mandate or their security requirements. The line-of-business folks continue to ask "Why can't we just…." without understanding the difficulty and implications. Those same business folks also hoard data for political reasons. Data science is a new tribe and they have their own tools, biases, and agendas.
The result is shadow IT. Shadow data science. Massive duplication and inefficiency. Enterprises spend too much time trying to herd the cats and not enough time taking several steps back and asking "What should my architecture look like to deliver x…?"
There are a handful of companies doing that today and they will serve as the model going forward. They are building an architecture for the next decade, not Band-Aiding the architecture they have. Those companies realize, at their core, that no matter what the sign outside says -- be it bank, manufacturing, studio, or communications -- they are a data company. The successful companies think data first. Everyone says customer first, but think about it, every customer interaction is a data event.
Google, as you might imagine, has a good model. They have product managers for data. These product managers take on the unique strategic and tactical decision making that comes with building new data products and creating new data architectures. They are incented to drive access and consumption of their data products. This drives collaboration with other data product managers and pulls in IT and DevOps to solve problems together.
Is there a new technology in data and analytics that is creating more challenges than most people realize? How should enterprises adjust their approach to it?
Kubernetes, as noted earlier, is sweeping through the enterprise. This is a double-edged sword. I talked about the benefits a moment ago, but the other side is that many existing technologies and roles will become obsolete as the enterprise adopts this new paradigm. The traditional model of IT simply isn't compatible with the Kubernetes architecture. By traditional, we mean the purchase of a data warehouse application coupled with the purchase of a SAN or NAS appliance.
Kubernetes is disrupting the appliance vendors. They can't be containerized and orchestrated through Kubernetes.
This is why software vendors such as Presto, Spark, Splunk, Teradata, Vertica, and others are working so hard to be container-ready by leaving the state to object storage so they can become stateless. As a result, object storage is rapidly replacing SAN and NAS. You see that in Teradata's NOS and Splunk's SmartStore.
Enlightened enterprises are confronting those difficult decisions. A move to the cloud effectively strands those legacy solutions and changes the nature of IT, swinging influence to the DevOps folks. Most organizations will keep some if not all of their data on premises. Modern private cloud implementations using Kubernetes will result in the retirement of these appliances and the evolution of the teams managing them.
It is a difficult transition. Most IT folks just want to order another blade of what they have -- it makes their job easier. It does not, however, benefit the enterprise long term. The architecture that needs to be deployed is software defined, often open source, microservices friendly, S3 compatible, and scalable. Those are not terms that are associated with appliance vendors.
What initiative is your organization spending the most time/resources on today? What internal projects is your enterprise focused on related so that you benefit from your own data or business analytics?
MinIO has more than 12,000 organizations running its software and most of them have multiple instances. This is a tremendous source of information and MinIO looks at GitHub, Slack, and Remix to drive its product management function. While GitHub (22K stars) and Slack (nearly 8k users) are well known, Remix is a homegrown analytics platform.
MinIO started with MixPanel but the scale of our deployments made it unfeasible so we built our own. Remix allows us to understand organization type, configuration type, hardware type, usage, frequency of update, etc. Integrating Remix with GitHub and Slack enables us to prioritize features and bugs in real time. This is important as MinIO releases a new version weekly.
Additionally, these tools allow us to determine what features to remove. As a company that prioritizes simplicity, what we remove gets as much attention as what we add. By constantly analyzing the data, we can determine unused features and take them out or improve them.
Where do you see analytics and data management headed in 2020 and beyond? What's just over the horizon that we haven't heard much about yet?
In 2020, analytics and AI/ML will become cloud native and shift towards high-performance object storage. This will have the effect of making NVMe SSDs the dominant storage medium over the next 12 months. The financial services industry has already done the math on the total cost of ownership and moved the majority of its workloads there. Other industries have taken note and will begin to shift much of their spending (cloud and on premises) in this direction.
The performance and reliability outweigh the rapidly shrinking cost differential. This will, in turn, drive more 100GbE networks and will create an arms race on the performance front -- just as more AI/ML programs start to scale. Expect to see large scale NVMe deployments in PB scale in 2020.
We'll still have tape in 2020, but hard disks are going to start to look a lot more like that in the next year.
Describe your product/solution and the problem it solves for enterprises.
MinIO is a high performance, distributed object store designed to deliver massive scale in private cloud deployments. The S3-compatible, 100 percent open source solution is the fastest growing object store in the industry and is deployed by more than half of the Fortune500.
MinIO was purpose-built to serve only objects and is the fastest object store available, topping 183 GB/s on a small NVMe cluster. This speed means enterprises can run Spark, Presto, Tensorflow, and H2O.ai directly on the object store making it the primary storage solution while supporting traditional use cases such as data backup and recovery.