A lot of ink has been spilled on the significance of the AI/ML technology wave (here are our posts). What doesn’t get attention, but probably should, is how AI/ML is remaking the technology power structure inside the enterprise. As companies reorganize around a data-centric orientation, they are also reorganizing who makes and executes the technology architecture. While subtle, there is a transformative shift from IT departments to data teams as the primary custodians of data infrastructure and it is likely permanent.
The Rise of Big Data Systems
The roots of this change can be traced back to the big data movement, most notably the adoption of the Hadoop ecosystem. While we belittle Hadoop today - it was revolutionary in its time and ushered in the era of big data. Unlike traditional databases managed by IT, Hadoop enabled data professionals to handle vast data sets, unlocking new potential in analytics and insights. This represented a shift in expertise from IT to those with specialized skills in data processing and analytics. Control over what got bought and from what vendors also shifted to these new teams. Ironically, while PBs in size, even today, these are often considered “off book” by the traditional IT leadership.
The AI/ML Explosion
When OpenAI’s ChatGPT debuted in November of 2022, the world changed. The true power of what was possible seemed to flow into everyone’s consciousness and the accessibility of ChatGPT made AI a board-level imperative. Data wasn't just about retrospective analytics anymore; it was about prospective insights, predictions, and even automated insight. With this, the role of data scientists and ML engineers became crucial and giving them the infrastructure and tools they needed a priority.
The Changing Role of IT
As data teams increasingly became central to an organization's strategic decisions, the role of IT began to evolve. Traditionally, IT's main focus was on maintaining infrastructure, ensuring uptime, and managing storage and access. But with the rise of cloud platforms and as-a-service offerings, many of these tasks got outsourced or automated. IT's role started to transition more towards ensuring integration, security, and governance. These are mission-critical to be sure, but the enterprise has moved past the command and control role for IT to more of an orchestrator. Speaking of orchestrators, the cloud operating model delivered containerization and orchestration to the developer community. This changed how code was written, shipped, maintained and updated. No longer was it an upgrade a year - it was an upgrade a week, sometimes even a day. IT was not and is not built for that world and as such have focused on creating a secure, governable environment that is an enabler for the development teams.
The Case for Data Teams Designing and Managing the Infrastructure
Look at any research, AI/ML is the top technology priority for the enterprise. That doesn’t mean that every enterprise knows what that means, but when the CEO is mandating something - you have a tendency to listen. The AI/ML teams (many of whom worked on the Hadoop infrastructure) are going to drive the bus when it comes to the data infrastructure. This isn’t just about selecting the frameworks used, or even coding the models or choosing a foundational model to adapt - they are literally going to design AI centric datalake infrastructure. To succeed in AI, you need to dramatically consolidate access to data. It can be distributed, but it needs to be accessible.
Data teams possess specialized knowledge not only in data analysis but in optimizing data storage, retrieval, and processing for analytics and ML applications. Given their narrower focus, data teams can often respond more quickly to new technological advances, ensuring that the organization stays at the cutting edge. Data teams are also better equipped to understand the business implications of data and directly manage that infrastructure to ensure better alignment with the enterprise’s goals.
The Outline of an AI Centric Datalake
The AI Centric datalake will emphasize different elements of the modern datalake.
It will be bigger, demand performance at scale and will be multi-engine. Additionally, it will be disaggregated between the data layer and compute layer. The first wave of disaggreation, post HDFS by the way, was the compute layer and the drive layer - this takes it further.
The AI datalake will be scale out. Scale up falls apart quickly at once you start to get into PBs and EBs - which is exactly where we are heading with these.
The AI datalake will be software defined and cloud-native (containerization, orchestration, APIs, automation etc) - that means it will be based on object storage. We are already seeing this behavior with customers today.
The AI datalake will be Price/Performance optimized for scale (therefore NVMe). The case for all-flash storage all the time is here and given the value of GPU cycles - feeding them for capacity utilization matters more than ever. Throughput AND IOPS matter.
The AI datalake will use commodity HW. Appliances need not apply. This is a simple lesson from the hyperscalers - dumb HW and smart SW. In volume.
The AI datalake will push the speed of the networking layer in such a way that 100 GbE becomes table stakes.
A Collaborative Paradigm
It's essential to understand that this isn't a zero-sum game. The evolution doesn't render IT departments obsolete. Instead, it calls for a more collaborative approach where IT and data teams work hand in hand. While data teams might be at the forefront of leveraging the data, IT plays a critical role in ensuring that the infrastructure is secure, compliant, and integrated with other enterprise systems.
We are really excited about this next chapter. The AI/ML revolution is not just about new technologies; it's about rethinking organizational structures and roles in the face of rapidly changing technological landscapes. As data becomes increasingly central to enterprise success, it's only logical that those who best understand its implications take the lead in managing its infrastructure. If you want to engage us and learn how our customers and community are adapting to this world, ping us on firstname.lastname@example.org. MinIO doesn’t have a traditional sales team so you can be sure you are talking to someone with a deep understanding of both AI and of MinIO.