Hiring for AI Success: Why Your First Hire Should Be a Data Engineer

AI initiatives are often a top priority for many organizations looking to extract value from their data. However, before hiring a highly skilled AI/ML engineer, the critical foundation of managing and optimizing data needs to be in place. For AI success, hiring a data engineer first is crucial, especially one experienced in object storage and open table formats. Here’s why.

Learning from Experience: Getting the Job Title Right

When Pete Hnath, our Head of Technical and Sales Training, began his AI journey here at MinIO, he initially posted a job for a "Data Scientist." His goal was to find someone to dive into machine learning and build AI models right out of the gate. But something didn’t quite fit. As Pete explained:

“I first posted the position as 'Data Scientist' and got a bunch of resumes too high in the stack. The candidates were well-versed in algorithms and advanced analytics but didn’t have the hands-on experience with cloud infrastructure we needed.”

Realizing that building AI models requires a strong data foundation, Pete changed the role to "Cloud Engineer." This adjustment attracted candidates proficient in cloud infrastructure but with a focus on the operational layer—managing virtual machines, networks, and cloud native software.

“They were too low in the stack," he continued, "These candidates excelled in cloud native software but weren’t equipped to handle the nuances of data architecture.”

Finally, he redefined the position as "Data Engineer." This hit the sweet spot.

“Data Engineer seems to be the Goldilocks title for what I need—someone who can manage, store, and optimize data for AI/ML workloads and has the architectural awareness to help select and deploy scalable, performant, cloud-native infrastructure.”

Why Data Engineers Are Essential for AI Initiatives

AI/ML models are only as good as the data they rely on. If that data is poorly managed, messy, or not optimized for efficient processing, even the best AI models will fall short. Similarly, if the data architecture is not optimized for the unique and demanding storage, network and compute requirements of AI/ML, the application will struggle to perform. A data engineer with the right skills ensures that your AI/ML efforts are built on a solid foundation.

Here’s what to look for in a Data Engineer for AI success:

  1. Experience with Object Storage: Object storage has become the backbone of modern data lakes and lakehouses. It provides the flexibility and scalability required to handle massive amounts of unstructured and semi-structured data, critical for AI workloads. A data engineer should be well-versed in platforms like MinIO or AWS S3, ensuring seamless data management.
  2. Proficiency in Data Lakehouses: AI workloads demand high-performance access to vast amounts of data, and the data lakehouse architecture provides the best of both worlds—the scalability of data lakes and the reliability of data warehouses. A data engineer should be proficient in managing and optimizing data lakehouse environments, ensuring that data is well-organized, easily retrievable, and optimized for AI/ML use cases.
  3. Data Pipeline Development: A great data engineer can design and implement robust data pipelines that clean, transform, and aggregate data efficiently. This ensures that by the time an AI/ML engineer gets to work, the data is well-prepared for modeling.
  4. Cloud-Native Skills: While being too low in the stack isn't ideal, a strong data engineer should still have cloud-native experience to leverage software for scalable storage and computing. Their focus should be on data architecture and the mechanics of storing, organizing, and accessing data.

Building an AI-Ready Data Lakehouse

For modern AI initiatives, the data lakehouse architecture is increasingly becoming the go-to solution. It combines the flexibility and scalability of data lakes with the performance and ACID guarantees of traditional data warehouses. To make the most of this architecture, your first hire should be someone who knows how to:

  • Manage large-scale object storage systems.
  • Optimize data using open table formats for maximum efficiency.
  • Ensure data pipelines deliver AI-ready datasets.
  • Partner with the architecture team to select and deploy the right infrastructure.

Hiring a data engineer before an AI/ML engineer ensures your data infrastructure can handle the demands of advanced analytics and machine learning. With a strong data foundation, you can avoid common pitfalls and make sure your AI initiatives are successful from day one.

Hire the Right Expertise, in the Right Order

The key takeaway for organizations embarking on AI initiatives is to prioritize hiring a data engineer with the right experience. Look for candidates who understand object storage, open table formats, cloud-native software, and data infrastructure. Once your data foundation is solid, your AI/ML engineer can focus on building and fine-tuning models without being bogged down by data inefficiencies.