MinIO AIStor: Pioneering Arm-Powered AI Data Infrastructure with NVIDIA BlueField-3 DPUs

The Arm architecture is revolutionizing the hyperscale cloud, propelled by its Total Cost of Ownership (TCO) advantages—lower power consumption and reduced cooling requirements—that enable sustainable, high-performance computing at scale. Industry leaders like AWS, Azure, and GCP are embracing Arm to drive their latest compute instances for AI training, harnessing its efficiency to meet the demands of data-intensive workloads. These same compelling factors—cost savings, energy efficiency, and streamlined infrastructure—are now poised to drive enterprise customers to adopt Arm on-premises, building private AI data infrastructure that rivals the cloud’s capabilities. In this shifting landscape, MinIO stands as a pioneer, having engineered Arm-native object storage from day one. Our innovations position MinIO as the cornerstone of enterprise-grade, Arm-powered AI data infrastructure.

AIStor Arm Readiness: A Foundational Commitment

AIStor’s embrace of the Arm architecture has been a cornerstone of our vision since day one, reflecting our belief in its transformative potential. We recognized early on that Arm’s remarkable energy efficiency and computational density would redefine modern data infrastructure—and we built AIStor from the ground up to harness these strengths. This forward-thinking approach has perfectly positioned us to capitalize on Arm’s growing prominence in data centers and AI workloads, where power efficiency and scalability unlock new possibilities for enterprise success.

Initially, AIStor optimized performance using Arm’s Neon instruction set to accelerate essential functions like erasure coding and bit-rot detection. Recognizing the limitations of Neon, we transitioned to Arm's Scalable Vector Extensions (SVE) to unlock significantly greater efficiencies. SVE’s length-agnostic SIMD architecture delivers a remarkable doubling of throughput for Reed-Solomon erasure coding relative to Neon, while simultaneously utilizing just one-quarter of the available cores and half the memory bandwidth. Further, our Highway Hash algorithm for bit-rot detection demonstrates linear scaling with core count, achieving near-maximum memory bandwidth utilization at around 50–52 cores, particularly for larger data block sizes. These performance enhancements, rigorously tested on advanced Arm hardware, underscore AIStor’s unique capability to fully exploit Arm's architectural potential, ensuring unparalleled efficiency and performance that redefine object storage solutions tailored specifically for demanding AI-driven workloads.

JBOF and BlueField-3: A Lucrative Frontier

JBOF or “Just a Bunch of Flash” is an intelligent, all-flash storage system that ditches the conventional CPU-memory-NIC trio for a smarter, more integrated approach using a Data Processing Unit (DPU).  The “intelligence” comes from swapping out the traditional server guts (CPU, RAM, and NIC) for a DPU, a specialized processor that combines three key roles into one: a networking card, a data accelerator, and a storage processor. The NVIDIA BlueField-3 (BF3) DPU has 16 Arm cores, 400Gb/s Ethernet or InfiniBand networking, and hardware accelerators for tasks like encryption, compression, and erasure coding.

At just ~100MB, the small AIStor binary size is a testament to our minimalist design philosophy, delivering maximum capability with minimal overhead. This compactness makes MinIO an ideal candidate for native deployment on BF3 DPUs, where resource constraints demand lightweight yet powerful software. Our prior testing on NVIDIA BlueField DPUs validated this fit, demonstrating MinIO’s ability to run efficiently on Arm-based networking hardware, offloading storage tasks from host CPUs.

AIStor, deployed natively on the BF3 DPU, provides enterprises with a platform that integrates seamlessly with NVIDIA’s Spectrum-X networking architecture. This delivers the low-latency, high-bandwidth performance required for AI environments, ensuring reliable data transfers that optimize GPU cluster efficiency. In a BF3-powered JBOF configuration, AIStor is poised to leverage GPU Direct Storage (GDS) capabilities—currently in development, with General Availability forthcoming—to transfer data over RDMA on Ethernet fabrics. This enhancement, once realized, will improve CPU efficiency on both the GPU server and the JBOF storage server, a critical advantage given the computational power available on the JBOF system. This strategic combination of Spectrum-X compatibility and forthcoming GDS integration equips enterprises with a scalable, efficient, and high-performance foundation, poised to address the evolving demands of AI innovation.

Conclusion

Currently, AIStor is undergoing testing on a Supermicro JBOF setup, a work in progress, with official support for this configuration slated for General Availability at a later date, further solidifying our commitment to cutting-edge solutions.

As AI workloads grow in scale and complexity, the need for storage that keeps pace with GPU and DPU innovations has never been greater. MinIO’s Arm readiness, refined through years of optimization from Neon to SVE, establishes us as a leader in this domain. Our compatibility with emerging JBOF architectures and NVIDIA BlueField-3 DPUs amplifies this advantage, providing enterprises with a pathway to exascale storage that is both efficient and future-ready. With a proven track record and a compact binary size tailored for DPU-native deployments, MinIO delivers not just performance, but a strategic edge, empowering organizations to fully harness the potential of their AI infrastructure.