The Bank of the East - Replacing Hadoop with MinIO and Dremio
Our client, a global financial institution headquarterd in Japan, recently completed an ambitious Hadoop replacement project with MinIO and Dremio. You can see them present it in this talk from Subsurface but we thought we would write it up as well.
Like most banks, the firm had built out a large Hadoop footprint to power its analytics and risk management workloads. Like every other bank - the legacy architecture started to crack under mounting data volumes and more intense query loads. Furthermore, frequent system outages impacted the bank’s ability to meet stringent SLAs and regulatory requirements.
To resolve these challenges, the bank embarked on a sweeping modernization initiative centered around adopting Dremio's high-performance SQL query engine optimized for cloud data lakes. To complement Dremio's compute capabilities, the bank selected AIStor. Engineered specifically for analytics and AI workloads, MinIO delivered the scalable and resilient data foundation needed to sustain the bank's surging analytics demands.
For organizations aiming to enable the next generation of data-driven insights, their success underscores how strategic deployment of MinIO can overcome analytics limitations and unlock new potential. Let’s dive into some lessons that can be taken from this success story.
Lesson 1: Recognize When Legacy Systems Are Holding You Back
A key milestone in this financial institution’s modernization journey was acknowledging the growing limitations of its Hadoop-based architecture. Designed for batch processing, Hadoop struggled to deliver the real-time performance and scalability required for modern workloads. Frequent stability issues and outages plagued operations. The complexity of managing Hadoop also hindered innovation and made it hard to pursue new initiatives.
Most importantly, the bank realized Hadoop could not cost-effectively scale to meet future data growth. Legacy systems like Hadoop were not built to leverage the cloud or handle new data types like JSON and Avro. Tying its future to Hadoop meant restricting business agility and analytics capabilities.
Best Practice
Organizations should periodically assess if existing data infrastructure still meets current and projected requirements. Workload performance, scalability ceilings, operational overhead, and innovation readiness are key evaluation criteria. Although initially sufficient, legacy systems often struggle to support modern analytics demands. Recognizing these limitations early is essential to transform data architecture before it hinders competitiveness. Regular reviews ensure systems evolve and align with organizational needs.
Lesson 2: Adopt a Cloud-Native Approach for Flexibility and Scalability
To modernize its data architecture, the bank smartly embraced cloud-native technologies centered around Kubernetes for orchestration and containerization. This microservices-based approach provided greater automation, simplifying infrastructure management. Kubernetes' dynamic resource allocation and auto-scaling also enabled seamless scalability to handle spiky workloads.
Crucially, Kubernetes supports portable deployment across on-prem and multi-cloud environments. By integrating cloud-native technologies like Dremio and MinIO, the bank gained a consistent interface to query data anywhere. This hybrid and multi-cloud capable analytics platform ensured future flexibility and mitigated vendor lock-in risks.
Best Practice
Cloud-native methodologies facilitate scalability while abstracting underlying infrastructure complexities. Portability across environments future-proofs analytics investments. By encapsulating functionality into modular microservices orchestrated by Kubernetes, complexity reduces while operational efficiency improves. Prioritizing cloud-native capabilities and standards compliance unlocks data agility across pipelines and workloads.
Lesson 3: Select Purpose-Built Storage for Analytics Workloads
General-purpose storage systems fail to unlock the full performance potential of modern analytics engines like Dremio. By selecting MinIO's purpose-built object store, the bank ensured storage would never be the bottleneck. MinIO's erasure coding policies, bitrot protection, and parallelized architecture are engineered specifically to tackle the intense loads of ad hoc SQL queries. These optimizations prevented storage from hindering demanding analytical workloads.
By combining MinIO's fast object storage with Dremio's accelerated query engine, the bank established a best-in-class analytics stack. Together, these technologies minimized query latency by reducing unnecessary data movement and I/O. Rather than settle for generic storage, the bank chose a system explicitly designed for intensive analytics. This strategic pairing of Dremio and MinIO enabled unparalleled performance at a petabyte scale.
Best Practice
When evaluating storage, carefully analyze the architectural attributes optimized for target workloads. For demanding analytics use cases, MinIO's purpose-built design delivers blazing performance.
Lesson 4: Maximize Performance Through Strategic Data Locality
To optimize query performance, the bank strategically deployed Dremio and MinIO together on the same physical servers. This topology maximized data locality between the query engine and the storage layer. With both technologies colocated on each node, Dremio could leverage MinIO's parallel architecture to achieve high throughput across local SSDs. Rather than move data across the network, computations occurred alongside the stored data.
MinIO's ability to saturate available disk bandwidth enabled fast parallel execution of queries and data scans. At the same time, Dremio's columnar optimization and caching capabilities minimized disk I/O for the most common queries. By maximizing data locality between MinIO and Dremio, the bank minimized network traffic and latency. This, along with the platform's native acceleration features, translated to blazing overall performance.
Best Practice
Architect MinIO deployments to keep storage physically close to compute engines, maximizing data locality. This allows fast parallel query execution and caching to unlock the full performance potential.
The bank's analytics modernization journey highlights key lessons for organizations aiming to enable the next generation of data-driven insights. By adopting MinIO's cloud-native object storage, the bank overcame the constraints of legacy infrastructure to boost performance, ensure resilience, and unlock new innovations.To learn more, about this bank and others, drop us a note on sales@min.io and we will show you AIStor that make it happen for the client.