Revolutionizing Life Sciences with AI and a Hybrid-Cloud Data Lakehouse
In the rapidly evolving TechBio industry, one innovative life sciences company puts technology at the forefront of its approach to drug discovery and experimentation. They operate one of the fastest privately owned supercomputers in the pharmaceutical sector, enabling unprecedented computational scale in biological research. Powered by a bespoke and AI-driven scientific operating system, their mission is to transform and accelerate drug discovery and development by integrating automation, machine learning, and large-scale biological data. The company seeks to decode biology using computational tools and high-throughput experimentation to discover new treatments faster and more efficiently than traditional methods.To achieve these goals, the organization needed a high-performance AI data lakehouse that could economically scale to dozens of petabytes, with seamless integration across its hybrid cloud infrastructure. That’s where MinIO AIStor came in.
Analyzing the Environment
This life sciences company uses automated laboratories to generate massive biological datasets that capture how human cells respond to various compounds and genetic modifications—a process known as phenomics. Its data science and research teams use AI models to analyze this data and identify patterns that may reveal potential drug candidates, disease mechanisms, or therapeutic targets. The company’s unique advantage lies in its internally developed, full-stack experimentation platform and its highly automated laboratory robotics workflows, which enable up to 2.2 million experiments each week. These generate high-integrity biological cell plate images and chemical data, managed in real time through custom-built data pipelines that clean, annotate, and prepare data for downstream analysis. Altogether, the company stores and manages over 20 petabytes of data.
The Challenge: Complexity, Scalability, and Cost
The organization’s previous network-attached storage (NAS) system struggled to meet growing scalability and performance needs. To compensate, the team relied heavily on public cloud infrastructure—leading to high operational and egress costs. The system also lacked the robust feature set and operational simplicity needed for an environment running millions of weekly experiments. With a supercomputing cluster and lab systems generating terabytes of microscopy images daily, data had to move continuously between on-premises environments and the cloud, creating inefficiencies and cost burdens. The company required a hybrid cloud data lakehouse capable of supporting seamless movement between three critical environments: lab clusters, high-performance computing (HPC) systems, and public cloud infrastructure. Further, historical datasets spanning years of research needed to remain readily accessible across on-prem and multi-cloud environments for analysis and retraining. The team also required vendor independence and the flexibility to choose hardware aligned with its evolving needs.
The Solution
After a rigorous evaluation of storage alternatives, the organization selected MinIO AIStor as its next-generation AI data lakehouse platform to address its scalability, performance, and cost challenges.
Petabyte-Scale Performance
By adopting AIStor, the company scaled efficiently to tens of petabytes of data while reducing both latency and cloud egress costs. These gains significantly improved machine learning pipeline efficiency and accelerated HPC and AI/ML workloads across hybrid environments.
Seamless Hybrid Cloud Storage
AIStor’s S3 compatibility enables standardized data access across on-premises and cloud environments, allowing researchers to train foundational models wherever GPU resources are available. This flexibility maximizes computational efficiency, enabling more experiments with more accurate results—ultimately improving scientific and business outcomes.
Software-Defined Flexibility
The team valued AIStor’s vendor-agnostic, hardware-independent architecture, allowing them to continuously integrate new compute and storage technologies without platform lock-in. This ensures their ability to scale innovation and extract maximum value from their proprietary research platform. Today, the organization relies on AIStor to store and manage continuous streams of microscopic images, sequencing data, and experimental results. This data is replicated and moved effortlessly between HPC clusters and hybrid cloud environments to support ongoing machine learning, disease identification, and drug discovery.
How AIStor Helps Organizations Monetize their Data using AI
AIStor enables organizations to store, manage, and access petabyte-scale enterprise data critical to research, analysis and discovery. With enterprise-grade performance and flexibility, AIStor eliminates the need for custom-built storage architectures—allowing research teams to focus on innovation rather than infrastructure.
