In November of 2023, Amazon announced the S3 Connector for PyTorch. The Amazon S3 Connector for PyTorch provides implementations of PyTorch's dataset primitives (Datasets and DataLoaders) that are purpose-built for S3 object storage. It supports map-style datasets for random data access patterns and iterable-style datasets for streaming sequential data access patterns.
The S3 Connector for PyTorch also includes
Read more
Earlier this month, Amazon held their re:Invent conference in Las Vegas, Nevada, from December 1st to 5th - a 5-day event. If you have never been to a re:Invent conference, then the word that describes it best is “huge” - not just in terms of the number of attendees (60,000) but also the breadth of topics covered.
Read more
In November of 2023 Amazon announced the S3 Connector for PyTorch. The Amazon S3 Connector for PyTorch provides implementations of PyTorch's dataset primitives (Datasets and DataLoaders) that are purpose-built for S3 object storage. It supports map-style datasets for random data access patterns and iterable-style datasets for streaming sequential data access patterns.
In a previous post, I introduced the
Read more
How does Exness handle massive data volumes and demanding AI/ML workloads? By moving to an on-prem infrastructure powered by MinIO. From scaling their data lake to managing traffic peaks of 200 Gbps, MinIO supports their AI workflows, disaster recovery, and more.
Read more
Your DevOps Engineer’s customer should be your AI/ML Engineering Team. The DevOps Engineer is there to ease the friction points in infrastructure so AI/ML folks can focus on the task at hand. Any issues that come with the infrastructure should be the responsibility of the DevOps Engineer.
Read more
Almost a year ago (actually 11 months ago), I wrote about the “Starving GPU Problem” and how the horsepower of Nvidia’s Graphic Processing Units (GPUs) could be so powerful that your network and your storage solution may not be able to keep up - preventing your expensive GPUs from being fully utilized. Well, in those short 11 months, a
Read more
As AI workloads drive cloud costs through the roof, many companies are rethinking their approach. Moving select AI tasks back on-prem offers a path to predictable costs, improved performance, and stronger data control.
Read more
Interoperability is the key to building a flexible, future-ready AI data stack. As proprietary systems lock down innovation and drive up costs, open tools like S3-compatible storage and multi-format table systems offer the freedom to scale and adapt.
Read more
MinIO recently surveyed 656 IT leaders as part of a primary research initiative with User Evidence. The results were very interesting and underscore the massive sea change we are seeing in the enterprise, both around the movement to object storage and the interest in using object storage as the primary building block for an organization’s AI initiatives. We will
Read more
Before diving into Amazon’s S3 Connector for PyTorch, it is worthwhile to introduce the problem it is intended to solve. Many AI models need to be trained on data that cannot fit into memory. Furthermore, many really interesting models being built for computer vision and generative AI use data that cannot even fit on the disk drive that comes
Read more
One of the biggest challenges facing organizations today for AI and data management is access to reliable infrastructure and compute resources. The Intel Tiber Developer Cloud is purpose-built for engineers who need an environment for proof-of-concepts, experimentation, model training, and service deployments. Unlike other clouds, which can be unapproachable and complex, the Intel Tiber Developer Cloud is simple and easy
Read more
AIStor is a foundational component for creating and executing complex data workflows. At the core of this event-driven functionality is MinIO bucket notifications using Kafka.
Read more
In this post we’ll show you how you visualize the cluster metrics in a web browser and also we’ll set up alerting so that when something like a drive needs to be replaced or drive runs out of space we can get alerted for it.
Read more
To ensure AI success, start by hiring a data engineer, not an AI/ML expert. Learn from our experience and find out why a strong data foundation—focused on object storage, data lakehouses, and optimized pipelines—is critical for scalable, efficient AI/ML workloads.
Read more
You’ve surely version controlled code in the past. But have you version controlled your data? Did you ever want to collaborate on large sets of data with various teams without committing a large chunk?
Read more
This post first appeared on The New Stack on July 29th, 2024.
Artificial Intelligence is in the middle of a perfect storm in the software industry, and now Mark Zuckerberg is calling for open-sourced AI.
Three powerful perspectives are colliding on how to control AI:
1. All AI should be open-source for sharing and transparency.
2. Keep AI closed-source and
Read more
In this post we explain how to use Splunk's advanced log analytics to help understand the performance of AIStor and the data under management.
Read more
The modern enterprise defines itself by its data. This requires a data infrastructure for AI/ML as well as a data infrastructure that is the foundation for a Modern Datalake capable of supporting business intelligence, data analytics, and data science. This is true if they are behind, getting started or using AI for advanced insights. For the foreseeable future, this
Read more
The team at Insight Partners just released their State of Enterprise Tech report for 2024. There is a lot to consume in the 60+ slides, but we cherry picked the things that should be interesting to our audience - and frankly there is a lot of interesting stuff.
I will leave the survey methodology stuff for you to consume, but
Read more