PostgreSQL Meets Object Storage: Access External Data in MinIO
One of the most exciting developments in data is the rise of lakehouse functionality across all major database vendors. Snowflake and SQL Server have long adopted this and now PostgreSQL is embracing this paradigm shift with pg_lakehouse, making it easier than ever to leverage modern datalakes for analytics, AI and beyond. It’s perhaps no coincidence that as many more traditional databases continue to allow you to query over data in object storage, AWS has elected to deprecate Amazon S3 Select. There are simply many more entrants to the field that can successfully offer this functionality and more to customers.
While greenfielding offers the thrill of customizing technology stacks to specific use cases, a complete rip-and-replace strategy is seldom feasible nor sensible. Instead, the way forward lies in leveraging existing database technologies for compute while investing in world-class object storage. In this modern era, it’s data and storage that hold the true value, as query engines—though important—have become commoditized and interchangeable. pg_lakehouse makes this strategy possible for the many enterprises currently using PostgreSQL, allowing them to build for the future with a modern datalake without sacrificing their existing investments.
pg_lakehouse is an open-source extension developed by ParadeDB. This extension leverages PostgreSQL's existing foreign data wrapper capabilities, enhanced by integration with Apache DataFusion, to provide high-performance analytics over diverse data sources.
From SQL to Object Storage: The New Frontier
PostgreSQL has long supported foreign tables and extensions, allowing it to interact with external data sources. The new pg_lakehouse extension continues this tradition by enabling PostgreSQL to query data stored in object storage systems like MinIO. This isn't a mere add-on but an extension of PostgreSQL's existing capabilities, allowing users to treat external object stores as native tables within their database.
Paired with MinIO Enterprise Object Store, users can store vast amounts of data, while integrating it with their existing SQL workflows. Data engineers rejoice as PostgreSQL has become a query engine for Object Storage.
Why This Matters
In the modern data landscape, the ability to store and analyze data efficiently is paramount. On their own, traditional databases have limitations in scalability and flexibility, particularly when dealing with large datasets or diverse data formats.
The modern datalake architecture—combining the best of data lakes and data warehouses—addresses these challenges. By disaggregating compute and storage, this architecture allows enterprises to scale resources independently, optimizing both performance and cost. Additionally, modern datalakes support a wide range of AI/ML workloads, ensuring that data is always accessible, resilient, and secure, even in large, geographically distributed deployments.
PostgreSQL and MinIO Enterprise Object Store
Integrating PostgreSQL with MinIO’s Enterprise Object Store (EOS) provides a powerful foundation for building a modern data lake, offering features that ensure your data is scalable, secure, and highly performant.
- Query Across Data Sources with MinIO: Use the pg_lakehouse extension to directly query data stored in MinIO. CSV format is currently supported with S3-compatible object storage like MinIO. PostgreSQL can treat these files as native tables, enabling you to perform complex analytics without the need for data movement. The ParadeDB has indicated that support for Iceberg will be available shortly, further expanding the versatility of your data lake. Support for Iceberg will be available shortly, further expanding the versatility of your data lake.
- Enterprise-Grade Scalability: MinIO’s architecture is designed for massive scale, making it possible to manage exabytes of data effortlessly. MinIO uses a distributed, server-pool-based architecture that allows for horizontal scaling, meaning you can add more pools to increase capacity and performance without disruption. This design is ideal for handling the large-scale data needs of modern enterprises, ensuring that your infrastructure can grow alongside your data requirements.
- Advanced Security: Security is paramount in modern data architectures. MinIO EOS offers robust security features, including MinIO Enterprise KMS (Key Management System) for server-side encryption. EOS KMS ensures that your data is encrypted both at rest and in transit, maintaining the highest levels of data protection.
- High-Performance: MinIO Enterprise Cache feature significantly enhances data access speeds by storing frequently accessed data closer to the application. This is especially beneficial to PostgreSQL queries, where reduced latency can lead to faster query execution, particularly for large datasets stored in the data lake. There is only one world's fastest object store and with more than 325 GiB/s on GET operations and 165 GiB/s on PUT operations using NVMe SSDs there is only one real choice of object store to support PostgreSQL as a query engine.
- Streamlined Management with MinIO Console: The MinIO Enterprise Console provides an intuitive web-based interface for managing all your object storage in one place, including monitoring, user management, and policy enforcement. This ease of management is crucial when building a modern datalake as it allows administrators to efficiently oversee the storage layer from a single interface.
By leveraging these features of MinIO’s Enterprise Object Store, combined with PostgreSQL’s powerful capabilities, you will soon be able to build a modern, secure, and highly scalable modern datalake that meets the demands of today’s data-intensive environments. This setup not only enhances your analytics capabilities but also provides a robust foundation for future-proofing your data strategy, ensuring that your infrastructure can adapt to the evolving landscape of data management.
Getting Started with pg_lakehouse
The installation process is straightforward, with detailed setup instructions available in the official ParadeDB documentation. As an open-source project licensed under AGPL-3.0, pg_lakehouse encourages community contributions and ensures that the extension remains free and accessible, making it a valuable tool for organizations looking to modernize their data infrastructure with PostgreSQL and MinIO.
Continue to Build
The integration of lakehouse functionality into PostgreSQL via pg_lakehouse, combined with MinIO's robust object storage, offers a powerful solution for modern data needs. This move is not just about adding features but reflects a broader trend in the industry—one where data lakes and data warehouses converge to provide the best of both worlds. As more databases adopt similar functionality, the future of data analytics looks bright and more integrated than ever.
Whether you're a developer, data engineer, or machine learning engineer, now is the time to explore the possibilities of lakehouse architectures. With PostgreSQL and MinIO, you're not just keeping up with the times—you're leading the charge.