Document management is a core requirement for all sorts of regulated institutions - finance, telecom, healthcare, government and others. These institutions need to manage and retain an ever growing number of documents and regulatory guidelines often require these documents to be stored for a very long term (7-10 years).
Take for example, KYC (Know Your Customer) documents. Anyone starting a relationship with a financial institution needs to provide KYC documents. These documents are then stored with the institution as long as the relationship exists and even beyond that for several years.
Traditional document management systems are simply no longer able to provide the scale, cost efficiency and reliability that such use cases require. This has triggered the move to modern, cloud native solutions.
In this document we go through various challenges with traditional document management solutions and then look at the next generation solutions that some of our customers have developed.
Traditional document management systems are monolithic, closed systems. This means IT teams can not take advantage of new paradigms like segregation of concerns (i.e. microservices), API access, disaggregating storage and compute, among others. Adding features or scaling such systems simply means paying for new licenses.
Additionally, these systems are built as blackboxes with little to no extensionsibility. So, the IT teams have to work around these tools to achieve their intended workflow.
Limited System Integration Capabilities
Traditional Document management systems are inherently closed, with little to no API integration to plug into external systems, or plugin other tools. This creates a vendor lock-in, higher the number of documents a system manages, the tougher it becomes to get rid of the system.
This is orthogonal to the modern approach of native API integrations with other systems to build powerful features by composition of tools and technologies..
Typical document management systems use databases combined with file systems. Both databases and file systems are well known to have difficulties in handling large data volumes. The scalability challenges in the underlying database and file system, invariably show up as the number of documents grows.
Documents are essentially unstructured blobs of varying formats and sizes, ideal for object storage platforms like MinIO. With seamless scalability, API integration with major modern data platforms to enable search, audit logs and other important features, MinIO fits right in.
Here we propose a highly scalable approach to build a modern, cloud-native document store.
Let’s take a closer look at the architecture, its components and their interaction:
- Frontend: Modern JS based frameworks like React, Angular, Vue, or Svelte provide a great starting point for building the frontend for enterprise applications like this. Since all these frameworks are API driven, it is quite easy to integrate these with backend, special purpose applications like Identity and Access Management, DB, Object Storage among others.
- Text/Metadata Search: A key requirement for a document management system is to allow searching the whole document catalog for specific documents. This search may be based on document metadata (like owner-name, owner-id, document-type etc), or even based on contents of the document itself. Either way, any document metadata can be sent to text search tools like MeiliSearch, Elastic.
Essentially, MinIO can be configured to send out event notifications (with object metadata) to the text search platform. This way, the search platform then has all the document metadata and a link to the actual document in MinIO. The frontend can then send the user query the text search platform and fetch the relevant document.
Apart from simple text search, the object data and metadata from MinIO can be fed to Machine Learning pipelines as explained in this blog post. This opens up opportunities to analyse and understand not only the metadata but also the actual documents, opening up business insights for the teams to leverage.
- Webhook: Customer documents are secure and private objects, it is very important for the compliance or audit teams to ensure that a detailed audit log of the document is maintained.
MinIO can be integrated with various target systems to send out audit logs. MinIO webhook integration allows seamless integration with any platform that supports webhooks, while removing dependency from message queue type systems.
Additionally, MinIO ensures that events missed while the remote webook target is offline are delivered later, as and when the target comes back online. Events pending for delivery are stored securely on MinIO, making sure that the audit log doesn’t miss any entry.
Object Storage: This is the backbone of the whole system. It provides scalable, persistent storage for documents (and their multiple versions), related metadata, user-access policies among other information. There are several unique advantages with a MinIO object storage vs. generic or legacy solutions:
ILM based Tiering & Archival: Lifecycle management feature allows automatically moving data from one storage tier to another (e.g. Warm/Hot to Archival) or even expire certain objects after they are no longer required. This allows IT teams to ensure that only the frequently accessed documents are kept in the fast tier, while the other, infrequently accessed documents are moved to archival tier which potentially uses cost effective hardware allowing almost limitless storage.
Active-Active replication: High availability and protection from data loss is critical for document management systems. MinIO Active-Active replication comes in handy here. Users can configure the MinIO cluster to replicate objects to a remote MinIO cluster. This means IT teams can leverage a 2 DC approach to ensure data is safe from a complete DC failure.
Cohasset certified object locking & object retention: MinIO object retention and locking capabilities are certified by Cohasset. This means IT teams can be sure that MinIO software is complaint to the the appropriate regulator.
Document versions: MinIO object versioning allows seamless storage of multiple versions of a document. This maps very well to real life situations where customers may need to provide new documents due to issues in older versions.
Document management is not immune to the advancements in cloud technologies, microservices and API-driven development. Having said that, compliance, particularly regulatory compliance does cause innovation to lag. The trend we are seeing today, however, is that major financial institutions are moving toward a model where object storage serves as the foundational element in KYC architectures.
Modern object storage is scalable, resilient, performant and in the case of MinIO, certified for object locking and retention.
We think this modern architecture is the optimal approach for regulated institutions like finance, telecom, healthcare, government.
And this is why all ten of the largest banks in the US run MinIO and eight of the ten largest banks in Europe run MinIO.
To learn more, reach out to us at email@example.com and we can go into more detail than we can publicly.