Accelerating MongoDB Backup with MinIO Jumbo

Accelerating MongoDB Backup with MinIO Jumbo

MongoDB is one of the most widely used, if not the most widely used, distributed document database. MongoDB is known for coupling a flexible document data model with powerful features such as support for ad-hoc queries, secondary indexing and real-time aggregations. Enterprises rely on MongoDB and its scale-out architecture to handle huge volumes of unstructured data to build business and web applications that evolve quickly and scale transparently. The database is offered in several different formats, and today we’re going to focus on MongoDB Community Server.

No database should be put in production without a reliable, ironclad backup and recovery strategy in place. With MongoDB playing a critical role in the enterprise, it’s essential to back up the business data that it holds for business continuity and disaster recovery. However, backing up MongoDB databases is a time-consuming and resource-intensive process. This blog post explains how to back up MongoDB replica sets; sharded clusters require a lot more effort and coordination, as per MongoDB documentation.  

Mongodump is a small simple MongoDB backup utility that creates BSON files from a database. BSON, or Binary Javascript Object Notation, is a binary encoded serialization of JSON documents. JSON is human-readable, whereas BSON is not. BSON encodes type and length information so machines can quickly parse the data within. BSON enjoys the support of multiple programming languages such as C, C++, C#, Java, JavaScript, PHP, Python, Ruby, and Swift. BSON is lightweight, meaning that it’s possible to store very large amounts of data in BSON format, plus, BSON files are efficient when transmitted across a network and written to storage. It may be helpful for you to think of BSON the same way you think of other binary file formats. The workflow is to run mongodump and send the BSON output to MinIO in order to create a remote snapshot of your MongoDB replica set.

Free mongodump is a convenient MongoDB backup solution, but mongodump is a single data stream from a single server. The consequence is that mongodump works great up to a point, but falls down when it needs to provide performance at scale. Slow backup performance decreases the effectiveness of an enterprise’s BC/DR strategy and prevents cloning a database for analytics and further development.  When enterprises reach that point, they typically upgrade to the hosted or enterprise version or write their own scripts to back up using the underlying file system. But what if there was a way to parallelize the output of mongodump to achieve greater speed and scale?

Have no Fear, MinIO Jumbo is Here!

We developed MinIO Jumbo to upload and download very large streams, such as database backups, to/from MinIO. Jumbo accepts content via a STDIN pipe or reads it from storage and uploads it in parallel to MinIO Server. In the case of mongodump, Jumbo receives the piped stream from the source, fills concurrent buffers and uploads in parallel to the server. The default buffer size is 256 GiB, but you can configure the buffer size to meet your performance requirements. The contents of each buffer become an object that is uploaded to MinIO as a multipart upload.  

Combining MinIO Jumbo with a MinIO deployment creates a flexible, durable high-performance home for backups without limitation. While capable of supporting the most demanding workloads, MinIO is also exceptionally well suited as storage for backups because it is:

  • High-Performance: MinIO is capable of PUT throughput of 1.32 Tbps and GET throughput of 2.6 Tbps in a single 32 node NVME cluster. This means that backup and restore operations run faster, decreasing the potential business impact of downtime.
  • Optimized across a range of object sizes: MinIO can handle any object size with aplomb. Because MinIO writes metadata atomically along with object data, it does not require a separate metadata database. This dramatically decreases response time for small file PUTs and GETs. Jumbo parallelizes large object uploads to use the network as efficiently as possible.
  • Inline and strictly consistent: All I/O is committed synchronously with inline erasure-code, Bitrot hash and encryption making MinIO atomic and immediately consistent. The S3 API is resilient to disruption or restart during an upload or download so a backup can’t disappear. Finally, there is no caching or staging of data, meaning that all backup operations are guaranteed to complete.
  • Built for commodity hardware: Commercial off-the-shelf hardware means that you’ll realize huge savings over purpose built appliances. In today’s economic climate, you’ll get more bang for the buck as backup data grows into petabytes.  

MinIO Jumbo and mongodump Tutorial

We will use mongodump and MinIO Jumbo to back up a local MongoDB database to a MinIO deployment.

Prerequisites

To follow this tutorial, you will need:

  1. MinIO Server running on bare metal or Kubernetes. If you’re not already running it, please install MinIO on bare metal or Kubernetes.
  2. MinIO Client (mc) to access MinIO Server. Here’s how to install mc locally.
  3. MongoDB Community so we can run MongoDB.
  4. MongoDB Shell to manage and work with MongoDB.

Install mongodump

Open this link to download MongoDB’s database tools. If you are using the Linux command line, click the Copy Link to the right of the Download button. Then use a command line tool such as wget or curl to download the install package. Next, expand the file and copy the executables to a directory in your path.

mkdir mongodbtools
cd mongodbtools
wget mongodb-database-tools-*-100.7.0.tgz
tar -zxvf mongodb-database-tools-*-100.7.0.tgz
sudo cp * /usr/local/bin

At this point, you could simply enter mongodump to make a backup and save it locally.

Create a MongoDB Database

If you already have a MongoDB database, then you can skip this step. If, like me, you didn’t have MongoDB installed but you want to try out MinIO Jumbo, then please follow the steps in this section to create a MongoDB Database and populate it. These steps are based on Getting Started — MongoDB Manual.

$ mongosh
...
test> 
test> use examples
Switched to db examples
examples> db.movies.insertMany([ { title: 'Titanic', year: 1997, genres: ['Drama', 'Romance'], rated: 'PG-13', languages: ['English', 'French', 'German', 'Swedish', 'Italian', 'Russian'], released: ISODate("1997-12-19T00:00:00.000Z"), awards: { wins: 127, nominations: 63, text: 'Won 11 Oscars. Another 116 wins & 63 nominations.' }, cast: ['Leonardo DiCaprio', 'Kate Winslet', 'Billy Zane', 'Kathy Bates'], directors: ['James Cameron'] }, { title: 'The Dark Knight', year: 2008, genres: ['Action', 'Crime', 'Drama'], rated: 'PG-13', languages: ['English', 'Mandarin'], released: ISODate("2008-07-18T00:00:00.000Z"), awards: { wins: 144, nominations: 106, text: 'Won 2 Oscars. Another 142 wins & 106 nominations.' }, cast: ['Christian Bale', 'Heath Ledger', 'Aaron Eckhart', 'Michael Caine'], directors: ['Christopher Nolan'] }, { title: 'Spirited Away', year: 2001, genres: ['Animation', 'Adventure', 'Family'], rated: 'PG', languages: ['Japanese'], released: ISODate("2003-03-28T00:00:00.000Z"), awards: { wins: 52, nominations: 22, text: 'Won 1 Oscar. Another 51 wins & 22 nominations.' }, cast: ['Rumi Hiiragi', 'Miyu Irino', 'Mari Natsuki', 'Takashi Naitè'], directors: ['Hayao Miyazaki'] }, { title: 'Casablanca', genres: ['Drama', 'Romance', 'War'], rated: 'PG', cast: ['Humphrey Bogart', 'Ingrid Bergman', 'Paul Henreid', 'Claude Rains'], languages: ['English', 'French', 'German', 'Italian'], released: ISODate("1943-01-23T00:00:00.000Z"), directors: ['Michael Curtiz'], awards: { wins: 9, nominations: 6, text: 'Won 3 Oscars. Another 6 wins & 6 nominations.' }, lastupdated: '2015-09-04 00:22:54.600000000', year: 1942 }])

Enter this command to verify that the documents have been added to the collection

db.movies.find( { } )

Now that we have data in the movies collection, we can back it up. Remember, if you already have a populated collection, you can back that up instead.

Install MinIO Jumbo

Email us at hello@min.io for the Jumbo binary.

Set variables for authentication

export JUMBO_ACCESS_KEY=<Your-MinIO-Access-Key> JUMBO_SECRET_KEY=<Your-MinIO-Secret-Key>

We’re now ready to use Jumbo.

Back Up with mongodump and Jumbo

We’re going to back up with mongodump and then use Jumbo to quickly and efficiently store that backup in MinIO. Mongodump will back up to STDOUT and this output will be piped into Jumbo. Because we are using STDOUT, mongodump is limited to backing up a single collection from a single database at a time.

If you enter the following, you’ll see the contents of your collection fly by on your monitor as they’re written to STDOUT. This tells you that the MongoDB part of the equation is in order.

mongodump --db=examples --collection=movies --out="-"

Feel free to substitute your database and collection names.

Create a bucket on your MinIO deployment to store the backups

mc mb backup

Next, we’ll pipe the output from mongodump to Jumbo and write it to the MinIO bucket specified. Please note that the Jumbo version is likely to change over time.

mongodump --db=examples --collection=movies --out="-" | ./jumbo_0.1-rc2_linux_amd64 put http://<Your-MinIO-Address>:9000/backup/mongo-backup-1

MinIO Jumbo Accelerates Backups

If you’re struggling to meet backup objectives, then you need MinIO Jumbo to parallelize and speed up writing backups to object storage. Jumbo allows you to use all available bandwidth to write the backup files to MinIO. Given that MinIO is the fastest object storage you can buy, and that Jumbo maxes out the network, this means that the only limitation to how fast you can back up is the source system itself.  

If you have any questions about Jumbo or want to download the binary, please ask us via email at hello@min.io.