Object storage in practice: Creating a reliable data store

Object storage in practice: Creating a reliable data store

In my previous post we learnt the why and what of object storage. Specifically, we learnt why at all a new storage paradigm is required and what it does to alleviate modern unstructured data problems. We saw how object storage lets you access objects at application layer with simple API calls over HTTP(s) and does away with overhead of handling files in traditional way. Let us take a real life use case, and see how object storage can help make sure your application’s unstructured data is stored in an easy to get, resource light and reliable manner.

Use Case

WordPress is one of most used content management platforms. As per wikipedia, over 60 million people have used WordPress to host their websites or blogs, and around 24% of the top 10 million websites on the Internet are based on WordPress. Naturally, WordPress websites are homes to millions of images and videos — unstructured data in its crudest form.

Currently, if you have your own WordPress installation, and you’re using WordPress out of the box, all the images, videos etc. are stored in a folder on your server’s file system. This is probably okay if you have a few images and expect few visitors. But as the site grows and you add new posts (with each generally including multiple videos and/or images) server file system fills up, sometimes even slowing the overall system.

If we could separate the file storage and the web server, like we generally separate web server and database server, it would free you up from worries of server failure, data backup and other overheads related to managing data.

WordPress storage use case

So, the use case here is to override the default WordPress file upload process. So that instead of usual save to file system, putObject() is called to store the uploaded files in an object store. Later when files need to be retrieved, getObject()can get the files. This way you abstract away the storage details and focus on keeping your web server running.

Architecture

Above use case takes an overly simplified view of how files can be simply put and get from the object server. In reality, things are bit more complex. Applications need file meta data to make some sense of a file. Metadata is generally a small chunk of structured data, i.e. predefined set of fields like author, uploaded timestamp, file type, file id and so on. Structured nature of the meta data means it is a good candidate to be stored in the database.

So, files sit in the object store, metadata goes to the database. And here is how it generally works — when the application needs to upload a file, it creates the metadata and stores it to the database, along with putting the file to the object store. Later when the file is needed, the application queries the database for the metadata and the based on the available info, gets the file.

WordPress database tables updated when a file is uploaded. Image credit: https://codex.wordpress.org/Database_Description

Coming back to WordPress, it is not an exception, it uses the similar metadata based file handling even when files are stored in server’s file system. Each file uploaded to WordPress is treated as a post and has an ID assigned to it. In addition, there are several fields like author, date edited, title, etc. that are updated when a file is uploaded to WordPress. To be specific, the tables wp_posts and wp_postmeta are updated.

Going ahead with our plan to use object storage instead of file system storage in our WordPress installation, it makes sense to keep the metadata aspect of the files unchanged. We will just override the part where files are physically stored and retrieved from the local disk.

The implementation

Armed with all the analysis, let us try to understand how to create a WordPress plugin to override the file upload process. The plugin should ensure that files are uploaded to the object storage server, while the metadata creation and storage remain unchanged. For the uninitiated, WordPress offers great deal of flexibility via plugins, you can easily extend or modify a feature with your plugin code. Here is a detailed tutorial how to create one.

To start with, you’ll need to call the add_action() WordPress method. This method helps you trigger a PHP function (from your plugin) when a specific event happens. WordPress provides several events for plugin developers that can be used as hooks to trigger specific functions. I have used the admin_inithook for now. As you get a hold of different functionalities required for the plugin and various hooks related to them, you can add other hooks.

add_action( ‘admin_init’, ‘wp_minio’ );

As you’d have guessed, wp_minio() is the function that will be triggered. Let’s see what it should look like. First of all we’ll use the minio-js library to call the Minio fPutObject() API. To do that, we can call a .js file with the file path and the filename (of file being uploaded) as parameters from wp_minio().

$execution_cmd = node fput-object.js ‘.$file_location.’ ‘.$file_name.’ ;

The fput-object.js file should handle the parameters passed and call the fPutObject() API with required arguments. Read more about the fPutObject()API here. This will upload the file to your object storage server. What follows is the routine process of creating file metadata, saving it to WordPress database and creating file thumbnails. But all this is nothing new, WordPress already does it. You can refer to the source code to see how it’s done.

Next part is make sure the files from the object server are accessible during the normal functioning of WordPress. To do this, we can frame file URL by concatenating the object server endpoint name, bucket name and the filename (given, the object permissions are set to at least authenticated user). You can get the bucket name using the API listBuckets() and the filename from the WordPress database. Just concatenate these with the endpoint to create the URL and let WordPress seamlessly access the files.

Conclusion

One of the key concerns of application developers is where do they dump all the files uploaded to their application. Till now this was generally a folder on the server’s file system. With open source object storage at their disposal, uploading and retrieving files is as easy as simple API calls. In this post we saw how can we implement object storage with metadata stored in tandem, to create a robust file storage mechanism that is easy to scale and manage in the long run.

We would love to know if you already use object storage in your applications or plan to do so and how, do let us know in the comments section!