Object storage: What is it all about?
Traditionally web applications used file systems and databases to store user data in the back-end. It was simple, structured data goes to the database and anything else goes to the file system. This was easy to manage, since rarely an application generated unstructured data — most of the applications took user input in forms and saved the data to database. However, times are changing now, with the advent of social media, cloud storage, data analytics platforms and other computing paradigms, more and more unstructured data is now being pushed on to the Internet.
The background
IDC conducted a study in 2014 that predicted the unstructured data created and copied all over the world to reach 44 zettabytes, i.e. 44 trillion gigabytes annually by 2020. This means a 10x increase from the 2013 figure of 4.4 zettabytes. If you are thinking this is a little too much, think about this — unstructured data already accounts for 90% of all of the digital data in 2015!
So, as with other computing paradigms, storage systems needed to evolve to be able to cater to this new wave of unstructured data that has hit the Internet. But before we move any further, let me define unstructured data for you. Data that can’t be organized for storage inside a relational database is generally termed as unstructured data. You can have textual or non textual unstructured data. Text documents, emails, presentations, etc. are examples of textual unstructured data. Examples of non textual unstructured data include videos, images, audio files etc.
Why object storage?
We now know that there is a lot of unstructured data being generated and it needs to be handled in an easy to access yet secure and reliable way. We already have a storage mechanism that people have been using since the start of modern computing, the file system. So, why do we need a whole new storage paradigm? The answer lies in the details. Let us close in a little and understand the requirements.
- When we talk about unstructured data and its scale, it is important to understand that the underlying system used to store data should scale very well. But scaling file systems is difficult. Not only you need to manage the (sometimes) unnecessary metadata and hierarchy that file systems impose on you, there is also other stuff like backup management to be taken care of.
- It is not enough to just collect unstructured data. You also need to apply some level of organization to be able to make sense of the data. Techniques like text analytics, auto categorization, auto tagging etc are crucial to get business sense from all the unstructured data that you collect. File systems with a fixed layout make this difficult to achieve.
- File systems aren’t made for HTTP(S) but humans. Sharing and managing files in a file system is difficult to handle programmatically (think of the cunning C/C++ file handling tricks that most of us struggled to understand well). Handling file streams and the possible boundary cases is error prone and takes lot of time and effort.
To bypass all this, something new was needed. Imagined from the scratch, keeping the new requirements in focus. This led to object storage.
Object storage
Unlike files in file systems, objects are stored in a flat structure. There is just a pool of objects — no folders, directories or hierarchies. You simply ask for a given object by presenting its object ID. Objects may be local or on a cloud server thousands of miles away, but since they are in a flat address space, they are retrieved exactly the same way.
Another important aspect is metadata handling. Object storage provides great deal of flexibility while storing object metadata. This means metadata is not limited to what storage system thinks is important (think of fixed metadata in file systems). You can manually add any type or amount of metadata. For example, you can assign metadata such as the type of application the object is associated with; the importance of an application; the level of data protection you want to assign to an object; whether you want this object replicated to another site or sites; when to move this object to a different tier of storage or to a different geography; when to delete this object. And so on, the possibilities are limitless.
It is very important for files to be accessible via HTTP(S). Only if the file is easily accessible, it can be subjected to analytics or other techniques. Object storage handles this well. Almost all the platforms offering object storage have REST APIs available to help you access the files via HTTP(S). Not only the APIs are helpful in accessing data, they also help you authenticate, get the file properties and manage permissions — everything you’d need to do manually in a file system.
Conclusion
Now that the majority of data on the Internet is unstructured and pundits are predicting a double digit growth in this trend, it is important to take this challenge head on. Not only unstructured data be stored in an easy to access manner, it is important to be able to make business sense from the unstructured data collected over time.
Object storage holds a promise to help you achieve all this and more. With HTTP(S) access, flexible metadata, and flat storage model, it has everything that you need to handle the unstructured data wave.
An opportunity presents itself as a challenge at first. Only if you have a solution to the challenge, it will become your opportunity. Object storage paradigm is your solution to the unstructured data challenge. Go convert this challenge into your opportunity!