Home > Design Patterns > High Volume Binary Storage

High Volume Binary Storage (Buhler, Erl, Khattak)

How can a variety of unstructured data be stored in a scalable manner such that it can be randomly accessed based on a unique identifier?

High Volume Binary Storage

Problem

Storing very large amounts of unstructured data in traditional database technologies not only incurs performance penalty but also suffers from scalability issues as the amount of data increases.

Solution

Unstructured data is stored based on a simple cluster-based storage technique that implements accessing data units via keys.

Application

A NoSQL-based Big Data storage technology is used that treats each data unit as binary data and provides access to it via unique key such that each data unit can be retrieved, replaced or deleted individually.

A key-value NoSQL data is introduced within the Big Data platform. Such a database generally provides API-based access for inserting, selecting and deleting data without any support for partial updates, as the database has no inner knowledge about the structure of the data it stores. Such a NoSQL database is good for storing large amounts of data in its raw form because all of the data gets stored as a binary object. Furthermore, a key-value NoSQL database can also be utilized where the use case involves high-speed read and write operations.

Apart from a generic disk-based, NoSQL and key-value database, a memory-based storage device, such as a memory grid that provides key-value storage, can also be used to gain the same functionality with the added benefit of low latency data access.

It should be noted that the application of the High Volume Binary Storage pattern delegates the responsibility of interpreting (serialization/deserialization) the data to the client that reads the data. Hence, the successful read of the data by any client requires knowledge about the nature of the data being stored. Also, as the access is only possible via the key, some logical key naming nomenclature may need to be implemented for quick retrieval of the required data units.

High Volume Binary Storage: A contemporary database solution is implemented that supports scaling out and stores data as a binary large object (BLOB) that can be accessed based on an identifier.

A contemporary database solution is implemented that supports scaling out and stores data as a binary large object (BLOB) that can be accessed based on an identifier.

  1. A user tries to import a very large binary file into a key-value NoSQL database.
  2. The operation succeeds and the database assigns a key to the stored file.
  3. The user later requests the database for data with the same key.
  4. The previously stored, very large binary file is returned to the user in its original format.