Real Businesses Can’t Depend upon Just Tape Backup The Dark Side of The Cloud

File-systems Vs Databases

By Jaspreet on January 25th, 2009 under Data Protection, Technology & Innovation

This topic has been on my plate for some time now. It’s interesting to see how databases have come a long way and have clearly out-shadowed file-systems for storing structured or unstructured information.

Technically, both of them support the basic features necessary for data access. For example both of them ensure  -

  • Data is managed to ensure its integrity and quality
  • Allow shared access by a community of users
  • Use of well defined schema for data-access
  • Support a query language

But, file-systems seriously lack some of the critical features necessary for managing data. Lets take a look at some of these feature.

Transaction support
Atomic transactions guarantee complete failure or success of an operation. This is especially needed when there is concurrent access to same data-set. This is one of the basic features provided by all databases.

But, most file-systems don’t have this features. Only the lesser known file-systems - Transactional NTFS(TxF), Sun ZFS, Veritas VxFS support this feature. Most of the popular opensource file-systems (including ext3, xfs, reiserfs) are not even POSIX compliant.

Fast Indexing
Databases allow indexing based on any attribute or data-property (i.e. SQL columns). This helps fast retrieval of data, based on the indexed attribute. This functionality is not offered by most file-systems i.e. you can’t quickly access “all files created after 2PM today”.

The desktop search tools like Google desktop or MAC spotlight offer this functionality. But for this, they have to scan and index the complete file-system and store the information in a internal relational-database.

Snapshots
Snapshot is a point-in-time copy/view of the data. Snapshots are needed for backup applications, which need consistent point-in-time copies of data.

The transactional and journaling capabilities enable most of the databases to offer snapshots without shopping access to the data. Most file-systems however, don’t provide this feature (ZFS and VxFS being only exceptions). The backup softwares have to either depend on running application or underlying storage for snapshots.

Clustering
Advanced databases like Oracle (and now MySQL) also offer clustering capabilities. The “g” in “Oracle 11g” actually stands for “grid” or clustering capability. MySQL offers shared-nothing clusters using synchronous replication. This helps the databases scale up and support larger & more-fault tolerant production environments.

File systems still don’t support this option :(  The only exceptions are Veritas CFS and GFS (Open Source).

Replication
Replication is commodity with databases and form the basis for disaster-recovery plans. File-systems still have to evolve to handle it.

Relational View of Data
File systems store files and other objects only as a stream of bytes, and have little or no information about the data stored in the files. Such file systems also provide only a single way of organizing the files, namely via directories and file names. The associated attributes are also limited in number e.g. - type, size, author, creation time etc. This does not help in managing related data, as disparate items do not have any relationships defined.

Databases on the other hand offer easy means to relate stored data. It also offers a flexible query language (SQL) to retrieve the data. For example, it is possible to query a database for “contacts of all persons who live in Acapulco and sent emails yesterday”, but impossible in case of a file system.

File-systems need to evolve and provide capabilities to relate different data-sets. This will help the application writers to make use of native file-system capabilities to relate data. A good effort in this direction was Microsoft WinFS.

Conclusion

The only disadvantage with using the databases as primary storage option, seems to be the additional cost associated. But, I see no reason why file-systems in future will borrow features from databases.

Disclosure

Druvaa inSync uses a proprietary file-system to store and index the backed up data. The meta-data for the file-system is stored in an embedded PostgreSQL database. The database driven model was chosen to store additional identifiers withe each block - size, hash and time. This helps the filesystem to -

  1. Divide files into variable sized blocks
  2. Data deduplication - Store single copy of duplicate blocks
  3. Temporal File-system - Store time information with each block. This enables faster time-based restores.

Related Posts:

  1. Understanding RPO and RTO Recovery Point Objective (RPO) and Recovey Time Objective (RTO) are...
  2. Druvaa inSync v3.0 Feature List This post is in reference to the forums discussion -...
  3. We are Paid to Backup not Restore “We are paid to Backup not Restore … “, this...
  4. New Features in inSync v2.2 This weekend I downloaded and tried the new Druvaa inSync...
  5. Performance Optimization One of the major goals for inSync 2.1 release (due...

Related posts brought to you by Yet Another Related Posts Plugin.

5 Comments Add your own

  • 1. Jacob  |  January 29th, 2009 at 2:12 pm

    Nice Information.

    You mean, you are running a full user-space file system just for backups ?

  • 2. Jaspreet  |  February 2nd, 2009 at 3:39 am

    Jacob,

    Yes, we are doing that for better “restores” :)

    Jaspreet

  • 3. PuneTech » Should y&hellip  |  February 3rd, 2009 at 11:06 pm

    [...] Jaspreet Singh, of Pune-based startup Druvaa has weighed in on this issue on Druvaa’s blog. His post is republished here with [...]

  • 4. What should you use : a f&hellip  |  February 9th, 2009 at 11:58 pm

    [...] Reference The original is post is at:  http://blog.druvaa.com/2009/01/25/file-systems-vs-databases/ [...]

  • 5. File-systems Vs Databases&hellip  |  March 24th, 2009 at 4:53 pm

    [...] File-systems Vs Databases [...]

Leave a Comment

Required

Required, hidden

Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Trackback this post  |  Subscribe to the comments via RSS Feed


Categories

Subscribe

Calendar

January 2009
M T W T F S S
« Dec   Feb »
 1234
567891011
12131415161718
19202122232425
262728293031  

Archives

Blogroll

Meta

Tags

Visitors Online