Green-ness of Data De-duplication

The Storage Hunger

Sale of disk-bases storage system has already crossed 2500 Petabytes in 2008 and up by 58.1% YOY (One petabyte = 1 Million Gbs). These figures do not include the direct attached storage which comes pre-loaded with PCs or servers.[1]

This is understandable as 1TB (1000GB) storage NAS/SAN devices are now commodity. The top three vendors in this space are HP, IBM and EMC with market share of aprroximately 29%, 20% and 14% respectively.[2]

The overall consumption doubles when this storage is backed up :)

Energy Consumption

On an average a dataceter consumes 100 Watts/sq-feet of energy and the best solid state storage consumes about 5 watts for 1MB IOPs.[3]

This puts the total cost for mainiating (cooling + power) for 1 TB disk array about USD $2,500/annually. (16c for KWh, and 20 GB average daily usage).

This makes the annual energy consumption of newly bought storage = USD 5 Billion !!!

And backing this 5 Billion dollar inventory surely adds couple of more billions.

Data De-duplication

The data de-duplication technology saves single copy of duplicate data. There are two important aspects of any data de-duplication solution/product -

  1. Scope of duplicate discovery - File-level / Sub-File level / Block level
  2. Point of duplicate discovery - Source / Target

Most of the storage vendors which use data de-duplication provide block-level duplicate removal at target (i.e. when the data reached the storage). But, its not very difficult to image that source level removal of sub-file or block level duplicates would be much better for two reasons -

  1. Sending lesser/de-duplicated data saves time and bandwidth (apart from storage)
  2. Duplicate discovey would be much better as you have access to the structured data

Consindering Microsoft’s report on de-duplicate assessment [4], -

  1. 20-30% data duplicates are easily visible even in unstructured data source like ERP databases
  2. 40-80% data duplicates can be seen in file-servers and mail servers.
  3. 60-90% data duplicates can be seen between different PCs. (Just my observation and opinion)

On an average a conservative 30% data duplicate removal can save $1.6B on storage energy and $2B on bandwidth costs and backups.


De-duplication and Druvaa

We see Druvaa inSync as a product/platform to provide de-duplicated (at source) backup for PCs, PDAs and servers. The current version is available for just PCs and we can easily see up to 90% savings for time and cost (bandwidth and storage) for enterprises.

I just don’t see a reason why all storage and backup vendors wouldn’t do it. EMC and Netapp have already announced de-duplcation as additionally licenssible technology on their arrays (target based).[5] No major vendor except for EMC has announced agent/source based de-dup though.[6]

Surely, Druvaa has a good lead and cashing on it :)

Add comment September 20th, 2008


Categories

Subscribe

Calendar

March 2010
M T W T F S S
« Feb    
1234567
891011121314
15161718192021
22232425262728
293031  

Archives

Blogroll

Meta

Tags

Visitors Online