Green-ness of Data De-duplication
By Jaspreet on September 20th, 2008 under Data Protection, Technology & Innovation
The Storage Hunger
Sale of disk-bases storage system has already crossed 2500 Petabytes in 2008 and up by 58.1% YOY (One petabyte = 1 Million Gbs). These figures do not include the direct attached storage which comes pre-loaded with PCs or servers.[1]
This is understandable as 1TB (1000GB) storage NAS/SAN devices are now commodity. The top three vendors in this space are HP, IBM and EMC with market share of aprroximately 29%, 20% and 14% respectively.[2]
The overall consumption doubles when this storage is backed up
Energy Consumption
On an average a dataceter consumes 100 Watts/sq-feet of energy and the best solid state storage consumes about 5 watts for 1MB IOPs.[3]
This puts the total cost for mainiating (cooling + power) for 1 TB disk array about USD $2,500/annually. (16c for KWh, and 20 GB average daily usage).
This makes the annual energy consumption of newly bought storage = USD 5 Billion !!!
And backing this 5 Billion dollar inventory surely adds couple of more billions.
Data De-duplication
The data de-duplication technology saves single copy of duplicate data. There are two important aspects of any data de-duplication solution/product -
- Scope of duplicate discovery - File-level / Sub-File level / Block level
- Point of duplicate discovery - Source / Target
Most of the storage vendors which use data de-duplication provide block-level duplicate removal at target (i.e. when the data reached the storage). But, its not very difficult to image that source level removal of sub-file or block level duplicates would be much better for two reasons -
- Sending lesser/de-duplicated data saves time and bandwidth (apart from storage)
- Duplicate discovey would be much better as you have access to the structured data
Consindering Microsoft’s report on de-duplicate assessment [4], -
- 20-30% data duplicates are easily visible even in unstructured data source like ERP databases
- 40-80% data duplicates can be seen in file-servers and mail servers.
- 60-90% data duplicates can be seen between different PCs. (Just my observation and opinion)
On an average a conservative 30% data duplicate removal can save $1.6B on storage energy and $2B on bandwidth costs and backups.
De-duplication and Druvaa
We see Druvaa inSync as a product/platform to provide de-duplicated (at source) backup for PCs, PDAs and servers. The current version is available for just PCs and we can easily see up to 90% savings for time and cost (bandwidth and storage) for enterprises.
I just don’t see a reason why all storage and backup vendors wouldn’t do it. EMC and Netapp have already announced de-duplcation as additionally licenssible technology on their arrays (target based).[5] No major vendor except for EMC has announced agent/source based de-dup though.[6]
Surely, Druvaa has a good lead and cashing on it ![]()
Leave a Comment
Some HTML allowed:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>
Trackback this post | Subscribe to the comments via RSS Feed