<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	>
<channel>
	<title>Comments on: Data De-duplication</title>
	<atom:link href="http://blog.druvaa.com/2008/06/15/data-de-duplication/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/</link>
	<description>Continuous Data Availability</description>
	<pubDate>Fri, 12 Mar 2010 16:17:58 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.7.1</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Davish Bhardwaj</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/comment-page-1/#comment-783</link>
		<dc:creator>Davish Bhardwaj</dc:creator>
		<pubDate>Wed, 20 May 2009 03:58:31 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-783</guid>
		<description>Your blog is very intresting.
I began my research on data deduplication after reading your blog.
But after I'm done with my research on Dedupe,I want to know where to showcase my research.
Actually I have developed an Algorithm for Variable size block Deduplication which I think is best in the storage domain industry in terms of Deduplication ration with an additional blend of fast processing.
Can Anybody there help me out.

Thanks in advance.</description>
		<content:encoded><![CDATA[<p>Your blog is very intresting.<br />
I began my research on data deduplication after reading your blog.<br />
But after I&#8217;m done with my research on Dedupe,I want to know where to showcase my research.<br />
Actually I have developed an Algorithm for Variable size block Deduplication which I think is best in the storage domain industry in terms of Deduplication ration with an additional blend of fast processing.<br />
Can Anybody there help me out.</p>
<p>Thanks in advance.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Types Of Erp</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/comment-page-1/#comment-60</link>
		<dc:creator>Types Of Erp</dc:creator>
		<pubDate>Sun, 27 Jul 2008 15:46:56 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-60</guid>
		<description>&lt;strong&gt;Types Of Erp...&lt;/strong&gt;

Your blog makes very interesting reading. I'm sure others will think so too I look forward to reading their comments....</description>
		<content:encoded><![CDATA[<p><strong>Types Of Erp&#8230;</strong></p>
<p>Your blog makes very interesting reading. I&#8217;m sure others will think so too I look forward to reading their comments&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Men&#8217;s Game &#187; Blog Archive &#187; 10x Faster Enterprise PC Backup</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/comment-page-1/#comment-56</link>
		<dc:creator>Men&#8217;s Game &#187; Blog Archive &#187; 10x Faster Enterprise PC Backup</dc:creator>
		<pubDate>Sat, 21 Jun 2008 22:37:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-56</guid>
		<description>[...] Background of De-duplication (and attempts made by others). [...]</description>
		<content:encoded><![CDATA[<p>[...] Background of De-duplication (and attempts made by others). [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: 10x Faster Enterprise PC Backup at VentureWoods - India's leading venture capital community</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/comment-page-1/#comment-55</link>
		<dc:creator>10x Faster Enterprise PC Backup at VentureWoods - India's leading venture capital community</dc:creator>
		<pubDate>Fri, 20 Jun 2008 14:04:20 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-55</guid>
		<description>[...] Background of De-duplication (and attempts made by others). [...]</description>
		<content:encoded><![CDATA[<p>[...] Background of De-duplication (and attempts made by others). [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jaspreet</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/comment-page-1/#comment-54</link>
		<dc:creator>Jaspreet</dc:creator>
		<pubDate>Fri, 20 Jun 2008 11:21:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-54</guid>
		<description>Ankur,

Thanks for the information on Avamar. Yup, I also came to know about that some time back.

The approach followed by rsync, avamar and pure disk is to narrow down the delta change from the previous backup of the same data.

We are creating fingerprints which can be matched with any "similar" data from any source. The important step is to -

1. choose block boundaries which can be checked for hash against data from other sources.
2. compute this and check it against server in real time.

This makes sense for Laptop backups where duplicates are between machines. The pitch is for time not storage space.</description>
		<content:encoded><![CDATA[<p>Ankur,</p>
<p>Thanks for the information on Avamar. Yup, I also came to know about that some time back.</p>
<p>The approach followed by rsync, avamar and pure disk is to narrow down the delta change from the previous backup of the same data.</p>
<p>We are creating fingerprints which can be matched with any &#8220;similar&#8221; data from any source. The important step is to -</p>
<p>1. choose block boundaries which can be checked for hash against data from other sources.<br />
2. compute this and check it against server in real time.</p>
<p>This makes sense for Laptop backups where duplicates are between machines. The pitch is for time not storage space.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ankur P</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/comment-page-1/#comment-53</link>
		<dc:creator>Ankur P</dc:creator>
		<pubDate>Fri, 20 Jun 2008 11:09:19 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-53</guid>
		<description>Jaspreet,

Your claims about Avamar technology are not correct. Their technique is neither block-based, nor is their checksum computation "expensive". They use a technique they call "sticky byte factoring", which uses simple rolling checksums to generate variable sized chunks of data and to find differences. See their patent application for details: http://is.gd/C01 You can see that their technique is highly suited for remote office kind of situations. PureDisk used to do fixed-size chunking, but I had heard that they have also started moving towards an algorithm very much like Avamar's.

Also, when you say "linear polynomial based hash", I guess you mean Rabin fingerprinting. Rsync also uses similar fingerprinting to reduce latencies.

- Ankur.</description>
		<content:encoded><![CDATA[<p>Jaspreet,</p>
<p>Your claims about Avamar technology are not correct. Their technique is neither block-based, nor is their checksum computation &#8220;expensive&#8221;. They use a technique they call &#8220;sticky byte factoring&#8221;, which uses simple rolling checksums to generate variable sized chunks of data and to find differences. See their patent application for details: <a href="http://is.gd/C01" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/is.gd');">http://is.gd/C01</a> You can see that their technique is highly suited for remote office kind of situations. PureDisk used to do fixed-size chunking, but I had heard that they have also started moving towards an algorithm very much like Avamar&#8217;s.</p>
<p>Also, when you say &#8220;linear polynomial based hash&#8221;, I guess you mean Rabin fingerprinting. Rsync also uses similar fingerprinting to reduce latencies.</p>
<p>- Ankur.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
