<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Data De-duplication</title>
	<atom:link href="http://blog.druvaa.com/2008/06/15/data-de-duplication/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/</link>
	<description>Continuous Data Availability</description>
	<pubDate>Sun, 23 Nov 2008 13:20:41 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5.1</generator>
		<item>
		<title>By: Types Of Erp</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/#comment-60</link>
		<dc:creator>Types Of Erp</dc:creator>
		<pubDate>Sun, 27 Jul 2008 15:46:56 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-60</guid>
		<description>&lt;strong&gt;Types Of Erp...&lt;/strong&gt;

Your blog makes very interesting reading. I'm sure others will think so too I look forward to reading their comments....</description>
		<content:encoded><![CDATA[<p><strong>Types Of Erp&#8230;</strong></p>
<p>Your blog makes very interesting reading. I&#8217;m sure others will think so too I look forward to reading their comments&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Men&#8217;s Game &#187; Blog Archive &#187; 10x Faster Enterprise PC Backup</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/#comment-56</link>
		<dc:creator>Men&#8217;s Game &#187; Blog Archive &#187; 10x Faster Enterprise PC Backup</dc:creator>
		<pubDate>Sat, 21 Jun 2008 22:37:42 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-56</guid>
		<description>[...] Background of De-duplication (and attempts made by others). [...]</description>
		<content:encoded><![CDATA[<p>[...] Background of De-duplication (and attempts made by others). [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: 10x Faster Enterprise PC Backup at VentureWoods - India's leading venture capital community</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/#comment-55</link>
		<dc:creator>10x Faster Enterprise PC Backup at VentureWoods - India's leading venture capital community</dc:creator>
		<pubDate>Fri, 20 Jun 2008 14:04:20 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-55</guid>
		<description>[...] Background of De-duplication (and attempts made by others). [...]</description>
		<content:encoded><![CDATA[<p>[...] Background of De-duplication (and attempts made by others). [...]</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jaspreet</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/#comment-54</link>
		<dc:creator>Jaspreet</dc:creator>
		<pubDate>Fri, 20 Jun 2008 11:21:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-54</guid>
		<description>Ankur,

Thanks for the information on Avamar. Yup, I also came to know about that some time back.

The approach followed by rsync, avamar and pure disk is to narrow down the delta change from the previous backup of the same data.

We are creating fingerprints which can be matched with any "similar" data from any source. The important step is to -

1. choose block boundaries which can be checked for hash against data from other sources.
2. compute this and check it against server in real time.

This makes sense for Laptop backups where duplicates are between machines. The pitch is for time not storage space.</description>
		<content:encoded><![CDATA[<p>Ankur,</p>
<p>Thanks for the information on Avamar. Yup, I also came to know about that some time back.</p>
<p>The approach followed by rsync, avamar and pure disk is to narrow down the delta change from the previous backup of the same data.</p>
<p>We are creating fingerprints which can be matched with any &#8220;similar&#8221; data from any source. The important step is to -</p>
<p>1. choose block boundaries which can be checked for hash against data from other sources.<br />
2. compute this and check it against server in real time.</p>
<p>This makes sense for Laptop backups where duplicates are between machines. The pitch is for time not storage space.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ankur P</title>
		<link>http://blog.druvaa.com/2008/06/15/data-de-duplication/#comment-53</link>
		<dc:creator>Ankur P</dc:creator>
		<pubDate>Fri, 20 Jun 2008 11:09:19 +0000</pubDate>
		<guid isPermaLink="false">http://blog.druvaa.com/?p=22#comment-53</guid>
		<description>Jaspreet,

Your claims about Avamar technology are not correct. Their technique is neither block-based, nor is their checksum computation "expensive". They use a technique they call "sticky byte factoring", which uses simple rolling checksums to generate variable sized chunks of data and to find differences. See their patent application for details: http://is.gd/C01 You can see that their technique is highly suited for remote office kind of situations. PureDisk used to do fixed-size chunking, but I had heard that they have also started moving towards an algorithm very much like Avamar's.

Also, when you say "linear polynomial based hash", I guess you mean Rabin fingerprinting. Rsync also uses similar fingerprinting to reduce latencies.

- Ankur.</description>
		<content:encoded><![CDATA[<p>Jaspreet,</p>
<p>Your claims about Avamar technology are not correct. Their technique is neither block-based, nor is their checksum computation &#8220;expensive&#8221;. They use a technique they call &#8220;sticky byte factoring&#8221;, which uses simple rolling checksums to generate variable sized chunks of data and to find differences. See their patent application for details: <a href="http://is.gd/C01" rel="nofollow" onclick="javascript:pageTracker._trackPageview ('/outbound/is.gd');">http://is.gd/C01</a> You can see that their technique is highly suited for remote office kind of situations. PureDisk used to do fixed-size chunking, but I had heard that they have also started moving towards an algorithm very much like Avamar&#8217;s.</p>
<p>Also, when you say &#8220;linear polynomial based hash&#8221;, I guess you mean Rabin fingerprinting. Rsync also uses similar fingerprinting to reduce latencies.</p>
<p>- Ankur.</p>
]]></content:encoded>
	</item>
</channel>
</rss>
