Read an interesting article at McKinsey Quarterly, which discussed that the storage demand is increasing at a much higher rate than the falling storage pricing.
For backups, both the storage cost and the bandwidth availability are not able to catch up with increasing storage demand.
Take iPhone as example. The amount of storage in iPhone has increased from 4Gb to 16GB, but the media and bandwidth available for backup hasn’t changed much.
The problem is particularly challenging for remote and on-the-go backups. Its an interesting fact (i read somewhere) that almost 200 Million enterprise users are working remotely at any given time. And its a very good possibility that they wouldn’t have backed their data.
This is where Druvaa inSync comes in. The SendUnique technology, ensures that the duplicate data on enterprise devices is backed up just once, giving a clear 90% advantage for bandwidth and storage used for backup
Currently we ship the product for only notebooks, but soon plan to cover every device connected to enterprise network from PDA to Servers.
The Gartner Report (here) says storage data de-duplication and virtualization are two main technologies driving innovation in storage management software this year. This makes sense, considering the fact that corporate data is increasing at a whooping 60% annual rate. (Microsoft Report says here).
Server Backup
Data is very rarely common between production servers of different types. Its not difficult to imagine that Exchange email server may not have same content as Oracle database server. But data is largely duplicate within file-servers, exchange server and say a bunch of ERP servers (development and test). This duplication creates potential bottlenecks for bandwidth and storage used for backup.
Existing players have offered two solutions to this problem -
Traditional single-instancing at backup server to filter out common content e.g Microsoft Single Instance Service (in Data center edition). This saves the just storage cost, depending upon at what level to filter commonalities - file / block / byte. A big player in this space is Data-Domain. These solutions don’t have a client component, they just save storage space.
New innovative solutions like Avamar (now with EMC) and PureDisk (now with Veritas) which try filter content at backup server level before the data goes to the (remote) store. This makes these solutions much better suited for remote-office backups. They save bandwidth and storage.
But, there are two unsolved problems with both these approaches as well ( Which also, explains a poor response for these products in the market )-
most of the times simple block checksum matching fails to figure out common data, as it may not fall on block boundaries . Eg. if you insert a simple byte in a file, the whole file changes and all the blocks shift. And the block checksum approach fails.
Checksum calculation is very costly and makes backups CPU exhaustive.
These approaches are targeting storage cost, not time/bandwidth which is more critical.
PC Backups
The problem is much more complex at PC level, as duplicated data is distributed among users and is as high as 90% in some cases. Emails / documents and similar file formats create large pool of duplicate data between users.
Also, since 50% of PC backup is mainly large email files, this is problem is particularly difficult to solve using simple file based de-duplication techniches used by servers.
Druvaa inSync v2.0 uses a on-wire (distributed) de-duplication technique which senses duplicate data before the backup starts and hences skips it from the backup. This is transparent to the user, all he notices is a 10 times boost in backup speed with over 90% reduction in bandwidth and storage usage.
How it works
This technology creates and maintains a Global “Single Instance” File System at backup server. Each time a user wants to backup a file, the insync clients prepares a file-fingerprint (using linear polynomial based hash) and compares it with the server. After the server sends a response, the backup happens only for the “unique” data within the file.
The (patent pending) advance file-fingerprinting makes it computationally very easy to filter common content like - same paragraphs in different documents, a same CCed email, media rich corporate presentations etc. This cuts down time for backup by 10 times and reduces bandwidth and storage utilization by 90%.
Other Interesting Features
Another good use of the Gobal Single Instance File System is - Continuous Data protection. The user after starting the restore can see how his files changes over time. Which gives him an option to restore point-in-time data from any point in the past. The marketing name for the feature is - “Eternity. Never lose a file. Ever.” A long name, but serves its meaning
Business Opportunities
The same technology/product can be stripped down to backup PDAs and scaled up to backup servers. A good use case would be to reduce time for backup of bunch of related remote servers.
Druvaa announces the availability of Druvaa inSync 2.0 Beta. The idea behind v2.0 is fast and bandwidth/storage efficient backup. The much awaited release, brings four very interesting and unique features -
1. SendUnique - Enterprise wide on-wire data de-duplication. Almost 80% of PC data (emails/docs) within an enterprise is common between users. SendUnique technology fingerprints the user’s backup set to send only one copy of data (emails/docs) common between different users to the backup server. This speeds up backup by almost 10 times and cut bandwidth usage by 90%.
2. Eternity - Never Loose a file. Ever. Timeline based, from-the-past restore. Enables ultimate protection against data loss or virus attacks.
3. NetworkSense - Automatic network sensing and prioritization. Allocates a user defined percentage of bandwidth for backups.
4. TrueSecure - Client triggered secure backups. 256 byte network (SSL) and 256 bit (AES) storage encryption.
You can sign up here for beta evaluation and updates.
The following presentations describes the 2.0 feature set -
“We are paid to Backup not Restore … “, this was the company slogan at my last job.
And the team really followed it well, seriously. Look at the existing backup solutions, they have tons of options for backup and most of these options are derived for pre-historic tape-based backups. In fact most of the solutions confuse backup with archival. Take a look at following backups options -
Backup Rule Engines
Complicated Scheduling and archival times
Naming a backup or scheme or snapshots
But, there are hardly any options for restore , besides they hardly even work
I guess restores can be made much better, with options like -
Self-serve, web based restore.
Give and option to restore just a part of backup.
Search a file in Restore.
Restore based on Timeline. ( Choose a date and restore based on that date. )
Support compression for faster restores (just like backups).
With Druvaa inSync we tried to address most these issues. Now with upcoming 2.0 we plan to add file-searching and time-line based restores.
Should the results differ for a larger set of users ? Looks difficult, but would soon try and publish (upcoming) version 2.0 benchmarks with larger set of users.
For the first timers - Druvaa inSync is a Fast Enterprise Laptop Sync solution mainly targeted at LAN/WAN based backups for mobile workforce and cross office backup consolidation.
Druvaa inSync 2.0 is being targeted as a complete laptop data sync protection solution.
Feature Set
a) Server Side
Performance improvements - Improved sync algorithm (based on xdelta-3).
Continuous Data Protection - Date/time based “from the past” restore.
Network Profiles - admin configures network profiles with -
IP addresses
Network Type LAN/WAN/Auto (to optimize on backup packet sizes)
On-wire compression levels
Pluggable Storage Architecture - to make way for storage engines like encrypted, compressed, single instance, Amazon S3
Single Instance Store - powerful block level de-duplication/single instancing.
Storage Profiles - admin configures storage profiles with storage engine and capacity
User Profiles - binds network and storage profiles with user profiles/groups. A totally flexible way of choosing which user syncs using which network and what storage type.
Remote Data Deletion/Encryption - Theft protection
Live Reports - Statistic, Graphs and email Alerts.
b) Client Side
Bandwidth QoS - Choose backup bandwidth cap by percentage not value.
Search backed-up files - allow users to search files in backup by names
Auto restore/heal - automatically “heal” the laptop by restoring all files to default location
Live Console - View live logs and backup activity
.
Feedback Suggestion
Druvaa inSync team would like to hear your feedback/suggestions -
Recovery Point Objective (RPO) and Recovey Time Objective (RTO) are one of the most important parameters of a disaster recovery or data protection plan. These objectives guide the enterprises to choose a optimal data backup (rather restore) plan.
“Recovery Point Objective (RPO) describes the amount of data lost measured in time. Example: If the last available good copy of data upon an outage was from 18 hours ago, then the RPO would be 18 hours.”
In other words if the answer to question - “Up to what point in time could the data be recovered ?“.
“The Recovery Time Objective (RTO) is the duration of time and a service level within which a business process must be restored after a disaster in order to avoid unacceptable consequences associated with a break in continuity.
…
It should be noted that the RTO attaches to the business process and not the resources required to support the process.”
In another words its the answer to question - “How much time did you take to recovery after notification of business process disruption ?“
The RTO/RPO and the results of the Business Impact Analysis (BIA) in its entirety provide the basis for identifying and analyzing viable strategies for inclusion in the business continuity plan. Viable strategy options would include any which would enable resumption of a business process in a time frame at or near the RTO/RPO. This would include alternate or manual workaround procedures and would not necessarily require computer systems to meet the objectives.
There is always a gap between the actuals (RTA/RPA) and objectives introduced by various manual and automated steps to bring the business application up. These actuals can only be exposed by disaster and business disruption rehearsals.
Some Examples -
Traditional Backups
In traditional tape backups, if your backup plan takes 2 hours for a scheduled backup at 0600 hours and 1800 hours, then a primary site failure at 1400 hrs would leave you with an option to restore from 0600 hrs backup which means RPA of 8 hours and 2 hours RTA.
Continuous Replication
Replication provides higher RPO guarantees as the target system contains the mirrored image of the source. The RPA values depend upon how fast the changes are applied and if the replication is synchronous or asynchronous. RPO is dependent on the fact that how soon can the data on target/replicated site be made available to the application.
Druvaa Replicator
Druvaa Replicator is Continuous Data Protection and Replication (CDP-R) product which near-synchronously and non-disruptively replicates changes on prodhuction sever to target site and provides point-in-time snapshots for instant data access.
The partial synchronous replication ensures that the data is written to a local or remote cache (caching server) before it application can write locally. This ensures up to 5 sec RPO guarantees . CDP technology (still beta) enables up to 1024 snapshots (beta) at that target storage which helps the admin to access current or any past point-in-time consistent image of data instantly, ensuring under 2 sec RTO.
A nice article covered by Prasanna from Rediff.com. This interview was taken in early January 2008 when IAN announced its funding, just before the Proto.in 2008 event.
It was a great event with major focus on Indian entrepreneur ecosystem and innovation. Druvaa inSync got a good response, and helped us to understand the target audience better.