Archive for July, 2008
One of the major goals for inSync 2.1 release (due this week) is improved performance. With this new release users should be able to experience almost 30% speed improvements specially while syncing smaller files.
While working on inSync 2.1, team Druvaa rediscovered some tips and tricks for performance improvement -
Code Profilers
They can give you very quick insights into bottlenecks. It’s better to start at profiler output than from a hypothesis. Start working out a hypothesis only after profiler points out a bad function. We used
gprof2dot, which plots a nice graph from prof or gprof output. An example is shown below -
The graph shows top down hierarchy of functions, the percentage of time each function consumes, the number of calls etc. The percentage of time consumed by a function puts the performance optimization exercise in the right perspective. You don’t want to optimize a function if it contributes just 1% to the whole processing time. The general idea is to concentrate on function that consumes substantial time and is not supposed to do it. Once a few functions like this are optimized, you can go for another round of profiling.
Network Utilization
It’s not sufficient to just reduce the network bandwidth usage. It’s equally important to completely utilize your share of the network bandwidth.
Especially for non-interactive applications, the throughput matters much more than the latency. In a system that uses a single threaded client to issue RPC calls, thethroughput is governed by the latency. If one RPC call takes a long time, the throughput is low even though there is no bottleneck, persay. Looking at it in a different way, the network is not being utilized when the server is processing the call. A multi-threaded client improves network utilization and also throughput. Sometimes the cause for poor network performance could be outside your code. For example, the TCP default window size shows poor performance with high latency-high bandwidth network. Increasing TCP window size improves performance for such networks and so does the use of multiple TCP connections.
Caching
Caching frequently used data reduces the database queries or disk reads. Database queries and disk reads may not consume the CPU cycles but they add to the latency in a big way.
Muti-threading can work around latency but it comes with its own overheads in terms of code complexity and resource consumption. Simple caching avoids frequent trips to database/disk. Databases and operating systems maintain their own cache but the overheads of connecting to a database or issuing a system call are avoided at best.
Beware of stale caches and serialization issues.
Delayed Writes
Synchronous writes are slow. Some writes, for example activity logs, can be delayed indefinitely. Other writes that need persistance gurantees can be synced in batches than individually.
This holds true for both databases and file systems. It’s cheaper to do multiple inserts in one sqlite transaction than to create one transaction for each insert. On the file system side, you are better
off writing a few MBytes to a file, followed by a fsync than multiple few KBytes of writes and a fsync for each write.
Batch requests
A batch of 10 queries sent to a database works faster than 10 queries issued one after the other. Encoding the 10 queries as a pl/sql function works even better. This is primarily due to the socket communication overheads, specifically the latency involved in it.
For inSync 2.1, we found that the lowest hanging fruits were with the database and file system interactions. We sure plucked all of them
July 28th, 2008
A very nice article covered by Calley Nye (TechCrunch) | July 9, 2008
Druvaa, an enterprise data backup solutions provider, recently launched their new product, Druvaa inSync. InSync is an enterprise PC backup system, that boasts 10x faster data backup while reducing 90% of the bandwidth and storage utilization.
InSync is based around a new technology from Druvaa called SendUnique, that creates a fingerprint for each file to prevent duplication. When 80% of enterprise files and communication are common between users, de-duplication is an important consideration. SendUnique uses a Single Instance File System on the backup server, and compares the file fingerprints during the backup to the files already on the server to determine if it’s unique data. This works especially well for remote notebook computer users who have essentially the same data on multiple computers. There is also a timeline-based system to track changes to existing files in order to restore point-in-time data from any specific time.
Some other new solutions have attempted to fix these problems, but in very different ways. Microsoft Single Instance Service, and Data-Domain both attempt to filter out similar data through traditional single-instancing at backup server level. Newer solutions like Avamar (EMC) and PureDisk (Veritas, a division of Symantec), filter content at the backup-server level before sending data to a remote store. All of these solutions save bandwidth and storage, but the calculations weigh down the CPU and simple block checksum matching often fails to interpret common data.
Druvaa is a new startup based in India, and has received $250,000 in a seed round from the Indian Angel Network in January 2008. They recently were awarded the Indian Entrepreneurial Challenge 2008 Award, chosen out of 8 finalists and 140 applications.
July 23rd, 2008
Read an interesting article at McKinsey Quarterly, which discussed that the storage demand is increasing at a much higher rate than the falling storage pricing.
For backups, both the storage cost and the bandwidth availability are not able to catch up with increasing storage demand.
Take iPhone as example. The amount of storage in iPhone has increased from 4Gb to 16GB, but the media and bandwidth available for backup hasn’t changed much.
The problem is particularly challenging for remote and on-the-go backups. Its an interesting fact (i read somewhere) that almost 200 Million enterprise users are working remotely at any given time. And its a very good possibility that they wouldn’t have backed their data.
This is where Druvaa inSync comes in. The SendUnique technology, ensures that the duplicate data on enterprise devices is backed up just once, giving a clear 90% advantage for bandwidth and storage used for backup
Currently we ship the product for only notebooks, but soon plan to cover every device connected to enterprise network from PDA to Servers.
Any takers ?
July 19th, 2008
July 6, 2008
We are delighted to announce that Druvaa software has won the Indian Entrepreneurial Challenge 2008 award.
Eight teams were shortlisted based on their potential scale of the business, the strength of the team and sustainable differentiation in the business model of 140 business plans that were submitted.
These shortlisted applicants were invited to participate in the final round which saw them presenting their plans and ideas in detail to an eminent jury of the country’s leading entrepreneurs and corporate heads (which includes Pramod Bhasin (CEO & President, Genpact), Raman Roy (Chairman, Quatrro BPO Solutions), Saurabh Srivastava (President, TIE Delhi), Sanjeev Bikhchandani (co-founder and CEO of Naukri.com), Mahesh Murthy (Partner, Seedfund) and Alok Mittal (Managing Director, Canaan Partners India).
The competition is was organized by Tie (The Indus Entrepreneurs) and Canaan Ventures and was co-sponsored by CNBC-TV18, Business Today, Microsoft, ISB and NASSCOM. The complete event from shortlisting, mentoring and Finals would be covered by Business Today and CNBC (Network 18).
More details with photographs and a video coverage would be posted very soon.
July 7th, 2008