When it comes to enterprise computer technology, everything seems to be happening in the cloud these days. Offsite storage was one of the first iterations of cloud computing and is continuing to gain traction as a smart way for companies to safely retain data.

You're probably familiar with the typical terminology surrounding online backup services, but if you pay by the byte, de-duplicating -- or de-duping, for short -- is a word you want to know.

De-duplicating means removing redundant information from data before storage. In other words, if you're going to transfer your cumulative company data to offsite storage every night, there's no point in re-sending files that haven't changed since the last transmission. Not only does de-duplicating save data transit time, it also saves disc space (and possibly money) by not keeping 20 copies of the same customer database file.

Indeed, analysts at The 451 Group predicted de-duplicating would "change the nature of data protection... [because] it allows organizations to better meet the challenges caused by data growth, the increasing compliance and legal discovery burden, zero tolerance for data loss and application downtime, and the need to locate and recover the right data as quickly as possible."

While interviewing Melissa Morales, the director of marketing communications at Diligent Technologies, for the now defunct ITManagersJournal.com, she told me that it's difficult to estimate how much time and money a business can save by deduping its data, but there are some rules of thumb. Morales says, "Customers need to identify the amount of data that gets backed up, the procedures that they take with their backups (i.e., incrementals during the week and fulls on weekends, or something different), what is their retention period for the data (i.e., do they need to keep it six weeks, six days, or six months?)."

Companies considering deduplicating should talk with several vendors before making a final selection. There are several key questions to ask:

  • Will your product scale to my system?
  • How easy is disaster recovery?
  • Will this work with my existing virtualization products?
  • Does it support other features like migrating data to tape?
  • How will your product impact my system performance?
  • How do you ensure that only the redundant data is stripped?
  • How is data processed? Inline processing allows follow-up maintenance like indexing and preparation for the next backup to be scheduled at a time when it won't interfere with workflow or contend for system resources. Post processing occurs after backup and is used by many businesses with an ample maintenance widow or smaller workloads.

Does your data storage solution have a de-duping component? Share your experiences in the comments.