Understanding Data Storage and De-Deduplicating

Lisa Hoover 1 Tallied Votes 327 Views Share

When it comes to enterprise computer technology, everything seems to be happening in the cloud these days. Offsite storage was one of the first iterations of cloud computing and is continuing to gain traction as a smart way for companies to safely retain data.

You're probably familiar with the typical terminology surrounding online backup services, but if you pay by the byte, de-duplicating -- or de-duping, for short -- is a word you want to know.

De-duplicating means removing redundant information from data before storage. In other words, if you're going to transfer your cumulative company data to offsite storage every night, there's no point in re-sending files that haven't changed since the last transmission. Not only does de-duplicating save data transit time, it also saves disc space (and possibly money) by not keeping 20 copies of the same customer database file.

Indeed, analysts at The 451 Group predicted de-duplicating would "change the nature of data protection... [because] it allows organizations to better meet the challenges caused by data growth, the increasing compliance and legal discovery burden, zero tolerance for data loss and application downtime, and the need to locate and recover the right data as quickly as possible."

While interviewing Melissa Morales, the director of marketing communications at Diligent Technologies, for the now defunct ITManagersJournal.com, she told me that it's difficult to estimate how much time and money a business can save by deduping its data, but there are some rules of thumb. Morales says, "Customers need to identify the amount of data that gets backed up, the procedures that they take with their backups (i.e., incrementals during the week and fulls on weekends, or something different), what is their retention period for the data (i.e., do they need to keep it six weeks, six days, or six months?)."

Companies considering deduplicating should talk with several vendors before making a final selection. There are several key questions to ask:

  • Will your product scale to my system?
  • How easy is disaster recovery?
  • Will this work with my existing virtualization products?
  • Does it support other features like migrating data to tape?
  • How will your product impact my system performance?
  • How do you ensure that only the redundant data is stripped?
  • How is data processed? Inline processing allows follow-up maintenance like indexing and preparation for the next backup to be scheduled at a time when it won't interfere with workflow or contend for system resources. Post processing occurs after backup and is used by many businesses with an ample maintenance widow or smaller workloads.

Does your data storage solution have a de-duping component? Share your experiences in the comments.

markanderson_1 0 Newbie Poster

Data storage essentially means that files and documents are recorded digitally and saved in a storage system for future use. Storage systems may rely on electromagnetic, optical or other media to preserve and restore the data if needed. Data storage makes it easy to back up files for safekeeping and quick recovery in the event of an unexpected computing crash or cyberattack.

Data storage can occur on physical hard drives, disk drives, USB drives or virtually in the cloud. The important thing is that your files are backed up and easily available should your systems ever crash beyond repair. Some of the most important factors to consider in terms of data storage are reliability, how robust the security features tend to be and the cost to implement and maintain the infrastructure. Browsing through different data storage solutions and applications can help you arrive at the choice that is the best fit for your business' needs.

De-duplication (“de-duping”) is the process of comparing electronic records based on their content and characteristics and removing duplicate records from the data set so that only one instance of an electronic record is produced when there two or more identical copies. De-duplicating a data set is a smart way to reduce volume and increase efficiencies of review.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.