Stephaniebalaouras Most customers were just starting to get their arms around all the different deduplication approaches available in disk appliances and VTLs from vendors when backup software vendors and even non-storage related vendors began announcing deduplication capabilities.

We all know the appliance and VTL vendors offering dedupe, including COPAN Systems, Data Domain, EMC, Exagrid, FalconStor, HP, IBM (Diligent), NEC, NetApp, Quantum, Sepaton, Sun StorageTek, and others.

And there were existing backup software vendors, including EMC Avamar, Symantec NetBackup PureDisk, and many online backup software vendors, like Asigra. Now add CommVault Simpana 8.0 and IBM Tivoli Storage Manager (TSM) V6.

But just because the deduplication is performed in software doesn't automatically make it source deduplication. With source deduplication, the deduplication is performed on the client (the server or desktop/laptop that you want to backup) before its transmitted over the LAN. Even though it's performed in software, IBM TSM and CommVault Simpana provide target deduplication. The deduplication is not performed on the client — it's performed on the media server and stores the data in deduplicated form on whatever disk target you have. So source vs. target is not what does the deduplication but rather where deduplication is performed. You can expect other backup software vendors to add deduplication capabilities in the future.

Any software vendor that manages content will build in dedupe: Ocarina is a good example of this. So will software vendors that manage storage capacity, particularly those vendors whose offerings have a built in volume manager and filesystem. VMware is a good example here. The company introduced deduplication capabilities in vSphere (v4 of of Virtual Infrastructure).

Plus, vendors such as NetApp offer deduplication in their production storage systems. NetApp customers have seen good dedupe ratios in virtual environments (server and desktop) and file shares. EMC has introduced file level deduplication capabilities in its Celerra offering. Neither vendor charges for dedupe. You can bet more storage vendors will add file and eventally block-level deduplication functionality.

AND there are completely new entrants like Riverbed. Surprised? It makes sense when you realize that WAN optimization vendors like Riverbed have been deduplicating data all along – that's partly how they're able to reduce bandwidth requirements for workloads like remote backup and replication. They simply "rehydrate" the data before its written to disk. Now imagine a WAN optimization appliance as a gateway to a NAS system deduplicating data inline and storing the data in deduplicated form, not rehydrating it.

IT professionals need to know that deduplication will be available everywhere in the environment, in software and hardware, and in production environments, not just backup and archiving. There will be pros and cons to each approach and it's likely that you will leverage multiple approaches in your environment.

I also expect that since deduplication exists everywhere and is quickly becoming a standard feature of software and disk systems, vendors soon won't be able to charge a significant premium for it, if any premium at all.

I'm in the process of finishing up a report on the state of deduplication and I'd welcome any comments on the subject. Are you still struggling to decide between approaches? If you've already dedplyed dedupe, is it living up to the hype? Are you using dedupe to reduce bandwidth requirements between data centers and remote offices?

By Stephanie Balaouras

Check out Stephanie's research