Re: [ADSM-L] Data Deduplication
2007-08-27 16:17:02
>As others have noted, different vendors dedup at different levels of
>granularity.
I think I'd put it slightly differently. I'd say that they each
approach it differently. Those different approaches may have advantages
and disadvantages with different data types.
>When I spoke to Diligent at the Gartner conference over
>a year ago, they were very tight-lipped about their actual
>algorithm.
The patent was filed. It's not that secret. ;) They are quite
different in their approach, and it's a little different to grock. But
based on what I know about their approach, the scenario that started the
discussion may indeed be a limitation. (Or all the vendors may have
this limitation; I have some questions out to them.)
>The[y] would, however, state that they were able to dedup
>parts of two files that had similar data, but were not
>identical. I.e., if data was inserted at the beginning of the file,
>some parts of the end of the file could still be deduped. Neat trick
>if it's true.
Any de-dupe vendor is able to claim that. If it wasn't true, they
wouldn't see the de-dupe rates they're seeing. They can also identify
blocks that are common between a file in the file system and the same
file emailed via Exchange.
>Other vendors dedup at the file or block (or chunk) level.
If a vendor doesn't do subfile de-dupe, then they're not a de-dupe
vendor; they're a CAS vendor. File-level de-dupe is CAS (i.e. Centerra,
Archivas), and the de-dupe is not really pitched as the main feature.
It's about using the signature as a way to provide immutability of data
stored in the CAS array.
>I've not been able to gather much more detail about the specific
>dedup algorithms, but hope to get some more info this fall, as take a
>closer look at these products. If anyone has more details, I'd love
>to hear them.
I wrote this article that may help: http://tinyurl.com/3588fb . I also
blog about de-dupe quite a bit at www.backupcentral.com.
|
<Prev in Thread] |
Current Thread |
[Next in Thread>
|
- Re: [ADSM-L] Data Deduplication, (continued)
- Re: [ADSM-L] Data Deduplication, Wanda Prather
- Re: [ADSM-L] Data Deduplication, Curtis Preston
- Re: [ADSM-L] Data Deduplication, Ben Bullock
- Re: [ADSM-L] Data Deduplication, Charles A Hart
- Re: [ADSM-L] Data Deduplication, Curtis Preston
- Re: [ADSM-L] Data Deduplication, Charles A Hart
- Re: [ADSM-L] Data Deduplication, Curtis Preston
- Message not available
- Re: [ADSM-L] Data Deduplication, Paul Zarnowski
- Re: [ADSM-L] Data Deduplication,
Curtis Preston <=
- Message not available
- Re: [ADSM-L] Data Deduplication, Paul Zarnowski
- Re: [ADSM-L] Data Deduplication, Curtis Preston
- Re: [ADSM-L] Data Deduplication, David Longo
- Re: [ADSM-L] Data Deduplication, Charles A Hart
- Re: [ADSM-L] Data Deduplication, Curtis Preston
- Re: [ADSM-L] Data Deduplication, Kelly Lipp
- Message not available
- Re: [ADSM-L] Data Deduplication, Paul Zarnowski
- Re: [ADSM-L] Data Deduplication, Wanda Prather
- Re: [ADSM-L] Data Deduplication, Curtis Preston
- Re: [ADSM-L] Data Deduplication, Allen S. Rout
|
|
|