ADSM-L

Re: How do you verify the Completion and A ccuracy of Backups and Restores?

2006-11-08 11:45:20
Subject: Re: How do you verify the Completion and A ccuracy of Backups and Restores?
From: Mark Stapleton <mark.s AT EVOLVINGSOL DOT COM>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 8 Nov 2006 10:44:22 -0600
From: ADSM: Dist Stor Manager [mailto:ADSM-L AT VM.MARIST DOT EDU] On Behalf Of
Wesley Smith
>       My problem is that they (that sister agency) do not seem to have
>a reliable way of verifying that all backups have been properly
>completed. They don't even seem to have a way to know that all files
>(that need to be backed up) are being backed up.  I've seen the reports
>that get generated during the backup process and I am definitely
>unimpressed.  Backups start and backups complete.  There doesn't seem
to
>be anything that says how many rows are copied or how large the files
>are or anything else that could be used for verifying the accuracy of
>the backups.  They tell my folks that we should trust Tivoli is doing
>the job correctly.  Trust is the problem....

Let's start there. When you look at the dsmsched.log file, that contains
a record of all scheduled backups and their outcomes, you should have a
record of what files are backed up, the size of the files, and the
timestamps give an idea of how long it took to back each file up. (This
is assuming that the QUIET feature is not present in the client option
file or the client option set designated for that TSM client.) If you're
using the specialized TSM agents for databases or mail apps, the
scheduled backup logs containing fairly granular information about
individual file backups. What more do you need?

>       We have needed to have restores done on just a few databases in
>the past and the restores were not complete and up to date.  In each
>case we were able to rebuild the data using logs maintained within the
>applications but that should not have been necessary.  Each recovery
was
>done at a point after a backup and before additional processing had
been
>done within the apps so they should have been complete.  In each case,
>the folks who run Tivoli for us were able to track down and show that
>problems had occurred during the processing of the backups.  They did
>this through circumstantial evidence and in each case once again said
>that they have no way of verifying that the backups are actually good.
>I hear a lot about the difficulty of trying to write a program to
>process the Tivoli log files.
>
>       I think I'm at wit's end with these folks and the product.  I
>know that the people are competent and I suspect that the product (like
>other things available from IBM) really is weak on the reporting and
>verification issue. 

While TSM itself does lack some reporting functionalities (particularly
when it comes to client backups and restores), I have to say this:

On every properly maintained and monitored TSM system I have touched in
the 12 years I've adminstered and engineered this product, I have
*never* lost a single byte of information. Period. If you cannot do a
restore because of "lost" data, something is happnening during backups
that is not being caught at the time of the backups.

>I'm hoping that someone out there in the Big Wide
>World has already solved this problem with an in-house or third-party
>solution.  Sorry for being so long winded.  Any ideas...?

I think what is needed here is greater familiarity with TSM and its
proper administration. Proper verification of good backups is best done
by regular DR practice of planned bare-metal restores of chosen
machines. If you can take data backed up by TSM and restore a given
machine in a DR environment, and the machine comes back properly, you
know the job is being done right. If it doesn't, *then* you dig into
*why*.

BTW, there are responses to this thread advocating ServerGraph and
Bocada for reporting and monitoring. Be aware that those applications do
a fine job of monitoring server operations. (Well, ServerGraph does,
anyway.) Their reporting, however, is not granular enough to indicate
whether a given file is being backed up properly.

--
Mark Stapleton (mark.s AT evolvingsol DOT com)
Senior TSM consultant