Networker

Re: [Networker] Testing recoveries

2008-08-27 18:23:08
Subject: Re: [Networker] Testing recoveries
From: Preston de Guise <enterprise.backup AT GMAIL DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Thu, 28 Aug 2008 08:19:47 +1000
Hi Patti,

On 28/08/2008, at 7:58 AM, Clark, Patti wrote:

I've been rummaging through the threads on this list and there have been
many that have talked about D&R and actually performing recoveries.
I've performed various recoveries as requested without much issue.

I have a new requirement to document and perform "practice" recoveries
to demonstrate the viability of backups. This is not specifically a D&R exercise as much as an assurance of good backups. D&R is a separate set
of documentation and exercise.  So, with Linux and Windows clients,
weekly fulls and daily incrementals, what is a good frequency and set of recoveries to go by? Biannually, quarterly, monthly? Alternate between
Linux and Win clients?

My first, best suggestion to make in relation to testing is: keep it random.

Where tests quite often fail to be of use is where they're scheduled in such a way that you're always testing the same thing. For instance, always running tests after a full backup is convenient, because there's fewer savesets to work with, but it doesn't cover (in daily incrementals with weekly fulls) 6 out of 7 possible recovery scenarios.

There's also the type of test - is it going to be a simple file based recovery, an entire filesystem, or an entire machine? Based on what you've said above I'll assume that full machine testing is a separate activity; that does however lead to a secondary question - are databases going to be included in the testing?

Normally I'd recommend the following:

(a) Keep a test register
(b) Aim to test each type of operating system at least once a month
(c) Don't test the same client(s) each time
(d) Test both standard recovery and directed recovery where possible
(e) Draw hostnames out of a hat (or use a random hostname generator that correctly seeds itself then queries nsradmin if you like to be more technical about it) to ensure randomness (f) If a test fails, it obviously becomes a high priority issue - need to determine why it failed (e.g., just a media problem? deeper/more serious?) (g) Test both originals and clones - you never know when you'll need to recover from one or the other (h) If using mixed media - e.g., combination of disk backup and tape, plan some tests that recover across both formats

One of the good things about NetWorker is you can automate the test framework - e.g., write a script that accepts a hostname and a directory or set of files to recover, then it goes off and recovers it (e.g., to an alternate location) and reports the output from the recovery. Obviously there's a bit of work in this and you have to treat any "unknown" output as an error condition so you don't miss unanticipated failures, but it can help make the test process a little less annoying.

I have a sample acceptance test plan document on my website - http://www.enterprisesystemsbackup.com - under the "Resources" section. While it's designed primarily for use in testing a new implementation, it can be a starting point for coming up with a test register and formally documenting what sorts of tests are to be performed.

Cheers,

Preston.


--
Preston de Guise


"Enterprise Systems Backup and Recovery: A Corporate Insurance Policy", due out September 17 2008:

http://www.crcpress.com/shopping_cart/products/product_detail.asp?sku=AU6396&isbn=9781420076394&parent_id=&pc=

http://www.enterprisesystemsbackup.com

To sign off this list, send email to listserv AT listserv.temple DOT edu and type 
"signoff networker" in the body of the email. Please write to networker-request 
AT listserv.temple DOT edu if you have any problems with this list. You can access the 
archives at http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>