Networker

[Networker] disaster recovery test

2005-08-30 15:37:12
Subject: [Networker] disaster recovery test
From: Patricia Neal <neal AT CSEAINC DOT ORG>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Tue, 30 Aug 2005 15:28:07 -0400
Last month we simulated a disaster and went to Sungard in Phli for a test. Well 
it would have been a disaster for sure because we couldn't even get the 
software to start after a fresh installation on a windows 2000 server. we're 
running Networker 7.1. we back up Linux E.S and A.S operating systems overlayed 
with Oracle. After disabling 3 of the nic cards we could get the Networker to 
start, but very sluggish. I was able to do the MMRECOVER and then the client 
indexes. Once we started the savegroup recover the whole thing hung and used up 
all of the memory on the client box. It was not a pleasant experience. We've 
tested at our home site using our equipment, but once you're offsite with 
different brands of everything, it's alot different. We're going back in Nov. 
because we have some extra hours left. Legato was trying to blame it on the 
network. We didn't get a chance to test ftp'ing files back and forth like 
Legato wanted us to because we weren't set up for FTP. 
During DR:
1. Using Windows 2000/SPK4, fresh OS install
2. Fresh NetWorker installation
3. NetWorker services would not come up or would take a very long time to come 
up.
4. Networker administrator would not come up, on fresh install with nothing in 
media database and on indexes.
5. Same configuration and problems (1)-(4) in 2nd computer at DR site.
6. hosts file was populated to fix probable name resolution
7. Media database finally recovered, but again, services would not come up. It 
was already Noon time by then.
6. It was found that server used had 4 nics, probably 3 hooked up to network; 
disabled 3 nics, and added entry in hosts file.
7. There were problems by jbconfig trying to configure jukebox (3 times it was 
tried). (2nd CSE took over).  
   That was fixed and inventory was done.
8. Recoverying of the index was a problem because the server had the name in 
All caps.  Renamed and rebooted.  Index recovery succeeded.
   A 3gb recovery of 1st client, took 40 mins.
9. To recover the 2nd client, we had to use the 2nd bootstrap, and undo the 
previous mediadb.  It was 6 pm.
10. Recovered the 2nd media database.
11. The were connectivity problems from 2nd client, a Linux box.  NW recover 
would not connect.  Ping would not work.
12. Verified hosts files, and finally connected using -s hostname.
13. A recovery was started, and other users involved in recovery explained some 
of the directories being recovered were not needed,
    as it was trying to recover > 200GB, as opposed to 80gb.
14. Finally, after finding out saveset ids, and running preliminary tests, 2 
parallel saveset recoveries were done
    to recover 100GB (80GB and 20Gb) worth of data.  This was started at around 
9pm.
15. At 6:20 am, the next day, of 20GB only 10GB had been recovered in 9 hours. 
(That's an average of only 1gb per hour).
16. Recovery client, with 4 processor was using 100% of CPU.

Patricia Neal 
Production Support Specialist
C.S.E.A.

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the
body of the email. Please write to networker-request AT listserv.temple DOT edu 
if you have any problems
wit this list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>
  • [Networker] disaster recovery test, Patricia Neal <=