Networker

[Networker] savepnpc weirdness (7.5.1) - multiple runs per backup

2009-11-09 16:30:17
Subject: [Networker] savepnpc weirdness (7.5.1) - multiple runs per backup
From: Len Philpot <Len.Philpot AT CLECO DOT COM>
To: NETWORKER AT LISTSERV.TEMPLE DOT EDU
Date: Mon, 9 Nov 2009 15:27:51 -0600
Anyone seen a behavior from savepnpc where it runs multiple times for a 
given backup? Note I'm talking about savepnpc, not savepc, which is 
designed (AFAIK) to run once per saveset.

We have a couple of (Solaris) database clients that use savepnpc scripts 
to (call other scripts and) shutdown/restart their DBs before and after 
backups, both on 7.2 and 7.5.1. Works great on 7.2, been doing it for 
several years. On 7.5.1, we've seen where apparently Networker runs the 
shutdown and restart scripts numerous times in quick succession before 
finally seeing there's something backing up and thus waiting to run the 
final restart when the backup completes. Of course, these backups are no 
good since the DBs were bouncing like a ball during them.

Here are a few key events, for example, from a group that ran at 2 am...

>From the /nsr/logs/savepnpc.log on the client:

11/09/09 02:00:00 preclntsave: Starting up the precmds.
11/09/09 02:00:43 preclntsave: All command(s) ran successfully.
11/09/09 02:01:44 pstclntsave: Client is not active in the worklist.
11/09/09 02:01:44 pstclntsave: All savesets on the worklist are done.
11/09/09 02:01:44 pstclntsave: Starting up the pstcmds.
11/09/09 02:02:28 pstclntsave: All command(s) ran successfully.
11/09/09 02:02:28 pstclntsave: Exited.

The above repeats six more times, bouncing the DB each time, until 
finally...

11/09/09 02:39:37 preclntsave: Starting up the precmds.
11/09/09 02:40:42 preclntsave: All command(s) ran successfully.
11/09/09 02:41:42 pstclntsave: Client is still active in the worklist.
11/09/09 02:41:42 pstclntsave: Worklist not complete. Some saves are still 
running.
11/09/09 02:42:42 pstclntsave: Client is still active in the worklist.

The above repeats seven more times during the backup, presumably a normal 
polling process from Networker, until...

11/09/09 02:49:42 pstclntsave: Savegroup is not running.
11/09/09 02:49:42 pstclntsave: All savesets on the worklist are done.
11/09/09 02:49:42 pstclntsave: Starting up the pstcmds.
11/09/09 02:50:26 pstclntsave: All command(s) ran successfully.
11/09/09 02:50:26 pstclntsave: Exited.


And, from /nsr/logs/daemon.log on the server:

11/09/09 02:00:50  nsrd client1:/dir1 saving to pool 'diskpool' 
(diskvol.001)
11/09/09 02:04:49  nsrd 913 MB are saved to pool 'diskpool' (diskvol.001) 
of client1:/dir1
11/09/09 02:04:49  savegrp job (1792176) host: client1 savepoint: /dir1 
had WARNING indication(s) at completion.
11/09/09 02:05:37  nsrd client1:/dir2 saving to pool 'diskpool' 
(diskvol.001)
11/09/09 02:11:45  nsrd 1670 MB are saved to pool 'diskpool' (diskvol.001) 
of client1:/dir2

...etc. until the group completes.

We originally wondered if savepnpc was running the pstcmd too quickly, 
before Networker had built the worklist, thinking it was already done. But 
the daemon.log shows that the client was actually backing up within a 
minute of the group starting, so... Maybe the worklist is built concurrent 
with whatever savepnpc is triggering, but it just (hopefully!) waits until 
the precmd is done before actually starting the backup. In the case above, 
backups started 7 seconds after the first precmd completed, with the first 
pstcmd running about a minute later.

Any ideas??

Thanks.

To sign off this list, send email to listserv AT listserv.temple DOT edu and 
type "signoff networker" in the body of the email. Please write to 
networker-request AT listserv.temple DOT edu if you have any problems with this 
list. You can access the archives at 
http://listserv.temple.edu/archives/networker.html or
via RSS at http://listserv.temple.edu/cgi-bin/wa?RSS&L=NETWORKER

<Prev in Thread] Current Thread [Next in Thread>