BackupPC-users

Re: [BackupPC-users] backup keeps failing in exactly the same place

2008-04-21 08:32:06
Subject: Re: [BackupPC-users] backup keeps failing in exactly the same place
From: Jonathan Dill <jonathan AT nerds DOT net>
To: Benjamin Staffin <benley AT zoiks DOT net>
Date: Mon, 21 Apr 2008 08:31:51 -0400
On Apr 20, 2008, at 5:39 PM, Benjamin Staffin wrote:

> I've run into a frustrating problem that affects only one of my backup
> clients.  This particular client is configured as though it is two
> hosts to backuppc, such that part of its contents go to a compressed
> pool and the rest goes to an uncompressed pool.  Both of the backups
> have started failing for reasons that are compeltely eluding me.
> Here's a slightly-trimmed backup log from one of the failures: (kindly
> ignore the fact that I'm backing up lame anime videos here...)
>
> [begin log output]
>
> full backup started for directory /; updating partial #141
> Running: /usr/bin/ssh -c blowfish -q -x -l root <hostname>
> /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group
> -D --links --hard-links --times --block-size=2048 --recursive
> --checksum-seed=32761 --ignore-times . /
> Xfer PIDs are now 28641
> Got remote protocol 29
> Negotiated protocol version 28
> Checksum caching enabled (checksumSeed = 32761)
> Sent include: /export
> Sent include: /export/movies
> Sent include: /export/Music
> Sent include: /export/roms
> Sent exclude: /*
> Sent exclude: /export/*
> Xfer PIDs are now 28641,28642
>  create d 755       0/0        1024 .
>  create d 755       0/0         512 export
>  create d 755    1000/0          16 export/movies
> <snip>
>  pool     744   101/101   183818240 export/movies/Anime/Aishiteruze
> Baby/[Lunar] Aishiteru ze Baby - 16 [D32F0AE9].avi
>  pool     744   101/101   183558144 export/movies/Anime/Aishiteruze
> Baby/[Lunar] Aishiteru ze Baby - 17 [DE935E39].avi
>  pool     744   101/101   183734272 export/movies/Anime/Aishiteruze
> Baby/[Lunar] Aishiteru ze Baby - 18 [9713D4BC].avi
>  pool     744   101/101   183767040 export/movies/Anime/Aishiteruze
> Baby/[Lunar] Aishiteru ze Baby - 19 [67A452A4].avi
>  pool     744   101/101   183699456 export/movies/Anime/Aishiteruze
> Baby/[Lunar] Aishiteru ze Baby - 20 [3A7810A3].avi
> Read EOF: Connection reset by peer
> Tried again: got 0 bytes
> finish: removing in-process file export/movies/Anime/Aishiteruze
> Baby/[Lunar] Aishiteru ze Baby - 21 [F48F3C55].avi
> Child is aborting
> Can't write 33812 bytes to socket
> Done: 12 files, 918624514 bytes
> Got fatal error during xfer (aborted by signal=PIPE)
> Backup aborted by user signal
>
> [end log output]
>
> I've tried cranking XferLogLevel up to crazy numbers, but the
> resulting output doesn't include anything that appears to be of
> interest here.  My initial thought was that there must be something
> wrong with the files I'm trying to back up - maybe there's some crazy
> IO error happening - but that doesn't seem to be the case.  Manually
> rsyncing the same data over the same network link works fine, and if I
> move the "Aishiteruze Baby" directory out of the way, the backup
> transfers some other files (more lame anime vids, naturally) and fails
> after a similar number of files and bytes transferred.
>
> The other backuppc-configured client that points at the host shows
> similar failures in its XferLog:
>
> full backup started for directory /; updating partial #157
> Running: /usr/bin/ssh -c blowfish -q -x -l root <same host as above>
> /usr/bin/rsync --server --sender --numeric-ids --perms --owner --group
> -D --links --hard-links --times --block-size=2048 --recursive
> --checksum-seed=32761 --ignore-times . /
> Xfer PIDs are now 6963
> Got remote protocol 29
> Negotiated protocol version 28
> Checksum caching enabled (checksumSeed = 32761)
> Sent exclude: /afs
> Sent exclude: /backup
> Sent exclude: /mnt
> <snip>
> Xfer PIDs are now 6963,7001
>  create d 755       0/0        1024 .
>  same     600       0/0        3899 .bash_history
>  create d 755       0/0         512 .ssh
>  same     644       0/0         394 .ssh/authorized_keys
> <snip>
>  create d1777       0/0           8 export/VirtualMachines
>  create d 755 1000/1000          13 export/VirtualMachines/Migrainepad
>  pool     600 1000/1000   239730688
> export/VirtualMachines/Migrainepad/Ubuntu-migrainepad-s001.vmdk
>  pool     600 1000/1000   101777408
> export/VirtualMachines/Migrainepad/Ubuntu-migrainepad-s002.vmdk
> Can't write 33792 bytes to socket
> exiting after signal ALRM

Are the VMware virtual machines that use those virtual disks in use?   
Maybe could be issue with file locking, or the files are changing  
while BackupPC is trying to back them up.  You might want to try  
backing up the VM from "inside" i.e. set it up as just another client  
if the VM is running most of the time.

Also, you may want to check syslog on the client (not sure of exact  
file for Open Solaris).  Maybe rsync is hitting some per process  
limit, or both separate BackupPC are running simultaneously and  
causing out of memory or some other kind of problem.  If the network  
is mostly secure, you might try rsyncd without ssh, or try using a  
different encryption algorithm, maybe there is something weird with  
Open Solaris implementation of ssh, but it should be possible to find  
out if that is the case by testing ssh and rsync with the same options  
on the command line.

Lastly, try running rsync through ssh from the server to the client on  
the command line and see what happens, the shell has to be "clean"  
without a lot of crap output or rsync gets confused, for example if  
you do cutesy Escape in the shell to change the terminal window title  
or text color, although I think if that is causing a problem it should  
happen immediately rather than well into the backup.  In the shell  
rc / profile, you may want to put those kind of things into "if then"  
type clause and set it up so it skips those items when you are running  
rsync rather than actually logging in to an interactive session.

Anyway, much of this is speculation not having tested Open Solaris,  
but I'd still say look at the OS log files and something like "sar"  
maybe there are some clues at the OS level outside of BackupPC.

>
>
> Note that this one doesn't print quite the same set of messages - it
> doesn't print the "Done: ## files, ###### bytes" line.
>
> I'm at a loss for figuring out how to track this down.  I've tried
> running BackupPC_dump by hand with variously high levels of verbosity,
> but haven't been able to find anything.  What should I do next?
>
> The backup client in question is an OpenSolaris SXCE B82 box with
> rsync version 2.6.9 and ssh version "Sun_SSH_1.2" as distributed by
> Sun.  Yes, this problem most likely relates to some craziness caused
> by my experimental OS, and I acknowledge that it's probably not even
> backuppc's fault.  I'd still love some help tracking it down :-)
>
> Many thanks,
>
> - Benjamin Staffin
>
> -------------------------------------------------------------------------
> This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
> Don't miss this year's exciting event. There's still time to save  
> $100.
> Use priority code J8TL2D2.
> http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> _______________________________________________
> BackupPC-users mailing list
> BackupPC-users AT lists.sourceforge DOT net
> List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
> Wiki:    http://backuppc.wiki.sourceforge.net
> Project: http://backuppc.sourceforge.net/


-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
Don't miss this year's exciting event. There's still time to save $100. 
Use priority code J8TL2D2. 
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
BackupPC-users mailing list
BackupPC-users AT lists.sourceforge DOT net
List:    https://lists.sourceforge.net/lists/listinfo/backuppc-users
Wiki:    http://backuppc.wiki.sourceforge.net
Project: http://backuppc.sourceforge.net/

<Prev in Thread] Current Thread [Next in Thread>