Yeah, I originally thought that this might be a network problem myself.
However I have checked the network settings on the Sun systems and the Cisco
switches in-between. I'm even forcing a 100FDX on the switch and system just
to be safe (auto negotiation never works, regardless of what the vendors
say)
Seems that this is a MPX thing. I did some further testing and backing up
systems without multiplexing enabled, and the problem goes away. The rmt/1
device stops with the 100% busy and 0 kw/s, client full backups drop back
down into the 15 minute range...
On 3/23/05 10:56 AM, "Jorgensen, Bill" <Bill_Jorgensen AT csgsystems DOT com>
wrote:
> Jeff:
>
> A few things to consider (assuming a Sun sever as the NBU master):
>
> 1.) Are you aware of anything that has changed on your NBU server?
> 2.) Are you aware of anything that has changed with your network?
> (Providing you are doing Ethernet-based backups. If not, what about the
> SAN?)
> 3.) Are you aware of any changes to the policies?
>
> If no to the above try the following:
>
> 1.) Find out what Veritas recommends for your environment for these two
> variables:
> NUMBER_DATA_BUFFERS
> SIZE_DATA_BUFFERS
> These are found in /usr/openv/netbackup/db/config. They may not give
> them to you if you open a ticket with the solution center (Professional
> Services). Ask around if they do not.
>
> 2.) Check the network driver settings for a few things. This depends on
> the network type you are using. 100Mb-switched, 10Mb-switched, etc.
>
> root[prod-backup:/]# ndd -get /dev/qfe adv_autoneg_cap
> 1
> root[prod-backup:/]# ndd -get /dev/qfe adv_100hdx_cap
> 1
> root[prod-backup:/]# ndd -get /dev/qfe adv_100fdx_cap
> 1
> What the output above is stating is that the qfe driver is set at 100
> half and full duplex, and autonegotiate. Once you know how the network
> driver is configured go to your network guys and ask them to see how the
> port on the switch is configured (unless you are the network guy). If
> the port is NOT set to 100-full or autonegotiate have them set it
> accordingly.
>
> 3.) Reseat the RJ-45 connectors for the physical connections.
>
> These are some things that have bit us in the past.
>
> Good luck,
>
> Bill
>
> --------------------------------------------------------
> Bill Jorgensen
> CSG Systems, Inc.
> (w) 303.200.3282
> (p) 303.947.9733
> --------------------------------------------------------
> UNIX... Spoken with hushed and
> reverent tones.
> --------------------------------------------------------
>
> -----Original Message-----
> From: veritas-bu-admin AT mailman.eng.auburn DOT edu
> [mailto:veritas-bu-admin AT mailman.eng.auburn DOT edu] On Behalf Of Jeff
> McCombs
> Sent: Wednesday, March 23, 2005 6:51 AM
> To: veritas-bu AT mailman.eng.auburn DOT edu
> Subject: [Veritas-bu] Backups slow to a crawl
>
> Gurus,
>
> NB 5.0 MP4, single combination media/master server, Solaris 9.
> Overland
> Neo 2000 26-slot 2 drive DLT.
>
> I'm noticing that for some reason or another, all of my client
> backups
> have slowed to a _crawl_. A _cumulative_ (!) backup of local disk on a
> Sun
> V100 is taking somewhere on the order of 2 hours at this point, and with
> over 40 systems, I'm blowing past my window consistently.
>
> I'm not quite sure what's going on here, but as I sit and watch the
> output from 'iostat', I'm noticing that rmt/1 (the 2nd drive in the Neo)
> is
> fluxuating between 100% busy, with kw/s at close to zero, and busy @
> 1-15%
> and kw/s up into the 1000's.
>
> rmt/0 seems to be fine, kw/s sits consistently up in the 1.8-2K
> range,
> while busy is anywhere from 2% - 30% on average. My other disks aren't
> working hard, CPU isn't loaded and I've got plenty of memory.
>
> The policy I'm using allows for multiple datastreams, no limits on
> jobs,
> and most schedules allow for an MPX of 2. I'm backing up
> ALL_LOCAL_DRIVES on
> all clients, and I'm not using any NEW_STREAM directives. I'm not seeing
> any
> errors on the media either.
>
> Can anyone shed some light on what might be happening here? Am I
> looking
> at a drive that might be having some problems, or am I barking up the
> wrong
> tree, and it's something else entirely?
>
> A small sample of iostat output covering the affected devices is
> below.
>
> sample (extra disks removed from putput);
> root@backup(pts/1):~# iostat -nx 1 100
> extended device statistics
> r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
> 0.0 4.1 0.0 252.2 0.0 0.0 0.0 5.9 0 2 rmt/0
> 0.0 4.6 0.0 278.4 0.0 0.1 0.0 27.3 0 12 rmt/1
>
> 0.0 4.1 0.0 252.3 0.0 0.0 0.0 5.9 0 2 rmt/0
> 0.0 4.6 0.0 278.4 0.0 0.1 0.0 27.3 0 12 rmt/1
>
> 0.0 33.0 0.0 2076.4 0.0 0.2 0.0 5.8 0 19 rmt/0
> 0.0 2.0 0.0 125.8 0.0 1.0 0.0 490.0 0 98 rmt/1
>
> 0.0 38.0 0.0 2394.0 0.0 0.2 0.0 5.4 0 21 rmt/0
> 0.0 8.0 0.0 504.0 0.0 1.0 0.0 124.9 0 100 rmt/1
>
> 0.0 27.0 0.0 1701.1 0.0 0.2 0.0 6.5 0 17 rmt/0
> 0.0 2.0 0.0 126.0 0.0 1.0 0.0 499.9 0 100 rmt/1
>
> 0.0 33.0 0.0 2078.9 0.0 0.2 0.0 5.3 0 18 rmt/0
> 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 rmt/1
>
> 0.0 16.0 0.0 1008.0 0.0 0.1 0.0 6.2 0 10 rmt/0
> 0.0 13.0 0.0 819.0 0.0 0.6 0.0 48.4 0 63 rmt/1
>
> 0.0 40.0 0.0 2520.1 0.0 0.2 0.0 5.9 0 24 rmt/0
> 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0 100 rmt/1
>
> 0.0 33.0 0.0 2078.9 0.0 0.2 0.0 5.3 0 18 rmt/0
> 0.0 10.0 0.0 630.0 0.0 1.0 0.0 99.9 0 100 rmt/1
>
--
Jeff McCombs | NIC, Inc
Systems Administrator | http://www.nicusa.com
jeffm AT nicusa DOT com | NASDAQ:
EGOV
Phone: (703) 909-3277 | "NIC - the People Behind eGovernment"
--
"My favorite thing about the internet, is that you get to go into
the private world of real creeps without having to smell them."
- Penn Jillett.
|