ADSM-L

[ADSM-L] DLLA performance on TSM 5.5.5.2

2013-01-30 22:34:21
Subject: [ADSM-L] DLLA performance on TSM 5.5.5.2
From: Frank Fegert <fra.nospam.nk AT GMX DOT DE>
To: ADSM-L AT VM.MARIST DOT EDU
Date: Wed, 30 Jan 2013 22:45:59 +0100
Hello all,

after some logical inconsistencies showed up in one of our TSM servers'
DB (v5.5.5.2, 96GB, 84% usage) we were advised by the IBM support to run
a DLLA procedure. To estimate the needed downtime we ran some initial
tests with a recent DB backup on a test machine (AIX 6.1.8.1, 2 ded.
Power6+ CPUs, SVC backed 15k disks). The numbers were:
  dumpdb:  ~15 min
  loaddb:  ~4 h 45 min
  auditdb: ~9 h 30 min

In an attempt to speed things up a bit more and to rule out any I/O
related issues, we altered the setup a little. The dumpdb is now done
to a dedicated LUN of a TMS RamSan-630 and the target database and log
volumes for the loaddb are created on ramdisk devices. The numers are
now (so far):
  dumpdb:  ~5 min
  loaddb:  ~4 h 55 min ()
  auditdb: TBD

The CPUs aren't maxed out and even adding a 3rd CPU didn't noticeably
speed things up on the loaddb step. Disk I/O on the loaddb step is
about avg. 4MB/sec (max. 8 MB/sec) with around 10-20 IOPS. "psswitch"
and "syscalls" are rather high though, >100k and >300k respectively.
When looking at the dsmserv processes activity for a sample period of
about 60 sec. we see the following distribution in function calls
(sorry, no analysis of delta_t between function calls available as
of now):
  53475 pthread_getspecific
  27141 pthread_self
  26585 pthread_mutex_unlock
  26585 pthread_mutex_lock
  7218 thread_waitlock_
  6335 thread_unlock_
  1526 pthread_cond_broadcast
  1184 strlen
  ... everything else is below 300 counts during the 60 sec. sample
      period.

Most of the work wrt system and library function calls seems to be
done in the IPC/ITC area. Defaulting or setting the pthread related
environment variables:
  AIXTHREAD_SCOPE=S
  AIXTHREAD_MNRATIO=1:1
  AIXTHREAD_AFFINITY=strict

had no significant effect on the loaddb runtime.

Does anyone have an idea on how to further cut down the DLLA runtime?
Or is the loaddbs internal datacrunching really a such time consuming
process?

Thanks & best regards,

    Frank


== loaddb final lines ==
ANR4039I LOADDB: Loaded 545249486 database entries (cumulative).
ANR4031I LOADDB: Copied 20576621 database pages.
ANR4033I LOADDB: Copied 5753 bit vectors.
ANR4035I LOADDB: Encountered 0 bad database records.
ANR4074I LOADDB: Encountered 0 bad database entries.
ANR4036I LOADDB: Copied 545278043 database entries.
ANR4037I LOADDB: 51942 Megabytes   copied.
ANR4004I LOADDB: Database load process completed.
ANR4405I LOADDB: Loaded an inconsistent dump image - a database audit (AUDITDB) 
IS REQUIRED with FIX=YES.
ANR2106I : Quiescing database update activity.
ANR2107I : Database update activity is now quiesced.

<Prev in Thread] Current Thread [Next in Thread>
  • [ADSM-L] DLLA performance on TSM 5.5.5.2, Frank Fegert <=