HPSS Operations Guide: Problems

Updated September 23, 1997


This explains some known problems with HPSS.


Runaway mover

This is a problem which we see occasionally. We're working on fixing it.

Symptoms:

  1. Backupd on ashur reports err=28 (NO SPACE)
  2. A job named (hpss_mvr_...) on ashur is using almost all the cpu.

    I.e. top might show:

    ashur            load averages:  0.23,  0.12,  0.24   Tue Sep 23 10:49:04 1997
    Cpu states:       2.8% user,   7.1% system,   0.4% wait,  89.7% idle
    Real memory:     421.6M free  275.8M procs  326.6M files 1024.0M total
    Virtual memory:  790.0M free  234.0M used                1024.0M total
    
       PID USER     PRI NICE   SIZE     RES STAT      TIME   CPU% COMMAND
      7206 root      60   0    388K    468K run     213:43  75.4% hpss_mvr_tcp
       516 root     127  21     16K     20K run   15040:31   0.2% Kernel (wait)
      7206 root      60   0    388K    468K sleep   213:43   5.4% pwsync
         0 root      16  21     20K     24K sleep    68:55   0.9% Kernel (swapper) 
       ...
    

Fix

If you see this please try to contact me (Jim Fox). If you can't then do this:

  1. On ashur, as root (assume pid = pid of the looping hpss_mvr job

  2. run this script

    # /usr/lpp/hpss/local/bin/get_snapshot pid > /usr/local/hpss/mvr.log

    (May take a few minutes. ignore the messages it prints)

  3. kill the process

    # kill -9 pid

That should fix the problem, although it will take awhile (many minutes) for the enospace to go away.


The new, improved archive project
(HPSS)
brought to you by
Jim Fox
Doug Luft
Ken Lowe