I have a .ksh script does a system call as follows.

recoveryProgram auto > /dev/null
      MYRESULT=$? 

      if [ $MYRESULT != "0" ] ; then
        # If 'recovery' failed
        releaseBrokenStation $SYSTEM
        emailFailure
      fi

Sometimes, this system call (recoveryProgram auto) gets hung. I would like to be able to have a time out, lets say about 60 minutes and if the system call doesnt finish, I would like to kill the system call (and all other child process this system call may have created). That way, my script can move on to the next step. However, I also need to know if the system call got hung.

Please note that this needs to run on Linux (Ubuntu) as well as on HPUX 11.27. Can someone please tell me what would be the best way to do this?

Recommended Answers

All 5 Replies

Usually, you'd like to spawn the process in the background and wait for it to complete. Unfortunately, there is no timeout parameter to the wait call.
I can think of two immediate solutions. First, you can write a small C program that will execute the recovery script and timeout after a specified period.
If that solution is not applicable then you could simply do something like:

recoveryProgram auto > /dev/null &
PID=$!
SEC=0
while [ ${SEC} -lt 60 ]; do
   # check for recoveryProgram running
   if [ `ps -p ${PID}` ]; then
      SEC=SEC + 1
   else
      SEC=60
   fi
done

# If timeout (process still running) kill it
if [ `ps -p ${PID}` ]; then
   kill ${PID}
fi

N.B. The above is general pseudocode. You will have to map it to valid ksh syntax.

My ksh skills arent that good, but here is what I have so far. I can not figure out why time out doesnt work right.

printLog "waiting for recovery to finish..."
      recoveryProgram auto > /dev/null
      PID=$!
      $timer = 0;
      while [ $timer -lt 1 ] ; do
        if [ `ps -p ${PID}` ]; then
          sleep 1 
          timer=timer+1
        else
          break
        fi
      done

      # If timeout (process still running) kill it
      if [ `ps -p ${PID}` ]; then
        printLog "Time out occurred, killing recovery process"
        kill ${PID}
      fi

      MYRESULT=$?
    fi

    if [ $MYRESULT != "0" ] ; then
       printLog "Failed to recover"
    else
       printLog "Going to step 2"
    fi

What part of it is not working?
I can see that you will only ever wait 1 second before exiting the while loop while [ $timer -lt 1 ] only holds when timer is less than 1 and you increment it immediately.
There is also an extra fi in your post but I'm assuming that is just a copy paste error.

Putting an upper bound on a program's run time is a PITA in most shells. But it can be done. Here's a bash script that demonstrates the method with only (I think) 6 extra lines of code. I imagine a clever programmer will figure out a way to make it a function with two args: timeout and command. The command must not block the TERM signal.

#! /bin/bash

if [ $# -ne 2 ]; then
  echo "Usage: `basename $0` cmd_time alarm_time"
  echo "where cmd_time is the 'time' the command will take to run, and"
  echo "      alarm_time is how long to wait for the 'command' to finish."
  echo
  echo "This script uses sleep() to test a fairly clean method of interrupting"
  echo "a command that should have a upper bound on its run time."
  echo "Set cmd_time<alarm_time to simulate the command finishing before the alarm."
  echo "Set alarm_time<cmd_time to simulate the alarm clock going off first."
  echo
  echo "This works nicely for bash. I don't know how ksh (or others) will react."
  exit 1
fi

# bash-ism to turn off job monitoring: silences the child's death throes.
set +m

# Plug our ears
trap "" ALRM TERM

# Start the command
(trap - ALRM TERM; sleep $1)&
CMDPID=$!

# Set the alarm clock
(trap - ALRM TERM; sleep $2; kill -TERM $CMDPID)&
ALARMPID=$!

# Wait for the command to die of either natural causes or procicide
wait $CMDPID

# Smash the alarm clock
kill -ALRM $ALARMPID >/dev/null 2>&1
kill -TERM $ALARMPID >/dev/null 2>&1

# Listen again
trap - ALRM TERM

echo "Finished"

BTW, wait's return status contains the command's exit code (<128) if it finished normally or a value greater than 127 if it was killed by the alarm clock.

UNIX and GNU/Linux have their differences as do sh, bash and ksh, but there is a measure of commonality among them.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.