[nycbug-talk] Runaway cron server

Aidan Cully aidan at panix.com
Tue Nov 3 22:12:13 EST 2009


On Sat, Oct 31, 2009 at 09:01:50PM, Matt Juszczak said:
> Hi all, Happy Halloween...
> 
> I've been having some issues with a runaway cron server.  We've got crons 
> setup, and I'm using a locking system to make sure no cron runs 
> overlapping another cron (though this problem was occuring prior to the 
> locking system being put in place).  After a day or two, our server load 
> spikes, the crons stop working, and top shows:
> 
>   6702 root          1  97    0 16292K  3312K RUN    3  13:50  4.79% cron
> 65338 root          1  96    0 16328K  3324K RUN    3 138:23  4.59% cron
> 69837 root          1  96    0 16328K  3324K RUN    3 116:05  4.59% cron
> 90642 root          1  96    0 16328K  3324K CPU2   2  37:39  4.59% cron
> 65729 root          1  96    0 16328K  3324K RUN    3 136:01  4.49% cron
> 79591 root          1  96    0 16328K  3324K RUN    0  80:51  4.49% cron
> 85363 root          1  96    0 16328K  3324K RUN    0  64:42  4.49% cron
> 90625 root          1  96    0 16328K  3324K CPU0   0  51:58  4.49% cron
> 82872 root          1  96    0 16328K  3324K RUN    3  50:16  4.49% cron
> 83551 root          1  96    0 16292K  3312K RUN    3  49:13  4.49% cron
> 80016 root          1  96    0 16328K  3324K RUN    1  79:37  4.39% cron
> 85758 root          1  96    0 16292K  3312K RUN    0  63:36  4.39% cron
> 90284 root          1  96    0 16328K  3324K RUN    2  52:45  4.39% cron
> 61636 root          1  96    0 16328K  3324K RUN    2 171:26  4.30% cron
> 
> And even more info:
> 
> s505# ps auxw | grep cron | wc
>       105    1464   10026
> 
> If I try to truss or ktrace one of the processes, it returns no output. 
> This behavior is reliable and occurs every single time.  I'll restart the 
> cron server, and things will run fine for a little while, but will then 
> get to this point again.
> 
> Any ideas?  I'm really stuck.

I don't have admin access to a BSD box at the moment, so some of
what follows may not make sense...  But since no one else has
responded, there are a few things I'd look at.  I'm assuming that
the behavior is due to a bug in userspace, probably in the cron
program itself.

Can you recompile cron with debugging symbols, and attach gdb to
one of the runaway processes?  Do you get any information from the
crons' start times (it'd help to see the output from ps auxw | grep
cron, without piping into wc)?  It may be related to a specific
job that you're trying to execute, and you could narrow that down
(or prove the hypothesis false) by working out when the crons
actually started running...  Are new crons being forked off or
dieing?  You may get something from ktrace -g on the cron daemon's
process group.  If you kill -11 a cron, does it produce a core dump
that you can feed to gdb?

HTH
Aidan



More information about the talk mailing list