[nycbug-talk] Runaway cron server

Matt Juszczak matt at atopia.net
Sat Oct 31 21:01:50 EDT 2009


Hi all, Happy Halloween...

I've been having some issues with a runaway cron server.  We've got crons 
setup, and I'm using a locking system to make sure no cron runs 
overlapping another cron (though this problem was occuring prior to the 
locking system being put in place).  After a day or two, our server load 
spikes, the crons stop working, and top shows:

  6702 root          1  97    0 16292K  3312K RUN    3  13:50  4.79% cron
65338 root          1  96    0 16328K  3324K RUN    3 138:23  4.59% cron
69837 root          1  96    0 16328K  3324K RUN    3 116:05  4.59% cron
90642 root          1  96    0 16328K  3324K CPU2   2  37:39  4.59% cron
65729 root          1  96    0 16328K  3324K RUN    3 136:01  4.49% cron
79591 root          1  96    0 16328K  3324K RUN    0  80:51  4.49% cron
85363 root          1  96    0 16328K  3324K RUN    0  64:42  4.49% cron
90625 root          1  96    0 16328K  3324K CPU0   0  51:58  4.49% cron
82872 root          1  96    0 16328K  3324K RUN    3  50:16  4.49% cron
83551 root          1  96    0 16292K  3312K RUN    3  49:13  4.49% cron
80016 root          1  96    0 16328K  3324K RUN    1  79:37  4.39% cron
85758 root          1  96    0 16292K  3312K RUN    0  63:36  4.39% cron
90284 root          1  96    0 16328K  3324K RUN    2  52:45  4.39% cron
61636 root          1  96    0 16328K  3324K RUN    2 171:26  4.30% cron

And even more info:

s505# ps auxw | grep cron | wc
      105    1464   10026

If I try to truss or ktrace one of the processes, it returns no output. 
This behavior is reliable and occurs every single time.  I'll restart the 
cron server, and things will run fine for a little while, but will then 
get to this point again.

Any ideas?  I'm really stuck.

-Matt



More information about the talk mailing list