[nycbug-talk] overloaded webserver: nfs wait issue?
N.J. Thomas
njt
Thu Dec 1 14:39:44 EST 2005
We have a website with moderately high traffic, load balanced among 3
webservers.
During peak traffic times however (when the volume is higher than
normal), the load shoots up to over a 100, and the site crawls to its
knees.
We set up a script to take snapshots of top every 20 seconds. Here is
what it looks like when everthing is normal:
127
last pid: 12003; load averages: 0.93, 1.36, 1.35 up 41+04:22:14 14:00:23
243 processes: 12 running, 230 sleeping, 1 zombie
Mem: 222M Active, 74M Inact, 186M Wired, 16M Cache, 111M Buf, 503M Free
Swap: 2048M Total, 16M Used, 2032M Free
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
136 root 32 0 1208K 420K RUN 33.1H 7.28% 7.28% amd
11918 nobody -1 0 149M 12292K nfsrcv 0:01 3.00% 1.95% httpd
11879 nobody 2 0 149M 12292K sbwait 0:01 2.10% 1.37% httpd
11896 nobody 2 0 148M 11704K RUN 0:00 1.80% 1.17% httpd
11962 nobody 2 0 147M 10072K RUN 0:00 4.33% 1.12% httpd
11892 nobody -1 0 145M 8804K nfsrcv 0:00 1.35% 0.88% httpd
11935 nobody 2 0 149M 12284K sbwait 0:00 1.73% 0.78% httpd
11925 nobody 2 0 149M 12288K sbwait 0:00 1.08% 0.68% httpd
11894 nobody 2 0 149M 12404K sbwait 0:00 0.98% 0.63% httpd
11937 nobody 2 0 149M 12456K RUN 0:00 1.61% 0.63% httpd
11954 nobody 2 0 149M 12288K sbwait 0:00 1.88% 0.49% httpd
191 root 2 0 144M 6632K select 13:23 0.34% 0.34% httpd
11930 nobody 2 0 145M 8852K sbwait 0:00 0.62% 0.34% httpd
11872 nobody 2 0 149M 12288K sbwait 0:00 0.45% 0.29% httpd
11911 nobody 2 0 148M 11604K accept 0:00 0.45% 0.29% httpd
11893 nobody 2 0 149M 12392K sbwait 0:00 0.38% 0.24% httpd
11876 nobody 2 0 149M 12264K sbwait 0:00 0.38% 0.24% httpd
11934 nobody 2 0 149M 12292K accept 0:00 0.41% 0.20% httpd
When the load shoots up, the number of http clients hits Apache's
MaxClients setting, here is what top shows:
last pid: 12407; load averages: 87.84, 51.91, 27.52 up 41+04:40:51 14:19:00
268 processes: 2 running, 266 sleeping
Mem: 715M Active, 68M Inact, 187M Wired, 29M Cache, 111M Buf, 2100K Free
Swap: 2048M Total, 272M Used, 1776M Free, 13% Inuse
PID USERNAME PRI NICE SIZE RES STATE TIME WCPU CPU COMMAND
136 root 64 0 1208K 376K RUN 33.1H 2.69% 2.69% amd
11965 nobody -1 0 149M 6892K nfsrcv 0:05 0.24% 0.24% httpd
11913 nobody -1 0 149M 8300K nfsrcv 0:05 0.20% 0.20% httpd
11878 nobody -1 0 149M 8572K nfsrcv 0:09 0.15% 0.15% httpd
11948 nobody -1 0 149M 8852K nfsrcv 0:07 0.15% 0.15% httpd
11982 nobody -1 0 149M 6764K nfsrcv 0:04 0.15% 0.15% httpd
11912 nobody -1 0 149M 4912K nfsrcv 0:06 0.10% 0.10% httpd
12060 nobody -1 0 149M 7356K nfsrcv 0:05 0.10% 0.10% httpd
11999 nobody -1 0 149M 8352K nfsrcv 0:04 0.10% 0.10% httpd
12122 nobody -1 0 149M 8296K nfsrcv 0:04 0.10% 0.10% httpd
12028 nobody -1 0 149M 8664K nfsrcv 0:04 0.10% 0.10% httpd
12267 nobody -1 0 149M 8452K nfsrcv 0:03 0.10% 0.10% httpd
12270 nobody -1 0 150M 7156K nfsrcv 0:02 0.10% 0.10% httpd
11983 nobody -1 0 149M 8256K nfsrcv 0:09 0.05% 0.05% httpd
11977 nobody -1 0 149M 5488K nfsrcv 0:06 0.05% 0.05% httpd
11952 nobody -1 0 149M 6704K nfsrcv 0:06 0.05% 0.05% httpd
11895 nobody -1 0 148M 4404K nfsrcv 0:06 0.05% 0.05% httpd
11885 nobody -1 0 149M 8348K nfsrcv 0:06 0.05% 0.05% httpd
The state of all the httpd prcesses are "nfsrcv". Does this mean the
bottleneck is at the NFS server that hosts the htdocs (and PHP scripts)
or just that the server is low on memory?
Thomas
--
N.J. Thomas
njt at ayvali.org
Etiamsi occiderit me, in ipso sperabo
More information about the talk
mailing list