[nycbug-talk] network strangeness (resource starvation?)

Jim Brown jpb
Sun Jul 31 10:17:55 EDT 2005


* Charles Sprickman <spork at bway.net> [2005-07-30 22:45]:
> Hey all,
> 
> I've pursued this on other lists for a few years, and it's getting under 
> my skin more and more.  Maybe someone here can give me some pointers...
> 
> Have a host (FBSD 4.9) that does lots of dns work - both queries from 
> outside and other hosts inside doing a ton of lookups.
> 
> We have run up against a few hurdles and cleared them.   First it was 
> ipfilter running out of state entries.  Upped the size in "ip_state.h" in 
> the ipfilter includes, and that helped.  Eventually we hit another wall, 
> so we relaxed the ipfilter rules to make them work for inbound/outbound 
> without generating state entries.  Since then, no problems reported by 
> "ipfstat -s" that would indicate we're running out of resources there.
> 
> One of the ongoing symptoms is that ssh sessions to the box will start 
> *dropping* characters when udp traffic is really high.  Even after we 
> solved the problem of outgrowing the state table, the problem still 
> remains.
> 
> We've bumped a number of things, nmbclusters is way up there, and netstat 
> -m shows that we're not hitting a peak there.  However looking at full 
> "netstat -s" stats after the box only being up for less than 12 hours 
> shows this:
> 
>         8297 dropped due to no socket
>         0 broadcast/multicast datagrams dropped due to no socket
>         31 dropped due to full socket buffers
> 
> So that's a hint.  I can look for whatever obscure sysctl variable to set 
> the listen queue deeper.  Not sure about the "no socket"...
> 
> Lately the newest wrinkle is that the box will just go unresponsive. 
> Pingable, but nothing on serial console, no ssh.
> 
> So can things getting starved in udp-land cause other networking stuff to 
> choke?  Any pointers where else to look?
> 
> Thanks,
> 
> Charles


Hi Charles,

What does 'lots of dns work' mean?  Can you give some I/O stats?
What DNS software are you running?  What is the hw platform?
How many zones?  Avg size of a zone?  

Some general thoughts:

 - eliminate all other non-essential services
 - re-nice named
 - try redesigning your DNS service to include multiple servers
   and load balance between them
 - BIND (if that's what you are using) is a notorious memory hog
   However it's still my favorite DNS server.  
   Increase memory to the limit.

I know it's not what you want to hear, but when I made the
switch from 4.10 to 5.4 i was *impressed* with the performance
under heavy load.  I tested using:

  X+KDE with multiple konsole sessions:

    - multiple FTPs
    - two different stress sessions, one memory, one disk
    - a loop of 'make buildworld'
    - a loop of 'make buildkernel'
    - a loop of 'ls -alR /'
    - ssh session to remote host

The desktop was still usable, and I didn't lose any ssh characters on remote sessions.

IBM T41 with 512 MB.

Best Regards,
Jim B.


    




More information about the talk mailing list