Bug 8184 - ctdb 50.samba event script monitoring is expensive under heavy load
Summary: ctdb 50.samba event script monitoring is expensive under heavy load
Status: RESOLVED FIXED
Alias: None
Product: CTDB 2.5.x or older
Classification: Unclassified
Component: ctdb (show other bugs)
Version: unspecified
Hardware: All All
: P5 minor
Target Milestone: ---
Assignee: Michael Adam
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-05-29 22:35 UTC by David Disseldorp
Modified: 2012-01-16 14:59 UTC (History)
0 users

See Also:


Attachments
1.0.112 based patch (1.41 KB, patch)
2011-05-29 22:56 UTC, David Disseldorp
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Disseldorp 2011-05-29 22:35:33 UTC
ctdbd monitors the status of the samba daemon by checking for sockets bound to
samba ports (445 & 139 by default) in the LISTEN state. It does this by parsing
the output of netstat -a -t -n.

config/functions:
178 ctdb_check_tcp_ports() {
179 
180     for p ; do
181         if ! netstat -a -t -n | grep -q "0\.0\.0\.0:$p .*LISTEN" ; then
182             if ! netstat -a -t -n | grep -q ":::$p .*LISTEN" ; then
183                 echo "ERROR: $service_name tcp port $p is not responding"
184                 return 1
185             fi
186         fi
187     done
188 }

Under an intensive connect-disconnect workload, the number of sockets in the
TIME_WAIT state can easily reach several thousand, as a result the netstat
command takes a long time (~5s x 4 runs = ~20s) to complete and is CPU
intensive.

There are a few options to make this check more efficient. Removing unnecessary
re-runs and using --listening rather than -a to request only sockets in the
LISTEN state would be a step in the right direction.
Comment 1 David Disseldorp 2011-05-29 22:56:09 UTC
Created attachment 6498 [details]
1.0.112 based patch
Comment 2 David Disseldorp 2011-07-11 12:39:32 UTC
This issue has been addressed in the ctdb 1.2 branch. If no further merges are required then this bug can be closed.

commit f9f28ff32c3d110b2609a277aa6f71211e3eb7b6
Author: Martin Schwenke <martin@meltin.net>
Date:   Tue Jul 5 11:32:06 2011 +1000

    Eventscript functions: optimise ctdb_check_tcp_ports() and add debug.
    
    ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
    port.  There are 2 problems with this:
    
    * Netstat is run on each loop iteration when it need only be run once.
    
    * The -a option is used to list all connections but the function only
      cares about the listening ports.  There may be many thousands of
      non-listening ports to grep through.
    
    This changes ctdb_check_tcp_ports() to run netstat with the -l option
    instead of the -a option.  It also only runs netstat once before the
    main loop.
    
    When a port is found to not be listening the output of the netstat
    command is now dumped to help with debugging.
Comment 3 David Disseldorp 2012-01-16 14:59:15 UTC
Closing as per comment#2.