8184 – ctdb 50.samba event script monitoring is expensive under heavy load

Bug 8184 - ctdb 50.samba event script monitoring is expensive under heavy load

Summary: ctdb 50.samba event script monitoring is expensive under heavy load

Status:	RESOLVED FIXED

Alias:	None

Product:	CTDB 2.5.x or older
Classification:	Unclassified
Component:	ctdb (show other bugs)
Version:	unspecified
Hardware:	All All

Importance:	P5 minor
Target Milestone:	---
Assignee:	Michael Adam
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-05-29 22:35 UTC by David Disseldorp
Modified:	2012-01-16 14:59 UTC (History)
CC List:	0 users

See Also:

Attachments
1.0.112 based patch (1.41 KB, patch) 2011-05-29 22:56 UTC, David Disseldorp	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Disseldorp 2011-05-29 22:35:33 UTC

ctdbd monitors the status of the samba daemon by checking for sockets bound to
samba ports (445 & 139 by default) in the LISTEN state. It does this by parsing
the output of netstat -a -t -n.

config/functions:
178 ctdb_check_tcp_ports() {
179 
180     for p ; do
181         if ! netstat -a -t -n | grep -q "0\.0\.0\.0:$p .*LISTEN" ; then
182             if ! netstat -a -t -n | grep -q ":::$p .*LISTEN" ; then
183                 echo "ERROR: $service_name tcp port $p is not responding"
184                 return 1
185             fi
186         fi
187     done
188 }

Under an intensive connect-disconnect workload, the number of sockets in the
TIME_WAIT state can easily reach several thousand, as a result the netstat
command takes a long time (~5s x 4 runs = ~20s) to complete and is CPU
intensive.

There are a few options to make this check more efficient. Removing unnecessary
re-runs and using --listening rather than -a to request only sockets in the
LISTEN state would be a step in the right direction.

Comment 1 David Disseldorp 2011-05-29 22:56:09 UTC

Created attachment 6498 [details]
1.0.112 based patch

Comment 2 David Disseldorp 2011-07-11 12:39:32 UTC

This issue has been addressed in the ctdb 1.2 branch. If no further merges are required then this bug can be closed.

commit f9f28ff32c3d110b2609a277aa6f71211e3eb7b6
Author: Martin Schwenke <martin@meltin.net>
Date:   Tue Jul 5 11:32:06 2011 +1000

    Eventscript functions: optimise ctdb_check_tcp_ports() and add debug.
    
    ctdb_check_tcp_ports() runs "netstat -a -t -n" in a loop for each
    port.  There are 2 problems with this:
    
    * Netstat is run on each loop iteration when it need only be run once.
    
    * The -a option is used to list all connections but the function only
      cares about the listening ports.  There may be many thousands of
      non-listening ports to grep through.
    
    This changes ctdb_check_tcp_ports() to run netstat with the -l option
    instead of the -a option.  It also only runs netstat once before the
    main loop.
    
    When a port is found to not be listening the output of the netstat
    command is now dumped to help with debugging.

Comment 3 David Disseldorp 2012-01-16 14:59:15 UTC

Closing as per comment#2.