Bug 11611 - Sockets hang after frequent operations.
Summary: Sockets hang after frequent operations.
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.1.6
Hardware: x64 Linux
: P5 major (vote)
Target Milestone: ---
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-19 15:04 UTC by Will
Modified: 2016-07-29 03:02 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Will 2015-11-19 15:04:41 UTC
I run a python script to randomly create, delete, and move objects in a single DC domain to test some performance and some internal processes.

After ~2 - 3 minutes, Samba receives (I think) but does not respond to a request.

The socket is put in TIME_WAIT.

Upon hanging, all other LDAP requests from any client (regardless of vendor) fail to be processed.  I've tested a variety of sysctl and smb.conf options governing socket handling, none of which show any change in behavior.

Samba is being installed from the distro stable repo.

Tests fail in <3 minutes against:
- Debian 8, Digital Ocean (t1.micro)
- Debian 8, Amazon EC2 (1 cpu, 1gb ram)
- Ubuntu 14 LTS, Amazon EC2 (t1.micro)

Tests pass for >= 24 hours against:
- Server 2008, KVM (2 cpu, 2gb ram)
- Server 2012, KVM (2 cpu, 2gb ram)


The only discernible error message I've been able to glean from the logs is below.  The behavior is almost identical across all runs aside from the length of time it takes Samba hang.
 

ldb_request SUB dn=DC=sh,DC=com filter=(&(objectclass=organizationalunit)(description=ERIS*))
dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:25 2015 UTC
dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:30 2015 UTC
dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:35 2015 UTC
dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:40 2015 UTC
/usr/sbin/smbd: Could not find child 17287 -- ignoring
dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:45 2015 UTC
dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:50 2015 UTC
dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:55 2015 UTC
ldb: ldb error (ldb_wait: Operations error (1)) occurred searching for modules, bailing out
ldb: Unable to load modules for /var/lib/samba/private/sam.ldb: ldb_wait: Operations error (1)
Terminating connection - 'backend Init failed'
imessaging: cleaning up /var/lib/samba/private/smbd.tmp/msg/msg.16563.80
single_terminate: reason[backend Init failed]
ldb: ldb error (ldb_wait: Operations error (1)) occurred searching for modules, bailing out
ldb: Unable to load modules for /var/lib/samba/private/sam.ldb: ldb_wait: Operations error (1)
Terminating connection - 'backend Init failed'
imessaging: cleaning up /var/lib/samba/private/smbd.tmp/msg/msg.16563.80
single_terminate: reason[backend Init failed]
dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:44:00 2015 UTC
Comment 1 Will 2015-11-20 02:07:11 UTC
Just some more information:

I've testing on Debian 7 and the Amazon Directory Services (Simple AD).

Since the behavior is such that LDAP requests are no longer serviced as noted previously, and since Simple AD has no interface to control services, my Simple AD service is essentially bricked.
Comment 2 Andrew Bartlett 2016-07-29 03:02:37 UTC
Can you please supply your script, and re-try with Samba 4.5.0rc1?

We have made some important improvements in our handling of large numbers of users and user group memberships (in particular), and it should be much, much better.

I have been able to add users and users into groups for 70,000 iterations without Samba crashing with the new code.

As such I'm going to mark this as fixed in Samba 4.5.0rc1, but feel free to get back to us with your torture device if you can still cause trouble. :-)