I run a python script to randomly create, delete, and move objects in a single DC domain to test some performance and some internal processes. After ~2 - 3 minutes, Samba receives (I think) but does not respond to a request. The socket is put in TIME_WAIT. Upon hanging, all other LDAP requests from any client (regardless of vendor) fail to be processed. I've tested a variety of sysctl and smb.conf options governing socket handling, none of which show any change in behavior. Samba is being installed from the distro stable repo. Tests fail in <3 minutes against: - Debian 8, Digital Ocean (t1.micro) - Debian 8, Amazon EC2 (1 cpu, 1gb ram) - Ubuntu 14 LTS, Amazon EC2 (t1.micro) Tests pass for >= 24 hours against: - Server 2008, KVM (2 cpu, 2gb ram) - Server 2012, KVM (2 cpu, 2gb ram) The only discernible error message I've been able to glean from the logs is below. The behavior is almost identical across all runs aside from the length of time it takes Samba hang. ldb_request SUB dn=DC=sh,DC=com filter=(&(objectclass=organizationalunit)(description=ERIS*)) dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:25 2015 UTC dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:30 2015 UTC dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:35 2015 UTC dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:40 2015 UTC /usr/sbin/smbd: Could not find child 17287 -- ignoring dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:45 2015 UTC dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:50 2015 UTC dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:43:55 2015 UTC ldb: ldb error (ldb_wait: Operations error (1)) occurred searching for modules, bailing out ldb: Unable to load modules for /var/lib/samba/private/sam.ldb: ldb_wait: Operations error (1) Terminating connection - 'backend Init failed' imessaging: cleaning up /var/lib/samba/private/smbd.tmp/msg/msg.16563.80 single_terminate: reason[backend Init failed] ldb: ldb error (ldb_wait: Operations error (1)) occurred searching for modules, bailing out ldb: Unable to load modules for /var/lib/samba/private/sam.ldb: ldb_wait: Operations error (1) Terminating connection - 'backend Init failed' imessaging: cleaning up /var/lib/samba/private/smbd.tmp/msg/msg.16563.80 single_terminate: reason[backend Init failed] dreplsrv_notify_schedule(5) scheduled for: Thu Nov 19 14:44:00 2015 UTC
Just some more information: I've testing on Debian 7 and the Amazon Directory Services (Simple AD). Since the behavior is such that LDAP requests are no longer serviced as noted previously, and since Simple AD has no interface to control services, my Simple AD service is essentially bricked.
Can you please supply your script, and re-try with Samba 4.5.0rc1? We have made some important improvements in our handling of large numbers of users and user group memberships (in particular), and it should be much, much better. I have been able to add users and users into groups for 70,000 iterations without Samba crashing with the new code. As such I'm going to mark this as fixed in Samba 4.5.0rc1, but feel free to get back to us with your torture device if you can still cause trouble. :-)