Bug 13164 - 4.7.3 NT_STATUS_TOO_MANY_OPENED_FILES
4.7.3 NT_STATUS_TOO_MANY_OPENED_FILES
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: DCE-RPCs and pipes
4.7.3
x64 Linux
: P5 major
: ---
Assigned To: Andrew Bartlett
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2017-11-23 13:18 UTC by Anderson de Godoy
Modified: 2017-12-01 11:28 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Anderson de Godoy 2017-11-23 13:18:26 UTC
Hi,
I recently update my domain controller to version 4.7.3, before, they had 4.6.3 version of samba. I have two more domain controllers with 4.6.3 version running ok, but, after i upgraded to version 4.7.3, my domain controller master stop working everyday when the clients beginning to log in domain, and generate this error "NT_STATUS_TOO_MANY_OPENED_FILES" and its necessary restart the samba service to stop the problem, sometimes it's needed many restarts. Initially i changed on limits.d to see if i have some results, but nothing change, the error stills persists, so i looked on "/proc/$pid/limits" for every process of samba and i see the process of "SMBD" with this info (Max open files            16464                30000               files)  the "soft" value is default according with the manpage and "hard" is my test, but the process of "SAMBA" has this info (Max open files            1024                 4096                 files) so, i take a look the process on the other servers and this info is the same on both, but only in 4.7.3 version occurs this problem, no changes were made in smb.conf on the servers after upgrade, i have at least 1500 users logging in when starts the errors. The load and memory are ok, the servers have more resource free that they need. After the errors start the other servers lose communication with the master (Nov 23 08:39:25 AD02 samba[2027]: [2017/11/23 08:39:25.804846,  0] ../source4/librpc/rpc/dcerpc_util.c:737(dcerpc_pipe_auth_recv)
Nov 23 08:39:25 AD02 samba[2027]:   Failed to bind to uuid e3514235-4b06-11d1-ab04-00c04fc2dcd2 for ncacn_ip_tcp:192.168.xxx.x[49152,seal,krb5,tar
get_hostname=d91cf202-a314-44df-9d3b-1e67401683d6._msdcs.xxx.xx,target_principal=GC/ad01.xxx.xx/xxx.xx,abstract_syntax=e3514235-4b06-11d1-ab04-00c0
4fc2dcd2/0x00000004,localaddress=192.168.xxx.x] NT_STATUS_UNSUCCESSFUL
). 


Has some conf that i can do to stop this erros? Has some debug that i can do to help?


Nov 23 08:50:07 AD01 samba[2306]: [2017/11/23 08:50:07.967804,  0] ../source4/smbd/process_single.c:57(single_accept_connection)
Nov 23 08:50:07 AD01 samba[2306]:   single_accept_connection: accept: NT_STATUS_TOO_MANY_OPENED_FILES
Nov 23 08:50:08 AD01 samba[2306]: [2017/11/23 08:50:08.972954,  0] ../source4/smbd/process_standard.c:200(setup_standard_child_pipe)
Nov 23 08:50:08 AD01 samba[2306]:   Failed to create parent-child pipe to handle SIGCHLD to track new process for socket
Nov 23 08:50:08 AD01 samba[2306]: [2017/11/23 08:50:08.973878,  0] ../source4/smbd/process_standard.c:200(setup_standard_child_pipe)
Nov 23 08:50:08 AD01 samba[2306]:   Failed to create parent-child pipe to handle SIGCHLD to track new process for socket
Nov 23 08:50:08 AD01 samba[2306]: [2017/11/23 08:50:08.973941,  0] ../source4/smbd/process_single.c:57(single_accept_connection)
Nov 23 08:50:08 AD01 samba[2306]:   single_accept_connection: accept: NT_STATUS_TOO_MANY_OPENED_FILES
Comment 1 Andrew Bartlett 2017-11-24 19:22:12 UTC
Samba 4.7 allowed the LDAP server to follow the 'standard' process model and so become one-process-per-child.

You say you had 1500 users logged in, do they all have LDAP connections open?

It can all be forced back to one process with -M single, but this would be overkill I think.  Currently there isn't an smb.conf option to restrict just LDAP to the 4.6 behaviour. 

If you can, just let Samba have unlimited FDs, it uses one per child to track when the child goes away. 

Is the number of files in use proportional to the current number of clients, or the maximum number of clients?  That is, could we have an FD leak here, or are we just over-using a constrained resource?
Comment 2 Anderson de Godoy 2017-11-27 11:38:17 UTC
(In reply to Andrew Bartlett from comment #1)
*You say you had 1500 users logged in, do they all have LDAP connections open?
so, as we have 3 domain controllers the connections are balanced, at moment when the errors occurs are at least 700 connections "established" only for ldap. 

*If you can, just let Samba have unlimited FDs, it uses one per child to track when the child goes away. 
I added a unlimited fd to samba process 


*Is the number of files in use proportional to the current number of clients, or the maximum number of clients?  That is, could we have an FD leak here, or are we just over-using a constrained resource?
This servers are only netlogon, has another server only for file server, so, they not have files, just the policies and the netlogon ".bat" files.
Comment 3 Andrew Bartlett 2017-11-30 20:25:00 UTC
(In reply to Anderson de Godoy from comment #2)
If you are still seeing the issue, one approach would be to revert f4ce77857bb677ea612ad26d700960f913ff7bd8

This will allow you to force the LDAP server (only) back into a single process like Samba 4.6 did and bridge you to Samba 4.8 where we will have a prefork option available that will use less processes and so less file descriptors.
Comment 4 Anderson de Godoy 2017-12-01 11:28:22 UTC
(In reply to Andrew Bartlett from comment #3)
At this moment no big deals with this problem, i put in cron a script to monitor and restart if the problem occurs. Sometimes happen, sometimes don't. Another thing i see strange is, the use of memory for smbd process, if i do not limit the max number of process it uses memory until the server freezes, currently the server has 32GB, the server uses 24GB at top of use, but, if i do not limit smbd process on smb.conf he uses all memory and SWAP and freezes. At this moment with "max smbd process = 700" is ok, this started to ocurs after upgrade too. I will wait next release to see if this problems persists.