There seems to be FD leak happening for Samba share, due to which accessing file fails with below error, after accessing certain no of files. ../source3/smbd/smb2_server.c:2988(smbd_smb2_request_done_ex) smbd_smb2_request_done_ex: idx[1] status[NT_STATUS_TOO_MANY_OPENED_FILES] body[8] dyn[yes:1] at ../source3/smbd/smb2_server.c:3145 This occurs even if we increase smbd's max open files to 65536. This issue is not seen in samba 4.5.3, but seen with 4.5.11 and 4.5.15 This issue seems to be caused by fix for CVE-2017-2619. On investigating further below modification in source3/smbd/smb2_query_directory.c for fixing above CVE may be causing FD leak. https://github.com/samba-team/samba/commit/47b6b6f8f58efbabd7e4610f51db61dca2bc157c#diff-30edf5566a0d9e2abf214c7f778830df Line 328: dptr_CloseDir(fsp); Do we need to close FD using fd_close() instead of dptr_CloseDir()
Simple steps to re produce it are as below:- • Map a samba share on windows machine. Suppose share is mapped on Z drive. • Create a small batch file as below. :loop dir z: goto loop • Analysis it with any crash dump utility depending on platform or wait until smbd hits NT_STATUS_TOO_MANY_OPENED_FILES error
When default is used regarding wide link, the leaked fds are seen with open done with O_NOFOLLOW flag set, when tracing with strace/tusc. A gdb stack trace of the leaked entry allocation looks like apparently to be: #3 0x60000000fe924020:0 in smb_vfs_call_open () at ../source3/smbd/vfs.c:1643 #4 0x60000000fe8de3d0:0 in non_widelink_open () at ../source3/smbd/open.c:581 #5 0x60000000fe8deb40:0 in fd_open () at ../source3/smbd/open.c:684 #6 0x60000000fea10430:0 in smbd_smb2_query_directory_send () at ../source3/smbd/smb2_query_directory.c:339 #7 0x60000000fea0f630:0 in smbd_smb2_request_process_query_directory () at ../source3/smbd/smb2_query_directory.c:124 #8 0x60000000fe9be270:0 in smbd_smb2_request_dispatch () at ../source3/smbd/smb2_server.c:2644 Code is if (in_flags & SMB2_CONTINUE_FLAG_REOPEN) { int flags; dptr_CloseDir(fsp); /* * dptr_CloseDir() will close and invalidate the fsp's file * descriptor, we have to reopen it. */ flags = O_RDONLY; #ifdef O_DIRECTORY flags |= O_DIRECTORY; #endif status = fd_open(conn, fsp, flags, 0); if (tevent_req_nterror(req, status)) { return tevent_req_post(req, ev); } } dptr_CloseDir() code doesn't look like to close the fsp->fh->fd before it is replaced by fd_open() contrarily to what comment indicates. The code of fd_close() tends to confirm this since it does the 2 dptr_CloseDir() then closed the fd. Now fd_close() also check the fsp->fh->ref_count, in case that fd is shared. Shouldn't we have a fd_reopen() in the open.c API to manage the operation expected there?
Is this reproducible on any later release than 4.5.x ? Knowing that will really help determine if this is a bug that needs addressing just in 4.5.x or is a more generic problem.
At least in master, dptr_CloseDir(fsp) looks like: void dptr_CloseDir(files_struct *fsp) { if (fsp->dptr) { /* * The destructor for the struct smb_Dir * (fsp->dptr->dir_hnd) now handles * all resource deallocation. */ dptr_close_internal(fsp->dptr); fsp->dptr = NULL; } } dptr_close_internal() looks like: static void dptr_close_internal(struct dptr_struct *dptr) { struct smbd_server_connection *sconn = dptr->conn->sconn; DEBUG(4,("closing dptr key %d\n",dptr->dnum)); if (sconn == NULL) { goto done; } if (sconn->using_smb2) { goto done; } DLIST_REMOVE(sconn->searches.dirptrs, dptr); /* * Free the dnum in the bitmap. Remember the dnum value is always * biased by one with respect to the bitmap. */ if (!bitmap_query(sconn->searches.dptr_bmap, dptr->dnum - 1)) { DEBUG(0,("dptr_close_internal : Error - closing dnum = %d and bitmap not set !\n", dptr->dnum )); } bitmap_clear(sconn->searches.dptr_bmap, dptr->dnum - 1); done: TALLOC_FREE(dptr->dir_hnd); TALLOC_FREE(dptr); } The TALLOC_FREE(dptr->dir_hnd) triggers the destructor on dptr->dir_hnd, which looks like: static int smb_Dir_destructor(struct smb_Dir *dirp) { if (dirp->dir != NULL) { SMB_VFS_CLOSEDIR(dirp->conn,dirp->dir); if (dirp->fsp != NULL) { /* * The SMB_VFS_CLOSEDIR above * closes the underlying fd inside * dirp->fsp. */ dirp->fsp->fh->fd = -1; if (dirp->fsp->dptr != NULL) { SMB_ASSERT(dirp->fsp->dptr->dir_hnd == dirp); dirp->fsp->dptr->dir_hnd = NULL; } dirp->fsp = NULL; } } if (dirp->conn->sconn && !dirp->conn->sconn->using_smb2) { dirp->conn->sconn->searches.dirhandles_open--; } return 0; } SMB_VFS_CLOSEDIR(dirp->conn,dirp->dir) should be what is closing the file descriptor.
Quick question - are you reproducing this on HPUX only ? If so, does HPUX support the fdopendir() call ?
Created attachment 13961 [details] Proposed patch. Ralph, I think this is the correct fix (and I think this is a problem for all underlying operation systems that don't have an fdopendir() libc library call). HPE folks, can you test this and get back to me ? Jeremy.
Indeed HP-UX do not support fdopendir and SMB_VFS_OPENDIR() fails with ENOSYS. then dirp->fsp is not set
I have ported and tested the patch. This patch has fixed FD leak. Thanks..
Ralph, once you've +1'ed this I'll get into master and create back-ports for all supported versions. It looks like the correct fix.
Created attachment 13979 [details] git-am fix for 4.8.0rc, 4.7.next, 4.6.next. Cherry-picked from master.
Reassigning to Karolin for inclustion in 4.8, 4.7 and 4.6.
Karolin, I don't see this one in 4.8.0 so I think it got dropped. Can you add it to the relevent releases ? Thanks, Jeremy.
(In reply to Jeremy Allison from comment #12) Pushed to autobuild-v4-[8,7,6]-test.
(In reply to Karolin Seeger from comment #13) Pushed to v4-8-test and v4-7-test, re-trying autobuild-v4-6-test.
(In reply to Karolin Seeger from comment #14) Pushed to all branches. Closing out bug report. Thanks!