Created attachment 10794 [details] patch to take care of POLLNVAL condition. We found that one of the smbd processes was stuck at infinite poll loop and causing 100% CPU utilization. Here is the stack when it was stuck. We are running 3.6.12+ stack so line numbers may not be matching exactly to unmodified 3.6.12 sources. Dumping PID 71815 [Switching to Thread 803708c80 (LWP 103546)] 0x000000080219e23c in ?? () from /lib/libthr.so.3 Thread 7 (Thread 8037021c0 (LWP 102491)): #0 0x0000000802d465ec in poll () from /lib/libc.so.7 #1 0x00000008021967fe in poll () from /lib/libthr.so.3 #2 0x00000008038028f8 in winbind_read_sock (buffer=0x7fffffffb370, count=3496) at ../nsswitch/wb_common.c:457 #3 0x00000008038029e7 in winbindd_read_reply (response=<optimized out>) at ../nsswitch/wb_common.c:511 #4 winbindd_get_response (response=0x7fffffffa4f0) at ../nsswitch/wb_common.c:617 #5 0x0000000803803273 in winbindd_request_response (req_type=47, request=0x7fffffffc120, response=0x7fffffffb370) at ../nsswitch/wb_common.c:650 #6 0x0000000803802ff8 in winbind_open_pipe_sock (need_priv=<optimized out>, recursing=<optimized out>) at ../nsswitch/wb_common.c:346 #7 winbind_write_sock (buffer=0x7fffffffd240, count=2096, recursing=0, need_priv=0) at ../nsswitch/wb_common.c:378 #8 0x0000000803803159 in winbindd_send_request (req_type=1, need_priv=0, request=0x7fffffffd240) at ../nsswitch/wb_common.c:576 #9 0x0000000803803263 in winbindd_request_response (req_type=1, request=0x7fffffffd240, response=0x80390a140) at ../nsswitch/wb_common.c:647 #10 0x00000008038063d7 in _nss_winbind_getpwnam_r (name=0x7fffffffdde0 "domain\\caylaoj", result=0x802ece0c0, buffer=0x803760400 "caylaoj", buflen=1024, errnop=0x7fffffffdcec) at ../nsswitch/winbind_nss_linux.c:648 #11 0x0000000802ce888f in __nss_compat_getpwnam_r () from /lib/libc.so.7 #12 0x0000000802d747e3 in nsdispatch () from /lib/libc.so.7 #13 0x0000000802d3dc74 in getpwnam_r () from /lib/libc.so.7 #14 0x0000000802d3d8a1 in ?? () from /lib/libc.so.7 #15 0x000000000079951a in getpwnam_alloc_cached (mem_ctx=0x804b96050, name=0x7fffffffdde0 "domain\\caylaoj") at lib/username.c:45 #16 0x0000000000799631 in Get_Pwnam_internals (user2=<optimized out>, user=<optimized out>, mem_ctx=<optimized out>) at lib/username.c:117 #17 Get_Pwnam_alloc (mem_ctx=0x804b96050, user=0x80378e5d0 "DOMAIN\\caylaoj") at lib/username.c:173 #18 0x00000000004ba8aa in add_session_user (sconn=0x803711350, user=0x80378e5d0 "DOMAIN\\caylaoj") at smbd/password.c:365 #19 0x00000000005254cb in make_connection_snum (sconn=0x803711350, conn=0x804d07850, snum=41, vuser=0x803782990, password=..., pdev=0xa13336 "???", pstatus=0x7fffffffe340) at smbd/service.c:814 #20 0x000000000052673d in make_connection_smb2 (sconn=0x803711350, tcon=0x804b9ac50, vuser=0x803782990, password=..., pdev=0xa13336 "???", pstatus=0x7fffffffe340) at smbd/service.c:1194 #21 0x00000000005357f4 in smbd_smb2_tree_connect (out_tree_id=<optimized out>, out_maximal_access=<optimized out>, out_capabilities=<optimized out>, out_share_flags=<optimized out>, out_share_type=<optimized out>, in_path=<optimized out>, req=<optimized out>) at smbd/smb2_tcon.c:231 #22 smbd_smb2_request_process_tcon (req=0x804b9b110) at smbd/smb2_tcon.c:92 #23 0x0000000000531283 in smbd_smb2_request_dispatch (req=0x804b9b110) at smbd/smb2_server.c:1408 #24 0x000000000053201e in smbd_smb2_request_incoming (subreq=0x8043a7c90) at smbd/smb2_server.c:2661 #25 0x000000000052fc1c in smbd_smb2_request_read_done (subreq=0x8052670d0) at smbd/smb2_server.c:2504 #26 0x00000000005c8c21 in tstream_readv_pdu_queue_done (subreq=0x805266dd0) at ../lib/tsocket/tsocket_helpers.c:423 #27 0x00000000005c8fd3 in tstream_readv_pdu_readv_done (subreq=0x804f41b10) at ../lib/tsocket/tsocket_helpers.c:316 #28 0x00000000005c8052 in tstream_readv_done (subreq=0x805267cd0) at ../lib/tsocket/tsocket.c:604 #29 0x00000000007b3e70 in tevent_common_loop_immediate (ev=0x80370e110) at ../lib/tevent/tevent_immediate.c:139 #30 0x00000000007b2135 in run_events_poll (ev=0x80370e110, pollrtn=0, pfds=0x0, num_pfds=0) at lib/events.c:197 #31 0x000000000052261d in smbd_server_connection_loop_once (conn=<optimized out>) at smbd/process.c:999 #32 smbd_process (sconn=0x803711350) at smbd/process.c:3174 #33 0x0000000000a02326 in smbd_accept_connection (ev=<optimized out>, fde=<optimized out>, flags=<optimized out>, private_data=<optimized out>) at smbd/server.c:671 #34 0x00000000007b2451 in run_events_poll (ev=0x80370e110, pollrtn=<optimized out>, pfds=0x8037582d0, num_pfds=7) at lib/events.c:286 #35 0x00000000007b28bf in s3_event_loop_once (ev=0x80370e110, location=<optimized out>) at lib/events.c:349 #36 0x00000000007b2c71 in _tevent_loop_once (ev=0x80370e110, location=0xc0d109 "smbd/server.c:977") at ../lib/tevent/tevent.c:494 #37 0x0000000000a0404e in smbd_parent_loop (parent=<optimized out>) at smbd/server.c:977 #38 main (argc=<optimized out>, argv=<optimized out>) at smbd/server.c:1471 We are able to reproduce the issue in house using gdb. Closing socket fd to unix domain socket to winbindd has made the process to go infinite loop. In this case, I could see the revents flags are set to POLLNVAL(0x020) which was never checked in current code. 0x000000080219e23c in ?? () from /lib/libthr.so.3 (gdb) bt #0 0x000000080219e23c in ?? () from /lib/libthr.so.3 #1 0x000000080219c335 in ?? () from /lib/libthr.so.3 #2 0x0000000802405bb4 in tpm_worker_main (param=<optimized out>) at /home/p4bui #3 0x00000008021944b1 in ?? () from /lib/libthr.so.3 #4 0x0000000000000000 in ?? () (gdb) b winbind_read_sock Breakpoint 1 at 0x802a875e0: winbind_read_sock. (2 locations) (gdb) b wb_common.c:458 Breakpoint 2 at 0x802a87658: wb_common.c:458. (2 locations) (gdb) c Continuing. [Switching to Thread 8037021c0 (LWP 101120)] Breakpoint 2, winbind_read_sock (buffer=0x7ffffffd9dd0, count=3496) at ../nsswit 458 ../nsswitch/wb_common.c: No such file or directory. (gdb) p *pfd Structure has no component named operator*. (gdb) p pfd $1 = {fd = 32, events = 17, revents = 32} (gdb) p /x pfd $2 = {fd = 0x20, events = 0x11, revents = 0x20} ==> That's POLLNVAL Adding POLLNVAL flag check and allowed the socket to be closed completely instead of looping indefinitely. But we are sure of why and who has closed the winbindd fd. winbind_read_sock() code looks no different in master/4.2 code. Hence I feel we might hit the same issue. Attaching the patch which has fixed the issue. Bug 10836 looks similar to me except that issue is seen on winbindd rather than smbd.
So just to be clear, the *outgoing* file descriptor being used by smbd to talk to winbindd had been closed somewhere within smbd without the nsswitch/wb_common.c code being involved ? So the fd created from : fd = winbind_open_pipe_sock(false, false); and used inside the loop: /* Read data from socket */ while(nread < count) { struct pollfd pfd; int ret; /* Catch pipe close on other end by checking if a read() call would not block by calling poll(). */ pfd.fd = fd; pfd.events = POLLIN|POLLHUP; /* Wait for 5 seconds for a reply. May need to parameterise this... */ ret = poll(&pfd, 1, 5000); was closed, but pfd.fd was still a valid value ?
>So just to be clear, the *outgoing* file descriptor being used by smbd to talk to winbindd had been closed somewhere within smbd without the nsswitch/wb_common.c code being involved ? Yes I think so. >was closed, but pfd.fd was still a valid value. The reason we ended up in infinite loop is, poll returned value '1' with POLLNVAL set in revents.
Yes pfd.fd was having valid value. This is what we have seen when we run the "truss" command. # truss -p 71815 poll({39/POLLIN|POLLHUP},1,5000) = 1 (0x1) poll({39/POLLIN|POLLHUP},1,5000) = 1 (0x1) poll({39/POLLIN|POLLHUP},1,5000) = 1 (0x1) poll({39/POLLIN|POLLHUP},1,5000) = 1 (0x1)
Yeah the problem is the patch is papering over the cracks, which is that something is closing that fd without the existing code being aware of it. I don't currently see how that can happen of course :-).
So I'm guessing this happened on a customer site, but you can only reproduce by attaching using gdb and forcing the fd closed - correct ? In which case we don't know where the real bug is...
(In reply to Jeremy Allison from comment #5) That's correct Jeremy. We do not know the reason how the fd got closed. But shouldn't we be taking care of POLLNVAL when there is a good chance that poll can return this revents flag irrespective of whether it is set in pfd.events? From the man page of poll: ... POLLERR An exceptional condition has occurred on the device or socket. This flag is always checked, even if not present in the events bitmask. POLLHUP The device or socket has been disconnected. This flag is always checked, even if not present in the events bitmask. Note that POLLHUP and POLLOUT should never be present in the revents bitmask at the same time. POLLNVAL The file descriptor is not open. This flag is always checked, even if not present in the events bitmask. ... All these three flags can be part of revents even when they are not set in events. I do agree that we might be covering the cracks. But that is no different from handling other errors POLLHUP and POLLERR currently. In the existing winbind_read_sock() we are not root causing anything if revents has POLLHUP or POLLERR. Instead We are going ahead with read and close the socket when it fails. As we all aware that impact of this issue will be pretty high. i.e Session will be almost unusable or dead. Takes most of CPU time and might be causing other smbds to starve.
But POLLNVAL is different from the others. The others can happen due to "normal" error conditions (network failure etc.), so we *must* cope with these. POLLNVAL means "your code has a bug" :-(. I want to fix my bugs, not hide them.
(In reply to Jeremy Allison from comment #7) Sure Jeremy. I understood the concern. I will investigate further and let you know if I can figure out root cause.
For your production system I'd certainly recommend adding your patch - it will mask the bug for customers and the outcome of the bug is very severe - however I really need to know the root cause.
I think this can only ever happen if there're some threads running, which close the fd, either winbind_read_sock() from 2 threads or a thread with another bug, that just closes a random fd...