Bug 1211 - smbd on IRIX blocks in accept() if new client drops connection
smbd on IRIX blocks in accept() if new client drops connection
Status: CLOSED FIXED
Product: Samba 3.0
Classification: Unclassified
Component: File Services
3.0.2a
SGI IRIX
: P2 minor
: none
Assigned To: Gerald (Jerry) Carter
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-03-23 16:57 UTC by Richard Garnish
Modified: 2005-08-24 10:25 UTC (History)
1 user (show)

See Also:


Attachments
Patch to fix blocking accept() call on IRIX (953 bytes, patch)
2004-03-23 16:58 UTC, Richard Garnish
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Garnish 2004-03-23 16:57:35 UTC
Since we upgraded our PDC from 2.2.x to 3.0.x we have been having problems where
samba seems to go to sleep for up to 3 minutes at a time.  Since this is the
PDC, this has been causing us major headaches, as it prevents any server on the
domain from authenticating users.

After a bit of process tracing and a lot of educated guesswork, I tracked the
sticking point down to the accept() call in the main loop of
smbd/server.c:open_sockets_smbd().  The select() returns, indicating that a
client is waiting to connect, but by the time it gets to accept() the client has
gone away.  From my dim and distant memory of Linux socket behaviour, accept()
returns -1 immediately in this case.  On IRIX, the call blocks until interrupted
by a signal (usually SIGCLD.)

The solution is to set the socket as non-blocking, so that accept() will return
in these cases.  That worked well, until I found that IRIX also exhibits
slightly dubious behaviour when spawning child descriptors from accept() - they
inherit the (non)blocking flag from the parent socket, causing untold hell when
the clients actually try transferring data (I know Linux sockets do not do this,
and I am not at all convinced it is correct behaviour.)  Solution: set the child
socket back as blocking, thus restoring normal behaviour from then on.

Patch against 3.0.2a to follow - as you can see, it is pretty trivial.  I have
not wrapped the changes in any OS-dependent defines, since as I see it:

 - the listener socket should always be set as non-blocking, just in case;
 - there is no harm in ensuring the child socket is set to block (if it's
already that way, we've wasted a couple of syscalls in the socket setup, but
otherwise it's a no-op.)

Looking back at source for earlier 2.2.x versions, I can see that nothing
significant has changed here, so it must actually be something on our network
causing this new problem, rather than the Samba upgrade itself - i.e. the
problem has actually existed all along, but hasn't caused anyone a problem yet.
 My guess would be that the culprit is the NetApp filer which was the initial
reason for upgrading the PDC to Samba 3 (something to do with Unicode support
which the NetApp CIFS implementation requires.)
Comment 1 Richard Garnish 2004-03-23 16:58:26 UTC
Created attachment 453 [details]
Patch to fix blocking accept() call on IRIX
Comment 2 Tim Potter 2004-03-25 16:25:14 UTC
added me as cc
Comment 3 Jeremy Allison 2004-03-26 15:03:19 UTC
Applied - good patch ! Thanks,
Jeremy.
Comment 4 Gerald (Jerry) Carter 2005-08-24 10:25:14 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.