Bug 4044 - SMB sessions closed on windows write / linux read sequence
SMB sessions closed on windows write / linux read sequence
Status: NEW
Product: Samba 3.0
Classification: Unclassified
Component: File Services
3.0.23a
x86 Linux
: P3 normal
: none
Assigned To: Samba Bugzilla Account
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-08-23 09:31 UTC by Matt Godbolt
Modified: 2006-08-31 03:17 UTC (History)
1 user (show)

See Also:


Attachments
pcap from my windows machine at the point of the failure (266.22 KB, application/octet-stream)
2006-08-23 09:37 UTC, Matt Godbolt
no flags Details
GZip of loglevel 10 during fail (26.45 KB, application/octet-stream)
2006-08-31 03:16 UTC, Matt Godbolt
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Matt Godbolt 2006-08-23 09:31:03 UTC
I have a network server running samba 3.0.23a on Debian Linux.  For the last few weeks (feasibly since an upgrade from around 3.0.10ish, though I can't be sure) I have had an intermittant issue with SMB connections being closed, which I've finally found a replicable case for (and a workaround).

I develop for both Windows and Linux on my Windows machine, and tend to edit files over a samba mount of my unix home directory, then run Make on an ssh shell on the Linux box.  This has previously worked, but now I get a 100% replicable issue.  My replicable steps to this:
* Mount unix home folder (as X:\ in this example)
* Initiate a long file operation from X:\ (in my exact case I stream MP3 audio from a separate samba share - it's then obvious when the connection dies as the music stops!)
* Have apache export the ~user/public_html folder (or similar)
* Edit a file in X:\public_html
* Immediately visit the http://linux-box/~user/file.

What I see: the web site appears exactly as you'd expect - Linux can read the files fine.  However, the music will stop playing.

In a similar setup I've also edited files from Windows and then cat/less'd them on the Linux machine and had the same problem, so it's not purely web-based.  Likewise I develop embedded linux boxes (which have NFS roots stored on the server) and when I edit their contents (via the samba mount) then run make or similar on the embedded client I get dropout on my Windows music.

This has taken me ages to discern - it really does seem to be the subsequent linux-side read of the file that causes Samba to drop connection - I can positively hammer the server with samba read/writes and unconnected linux-side reads without issue, but reading the same file shortly after writing it from samba just drops the connection.

The log file isn't that helpful: I can see normal file reads followed immediately by windows attempting to reconnect and authenticate.  No obvious 'error' on the existing connection:

[2006/08/23 14:27:54, 3] smbd/reply.c:reply_write_and_X(3134)
  writeX fnum=9262 num=16384 wrote=16384
[2006/08/23 14:27:54, 3] smbd/process.c:process_smb(1110)
  Transaction 3842 of length 16452
[2006/08/23 14:27:54, 3] smbd/process.c:switch_message(914)
  switch message SMBwriteX (pid 14361) conn 0x83b5a10
[2006/08/23 14:27:54, 4] smbd/uid.c:change_to_user(176)
  change_to_user: Skipping user change - already user
[2006/08/23 14:27:54, 3] smbd/reply.c:reply_write_and_X(3134)
  writeX fnum=9262 num=16384 wrote=16384

[rough guess at where the connection dies]

[2006/08/23 14:27:57, 3] smbd/process.c:process_smb(1110)
  Transaction 1 of length 137
[2006/08/23 14:27:57, 3] smbd/process.c:switch_message(914)
  switch message SMBnegprot (pid 14880) conn 0x0
[2006/08/23 14:27:57, 3] smbd/sec_ctx.c:set_sec_ctx(241)
  setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
[2006/08/23 14:27:57, 3] smbd/negprot.c:reply_negprot(487)
  Requested protocol [PC NETWORK PROGRAM 1.0]
[2006/08/23 14:27:57, 3] smbd/negprot.c:reply_negprot(487)
...[cut lots of login and authentication stuff]...

A packet capture in Ethereal suggests that smb is just closing the connection at almost exactly the same time the linux machine is reading from the file (argh can't get Ethereal to dump out the packets, will attempt to attach them later!)

After *lots* of false starts, I discovered that turning oplocks off 'fixes' this problem.  I'm assuming I'll end up with corrupted files if I'm not careful in this setup, but at least I don't get any problems with read-after-writes.

If it's any help I'm running a kernel of 'Linux moon 2.6.17.6 #1 Thu Jul 20 17:30:04 BST 2006 i686 GNU/Linux'
Comment 1 Matt Godbolt 2006-08-23 09:37:06 UTC
Created attachment 2101 [details]
pcap from my windows machine at the point of the failure

Initially you'll see me streaming music and repeatedly modifying the file 'buildmonitor.php'.
At record 396 you'll see me fetch buildmonitor.php as a web page (which causes the linux box to read the file I've been editing remotely).
At record 404 you'll see the connection from samba being dropped (by the server)
Records 400ish to 480ish you'll see the flurry of activity as various other files are fetched from the web server.
From about 500 you'll see my windows machine reconnecting to the music share (\\moon\music) and logging in.  Yes I know my auth details are in there, but I'm (foolishly) trusting you lot :P
Comment 2 Buck Huppmann 2006-08-30 17:00:22 UTC
i'd be curious to see what smbd says before the client reconnects. i think,
unfortunately, the relevant locking DEBUG()'s are at level 10, so if can
possibly share a

log level = 3 locking:10

with me, i'd be appreciative. (i am not a samba developer; just pathologic-
ally curious about how the kernel oplock stuff works, as well as a self-in-
terested Debian user who doesn't want to get bitten by the same thing)
Comment 3 Matt Godbolt 2006-08-31 03:16:59 UTC
Created attachment 2119 [details]
GZip of loglevel 10 during fail

This is a log with "log level = 3 locking:10" enabled.  At around 9:10:04-07ish I reloaded the webpage which caused the existing music track '07) You Learn.mp3' to fail reading.  After the reconnect my music player continued from the next track '08) Head Over Feet.Mp3'.  The file I had previously saved via samba and then loaded (as a result of a web page fetch) was 'buildmonitor.php'.
Comment 4 Matt Godbolt 2006-08-31 03:17:43 UTC
(In reply to comment #2)
> 
> log level = 3 locking:10

Just done so and posted the attachment - thanks for your interest! :)