When I open a folder with digicam images on an SMB shared by the Linux smbd server with OS/2 PMView (a picture viewer) the first time, PMView creates thumbnails of the images in the .ICON extended attribute. However this is very likely to fail (hang). PMView is a multitreaded application. It starts with the craetion of the images while still scanning the folder for already created thumbnails. In fact this creates heavy SMB trafic. It fails more often when the server is slow, e.g. either because of log level 10 or other system load. I think turing on the read_bmpx feature let this happen less often but this requires another very recent fix to avoid data corruption. The tcpdump shows that there is always one TRANS2_QUERY_PATH_INFORMATION request without a reply. In a more detailed analysis it turned out that the TRANS2_QUERY_PATH_INFORMATION Request is sent before the previous TRANS2_SET_PATH_INFORMATION is complete. On the server side the logfile shows something like [2005/07/15 22:47:43, 0] smbd/trans2.c:reply_trans2(5009) reply_trans2: Invalid secondary trans2 packet at the same time. (A longer excerpt from the log is available under the above url.) I think the server cannot handle the multiplexed commands form the same application correctly. Maybe the transs2 decoding cannot demultiplex based on the MID. Unfortunately I cannot track this down in the source code because I never understood how the handling of secondary trans2 packages is implemented. Marcel
What I need from you is an ethereal capture trace showing this happening. I believe the problem - I even have some ideas on how we can fix this but I must tell you it'll probably be in the 3.0.21 or later timeframe. But please send me the trace as I need to be able to see this. Thanks, Jeremy.
Jeremy, could this be the "receive_next_smb must die" problem? This might be one of the last stumbling blocks towards full async-ness, although it will probably not be easy. Volker
Yes that's exactly what it is. I'm putting this off until after the RPC auth rewrite. Jeremy.
Hello all, I've written a multithreaded OS/2 test applet, to check for this condition. When running, after about 3 seconds, one thread is hanging and the debug0 log contains: smbd version 3.0.21pre1-SVN-build-10977 started. Copyright Andrew Tridgell and the Samba Team 1992-2005 [2005/10/21 04:56:41, 0] smbd/trans2.c:reply_trans2(4864) reply_trans2: Invalid secondary trans2 packet ----------------- The app is doing the following: - thread1 is writing xattr data to file0 ... file100, one file after each other, then starts over with file0 - thread2 is doing findfirst/findnext to query the filenames in the same directory (then closes the searchhandle and starts over again) I'll send the ethereal capture "os2_samba3_multithreaded_invalid-secondary-trans2-packet.cap" Best wishes. Guenter Kukkukk
Created attachment 1531 [details] ethereal capture, done when multithreaded os/2 app was running
We know what this problem is, it's just a little tricky to fix right now. Now we have Volker's asynchronous open code it's possible, but probably we'll target this for 3.0.22, not 3.0.21. Jeremy.
With Revision 14719 I've put a patch into trunk that should fix this problem. Maybe you might want to try? Volker
Hi Volker, thanks a lot working on this stuff! :-) Lots of side-effects could arise... All my comments are related to: Current samba_3 version: 14703 Current samba-trunk: 14726 The new samba-trunk version is _much_ more thread friendly. Til now, I haven't noticed any related glitches... BUT - _new_ worms have crept into the basic stuff. When comparing current samba_3 to samba-trunk, (at least) the setting of EAs is corrupted now. Assuming a samba file system being able to store about 64KB of EAs (reiserfs, XFS), the latest trunk version is limiting the max. EA size to about 8570 bytes - due to internal errors. I had a short look at that behaviour... In smbd/trans2.c line 5334 if ((smb_base(inbuf) + doff + dcnt >= inbuf + bufsize) || (smb_base(inbuf) + doff + dcnt < smb_base(inbuf))) goto bad_param; the "goto bad_param" link is taken due to if ((smb_base(inbuf) + doff + dcnt >= inbuf + bufsize) Oh well, looking at former similar occurences, I thought, that the ">=" operator should only read ">". (but that doesn't help) But my very 1st look doesn't cure the problem. :-( But I guess, it's related... Sorry, I had only very limited time atm. Best wishes. Guenter Kukkukk btw if ((smb_base(inbuf) + doff + dcnt > inbuf + bufsize + 4 ) !!! brought back the former working behaviour (not talking about bugs)
Hi Volker, was looking again at the new "8565 EA size limit" in current samba-trunk. The 1st "set EA" wire frame always succeeds, but then the "Secondary request" (when needed - and has a special size) _always_ fails. To track that down a bit more, I wrote an OS/2 test applet, which allows byte size granularity on writing a single EA. (and small sniffs and debuglogs!) When viewing from the OS/2 side (depending on EA name length and internal stuff - about 10 or more bytes are send): - writing 8565 bytes EAs succeeds - writing 8566 bytes (or more) _always_ fail To catch some info from the 1st packet, I added a DEBUG() entry into smbd/trans2.c function reply_trans2(): ... if (state->total_data) { DEBUG(0,("GKK: reply_trans2: dsoff: %u dscnt: %u size: %u\n", dsoff, dscnt, size)); /* Can't use talloc here, the core routines do realloc on the * params and data. */ state->data = SMB_MALLOC(state->total_data); if (state->data == NULL) { DEBUG(0,("reply_trans2: data malloc fail for %u " "bytes !\n", state->total_data)); TALLOC_FREE(state); END_PROFILE(SMBtrans2); return(ERROR_DOS(ERRDOS,ERRnomem)); } if ((dsoff+dscnt < dsoff) || (dsoff+dscnt < dscnt)) goto bad_param; if ((smb_base(inbuf)+dsoff+dscnt > inbuf + size) || (smb_base(inbuf)+dsoff+dscnt < smb_base(inbuf))) goto bad_param; memcpy(state->data,smb_base(inbuf)+dsoff,dscnt); } ..... I added some DEBUG statements to smbd/trans2.c function reply_transs2() to catch the "Secondary request" misbehaviour: ... if (dcnt) { if (ddisp+dcnt > state->total_data) goto bad_param; if ((ddisp+dcnt < ddisp) || (ddisp+dcnt < dcnt)) goto bad_param; if (ddisp > state->total_data) goto bad_param; DEBUG(0,("GK0: reply_transs2: doff: %u dcnt: %u doff+dcnt: %u\n", doff, dcnt, doff + dcnt)); DEBUGADD(0,("bufsize: %u\n", bufsize)); DEBUGADD(0,("smb_base(inbuf)+doff+dcnt: 0x%08x " ">= inbuf+bufsize: 0x%08x\n\n", smb_base(inbuf)+doff+dcnt, inbuf+bufsize)); if ((smb_base(inbuf) + doff + dcnt >= inbuf + bufsize) || (smb_base(inbuf) + doff + dcnt < smb_base(inbuf))) { DEBUGADD(0,("GK1: error taken!\n\n")); goto bad_param; } if (state->data + ddisp < state->data) goto bad_param; memcpy(state->data+ddisp, smb_base(inbuf)+doff, dcnt); } .... When writing 8565 EA bytes from OS/2: [2006/03/28 02:18:16, 0] smbd/trans2.c:reply_trans2(5198) GKK: reply_trans2: dsoff: 76 dscnt: 4280 size: 4360 [2006/03/28 02:18:16, 0] smbd/trans2.c:reply_transs2(5339) GK0: reply_transs2: doff: 56 dcnt: 4295 doff+dcnt: 4351 bufsize: 4356 smb_base(inbuf)+doff+dcnt: 0x4068010b >= inbuf+bufsize: 0x4068010c BUT - when writing 8566 EA bytes (one byte more): [2006/03/28 02:21:40, 0] smbd/trans2.c:reply_trans2(5198) GKK: reply_trans2: dsoff: 76 dscnt: 4280 size: 4360 [2006/03/28 02:21:40, 0] smbd/trans2.c:reply_transs2(5339) GK0: reply_transs2: doff: 56 dcnt: 4296 doff+dcnt: 4352 bufsize: 4356 smb_base(inbuf)+doff+dcnt: 0x4068010c >= inbuf+bufsize: 0x4068010c GK1: error taken! [2006/03/28 02:21:40, 0] smbd/trans2.c:reply_transs2(5384) reply_transs2: invalid trans parameters ... The ">=" might need to be replaced with ">" - but that doesn't help! Now writing 8567 EA bytes (two bytes more.. beyond success): [2006/03/28 02:25:26, 0] smbd/trans2.c:reply_trans2(5198) GKK: reply_trans2: dsoff: 76 dscnt: 4280 size: 4360 [2006/03/28 02:25:26, 0] smbd/trans2.c:reply_transs2(5339) GK0: reply_transs2: doff: 56 dcnt: 4297 doff+dcnt: 4353 bufsize: 4356 smb_base(inbuf)+doff+dcnt: 0x4068010d >= inbuf+bufsize: 0x4068010c GK1: error taken! [2006/03/28 02:25:26, 0] smbd/trans2.c:reply_transs2(5384) reply_transs2: invalid trans parameters .... Any more EA bytes always fail at this "secondary packet". I'll send two (very short) sniffs taken, when my "oneea.exe" applet was running writing 8565 (OK) and 8566 (FAIL) bytes of EAs. Volker - don't give up on this ongoing approach to get rid of "receive_next_smb"! I'll also write windows applets, to check for the "multi threaded" behaviour of samba3... (Lots of stuff is affected by your trunk changes..) Best wishes - Guenter Kukkukk btw - the initially mentioned OS/2 program PMView is showing "Cannot create thumbnails"
Created attachment 1825 [details] Writing 8565 EA bytes from OS/2 - OK
Created attachment 1826 [details] Wrinting 8566 bytes EAs - failure
This is fixed in trunk now. Needs merging to 3_0, I'd like Jeremy to take a close look. Volker