Bug 2882 - OS/2 Client hangs at heavy SMB trafic with nested TRANS2_SET_PATH_INFORMATION and TRANS2_QUERY_PATH_INFORMATION requests
Summary: OS/2 Client hangs at heavy SMB trafic with nested TRANS2_SET_PATH_INFORMATION...
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.20
Hardware: x86 Linux
: P3 normal
Target Milestone: none
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL: http://www.maazl.de/up/samba-bug.log
Keywords:
Depends on:
Blocks:
 
Reported: 2005-07-16 06:03 UTC by Marcel Müller
Modified: 2006-04-09 10:34 UTC (History)
2 users (show)

See Also:


Attachments
ethereal capture, done when multithreaded os/2 app was running (172.20 KB, application/octet-stream)
2005-10-20 20:49 UTC, Guenter Kukkukk
no flags Details
Writing 8565 EA bytes from OS/2 - OK (10.09 KB, application/octet-stream)
2006-03-27 18:54 UTC, Guenter Kukkukk
no flags Details
Wrinting 8566 bytes EAs - failure (10.33 KB, application/octet-stream)
2006-03-27 18:56 UTC, Guenter Kukkukk
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcel Müller 2005-07-16 06:03:31 UTC
When I open a folder with digicam images on an SMB shared by the Linux smbd
server with OS/2 PMView (a picture viewer) the first time, PMView creates
thumbnails of the images in the .ICON extended attribute.

However this is very likely to fail (hang).

PMView is a multitreaded application. It starts with the craetion of the images
while still scanning the folder for already created thumbnails. In fact this
creates heavy SMB trafic.

It fails more often when the server is slow, e.g. either because of log level 10
or other system load. I think turing on the read_bmpx feature let this happen
less often but this requires another very recent fix to avoid data corruption.

The tcpdump shows that there is always one TRANS2_QUERY_PATH_INFORMATION request
without a reply. In a more detailed analysis it turned out that the
TRANS2_QUERY_PATH_INFORMATION Request is sent before the previous
TRANS2_SET_PATH_INFORMATION is complete. On the server side the logfile shows
something like
[2005/07/15 22:47:43, 0] smbd/trans2.c:reply_trans2(5009)
  reply_trans2: Invalid secondary trans2 packet
at the same time. (A longer excerpt from the log is available under the above url.)

I think the server cannot handle the multiplexed commands form the same
application correctly. Maybe the transs2 decoding cannot demultiplex based on
the MID.

Unfortunately I cannot track this down in the source code because I never
understood how the handling of secondary trans2 packages is implemented.


Marcel
Comment 1 Jeremy Allison 2005-07-18 08:56:19 UTC
What I need from you is an ethereal capture trace showing this
happening. I believe the problem - I even have some ideas on how
we can fix this but I must tell you it'll probably be in the 
3.0.21 or later timeframe. But please send me the trace as I
need to be able to see this.
Thanks,
Jeremy.
Comment 2 Volker Lendecke 2005-08-15 12:22:50 UTC
Jeremy, could this be the "receive_next_smb must die" problem? This might be one
of the last stumbling blocks towards full async-ness, although it will probably
not be easy.

Volker
Comment 3 Jeremy Allison 2005-08-15 12:46:35 UTC
Yes that's exactly what it is. I'm putting this off until after the RPC auth
rewrite.
Jeremy.
Comment 4 Guenter Kukkukk 2005-10-20 20:45:43 UTC
Hello all, 
I've written a multithreaded OS/2 test applet, to check for this condition. 
When running, after about 3 seconds, one thread is hanging and the debug0 
log contains: 
    smbd version 3.0.21pre1-SVN-build-10977 started. 
  Copyright Andrew Tridgell and the Samba Team 1992-2005 
[2005/10/21 04:56:41, 0] smbd/trans2.c:reply_trans2(4864) 
  reply_trans2: Invalid secondary trans2 packet 
----------------- 
The app is doing the following: 
  - thread1 is writing xattr data to file0 ... file100, one file after each 
other, then starts over with file0 
  - thread2 is doing findfirst/findnext to query the filenames in the same 
directory (then closes the searchhandle and starts over again) 
 
I'll send the ethereal capture 
"os2_samba3_multithreaded_invalid-secondary-trans2-packet.cap" 
 
Best wishes. 
Guenter Kukkukk 
 
Comment 5 Guenter Kukkukk 2005-10-20 20:49:20 UTC
Created attachment 1531 [details]
ethereal capture, done when multithreaded os/2 app was running
Comment 6 Jeremy Allison 2005-10-20 22:24:52 UTC
We know what this problem is, it's just a little tricky to fix right now. Now we
have Volker's asynchronous open code it's possible, but probably we'll target
this for 3.0.22, not 3.0.21.
Jeremy.
Comment 7 Volker Lendecke 2006-03-25 09:59:29 UTC
With Revision 14719 I've put a patch into trunk that should fix this problem. Maybe you might want to try?

Volker
Comment 8 Guenter Kukkukk 2006-03-26 21:22:00 UTC
Hi Volker,

thanks a lot working on this stuff! :-)
Lots of side-effects could arise...

All my comments are related to:
Current samba_3 version: 14703
Current samba-trunk: 14726

The new samba-trunk version is _much_ more thread friendly.
Til now, I haven't noticed any related glitches...

BUT - _new_ worms have crept into the basic stuff.

When comparing current samba_3 to samba-trunk, (at least) the
setting of EAs is corrupted now.

Assuming a samba file system being able to store about 64KB
of EAs (reiserfs, XFS), the latest trunk version is limiting
the max. EA size to about 8570 bytes - due to internal errors.

I had a short look at that behaviour...
In smbd/trans2.c line 5334

                if ((smb_base(inbuf) + doff + dcnt >= inbuf + bufsize) ||
                    (smb_base(inbuf) + doff + dcnt < smb_base(inbuf)))
                        goto bad_param;
 
the "goto bad_param" link is taken due to 
             if ((smb_base(inbuf) + doff + dcnt >= inbuf + bufsize)

Oh well, looking at former similar occurences, I thought, that the
">=" operator should only read ">". (but that doesn't help)

But my very 1st look doesn't cure the problem. :-(

But I guess, it's related...
Sorry, I had only very limited time atm.

Best wishes.

Guenter Kukkukk

btw
if ((smb_base(inbuf) + doff + dcnt > inbuf + bufsize + 4 ) !!!
brought back the former working behaviour (not talking about bugs)
Comment 9 Guenter Kukkukk 2006-03-27 18:46:53 UTC
Hi Volker,

was looking again at the new "8565 EA size limit" in
current samba-trunk.

The 1st "set EA" wire frame always succeeds, but then the
"Secondary request" (when needed - and has a special size)
_always_ fails.

To track that down a bit more, I wrote an OS/2 test applet,
which allows byte size granularity on writing a single EA.
(and small sniffs and debuglogs!)

When viewing from the OS/2 side (depending on EA name length
and internal stuff - about 10 or more bytes are send):
 - writing 8565 bytes EAs succeeds
 - writing 8566 bytes (or more) _always_ fail

To catch some info from the 1st packet, I added a DEBUG() entry
into smbd/trans2.c function reply_trans2(): 
...
	if (state->total_data) {
		DEBUG(0,("GKK: reply_trans2: dsoff: %u dscnt: %u size: %u\n",
		 dsoff, dscnt, size));
				
		/* Can't use talloc here, the core routines do realloc on the
		 * params and data. */
		state->data = SMB_MALLOC(state->total_data);
		if (state->data == NULL) {
			DEBUG(0,("reply_trans2: data malloc fail for %u "
				 "bytes !\n", state->total_data));
			TALLOC_FREE(state);
			END_PROFILE(SMBtrans2);
			return(ERROR_DOS(ERRDOS,ERRnomem));
		}
		if ((dsoff+dscnt < dsoff) || (dsoff+dscnt < dscnt))
			goto bad_param;
		if ((smb_base(inbuf)+dsoff+dscnt > inbuf + size) ||
		    (smb_base(inbuf)+dsoff+dscnt < smb_base(inbuf)))
			goto bad_param;

		memcpy(state->data,smb_base(inbuf)+dsoff,dscnt);
	}
.....

I added some DEBUG statements to smbd/trans2.c function reply_transs2()
to catch the "Secondary request" misbehaviour:
...
 
        if (dcnt) {
 
                if (ddisp+dcnt > state->total_data)
 
                        goto bad_param;
 
                if ((ddisp+dcnt < ddisp) || (ddisp+dcnt < dcnt))
 
                        goto bad_param;
 
                if (ddisp > state->total_data)
 
                        goto bad_param;
 
                DEBUG(0,("GK0: reply_transs2: doff: %u dcnt: %u doff+dcnt: %u\n",
 
                         doff, dcnt, doff + dcnt));
 
                DEBUGADD(0,("bufsize: %u\n", bufsize));
 
                DEBUGADD(0,("smb_base(inbuf)+doff+dcnt: 0x%08x "
 
                            ">= inbuf+bufsize: 0x%08x\n\n",
 
                            smb_base(inbuf)+doff+dcnt, inbuf+bufsize));
 
                if ((smb_base(inbuf) + doff + dcnt >= inbuf + bufsize) ||
 
                    (smb_base(inbuf) + doff + dcnt < smb_base(inbuf))) {
 
                        DEBUGADD(0,("GK1: error taken!\n\n"));
 
                        goto bad_param;
 
                }
 
                if (state->data + ddisp < state->data)
 
                        goto bad_param;
 
                memcpy(state->data+ddisp, smb_base(inbuf)+doff,
 
                       dcnt);      
        }
....

When writing 8565 EA bytes from OS/2:

[2006/03/28 02:18:16, 0] smbd/trans2.c:reply_trans2(5198)
  GKK: reply_trans2: dsoff: 76 dscnt: 4280 size: 4360
[2006/03/28 02:18:16, 0] smbd/trans2.c:reply_transs2(5339)
  GK0: reply_transs2: doff: 56 dcnt: 4295 doff+dcnt: 4351
  bufsize: 4356
  smb_base(inbuf)+doff+dcnt: 0x4068010b >= inbuf+bufsize: 0x4068010c

BUT - when writing 8566 EA bytes (one byte more):

[2006/03/28 02:21:40, 0] smbd/trans2.c:reply_trans2(5198)
  GKK: reply_trans2: dsoff: 76 dscnt: 4280 size: 4360
[2006/03/28 02:21:40, 0] smbd/trans2.c:reply_transs2(5339)
  GK0: reply_transs2: doff: 56 dcnt: 4296 doff+dcnt: 4352
  bufsize: 4356
  smb_base(inbuf)+doff+dcnt: 0x4068010c >= inbuf+bufsize: 0x4068010c

  GK1: error taken!

[2006/03/28 02:21:40, 0] smbd/trans2.c:reply_transs2(5384)
  reply_transs2: invalid trans parameters
...
The ">=" might need to be replaced with ">" - but that doesn't
help!

Now writing 8567 EA bytes (two bytes more.. beyond success):
[2006/03/28 02:25:26, 0] smbd/trans2.c:reply_trans2(5198)
  GKK: reply_trans2: dsoff: 76 dscnt: 4280 size: 4360
[2006/03/28 02:25:26, 0] smbd/trans2.c:reply_transs2(5339)
  GK0: reply_transs2: doff: 56 dcnt: 4297 doff+dcnt: 4353
  bufsize: 4356
  smb_base(inbuf)+doff+dcnt: 0x4068010d >= inbuf+bufsize: 0x4068010c

  GK1: error taken!

[2006/03/28 02:25:26, 0] smbd/trans2.c:reply_transs2(5384)
  reply_transs2: invalid trans parameters
....

Any more EA bytes always fail at this "secondary packet".

I'll send two (very short) sniffs taken, when my "oneea.exe"
applet was running writing 8565 (OK) and 8566 (FAIL) bytes of EAs.

Volker - don't give up on this ongoing approach to get rid
of "receive_next_smb"! I'll also write windows applets, to check
for the "multi threaded" behaviour of samba3...
(Lots of stuff is affected by your trunk changes..)

Best wishes - Guenter Kukkukk

btw - the initially mentioned OS/2 program PMView is showing
"Cannot create thumbnails"
Comment 10 Guenter Kukkukk 2006-03-27 18:54:54 UTC
Created attachment 1825 [details]
Writing 8565 EA bytes from OS/2 - OK
Comment 11 Guenter Kukkukk 2006-03-27 18:56:27 UTC
Created attachment 1826 [details]
Wrinting 8566 bytes EAs - failure
Comment 12 Volker Lendecke 2006-04-09 10:34:31 UTC
This is fixed in trunk now. Needs merging to 3_0, I'd like Jeremy to take a close look.

Volker