Bug 3349 - After upgrading from 3.0.20b to 3.0.21 clients hang
Summary: After upgrading from 3.0.20b to 3.0.21 clients hang
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.21
Hardware: x86 Linux
: P3 critical
Target Milestone: none
Assignee: Gerald (Jerry) Carter (dead mail address)
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-12-22 04:28 UTC by Rudolf Kollien
Modified: 2005-12-28 09:24 UTC (History)
2 users (show)

See Also:


Attachments
loglevel 5 log w2k3 (69.32 KB, application/octet-stream)
2005-12-23 05:08 UTC, Marc Groot Koerkamp
no flags Details
loglevel 5 log its_lt_01 (54.13 KB, application/octet-stream)
2005-12-23 05:11 UTC, Marc Groot Koerkamp
no flags Details
Loglevel 10 (437.59 KB, application/octet-stream)
2005-12-23 10:18 UTC, Marc Groot Koerkamp
no flags Details
become_root pair (712 bytes, patch)
2005-12-23 15:37 UTC, Volker Lendecke
no flags Details
More become_root/unbecome_root pairs necessary (2.33 KB, patch)
2005-12-23 15:48 UTC, Volker Lendecke
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rudolf Kollien 2005-12-22 04:28:28 UTC
After upgrading from 3.0.20b to 3.0.21 now clients (win98/win2k) hang when they simulatnously access the same share and files. There where not change in the configure when building 20b and 21. And there where no change in the smb.conf between the update. The share has a very simple config:

[reu]
  comment = XXXXXXXX
  path = /u/samba/pc/reu
  public = no
  writeable = yes
  printable = no
  create mode = 660
  directory mode = 770
  valid users = @reu
  force group = reu

The first client (win98/win2k) connecting can work as expected. Every additional client hangs when the application tries to open files in the same share too. Accessing the directory listing with the windows explorer is still possible. So the general access to the share seems ok. But accessing files with lock may be the problem. As this is a production server, i'm unfortunately unable to do some deeper debug. I downgraded to 20b and all's ok now.

My configure options for building the samba executables:

./configure --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc/samba3 --with-privatedir=/etc/samba3 \
--libdir=/etc/samba3 --with-libdir=/etc/samba3 \
--with-logfilebase=/var/log/samba3 --with-lockdir=/var/lock \
--with-piddir=/var/run --with-automount --with-msdfs --with-vfs --enable-cups \
--with-acl-support --with-quotas --with-libsmbclient

Running on SuSE Linux 9.0
Kernel 2.4.21-303-smp
gcc 3.3.1
Comment 1 Volker Lendecke 2005-12-22 04:59:25 UTC
That's a bit too little information. To fix this, we do need logfiles. Is it possible to set up a separate machine with 3.0.21 and provide debug level 10 logfiles from the failing case?

Thanks,

Volker
Comment 2 Rudolf Kollien 2005-12-22 05:46:59 UTC
I'm sorry to not can do so. The applications which cause the problem (and are the only applications we are still running are to complex to install (and not licenseable to ohter client machines). The amount of data needed is to big to transfer and we do no more have a second machine to test.

Maybe i'm able to reactivate the 3.0.21 for about an hour and testdrive with to clients and get logfiles. But beware: from previous debug sessions with samba i know, that the only the domain logon by itself generates log entries in the high MB score. As our mail system limits the file sizes by 3MB, we may find an other way to submit the logs.
Comment 3 Volker Lendecke 2005-12-22 06:20:19 UTC
If you set an 'include = /etc/samba/smb.conf.%I' into your main smb.conf, then you can increase the debug level for individual clients if you create /etc/samba/smb.conf.192.168.1.1 for example with the content

debug level = 10
log file = /var/log/samba/log.%m
debug hires timestamp = yes

you get individual logfiles per client and don't overload the var directory with everything else at debug level 10
Comment 4 Rudolf Kollien 2005-12-22 06:53:12 UTC
Every client has currently it's own logfile. The problem is, that with debuglevel=5 the domain logon by itself produces a logfile > 6MB. Nothing but the system startup/logon is debuged at this time. Running the concerned app, which seems to open/read more than 200 files at starup, produces a logfile ~15 to ~20 MB BEFORE the error occures. If i'm not able to reduce the files read at startup, you will get a logfile about >40 MB per client.
Comment 5 Volker Lendecke 2005-12-22 07:17:00 UTC
Yes, and where's the problem? :-) I've had to walk through logfiles approaching the Gigabyte limit. The only problem there is that searching for strings needs a powerful CPU..... :-)

It can't be disk space and bandwith is cheap these days as well. BTW, bzip2 can *really* shrink samba log files.

Volker
Comment 6 Marc Groot Koerkamp 2005-12-22 10:00:24 UTC
Just wanted to say I had the same problem with previous 3.0.21 releases (winXP SP2 clients with all updates and domain member of the samba server). Switching back to 3.0.20b solved the problem. I assume it has something to do with the new oplock system in 3.0.21.
Comment 7 Gerald (Jerry) Carter (dead mail address) 2005-12-22 10:13:00 UTC
Marc, did you report this?  This is the first I'm hearing of it.  
We fixed one such issue prior to 3.0.21.

Can anyone get a backtrace ?  or an strace?
Comment 8 Rudolf Kollien 2005-12-22 10:49:22 UTC
Got same logfiles with debuglevel 10.
Scenario:

2 x Windows98SE
samba 3.0.21, from sourcetarball on SuSE 9.0, kernel 2.4.21-303-smp
a application named "RA-Micro". Mostly written in Visual Basic. Accessing files placed on a shared samba network drive:

[ra_micro]
  comment = RA-Micro Verzeichnis
  path = /u/samba/pc/ra_micro
  public = no
  writeable = yes
  printable = no
  create mode = 660
  directory mode = 770
  valid users = @inkasso
  force group = inkasso

Remarks: This is not the only app which causes the problem. But the only one i can handle (user, password, etc.). Other apps sharing files read/write on a network drive are also concerned. 

Testdrive:

First i started the pc named PC055. The system logged on to the samba domain controller and run all desired apps including RA-Micro. Then, after successful startup of PC055, i started PC058. PC058 logged in to the samba domain controller like PC055 before. Some apps started up until RA-Micro has to be launched. Then there was nothing more happened. You can launch the windows explorer and work (slowly) with the open apps. But RA-Micro doesn't startup. In the taskmanager you can see that a process is run but nothing appears on the screen. Then i closed RA-Micro on PC055 (the first one starting up). And a view seconds after RA-Micro on PC058 starts up. This is what you can see in the logfiles.

For security concerns: the passwords where changed before the debugging and resetet to the original after finishing :-)

I agree, it might be a bug according to the (op)locks.

It doesn't matter if the OS is Win98 or Win2k. Win2k isn't able to run RA-Micro seriously on samba. Every file access is complained about "File not found. Abort - Retry". With "retry" you can access the file(s). I didn't trace this down, as therefore it would be to much user interaction on production data.
Comment 9 Rudolf Kollien 2005-12-22 11:00:54 UTC
Hmm, tried to submit the logs as attachment. But they are to big. To whom i should send it?
Comment 10 Volker Lendecke 2005-12-22 11:22:38 UTC
Easiest would be if you put it on your webserver and sent an URL. If that does not work, send them to vl at sernet dot de, I'll forward them appropriately. But please first bzip2 -9 them.

Thanks,

Volker
Comment 11 Rudolf Kollien 2005-12-22 11:32:39 UTC
I bziped it. But each log is some bytes over 1MB. I sent to you directly. I preferred this just because to be sure not to be to public with some internal data.
Comment 12 Marc Groot Koerkamp 2005-12-23 05:08:21 UTC
Created attachment 1631 [details]
loglevel 5 log w2k3

Logfile containing log about opening a word document test.doc which was already openen on another computer (its_lt_01).
This log also contains opening test6.txt which could be saved by both computers when they were open on both computers! (with notepad)
Comment 13 Marc Groot Koerkamp 2005-12-23 05:11:01 UTC
Created attachment 1632 [details]
loglevel 5 log its_lt_01

Logfile containing log about an opened word document test.doc which is opened at a later stage by another computer (its-2k3).
This log also contains opening test6.txt which could be saved by both computers when they were open on both computers! (with notepad)
Comment 14 Marc Groot Koerkamp 2005-12-23 05:14:33 UTC
Okay, i tested the released 3.0.21 version again and created the attached log files.
The difference between my previous problem is that with notepad i can open .txt files on more then 1 computer at the same time. No timeout as happens in with word documents. However, i can save the specific .txt on both computers. That means that somehow there is no write lock.
Comment 15 Volker Lendecke 2005-12-23 06:41:09 UTC
This looks wrong:

[2005/12/15 19:25:09, 0] smbd/oplock.c:request_oplock_break(1052)
  request_oplock_break: failed when sending a oplock break message to pid 15390 on port 0 for dev = 1605, inode = 26657445, file_id = 1
  Error was Invalid argument

This is not 3.0.21 code.... smbd/oplock.c 3.0.21 has only 725 lines, the line 1052 you refer to is valid for 3.0.20 code.

Can you reproduce this with a running 3.0.21 smbd?

And, debug level 5 is not enough, please provide debug level 10 logs.

Thanks,

Volker
Comment 16 Marc Groot Koerkamp 2005-12-23 06:54:58 UTC
Volker, the beginning of log its_lt_01 is from 3.0.20b. Log with 3.0.21 starts probably around 2005/12/23 13:05

Tonight I try to make a loglevel 10 log. 
Comment 17 Volker Lendecke 2005-12-23 07:06:12 UTC
If you do, please stop smbd, make sure you have 3.0.21 installed, delete all
logfiles, start smbd. Then do (and please describe) the steps you do to
reproduce the problem.

BTW, please also set 'max log size = 0' during your tests, your logfiles seem
truncated. And bzip2 -9 them :-)

Thanks,

Volker
Comment 18 Marc Groot Koerkamp 2005-12-23 10:18:24 UTC
Created attachment 1633 [details]
Loglevel 10

Okay, here are the loglevel 10 log files.
What I did is the following:
* removed all old logfiles
* started samba 3.0.21
* synchronised the its_lt_01 laptop in order to access the samba share (/data/temp) where the test files are located
* Opened test.doc on its_lt_01 (XP SP2, domain member of ITS domain on samba 3.0.21)
* Opened test.doc on its-2k3 (win2k3 server which is a domain member of the ITS domain on samba 3.0.21)
* Opened test6.txt on its_lt_01
* Opened test6.txt on its-2k3
* Changed test6.txt on its-2k3 and saved (shouldn't be possible)
* Closed test6.txt and reopened it on its_lt_01 and checked the change made on its-2k3 (it actually changed which is weird because its_lt_01 opened it first)
* Waited till word timed out on its-2k3
* shutdown samba

Both its_lt_01 and its-2k3 were logged in to the ITS domain while samba 3.0.20b was still running. I didn't let them log in again cleanly on 3.0.21 because profiles and synchronisation generates a lot of noise in the logfiles.
They simply reconnected when samba 3.0.21 was started.
Comment 19 Volker Lendecke 2005-12-23 15:37:54 UTC
Created attachment 1634 [details]
become_root pair

Could you try the attached patch?

Thanks,

Volker
Comment 20 Volker Lendecke 2005-12-23 15:48:46 UTC
Created attachment 1635 [details]
More become_root/unbecome_root pairs necessary

There are more places where this kind of patch is necessary. Please try this new one.

Thanks,

Volker
Comment 21 Jeremy Allison 2005-12-24 13:52:03 UTC
Volker, can you please apply these fixes to the HEAD and 3.0 SVN trees.
I'd like to see everything in place.
Thanks,
Jeremy.
Comment 22 Marc Groot Koerkamp 2005-12-25 03:22:25 UTC
Volker, 27 december i will try the patch and report back if that solves the issue.
Comment 23 Daniel Beschorner (dead mail address) 2005-12-26 10:32:21 UTC
We have many "Trying to delay for oplocks twice" in the logs and concurrent accesses on files fail for all but the first one with "network error" on the client.
Maybe related to this one, I'll try the patch.
Comment 24 Marc Groot Koerkamp 2005-12-27 02:03:29 UTC
Volker, the patch fixed the issue I had with word documents. The .txt documents can still be opened on multiple computers without being locked. I don't know what samba behaviour was before 3.0.21 and it's not realy a problem for me.
Thnx for fixing the issue.
Comment 25 Thomas Bork 2005-12-28 08:04:48 UTC
Please add the patch to

http://usX.samba.org/samba/patches/

der tom
Comment 26 Gerald (Jerry) Carter (dead mail address) 2005-12-28 09:24:22 UTC
patch applied to all branches now.  Will be in 3.0.21a