Bug 1650 - Locking database problem Samba 3.0.6
Summary: Locking database problem Samba 3.0.6
Status: RESOLVED LATER
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.6
Hardware: All Linux
: P3 critical
Target Milestone: none
Assignee: Gerald (Jerry) Carter (dead mail address)
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-23 06:28 UTC by Denis Kirin
Modified: 2005-05-11 05:07 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Denis Kirin 2004-08-23 06:28:00 UTC
Good day.

I get Samba3.0.6 from samba.org site (src.rpm package), before i used Samba3.0.4 

After upgrade to 3.0.6 I have trouble with LOCKING database (I Think so...)
There is a listing of smbstatus on my host.
Look at the pid 15913... My Linux system (via ps -A) has no such pid but 
smbstatus shows that process 15913 exist and has open files... (see open files 
because top view of smbstatus has no pid 15913... )
The process 15913 was died... but smbstatus show it and user can not get full 
acces to his files.

Short listing of SMSTATUS:

Samba version 3.0.6
PID     Username      Group         Machine                        
-------------------------------------------------------------------
15916   tany          domusers      bu2          (10.0.0.28)
16961   alex          domusers      zal2         (10.0.0.17)
15904   buh3          domusers      zal1         (10.0.0.12)
16903   buh2          domusers      zimba        (10.0.0.57)
10823   elena         domusers      elena        (10.0.0.27)
15890   vit           domusers      sales        (10.0.0.15)
16458   serg          domusers      sergey       (10.0.0.26)
15972   acc1          domusers      wwworker     (10.0.0.56)
15907   dina          domusers      rc           (10.0.0.25)
15909   buh4          domusers      buhbuh       (10.0.0.11)
10823   root          root          elena        (10.0.0.27)

Service      pid     machine       Connected at
-------------------------------------------------------
docs         15904   zal1          Mon Aug 23 09:50:00 2004
alex         16961   zal2          Mon Aug 23 16:29:32 2004
docs         16458   sergey        Mon Aug 23 13:10:33 2004
tany         15916   bu2           Mon Aug 23 08:43:59 2004
base         15909   buhbuh        Mon Aug 23 08:37:10 2004
tany         15916   bu2           Mon Aug 23 16:38:03 2004
----/////////---------------------
15909  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5847.DBF   Mon Aug 23 15:30:01 2004
15904  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5847.DBF   Mon Aug 23 09:24:27 2004
15907  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5847.DBF   Mon Aug 23 08:52:06 2004
15916  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5847.DBF   Mon Aug 23 08:45:10 2004
15909  DENY_NONE  0x2019f     RDWR       NONE             /data/base/160404/CL.
DBF   Mon Aug 23 15:30:08 2004
15913  DENY_NONE  0x2019f     RDWR       NONE             /data/base/160404/CL.
DBF   Mon Aug 23 10:31:43 2004
15904  DENY_NONE  0x2019f     RDWR       NONE             /data/base/160404/CL.
DBF   Mon Aug 23 09:24:40 2004
15907  DENY_NONE  0x2019f     RDWR       NONE             /data/base/160404/CL.
DBF   Mon Aug 23 08:52:17 2004
15916  DENY_NONE  0x2019f     RDWR       NONE             /data/base/160404/CL.
DBF   Mon Aug 23 08:45:21 2004
15909  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT6704.DBF   Mon Aug 23 15:30:06 2004
15904  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT6704.DBF   Mon Aug 23 09:24:37 2004
15907  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT6704.DBF   Mon Aug 23 08:52:14 2004
15916  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT6704.DBF   Mon Aug 23 08:45:18 2004
15909  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC6065.DBF   Mon Aug 23 15:30:02 2004
15904  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC6065.DBF   Mon Aug 23 09:24:29 2004
15907  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC6065.DBF   Mon Aug 23 08:52:07 2004
15916  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC6065.DBF   Mon Aug 23 08:45:11 2004
16903  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5771.CDX   Mon Aug 23 16:38:31 2004
15909  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5771.CDX   Mon Aug 23 15:30:00 2004
15904  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5771.CDX   Mon Aug 23 09:24:26 2004
15907  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5771.CDX   Mon Aug 23 08:52:05 2004
15916  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/SC5771.CDX   Mon Aug 23 08:45:09 2004
15909  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT25407.CDX   Mon Aug 23 15:30:07 2004
15913  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT25407.CDX   Mon Aug 23 10:31:40 2004
15904  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT25407.CDX   Mon Aug 23 09:24:38 2004
15907  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT25407.CDX   Mon Aug 23 08:52:15 2004
15916  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT25407.CDX   Mon Aug 23 08:45:18 2004
15909  DENY_NONE  0x2019f     RDWR       NONE             
/data/base/160404/DT2998.DBF   Mon Aug 23 15:30:06 2004


History of process 15913
 
[2004/08/23 11:10:51, 1] smbd/service.c:close_cnum(837)
  zimba (10.0.0.57) closed connection to service HpLj1100
[2004/08/23 13:09:17, 0] lib/smbldap.c:smbldap_connect_system(796)
  failed to bind to server with dn= cn=Manager,dc=nego,dc=ru Error: Can't contact 
LDAP server
  	(unknown)
[2004/08/23 13:09:17, 0] lib/smbldap.c:smbldap_search_suffix(1101)
  smbldap_search_suffix: Problem during the LDAP search: (unknown) (Timed out)
[2004/08/23 13:09:33, 0] lib/smbldap.c:smbldap_search_suffix(1101)
  smbldap_search_suffix: Problem during the LDAP search: (unknown) (Timed out)
[2004/08/23 13:09:33, 0] rpc_server/srv_netlog_nt.c:get_md4pw(244)
  get_md4pw: Workstation ZIMBA$: no account in domain
[2004/08/23 13:09:48, 0] lib/smbldap.c:smbldap_search_suffix(1101)
  smbldap_search_suffix: Problem during the LDAP search: (unknown) (Timed out)
[2004/08/23 13:09:48, 0] rpc_server/srv_netlog_nt.c:get_md4pw(244)
  get_md4pw: Workstation ZIMBA$: no account in domain
[2004/08/23 14:55:42, 1] smbd/service.c:make_connection_snum(648)
  zimba (10.0.0.57) connect to service base initially as user buh2 (uid=1011, 
gid=513) (pid 15913)
[2004/08/23 14:56:39, 1] smbd/service.c:close_cnum(837)
  zimba (10.0.0.57) closed connection to service base
[2004/08/23 15:17:24, 1] smbd/service.c:make_connection_snum(648)
  zimba (10.0.0.57) connect to service base initially as user buh2 (uid=1011, 
gid=513) (pid 15913)
[2004/08/23 15:18:16, 1] smbd/service.c:close_cnum(837)
  zimba (10.0.0.57) closed connection to service base
[2004/08/23 15:22:00, 0] lib/smbldap.c:smbldap_connect_system(796)
  failed to bind to server with dn= cn=Manager,dc=nego,dc=ru Error: Can't contact 
LDAP server
  	(unknown)
[2004/08/23 15:22:00, 0] lib/smbldap.c:smbldap_search_suffix(1101)
  smbldap_search_suffix: Problem during the LDAP search: (unknown) (Timed out)
[2004/08/23 15:22:00, 1] smbd/service.c:make_connection_snum(648)
  zimba (10.0.0.57) connect to service HpLj1100 initially as user buh2 (uid=1011, 
gid=513) (pid 15913)
[2004/08/23 15:23:23, 1] smbd/service.c:close_cnum(837)
  zimba (10.0.0.57) closed connection to service HpLj1100
[2004/08/23 16:06:56, 0] lib/fault.c:fault_report(36)
  ===============================================================
[2004/08/23 16:06:56, 0] lib/fault.c:fault_report(37)
  INTERNAL ERROR: Signal 11 in pid 15913 (3.0.6)
  Please read the appendix Bugs of the Samba HOWTO collection
[2004/08/23 16:06:56, 0] lib/fault.c:fault_report(39)
  ===============================================================
[2004/08/23 16:06:56, 0] lib/util.c:smb_panic2(1385)
  PANIC: internal error
[2004/08/23 16:06:56, 0] lib/util.c:smb_panic2(1393)
  BACKTRACE: 38 stack frames:
   #0 smbd(smb_panic2+0x111) [0x81d4ea1]
   #1 smbd(smb_panic+0x1a) [0x81d4d8a]
   #2 smbd [0x81c2c88]
   #3 /lib/tls/libc.so.6 [0x192b88]
   #4 /usr/lib/libldap.so.2(ldap_int_open_connection+0x253) [0xf84163]
   #5 /usr/lib/libldap.so.2(ldap_new_connection+0x89) [0xf96019]
   #6 /usr/lib/libldap.so.2(ldap_open_defconn+0x41) [0xf83ae1]
   #7 /usr/lib/libldap.so.2(ldap_send_initial_request+0x8f) [0xf95b8f]
   #8 /usr/lib/libldap.so.2(ldap_sasl_bind+0x177) [0xf8bb07]
   #9 /usr/lib/libldap.so.2(ldap_sasl_bind_s+0x9b) [0xf8bddb]
   #10 /usr/lib/libldap.so.2(ldap_simple_bind_s+0x6c) [0xf8c60c]
   #11 smbd [0x82445bd]
   #12 smbd [0x824487a]
   #13 smbd [0x8244bcb]
   #14 smbd(smbldap_search+0x12a) [0x8244d8a]
   #15 smbd(smbldap_search_suffix+0x44) [0x8245494]
   #16 smbd [0x81ac81c]
   #17 smbd [0x81afd74]
   #18 smbd [0x81afe1c]
   #19 smbd [0x81a6d85]
   #20 smbd(pdb_getsampwsid+0x35) [0x81a8ac5]
   #21 smbd(make_server_info_guest+0x89) [0x82124c9]
   #22 smbd [0x8210382]
   #23 smbd [0x820afaf]
   #24 smbd [0x82140c9]
   #25 smbd [0x8105261]
   #26 smbd(ntlmssp_update+0x18b) [0x81047fb]
   #27 smbd(auth_ntlmssp_update+0x3d) [0x82143ed]
   #28 smbd [0x80ad841]
   #29 smbd [0x80adb14]
   #30 smbd(reply_sesssetup_and_X+0xe6b) [0x80aeb0b]
   #31 smbd [0x80d1a66]
   #32 smbd [0x80d1cf0]
   #33 smbd(process_smb+0x8c) [0x80d1efc]
   #34 smbd(smbd_process+0x168) [0x80d2c18]
   #35 smbd(main+0x4dd) [0x824796d]
   #36 /lib/tls/libc.so.6(__libc_start_main+0xf0) [0x1808a0]
   #37 smbd(ldap_msgfree+0x89) [0x80787b1]

This problem with samba3.0.6... with 3.0.4 no such problem.
 (I am Rusian and excuse my English) 
Best regards.
Denis Kirin
Comment 1 yuval yeret 2005-02-15 08:50:37 UTC
I see this bug with Samba 3.0.7 as well, here is my smbstatus output:

Samba version 3.0.7
PID     Username      Group         Machine
-------------------------------------------------------------------
<snipped several entries, none obviously 31972> 

<snipped irrelevant tables>

Pid    DenyMode   Access      R/W        Oplock           Name
--------------------------------------------------------------
31972  DENY_NONE  0x3         RDWR       NONE             /mnt/dir2/foobar0.tgz
  Tue Feb 15 14:00:37 2005
31972  DENY_NONE  0x3         RDWR       NONE             /mnt//dir2/foobar1.tgz
  Tue Feb 15 13:44:14 2005
31972  DENY_NONE  0x3         RDWR       NONE             /mnt//dir2/foobar2.tgz
  Tue Feb 15 13:29:34 2005
31972  DENY_NONE  0x3         RDWR       NONE             /mnt//dir2/foobar3.tgz
  Tue Feb 15 13:58:07 2005
<snipped several other locks by this same pid>

I also see the filenames when I run strings on the locking.tdb

Comment 2 Gerald (Jerry) Carter (dead mail address) 2005-02-15 09:24:24 UTC
Can this be reproduced against 3.0.11 ?  If so, please reopen 
the bug and update the version
Comment 3 yuval yeret 2005-02-15 09:36:15 UTC
I think I see this problem with printer queus as well (in printers/printername.tdb)
I see the queue from a windows client full of print jobs which are not active
anymore. The TDB shows those entries as well. 
Comment 4 Denis Kirin 2005-02-24 01:57:48 UTC
Bug exist...
Comment 5 Denis Kirin 2005-02-27 02:39:06 UTC
(In reply to comment #4)
> Bug exist...
> 
Bug exist in Samba 3.0.11.
Smbstatus shows Pid of SMBD process whitch was crasded and killed.
To repair problem I need to restart samba because I can not kill unexisten 
proccess.

Exuse my english. 

Comment 6 Jonas Helgi Palsson (dead mail address) 2005-04-04 14:39:31 UTC
Just adding to this bugreport, as I'm seeing similar problems.

I've got 2 Samba servers serving as (1) PDC and print server and (2) file server.
The hardware in question is IBM HS20 bladeservers (2x2.8GHz Xeon, 4G ram, SAN
storage). The OS is SLES 9 fully patched, the samba packages comes from SerNet.de.
The clients are Windows 2000 PC, 3300 of them.


The servers have frequent lockups, mainly at peak hours. The PDC more than the
fileserver. As they are production servers one has to press "the reset button"
asap to get the servers alive again. But once the PDC went down off hours and
one could see that several nmdb prosesses were taking all the resources leaving
the server with 100+ in load. 
"#killall -9 nmbd" returned the server to normal state.

No logs during or after the lockup are available, as nothing is writen in the
logs after lockup. The lockup comes abruptly, a top session grinds to a halt
showing load of ~1.75. 

Logs right before the halting are filled with:
rumba smbd[19007]: [2005/04/04 11:33:57, 0] tdb/tdbutil.c:tdb_log(725) 
rumba smbd[19007]:   tdb(/var/lib/samba/locking.tdb): tdb_lock failed on list 78
ltype=1 (Resource deadlock avoided) 
smbd[19007]: [2005/04/04 11:33:57, 0] smbd/oplock.c:oplock_break(708) 
smbd[19007]:   oplock_break: unable to lock share entry for file
directory/batchfile.bat 
smbd[19007]: [2005/04/04 11:33:57, 0] smbd/oplock.c:process_local_message(451) 
smbd[19007]:   process_local_message: oplock break failed. 
smbd[19007]: [2005/04/04 11:33:57, 0] tdb/tdbutil.c:tdb_log(725) 
smbd[19007]:   tdb(/var/lib/samba/locking.tdb): tdb_lock failed on list 78
ltype=1 (Resource deadlock avoided) 
smbd[19007]: [2005/04/04 11:33:57, 0] tdb/tdbutil.c:tdb_log(725) 
smbd[19007]:   tdb(/var/lib/samba/locking.tdb): tdb_lock failed on list 78
ltype=0 (Resource deadlock avoided) 
smbd[19007]: [2005/04/04 11:33:57, 0] tdb/tdbutil.c:tdb_log(725) 
smbd[19007]:   tdb(/var/lib/samba/locking.tdb): tdb_unlock: count is 0 

for different smbd prosesses.

This behavior has been experienced with 3.0.11 and 3.0.13 versions from build by
sernet.de

Time for a downgrade :-)

Comment 7 Jeremy Allison 2005-04-04 15:10:50 UTC
Ok, if this is seen when nmbd is spinning, it's probably not a smbd issue but
one of resource starvation when nmbd is undergoing problems. Can you attach to
the spinnign nmbd with gdb and get us a backtrace ?
Thanks,
Jeremy.
Comment 8 Gerald (Jerry) Carter (dead mail address) 2005-05-11 05:07:36 UTC
no feedback from reporter in over a month.  closing.