Bug 1667 - 3.0.6 UNSTABLE -- had to revert to 3.0.5 -- on Samba PDC
Summary: 3.0.6 UNSTABLE -- had to revert to 3.0.5 -- on Samba PDC
Status: CLOSED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: File Services (show other bugs)
Version: 3.0.6
Hardware: x86 Linux
: P3 major
Target Milestone: none
Assignee: Gerald (Jerry) Carter (dead mail address)
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2004-08-25 15:55 UTC by M. D. Parker (dead mail address)
Modified: 2005-08-24 10:18 UTC (History)
1 user (show)

See Also:


Attachments
Patch (524 bytes, patch)
2004-09-13 00:59 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description M. D. Parker (dead mail address) 2004-08-25 15:55:56 UTC
So far, I have had an unfavorable experience in our production environment with 
our Samba PDC 3.0.6.  No problems with other 3.0.6 systems.  OS=RH8.  No 
problems when I had to revert to 3.0.5

Unfortunately, with the logs being gone, I can only describe some things I 
noticed.  

We use the /etc/samba/smbpasswd NEW format file.  There were many, many lines in 
the logs indicating problems locking the file due to an incomplete system call. 
We inhibit password changes so there should be no changes occurring in the 
password file. In one client user in particular, there were notices that the 
smbpasswd file was in OLD format (which no entry is).

At times, I get this hang, where although the daemons are up, nothing is 
responding (the usual your domain controller is not available messages from 
users). Interestingly apparently some people were working. I have seen this hang 
3 times in a 24 hour period.  And being on a production system I was forced to 
revert. 

I really wish I could provide you all with logs and all sorts of nice 
information but I wanted to give you all a heads up and see if maybe there is 
something you all might know about this.
Comment 1 Gerald (Jerry) Carter (dead mail address) 2004-09-02 08:12:12 UTC
Without some logs or more information, there's not much we can do here.
None of the developers have been able to reproduce this.  Can 
you provide some more information about your environment ?
Comment 2 Gerald (Jerry) Carter (dead mail address) 2004-09-02 09:27:26 UTC
Comments sent via email----- 

I think that the problem occurrs in 3.0.5 too....but much 
more SEVERE in 3.0.6....  This is an intermittent problem but 
I sure get a LOT of flack from the user community.  The platform 
is RH8 Linux.

Hope this helps.

Here is some information that I have extracted:

 > On Samba 3.0.5 (and Samba 3.0.6) PDC I typically see messages in the 

>> logs of my Samba PDC of several terminals (I separate logs by the 
>> terminal).  They all seem to happen at about the same time and they 
>> cause the terminals in question to get messages indicating that the 
>> PDC is not present.  After a few minutes it seems to stop and things 
>> resume as normal.  I have not had this problem prior to 3.0.5.  I use 
>> the smbpasswd and all entries are in the "NEW" format.
>>
>> Is there something that I can do to fix this?  Or is this a bug in the 
>> current version(s).  I had to backoff from Samba 3.0.6 because the 
>> problem was far more disabling to my user community.
>>
>>
>> .
>> .
>> .
>> [2004/08/31 08:28:01, 3] smbd/uid.c:push_conn_ctx(351)
>>   push_conn_ctx(0) : conn_ctx_stack_ndx = 0
>> [2004/08/31 08:28:01, 3] smbd/sec_ctx.c:set_sec_ctx(288)
>>   setting sec ctx (0, 0) - sec_ctx_stack_ndx = 1
>> [2004/08/31 08:28:06, 0] lib/util_file.c:do_file_lock(67)
>>   do_file_lock: failed to lock file.
>> [2004/08/31 08:28:06, 0] passdb/pdb_smbpasswd.c:startsmbfilepwent(204)
>>   startsmbfilepwent_internal: unable to lock file /etc/samba/smbpasswd.
>> Error was Interrupted system call
>> [2004/08/31 08:28:06, 0]

passdb/pdb_smbpasswd.c:smbpasswd_getsampwnam(1305)

>>   Unable to open passdb database.
>> [2004/08/31 08:28:06, 3] smbd/sec_ctx.c:pop_sec_ctx(386)
>>   pop_sec_ctx (0, 0) - sec_ctx_stack_ndx = 0
>> [2004/08/31 08:28:06, 3] smbd/sec_ctx.c:push_sec_ctx(256)
>>   push_sec_ctx(0, 0) : sec_ctx_stack_ndx = 1 .
>> .
>> .
>> Anybody got a workaround on this? Help would be appreciated of course.
>>
>> I don't see what interrupted system call is being talked about either.
>>
>> Full listing of global setting of  smb.conf file: (via testparm -v)
>>
>>  Global parameters
>> [global]
>>         dos charset = CP850
>>         unix charset = UTF-8
>>         display charset = LOCALE
>>         workgroup = DOMAIN
>>         realm =
>>         netbios name = DOMAIN1
>>         netbios aliases = DOMAIN2
>>         netbios scope =
>>         server string = Server - File - (%h - %v)
>>         interfaces = eth3, lo
>>         bind interfaces only = Yes
>>         security = USER
>>         auth methods =
>>         encrypt passwords = Yes
>>         update encrypted = No
>>         client schannel = Auto
>>         server schannel = Auto
>>         allow trusted domains = Yes
>>         hosts equiv =
>>         min passwd length = 5
>>         map to guest = Bad User
>>         null passwords = No
>>         obey pam restrictions = No
>>         password server = *
>>         smb passwd file = /etc/samba/smbpasswd
>>         private dir = /etc/samba
>>         passdb backend = smbpasswd
>>         algorithmic rid base = 1000
>>         root directory =
>>         guest account = samba
>>         pam password change = No
>>         passwd program = /bin/false
>>         passwd chat = ""
>>         passwd chat debug = No
>>         passwd chat timeout = 2
>>         username map = /etc/samba/smbusers
>>         password level = 0
>>         username level = 0
>>         unix password sync = Yes
>>         restrict anonymous = 0
>>         lanman auth = Yes
>>         ntlm auth = Yes
>>         client NTLMv2 auth = No
>>         client lanman auth = Yes
>>         client plaintext auth = Yes
>>         preload modules =
>>         log level = 3
>>         syslog = 0
>>         syslog only = No
>>         log file = /SYSTEMS/log/samba/log.%m
>>         max log size = 200000
>>         timestamp logs = Yes
>>         debug hires timestamp = No
>>         debug pid = No
>>         debug uid = No
>>         smb ports = 445 139
>>         protocol = NT1
>>         large readwrite = Yes
>>         max protocol = NT1
>>         min protocol = CORE
>>         read bmpx = No
>>         read raw = Yes
>>         write raw = Yes
>>         disable netbios = No
>>         acl compatibility =
>>         nt pipe support = Yes
>>         nt status support = Yes
>>         announce version = 4.9
>>         announce as = NT
>>         max mux = 50
>>         max xmit = 16644
>>         name resolve order = lmhosts wins host bcast
>>         max ttl = 259200
>>         max wins ttl = 518400
>>         min wins ttl = 21600
>>         time server = Yes
>>         unix extensions = Yes
>>         use spnego = Yes
>>         client signing = auto
>>         server signing = No
>>         client use spnego = Yes
>>         change notify timeout = 60
>>         deadtime = 15
>>         getwd cache = Yes
>>         keepalive = 300
>>         kernel change notify = Yes
>>         lpq cache time = 10
>>         max smbd processes = 0
>>         paranoid server security = Yes
>>         max disk size = 0
>>         max open files = 10000
>>         socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192
>>         use mmap = Yes
>>         hostname lookups = No
>>         name cache timeout = 660
>>         load printers = Yes
>>         printcap name = cups
>>         disable spoolss = No
>>         enumports command =
>>         addprinter command =
>>         deleteprinter command =
>>         show add printer wizard = No
>>         os2 driver map =
>>         mangling method = hash2
>>         mangle prefix = 1
>>         stat cache = Yes
>>         machine password timeout = 604800
>>         add user script =
>>         delete user script =
>>         add group script =
>>         delete group script =
>>         add user to group script =
>>         delete user from group script =
>>         set primary group script =
>>         add machine script = /usr/sbin/useradd -d /dev/null -g 730 -s 
>> /bin/false -c Machine-Trust-Account -M %u
>>         shutdown script =
>>         abort shutdown script =
>>         logon script = scripts\logon.cmd
>>         logon path = \\DOMAIN1\Profiles\%u
>>         logon drive = H:
>>         logon home = \\DOMAIN1\%u
>>         domain logons = Yes
>>         os level = 255
>>         lm announce = Auto
>>         lm interval = 60
>>         preferred master = Yes
>>         local master = Yes
>>         domain master = Yes
>>         browse list = Yes
>>         enhanced browsing = Yes
>>         dns proxy = No
>>         wins proxy = No
>>         wins server = 10.1.1.29
>>         wins support = No
>>         wins hook =
>>         wins partners =
>>         kernel oplocks = Yes
>>         lock spin count = 3
>>         lock spin time = 10
>>         oplock break wait time = 0
>>         ldap suffix =
>>         ldap machine suffix =
>>         ldap user suffix =
>>         ldap group suffix =
>>         ldap idmap suffix =
>>         ldap filter = (uid=%u)
>>         ldap admin dn =
>>         ldap ssl =
>>         ldap passwd sync = no
>>         ldap delete dn = No
>>         ldap replication sleep = 1000
>>         add share command =
>>         change share command =
>>         delete share command =
>>         config file =
>>         preload =
>>         lock directory = /var/lib/samba
>>         pid directory = /var/run
>>         utmp directory =
>>         wtmp directory =
>>         utmp = No
>>         default service =
>>         message command =
>>         dfree command =
>>         get quota command =
>>         set quota command =
>>         remote announce = 10.255.255.255
>>         remote browse sync =
>>         socket address = 0.0.0.0
>>         homedir map =
>>         afs username map =
>>         time offset = 0
>>         NIS homedir = No
>>         panic action =
>>         host msdfs = Yes
>>         enable rid algorithm = Yes
>>         idmap backend =
>>         idmap uid =
>>         idmap gid =
>>         template primary group = nobody
>>         template homedir = /home/%D/%U
>>         template shell = /bin/false
>>         winbind separator = \
>>         winbind cache time = 300
>>         winbind enable local accounts = Yes
>>         winbind enum users = Yes
>>         winbind enum groups = Yes
>>         winbind use default domain = No        winbind trusted domains

only

>> = No
>>         winbind nested groups = No
>>         comment =
>>         path =
>>         username =
>>         invalid users =
>>         valid users =
>>         admin users =
>>         read list =
>>         write list =
>>         printer admin =
>>         force user =
>>         force group =
>>         read only = Yes
>>         create mask = 0744
>>         force create mode = 00
>>         security mask = 0777
>>         force security mode = 00
>>         directory mask = 0755
>>         force directory mode = 00
>>         directory security mask = 0777
>>         force directory security mode = 00
>>         inherit permissions = No
>>         inherit acls = No
>>         guest only = No
>>         guest ok = No
>>         only user = No
>>         hosts allow = 127., 141.248.
>>         hosts deny =
>>         ea support = No
>>         nt acl support = Yes
>>         profile acls = No
>>         map acl inherit = No
>>         afs share = No
>>         block size = 1024
>>         max connections = 0
>>         min print space = 0
>>         strict allocate = No
>>         strict sync = No
>>         sync always = No
>>         use sendfile = No
>>         write cache size = 0
>>         max reported print jobs = 0
>>         max print jobs = 1000
>>         printable = No
>>         printing = cups
>>         cups options =        print command =
>>         lpq command =
>>         lprm command =
>>         lppause command =
>>         lpresume command =
>>         queuepause command =
>>         queueresume command =
>>         printer name =
>>         use client driver = No
>>         default devmode = No
>>         default case = lower
>>         case sensitive = No
>>         preserve case = Yes
>>         short preserve case = Yes
>>         mangle case = No
>>         mangling char = ~
>>         hide dot files = Yes
>>         hide special files = No
>>         hide unreadable = No
>>         hide unwriteable files = No
>>         delete veto files = No
>>         veto files =
>>         hide files =
>>         veto oplock files =
>>         map system = No
>>         map hidden = No
>>         map archive = Yes
>>         mangled names = Yes
>>         mangled map =
>>         store dos attributes = No
>>         browseable = Yes
>>         blocking locks = Yes
>>         csc policy = manual
>>         fake oplocks = No
>>         locking = Yes
>>         oplocks = No
>>         level2 oplocks = No
>>         oplock contention limit = 2
>>         posix locking = Yes
>>         strict locking = No
>>         share modes = Yes
>>         copy =
>>         include =
>>         exec =
>>         preexec close = No
>>         postexec =        root preexec =
>>         root preexec close = No
>>         root postexec =
>>         available = Yes
>>         volume =
>>         fstype = NTFS
>>         set directory = No
>>         wide links = Yes
>>         follow symlinks = No
>>         dont descend =
>>         magic script =
>>         magic output =
>>         delete readonly = No
>>         dos filemode = No
>>         dos filetimes = No
>>         dos filetime resolution = No
>>         fake directory create times = No
>>         vfs objects =
>>         msdfs root = No
>>         msdfs proxy =
>>         root preexec =
Comment 3 M. D. Parker (dead mail address) 2004-09-08 15:35:04 UTC
NEW information --

Seems upon analysis of the various log material, and coding.....that the 
Interrupted system call is legit.  It is simply indicating a timeout for the 
lock request on the /etc/samba/smbpasswd file.  

OK, in checking the system further, we find one system basically constantly 
holding open / or using the passwd file for authentication for some reason thus 
causing the other clients to become basicallly locked out.

This client has the following log entries:
[2004/09/07 18:32:52, 2] smbd/server.c:exit_server(568)
  Closing connections
[2004/09/07 18:45:32, 2] smbd/sesssetup.c:setup_new_vc_session(602)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all old 
resources.
[2004/09/07 18:45:32, 2] smbd/sesssetup.c:setup_new_vc_session(602)
  setup_new_vc_session: New VC == 0, if NT4.x compatible we would close all old 
resources.
[2004/09/07 18:45:32, 2] lib/access.c:check_access(324)
  Allowed connection from  (141.248.152.163)
[2004/09/07 18:45:32, 2] lib/access.c:check_access(324)
  Allowed connection from  (141.248.152.163)
[2004/09/07 18:45:32, 0] passdb/pdb_smbpasswd.c:mod_smbfilepwd_entry(899)
  mod_smbfilepwd_entry:  Using old smbpasswd format.  This is no longer 
supported.!
[2004/09/07 18:45:32, 0] passdb/pdb_smbpasswd.c:mod_smbfilepwd_entry(900)
  mod_smbfilepwd_entry:  No changes made, failing.!
[2004/09/07 18:45:32, 0] passdb/pdb_smbpasswd.c:smbpasswd_update_sam_account
(1436)
  smbpasswd_update_sam_account: mod_smbfilepwd_entry failed!
[2004/09/07 18:45:32, 0] passdb/pdb_smbpasswd.c:mod_smbfilepwd_entry(899)
  mod_smbfilepwd_entry:  Using old smbpasswd format.  This is no longer 
supported.!
[2004/09/07 18:45:32, 0] passdb/pdb_smbpasswd.c:mod_smbfilepwd_entry(900)
  mod_smbfilepwd_entry:  No changes made, failing.!
[2004/09/07 18:45:32, 0] passdb/pdb_smbpasswd.c:smbpasswd_update_sam_account
(1436)
  smbpasswd_update_sam_account: mod_smbfilepwd_entry failed!
[2004/09/07 18:47:40, 2] smbd/process.c:timeout_processing(1138)
  Closing idle connection
[2004/09/07 18:47:40, 2] smbd/server.c:exit_server(568)
  Closing connections

====================

NOTE: Both the machine and client smbpasswd entries are of the "NEW" type:

client$:753:XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX:BDCCBA58CAE2136EE5F68206ADC748EA:
[W         ]:LCT-40F2BF93:

username:962:A8AD058EA120084B1486235A2333E4D2:11D4C351D6C291170C2D6192EB8E99FC:
[U          ]:LCT-41360DED:

Further entries in the /etc/passwd and /etc/shadow file track correctly to the 
information in the samba passwd file.

What problem are we seeing?


Comment 4 Jeremy Allison 2004-09-13 00:59:17 UTC
Created attachment 648 [details]
Patch

Ok, sorry for this. There was a return code path
that left the password file locked when detecting
an error. Here is the fix (attached as a patch).
Jeremy.
Comment 5 Jeremy Allison 2004-09-13 01:57:12 UTC
Ok, sorry for this. There was a return code path
that left the password file locked when detecting
an error. Here is the fix (attached as a patch).
Jeremy.
Comment 6 Gerald (Jerry) Carter (dead mail address) 2005-08-24 10:18:12 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.