Bug 4973 - winbindd: Receiving SMB: Server stopped responding
Summary: winbindd: Receiving SMB: Server stopped responding
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: winbind (show other bugs)
Version: 3.0.26a
Hardware: x64 Linux
: P3 normal
Target Milestone: none
Assignee: Gerald (Jerry) Carter (dead mail address)
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-13 23:52 UTC by Konstantin Zemlyak
Modified: 2008-03-12 06:46 UTC (History)
7 users (show)

See Also:


Attachments
/var/log/messages with "SMB stopped responding" (869 bytes, text/plain)
2007-11-29 22:10 UTC, Konstantin Zemlyak
no flags Details
smbd log with debug level 10 (763.10 KB, text/plain)
2007-11-29 22:11 UTC, Konstantin Zemlyak
no flags Details
winbindd log with debug level 10 (97.88 KB, text/plain)
2007-11-29 22:11 UTC, Konstantin Zemlyak
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Konstantin Zemlyak 2007-09-13 23:52:50 UTC
I'm running small Samba PDC (tdbsam backend, nothing fancy). To enable NTLM auth in Squid (runs on the same box), I had to run winbind. Since then I'm getting lots of "Receiving SMB: Server stopped responding" in logs, about few messages in few minutes. The problem with this is that smbd children really sometimes hang for a few seconds, stalling windows clients.
Comment 1 Michael 2007-11-27 01:52:38 UTC
Hi,

same problem here.
Needed to enable NTLM-auth for a freeradius daemon.
Since then, the same problem occures:
Lot of "Receiving SMB: Server stopped responding" in winbindd-log and sometimes hanging for a few seconds (stalling windows clients).
Samba Version: till 3.0.25 to 3.0.27a.
Comment 2 Gerald (Jerry) Carter (dead mail address) 2007-11-27 07:54:55 UTC
Do you have a reproducible test case?  Some action or command that 
can be used to track down the smbd stalls?  Can you attach a level 10
debug log (from all winbindd processes and a hung smbd) with 
timestamps illustrating the problem?
Comment 3 Konstantin Zemlyak 2007-11-29 22:10:11 UTC
Created attachment 3012 [details]
/var/log/messages with "SMB stopped responding"
Comment 4 Konstantin Zemlyak 2007-11-29 22:11:14 UTC
Created attachment 3013 [details]
smbd log with debug level 10
Comment 5 Konstantin Zemlyak 2007-11-29 22:11:43 UTC
Created attachment 3014 [details]
winbindd log with debug level 10
Comment 6 Konstantin Zemlyak 2007-11-29 22:15:10 UTC
(In reply to comment #2)
> Do you have a reproducible test case?  Some action or command that 
> can be used to track down the smbd stalls?  Can you attach a level 10
> debug log (from all winbindd processes and a hung smbd) with 
> timestamps illustrating the problem?

I've been trying to find the cause for few days, trying various debug levels, running interactively in foreground, etc, but was unable to find the reliable way to get smbd to stall. The messages of SMB stopped on the other hand appear nearly all the time.
Comment 7 Volker Lendecke 2007-11-30 00:48:48 UTC
[2007/11/30 09:00:42.406797, 10, pid=5657] lib/system_smbd.c:sys_getgrouplist(125)
  sys_getgrouplist: user [godai$]
2007/11/30 09:00:52.405096, 5, pid=5660] lib/util_sock.c:print_socket_options(206)
  socket option SO_KEEPALIVE = 1

That sys_getgrouplist takes too long on smbd. Can you find out what "id godai$" does? strace that for example? What does your nsswitch.conf look like?

Volker
Comment 8 Konstantin Zemlyak 2007-11-30 01:06:46 UTC
(In reply to comment #7)
> That sys_getgrouplist takes too long on smbd. Can you find out what "id godai$"
> does? strace that for example? What does your nsswitch.conf look like?

# LANG=C id godai$
uid=520(godai$) gid=1515(computers) groups=1515(computers)

relevant entries in /etc/nsswitch.conf:
passwd:     files
shadow:     files
group:      files

Just plain shadow md5-hashed files, not ldap/sql/nis.
System in question Fedora 7, selinux disabled.
Comment 9 Volker Lendecke 2007-11-30 02:11:15 UTC
Well, I'm stuck at this point. Sorry.

Volker
Comment 10 Christian Schwamborn 2007-12-20 16:20:15 UTC
I have the same problem and I figured out that there is a connection to the 'winbind cache time' parameter in smbd.conf. If I set this to 60 sec. the 'SMB stopped respronding' message comes up every 60 seconds the times in a row about 10 seconds apart.
I'm using 3.0.28 pdc-configuration with ldap-backend on debian etch
Comment 11 Mark Lassiter 2008-01-16 15:33:42 UTC
I too am having a problem with 3.0.25b packaged with CentOS 5.1 on i386.
I receive the same messages at the same interval.

I do not have any clients connected to it at this time, and this is a clean installation. I can reproduce it easily enough:

1. Install Samba 3.0.25b (I did as part of my CentOS 5.1 install)
2. Disabled SELinux
3. Updated smb.conf.  Relevant portions below, let me know if I should attach.
4. Configured nsswitch.conf:

 passwd:     files winbind
 shadow:     files
 group:      files winbind

5. Started Samba
6. Joined to domain: net rpc join -S <hostname> -U root
7. Started winbind
8. tailed winbindd.log and noted errors:

  winbindd version 3.0.25b-1.el5_1.4 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2007
[2008/01/16 15:59:20, 0] nsswitch/winbindd_cache.c:initialize_winbindd_cache(2221)
  initialize_winbindd_cache: clearing cache and re-creating with version number 1
[2008/01/16 15:59:33, 0] libsmb/clientgen.c:cli_receive_smb(112)
  Receiving SMB: Server stopped responding
[2008/01/16 15:59:46, 0] libsmb/clientgen.c:cli_receive_smb(112)
  Receiving SMB: Server stopped responding
[2008/01/16 15:59:58, 0] libsmb/clientgen.c:cli_receive_smb(112)
  Receiving SMB: Server stopped responding

Key smb.conf entries:

	; PDC
	workgroup = LCG
	server string = LCG File Server
	local master = yes
	domain master = yes 
	preferred master = yes
	domain logons = yes
	wins support = yes
	; winbind 
	# separate domain and username with '\', like DOMAIN\username
	winbind separator = \\
	# use uids from 10000 to 20000 for domain users
	idmap uid = 10000-30000
	# use gids from 10000 to 20000 for domain groups
	idmap gid = 10000-30000
	# allow enumeration of winbind users and groups
	winbind enum users = yes
	winbind enum groups = yes
	template shell = /bin/bash
	; security	
	security = user
	encrypt passwords = yes
	pam password change = yes
	passdb backend = tdbsam
	unix password sync = yes
	guest account = guest	

Please let me know if I can provide any other info.
Comment 12 Michael 2008-02-13 03:19:31 UTC
Hi there,

I've spend some time to track down the problem.
Summary:
1.) You have to run samba in PDC mode and additional the winbindd-process.
2.) If you set "disable netbios = yes" in smb.conf, the mentioned error is "solved". But in some cases you need "disable netbios = no", so I think it is just a workaround.

Is there a bug in combination of SMB as PDC, running winbindd and set parameter "disable netbios = no"??


Greets,
Michael
Comment 13 Orion Poplawski 2008-02-28 12:01:46 UTC
I'm starting to see this too.  I need to start running winbindd because I need to trust another domain.  When I first started running winbindd, things seemed mostly okay.  I din't see these timeouts.  But I couldn't seen any of the local (CO-RA) domain users with "wbinfo -u", just the trusted domain.  From reading online I gathered that this might be because the local samba server had never been added to the local domain.  So I created a computer account for it and had it join the domain as the PDC.  After that, I started seeing these hangs in winbindd.  Note that the server of the trusted domain has not yet been added to that domain as a PDC.
Comment 14 Michael 2008-03-09 08:39:57 UTC
Hi there,

the mentioned problem seems to be solved by using the latest release (3.0.28a).

Greets,
Michael
Comment 15 Andrew 2008-03-10 21:36:59 UTC
Can anyone else please confirm this?
Was it just a straight upgrade? or was it a fresh install?

Cheers
Comment 16 Michael 2008-03-11 17:49:41 UTC
Hi Andrew,

I tried both: A fresh install and an upgrade on a Samba PDC in a test-environment.
No problems anymore.
Perhaps it has something to do with the winbindd fix mentioned in the release notes of 3.0.28a?

--- snip ---

 Simo Sorce 
    * Don't assume NULL termination when copying the principal name
      in kerberos_get_default_realm_from_ccache().
    * Fix winbindd running on a Samba DC (again).

--- snap ---

Have a nice day.
Michael
Comment 17 Konstantin Zemlyak 2008-03-11 22:10:08 UTC
Confirmed, 3.0.28a seems to fix this bug. I think the fix was in patch http://git.samba.org/?p=samba.git;a=commit;h=9347d34b502bef70cdae8f3e8acd9796dba49581
Comment 18 Gerald (Jerry) Carter (dead mail address) 2008-03-12 06:46:55 UTC
Thanks for the feedback.