Bug 5849 - winbind stops answering nss requests periodically
Summary: winbind stops answering nss requests periodically
Alias: None
Product: Samba 3.2
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.2.3
Hardware: Other Linux
: P3 major
Target Milestone: ---
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
Depends on:
Reported: 2008-10-24 11:36 UTC by Jerome Haltom
Modified: 2009-05-13 17:33 UTC (History)
0 users

See Also:


Note You need to log in before you can comment on or make changes to this bug.
Description Jerome Haltom 2008-10-24 11:36:22 UTC
Every now and then, maybe every 10 to 20 minutes, winbind ceases to properly resolve NSS names. This became a problem for me in Intrepid, though I did see the problem in Hardy very infrequently. It is now a regular occurance.

I have no name!@station-1:~$ whoami
whoami: cannot find name for user ID 1786588783

Obviously this causes many apps to cease functioning. If I run wbinfo -u, it begins working again:

I have no name!@station-1:~$ whoami

It looks like running wbinfo -u pokes winbind in some fashion that causes it to answer requests again.

My view of this is that winbind SHOULD have a local cache of uid to sid/name. It should *never* empty a record from this cache unless superceeded by another valid answer from the domain. There's no good reason for it to do so.

I've got debugging on. When running a failing 'whoami', I see the following:

[2008/10/24 11:31:25,  3] winbindd/winbindd_user.c:winbindd_getpwuid(466)
  [30914]: getpwuid 1786588783
[2008/10/24 11:31:31,  5] winbindd/winbindd_dual.c:async_reply_recv(264)
  Could not receive async reply from child pid 30906
[2008/10/24 11:31:31,  5] winbindd/winbindd_async.c:query_user_recv(1089)
  Could not trigger query_user
[2008/10/24 11:31:31,  5] winbindd/winbindd_user.c:getpwsid_queryuser_recv(235)
  Could not query domain ISI SID S-1-5-21-1957994488-1482476501-725345543-1207
[2008/10/24 11:31:31,  5] winbindd/winbindd_dual.c:winbind_child_died(476)
  Already reaped child 30906 died

So, it fails to query the domain. For some reason this causes it to not answer with a cached response. I don't know why!

wbinfo -u runs as expected. It says the domain has users. The second run of whoami, after wbinfo -u, looks like this:

[2008/10/24 11:32:02,  6] winbindd/winbindd.c:new_connection(716)
  accepted socket 19
[2008/10/24 11:32:02,  3] winbindd/winbindd_misc.c:winbindd_interface_version(757)
  [30932]: request interface version
[2008/10/24 11:32:02,  3] winbindd/winbindd_misc.c:winbindd_priv_pipe_dir(790)
  [30932]: request location of privileged pipe
[2008/10/24 11:32:02,  3] winbindd/winbindd_user.c:winbindd_getpwuid(466)
  [30932]: getpwuid 1786588783
[2008/10/24 11:32:02,  7] winbindd/winbindd_idmap.c:winbindd_sid2gid_async(363)
  winbindd_sid2gid_async: Resolving S-1-5-21-1957994488-1482476501-725345543-513 to a gid

So, this sucks. Basically makes my desktop unusable. Every few minutes apps begin breaking. I've got wbinfo -u running on a loop in the background to keep it working.
Comment 1 Jerome Haltom 2008-10-24 11:37:37 UTC
I think it's major. Makes my desktop unusable.
Comment 2 Jeremy Allison 2008-10-24 22:07:39 UTC
Are you able to build Samba from source ? If so I'd like you to check the latest 3-2-test git tree as there has been a fix that may address this.
Comment 3 Karolin Seeger 2009-05-13 04:04:10 UTC
Jeremy, can we close out this one?
Comment 4 Jeremy Allison 2009-05-13 15:43:58 UTC
I'd say yes, no response from submitter for request to test.
Comment 5 Jerome Haltom 2009-05-13 17:33:24 UTC
Sorry for my tardyness.

It does seem to be fixed in whatever version is in Jaunty. For a minute I thought it was still there, but I think there's some new interaction with network manager. If winbind starts before Network Manager brings the interface up, Winbind doesn't work. I'll find/open a new bug for that issue however.