Bug 8925 - Winbind goes defunct, can't access group information
Winbind goes defunct, can't access group information
Product: Samba 3.6
Classification: Unclassified
Component: Winbind
x64 Linux
: P5 regression
: ---
Assigned To: Michael Adam
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2012-05-09 22:19 UTC by Eric Dana
Modified: 2015-02-06 14:51 UTC (History)
3 users (show)

See Also:

Samba log files using default message level (370.00 KB, application/x-tar)
2012-05-09 22:19 UTC, Eric Dana
no flags Details
Debug level 3, winbindd debug level 10 log files (490.00 KB, application/x-tar)
2012-05-10 16:50 UTC, Eric Dana
no flags Details
log level = 10 all:10 (1.59 MB, text/plain)
2015-02-06 14:51 UTC, Piviul
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Eric Dana 2012-05-09 22:19:31 UTC
Created attachment 7541 [details]
Samba log files using default message level

Upon booting Fedora 16 (kernel 3.3.4), winbind is defunct. Logging in via X times out. Booting into init level 3, logging in takes ~20 seconds after typing in the password. sudo su - hangs after entering a password, can only ctrl/c out.

This machine is configured as a domain member using domain security.
Comment 1 Eric Dana 2012-05-09 22:21:47 UTC
This machine is being used as a department level server. The domain controller is using Windows 2003 R2.
Comment 2 Guenther Deschner 2012-05-10 07:46:20 UTC
Looks like there is a problem with your cacheing tdb:

[2012/05/09 17:58:46.521120,  0] winbindd/winbindd_cache.c:4037(cache_traverse_validate_fn)
  cache_traverse_validate_fn: key length too large: (1031) > (1024)

Can you try moving winbind_cache.tdb to another location and retry ?
Comment 3 Eric Dana 2012-05-10 16:50:20 UTC
Created attachment 7549 [details]
Debug level 3, winbindd debug level 10 log files

The winbindd cache .tdb file was moved and the system was rebooted. The long keys error is now gone, but still no joy. The winbindd process is still defunct.
Comment 4 Eric Dana 2012-05-10 17:18:14 UTC
I re-ran the test with winbind really set to log level 10 and no other changes. Now it connects properly. Reset the log level back to defaults, rebooted, and the system is working OK. You can mark the bug closed and I will monitor and re-open if needed.
Comment 5 Karolin Seeger 2012-05-19 19:39:57 UTC
Closing out bug report as requested by the reporter.
Comment 6 Michael Letzgus 2012-07-30 20:20:24 UTC
Similar problem here, 3.6.5, FreeBSD:

cache_traverse_validate_fn: key length too large: (1029) > (1024)

In the tdb dump I've found this entry:

key(1029) = "NDR/AD/2/\..."

So there really is a long key.

What to do know?

I have the whole tdb file and a maximum level debug log...
Comment 7 Michael Letzgus 2012-07-30 20:34:48 UTC
After removing the cache.tdb there is no such long entry, even after enumerating/listing all groups (~2000) and users (~30000).

In both cases (database with an without the long key) "smbcontrol winbindd validate-cache" says: Winbindd cache is valid.

How to debug this now...?
Comment 8 Michael Letzgus 2012-07-30 20:45:59 UTC
In winbindd_cache.c is some code calles "paranoia check".

If the key conatains "UA/" then the max_key_len=1024 is multiplied by 1024.

What are UA keys and why are they allowed to be longer than my NDR key?

Maybe we should it patch like this?

if (strncmp("UA/", (const char *)kbuf.dptr, 3) == 0 || strncmp("NDR/", (const char *)kbuf.dptr, 4) == 0) {
 max_key_len = 1024 * 1024;
Comment 9 Michael Letzgus 2013-11-21 09:48:51 UTC
Any ideas here?
Log files still (3.6.20) contain al ot of

cache_traverse_validate_fn: key length too large: (4665) > (1024)

Comment 10 Piviul 2015-02-06 14:51:27 UTC
Created attachment 10707 [details]
log level = 10 all:10

full log when a user receive an access denied error even browsing a folder that he have permission to