Bug 6646 - Winbind authentication issue on 3.2.13/14 and 3.4.0 (was: [Samba] Crazied NTLM_AUTH on samba 3.4.0)
Summary: Winbind authentication issue on 3.2.13/14 and 3.4.0 (was: [Samba] Crazied NTL...
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.4
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 3.4.0
Hardware: x64 Linux
: P3 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL: http://www.nanogherkin.com/winbindd_a...
Keywords:
Depends on:
Blocks:
 
Reported: 2009-08-19 04:20 UTC by (dead mail address)
Modified: 2010-03-02 06:01 UTC (History)
3 users (show)

See Also:


Attachments
Fix from Volker for 3-4-test (1015 bytes, patch)
2009-09-09 07:01 UTC, Guenther Deschner
gd: review+
Details
Same patch for 3.3 (1.05 KB, patch)
2009-09-09 07:33 UTC, Guenther Deschner
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description (dead mail address) 2009-08-19 04:20:27 UTC
I am in a Samba Domain (pdc and bdc also running 3.2.14) and I have a
bidirectional trust set up to a remote Samba 3.2.14 domain.

ntlm_auth will fail after a period of time with 
error messsage was: NT code 0x1c010002

There were two instances of the issue, one shortly before 08:30 and the
other shortly before 09:24.

wbinfo authentication will also fail:

wbinfo -a ajc%xxxxxxxx
plaintext password authentication failed
Could not authenticate user ajc with plaintext password
challenge/response password authentication failed
error code was NT code 0x1c010002 (0x1c010002)
error messsage was: NT code 0x1c010002
Could not authenticate user ajc with challenge/response


I can also tell you that it can be immediately (if temporarily) restored
to operation by running "wbinfo -t". I am trying to keep my users happy
by running this every few seconds but obviously this isn't ideal!

smb.conf on the Squid server follows:

[global]
workgroup = IFA_NET
security = DOMAIN
netbios name = WEBPROXY
interfaces = eth2, lo
bind interfaces only = Yes
passdb backend = ldapsam:ldaps://bdc.ifa.net
username map = /etc/samba/smbusers
log level = 10
syslog = 0
log file = /var/log/samba/%m
max log size = 1048576
smb ports = 139 445
name resolve order = wins lmhosts bcast hosts
time server = no
#printcap name = CUPS
show add printer wizard = Yes
enable privileges = yes
ldap suffix = dc=ifa,dc=net
ldap machine suffix = ou=Computers
ldap user suffix = ou=People
ldap group suffix = ou=Groups
ldap idmap suffix = ou=Idmap
ldap admin dn = cn=Manager,dc=ifa,dc=net
ldap ssl = no
ldap timeout = 20
#idmap backend = ldap:ldap://192.168.20.137
idmap uid = 10000-20000
idmap gid = 10000-20000
#winbind nested groups = yes
winbind trusted domains only = no
winbind use default domain = yes
#winbind enum users = yes
#winbind enum groups = yes
allow trusted domains = yes
#winbind separator = +
map acl inherit = Yes
ea support = Yes
#printing = cups
#printer admin = root
wins server = 192.168.20.137
nt acl support = yes
Comment 1 Michael Adam 2009-08-19 17:53:30 UTC
acrow: can you send logs (log.winbindd, log.wb-*, ...) and possibly network traces of such a failing authentication?

Günther: seen anything like this yet?

Cheers - Michael
Comment 2 (dead mail address) 2009-08-20 01:36:31 UTC
The URL field contains a winbindd log. The log.wb- and log.winbindd files did not contain anything that remotely coincided with the failures.
Comment 3 Daniel Sheridan 2009-08-27 05:14:55 UTC
I think that I'm seeing this bug too. Apparently coinciding with the auth failures I have

[2009/08/27 11:10:17,  1] winbindd/winbindd_util.c:303(trustdom_recv)
  Could not receive trustdoms

in log.winbind, and 

[2009/08/27 11:10:17,  0] libsmb/ntlmssp_sign.c:208(ntlmssp_check_packet)
  NTLMSSP NTLM2 packet check failed due to invalid signature!
[2009/08/27 11:10:17,  0] rpc_client/cli_pipe.c:620(cli_pipe_verify_ntlmssp)
  cli_pipe_verify_ntlmssp: failed to unseal packet from host BAILEY. Error was NT_STATUS_ACCESS_DENIED.

in log.wb-ADELARD (bailey is the PDC for the Adelard domain; PDC and members are running samba 3.4.0). 

As for OP, wbinfo -t is sufficient to fix the problem for me.
Comment 4 Daniel Sheridan 2009-08-27 05:28:55 UTC
Also, maybe coincidence, but the time from running wbinfo -t to the next failure is 5 minutes -- the winbind cache time.
Comment 5 (dead mail address) 2009-08-27 05:37:46 UTC
(In reply to comment #4)
> Also, maybe coincidence, but the time from running wbinfo -t to the next
> failure is 5 minutes -- the winbind cache time.
> 

My trusted domain (which also uses the squid server on the main domain) doesn't follow this pattern - ntlm will fail every few seconds for them!

I am going to get some packet captures and upload them to my web server.
Comment 6 (dead mail address) 2009-08-27 05:51:17 UTC
A capture has been uploaded to:

http://www.nanogherkin.com/wbcap.tcpdump

host .135 is the proxy server, host .137 is the DC for IFA_NET.

The first few seconds are me browsing from my trusted domain via the proxy server.
Towards the end I run a loop of wbinfo -a against users in both domains (for some reason just authing a user on IFA_NET won't get it upset now, but a few goes on INTEGRALIFE_NET will set it off. I see a lot of this in the cap for DCERPC:

Status: nca_op_rng_error (0x1c010002)

Cheers

Alex
Comment 7 (dead mail address) 2009-08-27 07:36:15 UTC
I forgot to include this message in my original post:

http://www.mail-archive.com/samba@lists.samba.org/msg101686.html

So it looks like it's in 3.4.0 as well, possibly everything from 3.2.x-3.4.x.

Alex
Comment 8 Guenther Deschner 2009-09-03 18:38:34 UTC
Tried hard today to reproduce it in a 3.2.14 pdc/member and 3.4.0 pdc/member setup but could not yet see it. Setting up trusted domains tomorrow (seems like it only shows up there) and retry including those. This might well be a blocker for 3.4.1.
Comment 9 Guenther Deschner 2009-09-08 19:41:42 UTC
Ok, have that one reproduced as well (finally).
Comment 10 Guenther Deschner 2009-09-08 20:31:54 UTC
This is what I just saw while running an idle winbind with valgrind, this happend just right after an rpc refresh sequence number call:

==3413== Invalid read of size 4                                                    
==3413==    at 0x821141A: validate_smb_crypto (async_smb.c:818)                    
==3413==    by 0x8211AA3: handle_incoming_pdu (async_smb.c:926)                    
==3413==    by 0x8212393: cli_state_handler (async_smb.c:1100)                     
==3413==    by 0x81BCB0E: run_events (events.c:126)                                
==3413==    by 0x81BCDEB: s3_event_loop_once (events.c:185)                        
==3413==    by 0x81BDD1E: _tevent_loop_once (tevent.c:478)                         
==3413==    by 0x82EB7CD: rpc_api_pipe_req (cli_pipe.c:2323)                       
==3413==    by 0x85DE1CC: cli_do_rpc_ndr (ndr.c:62)                                
==3413==    by 0x8310C1B: rpccli_samr_QueryDomainInfo (cli_samr.c:362)             
==3413==    by 0x80E1BB4: sequence_number (winbindd_rpc.c:1005)                    
==3413==    by 0x80E2E8E: sequence_number (winbindd_reconnect.c:239)               
==3413==    by 0x80B8530: refresh_sequence_number (winbindd_cache.c:510)           
==3413==    by 0x80B8CC6: wcache_fetch (winbindd_cache.c:638)                      
==3413==    by 0x80BEB55: trusted_domains (winbindd_cache.c:2260)                  
==3413==    by 0x80D40DF: winbindd_dual_list_trusted_domains (winbindd_misc.c:361) 
==3413==    by 0x80EC142: child_process_request (winbindd_dual.c:453)              
==3413==    by 0x80EFDE6: fork_domain_child (winbindd_dual.c:1456)                 
==3413==    by 0x80EBBC1: schedule_async_request (winbindd_dual.c:314)             
==3413==    by 0x80EB35B: async_request (winbindd_dual.c:145)                      
==3413==    by 0x80B4460: init_child_connection (winbindd_util.c:627)              
==3413==  Address 0x493adcc is 0 bytes after a block of size 52 alloc'd            
==3413==    at 0x4006F3D: malloc (vg_replace_malloc.c:207)                         
==3413==    by 0x406A64A: __talloc (talloc.c:338)                                  
==3413==    by 0x406A8FD: _talloc_named_const (talloc.c:449)                       
==3413==    by 0x406C0A6: _talloc_memdup (talloc.c:1345)                           
==3413==    by 0x82119AF: handle_incoming_pdu (async_smb.c:895)                    
==3413==    by 0x8212393: cli_state_handler (async_smb.c:1100)                     
==3413==    by 0x81BCB0E: run_events (events.c:126)                                
==3413==    by 0x81BCDEB: s3_event_loop_once (events.c:185)                        
==3413==    by 0x81BDD1E: _tevent_loop_once (tevent.c:478)                         
==3413==    by 0x82EB7CD: rpc_api_pipe_req (cli_pipe.c:2323)                       
==3413==    by 0x85DE1CC: cli_do_rpc_ndr (ndr.c:62)                                
==3413==    by 0x8310C1B: rpccli_samr_QueryDomainInfo (cli_samr.c:362)             
==3413==    by 0x80E1BB4: sequence_number (winbindd_rpc.c:1005)                    
==3413==    by 0x80E2E8E: sequence_number (winbindd_reconnect.c:239)               
==3413==    by 0x80B8530: refresh_sequence_number (winbindd_cache.c:510)           
==3413==    by 0x80B8CC6: wcache_fetch (winbindd_cache.c:638)                      
==3413==    by 0x80BEB55: trusted_domains (winbindd_cache.c:2260)                  
==3413==    by 0x80D40DF: winbindd_dual_list_trusted_domains (winbindd_misc.c:361) 
==3413==    by 0x80EC142: child_process_request (winbindd_dual.c:453)              
==3413==    by 0x80EFDE6: fork_domain_child (winbindd_dual.c:1456)                 
==3413==                                                                           
==3413== Invalid read of size 2                                                    
==3413==    at 0x821142D: validate_smb_crypto (async_smb.c:819)                    
==3413==    by 0x8211AA3: handle_incoming_pdu (async_smb.c:926)                    
==3413==    by 0x8212393: cli_state_handler (async_smb.c:1100)                     
==3413==    by 0x81BCB0E: run_events (events.c:126)                                
==3413==    by 0x81BCDEB: s3_event_loop_once (events.c:185)                        
==3413==    by 0x81BDD1E: _tevent_loop_once (tevent.c:478)                         
==3413==    by 0x82EB7CD: rpc_api_pipe_req (cli_pipe.c:2323)                       
==3413==    by 0x85DE1CC: cli_do_rpc_ndr (ndr.c:62)                                
==3413==    by 0x8310C1B: rpccli_samr_QueryDomainInfo (cli_samr.c:362)             
==3413==    by 0x80E1BB4: sequence_number (winbindd_rpc.c:1005)                    
==3413==    by 0x80E2E8E: sequence_number (winbindd_reconnect.c:239)               
==3413==    by 0x80B8530: refresh_sequence_number (winbindd_cache.c:510)           
==3413==    by 0x80B8CC6: wcache_fetch (winbindd_cache.c:638)                      
==3413==    by 0x80BEB55: trusted_domains (winbindd_cache.c:2260)                  
==3413==    by 0x80D40DF: winbindd_dual_list_trusted_domains (winbindd_misc.c:361) 
==3413==    by 0x80EC142: child_process_request (winbindd_dual.c:453)              
==3413==    by 0x80EFDE6: fork_domain_child (winbindd_dual.c:1456)                 
==3413==    by 0x80EBBC1: schedule_async_request (winbindd_dual.c:314)             
==3413==    by 0x80EB35B: async_request (winbindd_dual.c:145)                      
==3413==    by 0x80B4460: init_child_connection (winbindd_util.c:627)              
==3413==  Address 0x493adcc is 0 bytes after a block of size 52 alloc'd            
==3413==    at 0x4006F3D: malloc (vg_replace_malloc.c:207)                         
==3413==    by 0x406A64A: __talloc (talloc.c:338)                                  
==3413==    by 0x406A8FD: _talloc_named_const (talloc.c:449)                       
==3413==    by 0x406C0A6: _talloc_memdup (talloc.c:1345)                           
==3413==    by 0x82119AF: handle_incoming_pdu (async_smb.c:895)                    
==3413==    by 0x8212393: cli_state_handler (async_smb.c:1100)                     
==3413==    by 0x81BCB0E: run_events (events.c:126)                                
==3413==    by 0x81BCDEB: s3_event_loop_once (events.c:185)                        
==3413==    by 0x81BDD1E: _tevent_loop_once (tevent.c:478)                         
==3413==    by 0x82EB7CD: rpc_api_pipe_req (cli_pipe.c:2323)                       
==3413==    by 0x85DE1CC: cli_do_rpc_ndr (ndr.c:62)                                
==3413==    by 0x8310C1B: rpccli_samr_QueryDomainInfo (cli_samr.c:362)             
==3413==    by 0x80E1BB4: sequence_number (winbindd_rpc.c:1005)                    
==3413==    by 0x80E2E8E: sequence_number (winbindd_reconnect.c:239)               
==3413==    by 0x80B8530: refresh_sequence_number (winbindd_cache.c:510)           
==3413==    by 0x80B8CC6: wcache_fetch (winbindd_cache.c:638)                      
==3413==    by 0x80BEB55: trusted_domains (winbindd_cache.c:2260)                  
==3413==    by 0x80D40DF: winbindd_dual_list_trusted_domains (winbindd_misc.c:361) 
==3413==    by 0x80EC142: child_process_request (winbindd_dual.c:453)              
==3413==    by 0x80EFDE6: fork_domain_child (winbindd_dual.c:1456)                 
Got non-SMB PDU                                                                    
handle_incoming_pdu: Aborting with NT_STATUS_INVALID_NETWORK_RESPONSE              
Comment 11 Jeremy Allison 2009-09-08 23:50:34 UTC
valgrind errors make this a blocker for 3.4.1 IMHO. I'll take a look at this tomorrow (unless you've already fixed it :-). Thanks.
Jeremy.
Comment 12 Guenther Deschner 2009-09-09 07:01:04 UTC
Created attachment 4667 [details]
Fix from Volker for 3-4-test
Comment 13 Guenther Deschner 2009-09-09 07:02:37 UTC
With this patch I can no longer reproduce these issues.

Karolin, please pick for 3.4.1
Comment 14 Karolin Seeger 2009-09-09 07:21:56 UTC
Pushed to v3-4-test, will be included in 3.4.1.

Re-assigning to Günther, as patch does not apply to v3-3-test.
Comment 15 Guenther Deschner 2009-09-09 07:33:07 UTC
Created attachment 4668 [details]
Same patch for 3.3
Comment 16 Karolin Seeger 2009-09-09 07:35:34 UTC
Thanks, Günther!
Pushed.
Closing out bug report.

Thanks!
Comment 17 (dead mail address) 2009-09-09 07:41:07 UTC
Thanks a lot!

Is there any chance of this getting into a new release of the 3.2.x series? A lot of people (including us) are still running it.

Regards

Alex

Comment 18 Karolin Seeger 2009-09-09 07:46:36 UTC
(In reply to comment #17)
> Thanks a lot!
> 
> Is there any chance of this getting into a new release of the 3.2.x series? A
> lot of people (including us) are still running it.
> 
> Regards
> 
> Alex
> 

I am sorry, but there will be security fixes only for the 3.2 branch.
I think, some vendors will provide 3.2 packages including this patch (and others).
Comment 19 (dead mail address) 2009-09-09 08:03:52 UTC
(In reply to comment #18)
> (In reply to comment #17)
> > Thanks a lot!
> > 
> > Is there any chance of this getting into a new release of the 3.2.x series? A
> > lot of people (including us) are still running it.
> > 
> > Regards
> > 
> > Alex
> > 
> 
> I am sorry, but there will be security fixes only for the 3.2 branch.
> I think, some vendors will provide 3.2 packages including this patch (and
> others).
> 

OK, thanks anyway - I'm sure I could probably patch it myself anyway.
Comment 20 Guenther Deschner 2009-09-09 08:15:22 UTC
(In reply to comment #19)
> (In reply to comment #18)
> > (In reply to comment #17)
> > > Thanks a lot!
> > > 
> > > Is there any chance of this getting into a new release of the 3.2.x series? A
> > > lot of people (including us) are still running it.
> > > 
> > > Regards
> > > 
> > > Alex
> > > 
> > 
> > I am sorry, but there will be security fixes only for the 3.2 branch.
> > I think, some vendors will provide 3.2 packages including this patch (and
> > others).
> > 
> 
> OK, thanks anyway - I'm sure I could probably patch it myself anyway.
> 

Yeah, just checked, the patch from comment #15 applies cleanly on top of v3-2-test git branch. As Karolin said, the official policy is that the 3.2 series is out of maintenance and there are no bugfix releases planned for it.

You might want to bug your distributor to ship the fixes that you need.
Comment 21 Konsta Saarelma 2010-03-02 06:01:59 UTC
i don't know how to fix this.anyone want to help me with this?i appreciate it very much