Bug 4294 - spnego/kerberos authentication fails vs win2k3 AD server
spnego/kerberos authentication fails vs win2k3 AD server
Status: RESOLVED DUPLICATE of bug 4400
Product: Samba 3.0
Classification: Unclassified
Component: libsmbclient
3.0.23d
x86 Linux
: P3 normal
: none
Assigned To: Derrell Lipman
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-12-11 08:28 UTC by Vince Negri
Modified: 2007-02-16 04:22 UTC (History)
1 user (show)

See Also:


Attachments
tcpdump captured with tcpdump -s 0 -w dump.dmp (19.97 KB, application/octet-stream)
2007-02-05 02:12 UTC, Vince Negri
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Vince Negri 2006-12-11 08:28:54 UTC
A while back I set up a Linux box (SUSE 9.2) to authenticate (using kerberos) against a w2k3
AD domain. A nice side effect of this was that I could use "smbclient -k" and save typing in
my password again.

The other day, I found that
"smbclient -k" no longer worked. Basic kerberos login was still fine (i.e. kinit worked,
PAM kerberos integration still good)

Investigating further, I went over to a fresh SuSE 10.1 installation and upgraded it to
the latest Samba release (3.0.23d). I then followed the steps in the main HOWTO. 
Still no dice - this is what happens:

xx@xxx:~/xxxxx> smbclient -k -d 4  //asl4/xxxxx
lp_load: refreshing parameters
Initialising global parameters
params.c:pm_process() - Processing configuration file "/etc/samba/smb.conf"
Processing section "[global]"
doing parameter workgroup = ASL-LAN
doing parameter printing = cups
doing parameter printcap name = cups
doing parameter printcap cache time = 750
doing parameter cups options = raw
doing parameter map to guest = Bad User
doing parameter include = /etc/samba/dhcp.conf
params.c:pm_process() - Processing configuration file "/etc/samba/dhcp.conf"
doing parameter wins server = eth0:192.168.102.12 eth0:192.168.202.5
doing parameter logon path = \\%L\profiles\.msprofile
doing parameter logon home = \\%L\%U\.9xprofile
doing parameter logon drive = P:
doing parameter usershare allow guests = Yes
doing parameter client use spnego = yes
doing parameter password server = asl4.asl.lan
doing parameter realm = ASL.LAN
doing parameter security = ADS
pm_process() returned Yes
added interface ip=192.168.102.91 bcast=192.168.102.255 nmask=255.255.255.0
Client started (version 3.0.23d-5.1.39-1084-SUSE-CODE10).
resolve_lmhosts: Attempting lmhosts lookup for name asl4<0x20>
getlmhostsent: lmhost entry: 127.0.0.1 localhost
resolve_wins: Attempting wins lookup for name asl4<0x20>
wins_srv_is_dead: 192.168.102.12 is alive
wins_srv_is_dead: 192.168.102.12 is alive
resolve_wins: using WINS server 192.168.102.12 and tag 'eth0'
nmb packet from 192.168.102.12(137) header: id=18191 opcode=Query(0) response=Yes
    header: flags: bcast=No rec_avail=Yes rec_des=Yes trunc=No auth=Yes
    header: rcode=0 qdcount=0 ancount=1 nscount=0 arcount=0
    answers: nmb_name=ASL4<20> rr_type=32 rr_class=1 ttl=0
    answers   0 char `...f.   hex 6000C0A8660C
Got a positive name query response from 192.168.102.12 ( 192.168.102.12 )
Connecting to 192.168.102.12 at port 445
 session request ok
Doing spnego session setup (blob length=101)
got OID=1 2 840 48018 1 2 2
got OID=1 2 840 113554 1 2 2
got OID=1 2 840 113554 1 2 2 3
got OID=1 3 6 1 4 1 311 2 2 10
got principal=asl4$@ASL.LAN
Doing kerberos session setup
ads_cleanup_expired_creds: Ticket in ccache[FILE:/tmp/krb5cc_1001] expiration Mon, 11 Dec 2006 21:17:50 GMT
read_socket_with_timeout: timeout read. read error = Connection reset by peer.
SPNEGO login failed: NT_STATUS_INVALID_NETWORK_RESPONSE
session setup failed: Read error: Connection reset by peer

In essence, the server "asl4" (which is the w2k3 server) appears to close the connection and kick me off.

However, it has granted me a ticket - as shown by klist:

Ticket cache: FILE:/tmp/krb5cc_1001
Default principal: xx@ASL.LAN

Valid starting     Expires            Service principal
12/11/06 11:19:15  12/11/06 21:17:50  krbtgt/ASL.LAN@ASL.LAN
        renew until 12/12/06 11:19:15
12/11/06 11:19:08  12/11/06 21:17:50  asl4$@ASL.LAN
        renew until 12/12/06 11:19:15


Using smbclient in the traditional way (supplying a username and password) works perfectly.
I assume that some recent win2k3 patch or update has changed things, because I used to
have a working system - but I haven't seen anyone else posting a similar problem.

Attempting to add the machine to the domain with "net ads join" also fails with the same symptoms - the server closes the connection just after "Doing kerberos session setup"

I'm very happy to run further tests, gather more information, etc. - just need a pointer as to
where to look next!
Comment 1 Vince Negri 2006-12-12 05:59:32 UTC
Some more information:

Running smbclient with a higher debug lebel yields the following:

Got KRB5 session key of length 16
Mandatory SMB signing enabled!
SMB signing enabled!
cli_simple_set_signing: user_session_key
[000] C6 33 40 99 6C 5C 58 95  B5 E9 80 F6 27 17 D1 B0  .3@.l\X. ....'...
cli_simple_set_signing: NULL response_data
simple_packet_signature: sequence number 0
client_sign_outgoing_message: sent SMB signature of
[000] 4C 4C F6 1C 70 FA 84 92                           LL..p...
store_sequence_for_reply: stored seq = 1 mid = 2
write_socket(6,16958)
write_socket(6,16958) wrote 16958
read_socket_with_timeout: timeout read. read error = Connection reset by peer.
receive_smb_raw: length < 0!
client_receive_smb failed


It strikes me as unusual that the call to write_socket is writing 16958 bytes of data. Googling about for other log files, usually this write is about 1/10th the size. So we have a very large write on the socket, followed by the windows server closing the connection. Perhaps there is a link?
Comment 2 Vince Negri 2006-12-13 09:23:58 UTC
An additional consequence of this situation is that the libsmbclient setting to "fallback to NTLM if kerberos fails" doesn't work, since the failure of the krb authentication causes the connection to fail, and the library code assumes (not unreasonably) that the TCP connection is still up if krb5 authentication hasn't succeeded.

This has the knock-on effect of breaking the smb:// kio-slave in recent KDEs.
Comment 3 Vince Negri 2007-01-27 04:33:47 UTC
We have found (I think) the underlying issue.

My user account on the AD server is a member of a large number of groups. This makes the token size very large, and it gets fragmented (I suspected something like this in my comment #1)

We worked this out because a similar issue surfaced server-side, which we were able to fix by changing "max xmit = 65535" in smb.conf. However, libsmbclient does not look at this setting.

Would it be possible to either (a) make libsmbclient honour "max xmit" or (b) create a new "client max xmit" option in smb.conf?

Comment 4 Derrell Lipman 2007-01-27 10:39:44 UTC
You are talking about using the smbclient tool, but then you reference libsmbclient.  The smbclient tool does not use libsmbclient.  Please confirm that you are seeing this issue with the smbclient tool and not with an application which links with libsmbclient.  If so, please change the "component" in this report to "Client Tools" (above) and "Reassign bug to default assignee" (below) so that this report is redirected to the correct people.
Comment 5 Vince Negri 2007-01-28 08:56:30 UTC
I am seeing the issue both with smbclient *and* libsmbclient (more precisely, the smb:// protocol in Konqueror which uses libsmbclient) and presume that they are both doing the same thing (fragmenting the outgoing packet.)

Should I split this into two bugs?
Comment 6 Derrell Lipman 2007-01-29 08:04:22 UTC
No, don't bother with a separate bug  I have a few libsmbclient issues (now including this one) to address and expect to get to them soon (this week, if I'm lucky).  The smbclient tool issue will likely be handled by someone else, so when I'm finished with libsmbclient chnages, I'll pass it off rather than closing the bug.
Comment 7 Derrell Lipman 2007-02-03 11:35:23 UTC
It looks like libsmbclient is already using a 128K buffer for reading, so setting "max xmit" to 64k, even if it were used by libsmbclient, would not solve the problem.

I suspect, however, that a different bug I just fixed may be responsible for this problem.  I see that the read_socket_with_timeout is returning a "connection reset by peer" error.  It is possible that this occurred due to libsmbclient improperly sending a netbios keepalive packet which causes the server to shut down the connection.  We know that Vista shuts down the connection upon receiving this packet.  Older versions appear to just ignore it.  I don't know what W2k3 does with it.

Please test latest svn and let me know if anything is different.  Unfortunately, I don't have an environment set up to be able to properly locate this problem. :-(

Derrell
Comment 8 Vince Negri 2007-02-04 03:44:05 UTC
(In reply to comment #7)
> It looks like libsmbclient is already using a 128K buffer for reading, so
> setting "max xmit" to 64k, even if it were used by libsmbclient, would not
> solve the problem.

Do you mean "128K buffer for _writing_"? The problem isn't the read buffer. I suspect it's the write buffer.

> 
> I suspect, however, that a different bug I just fixed may be responsible for
> this problem.  I see that the read_socket_with_timeout is returning a
> "connection reset by peer" error.  It is possible that this occurred due to
> libsmbclient improperly sending a netbios keepalive packet which causes the
> server to shut down the connection. 

I doubt it, because other users here who are members of fewer groups (and thus need to send a smaller token) don't experience the problem. Unless you only send the netbios keepalive packet after big writes?


Comment 9 Vince Negri 2007-02-04 04:56:55 UTC
Looking at the 3.0.24 SVN source code I notice that the routine that sends the packet that results in the server disconnection is cli_session_setup_blob_send(). This routine, unlike some of the other cli_*_send routines (e.g. cli_list_new()) , does not check against the cli->max_xmit value that has been previously set up in the session negotiation. In other words, possibly the win2k3 server has already told us "don't send packets bigger than X" and we haven't obeyed the rules in this instance because normally the packet sent by cli_session_setup_blob_send() is nowhere near the typical maximum xmit.

Comment 10 Derrell Lipman 2007-02-04 10:31:04 UTC
Would you please provide a packet capture of the problem with 

  tcpdump -s 0 -w capture.pcap

That should help isolate the source of the problem.
Comment 11 Vince Negri 2007-02-05 02:12:38 UTC
Created attachment 2267 [details]
tcpdump captured with tcpdump -s 0 -w dump.dmp

Here you are. This was taken while trying an "smbclient -k" (with a valid kerberos ticket)
Comment 12 Derrell Lipman 2007-02-06 22:05:24 UTC
Jeremy, Jerry: I'm in over my head here.  Does the attached packet capture help to discover this problem?  If you can figure out what the problem is with smbclient, and it's something that needs to be set by the client software, I can then make a similar change in libsmbclient.

Thanks for your help.

Derrell
Comment 13 Vince Negri 2007-02-07 04:01:34 UTC
(In reply to comment #3)
> We have found (I think) the underlying issue.
> 
> My user account on the AD server is a member of a large number of groups. 

Just to confirm - I have checked with other people logging into the same system, and this is now confirmed: 

If the user is a member of a large number of groups on the AD server, kerberos authentication fails for both smbclient -k and KDE's smb:// KIO slave.

If the user is a member of a "normal" number of groups, then both smbclient -k and smb:// work perfectly.

Comment 14 Jeremy Allison 2007-02-07 13:05:16 UTC
What does a normal kinit return on your box ? Can this get a tgt from the AD server ?
Comment 15 Vince Negri 2007-02-07 13:27:14 UTC
(In reply to comment #14)
> What does a normal kinit return on your box ? Can this get a tgt from the AD
> server ?
> 

Yes, kinit succeeds without a problem (see original bug description.) Having run kinit, "smbclient -k" *used to work* for me until (and I have now ascertained that this is the only thing that changed) my AD account gathered more group memberships.

The closest I've got to probing this myself is in my comment #9.



Comment 16 Guenther Deschner 2007-02-16 04:22:35 UTC
The cause of this has been identified, the issue will get addressed in #4400.

*** This bug has been marked as a duplicate of 4400 ***