Bug 11327 - cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
Summary: cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATU...
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Winbind (show other bugs)
Version: 4.2.2
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-06-12 06:02 UTC by Marc Muehlfeld
Modified: 2015-10-28 09:46 UTC (History)
4 users (show)

See Also:


Attachments
log.wb-MUC (level 10 debug log) (1.53 MB, application/x-gzip)
2015-06-12 06:02 UTC, Marc Muehlfeld
no flags Details
network trace, messages and log.wb-DOMAIN (82.25 KB, application/octet-stream)
2015-06-16 06:59 UTC, Marc Muehlfeld
no flags Details
Packet capture, messages, log.wb-DOMAIN (110.43 KB, application/octet-stream)
2015-06-17 06:59 UTC, Marc Muehlfeld
no flags Details
new packet capture (39.33 KB, application/vnd.tcpdump.pcap)
2015-10-21 08:09 UTC, Marc Muehlfeld
no flags Details
Possible patch for master (3.49 KB, text/plain)
2015-10-21 13:24 UTC, Stefan Metzmacher
no flags Details
Patch for v4-3-test (4.05 KB, patch)
2015-10-22 11:55 UTC, Stefan Metzmacher
vl: review+
Details
Patches for v4-2-test (4.05 KB, patch)
2015-10-22 11:55 UTC, Stefan Metzmacher
vl: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marc Muehlfeld 2015-06-12 06:02:32 UTC
Created attachment 11148 [details]
log.wb-MUC (level 10 debug log)

Since we've upgraded our member servers from 4.1.17 to 4.2.2, the log files on these machines get flooded with the following errors (6000-7000 per day on each server):


[2015/06/12 06:17:31.230322,  3, pid=70786, effective(0, 0), real(0, 0), class=rpc_cli] ../source3/rpc_client/cli_pipe.c:1803(rpc_pipe_bind_step_one_done)
  rpc_pipe_bind: host allel.muc.medizinische-genetik.de bind request returned NT_STATUS_BUFFER_TOO_SMALL
[2015/06/12 06:17:31.230350,  0, pid=70786, effective(0, 0), real(0, 0), class=rpc_cli] ../source3/rpc_client/cli_pipe.c:3065(cli_rpc_pipe_open_schannel_with_key)
  cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
...
[2015/06/12 06:17:31.230982,  3, pid=70786, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_cm.c:3015(cm_connect_netlogon)
  Could not open schannel'ed NETLOGON pipe. Error was NT_STATUS_BUFFER_TOO_SMALL
[2015/06/12 06:17:31.231091,  3, pid=70786, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_pam.c:1322(winbind_samlogon_retry_loop)
  Could not open handle to NETLOGON pipe (error: NT_STATUS_BUFFER_TOO_SMALL, attempts: 0)
[2015/06/12 06:17:31.231131,  3, pid=70786, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_pam.c:1352(winbind_samlogon_retry_loop)
  The connection to netlogon failed, retrying



Find attached a level 10 debug log
Comment 1 Marc Muehlfeld 2015-06-15 14:13:45 UTC
This bug has a serious side effect: Users can't login any more e. g. via ssh, when this error comes up (and it does often here).

Winbindd is configured in /etc/nsswitch.conf to retrieve the users from AD. SSH enabled users can login to the machine. But when this problem appears, ssh logins are denied and every try is logged in /var/log/messages with:

Jun 15 16:03:31 storage-03 sshd[70478]: error: Received disconnect from 10.1.0.254: 14: No supported authentication methods available [preauth]
Jun 15 16:03:32 storage-03 winbindd[60705]: [2015/06/15 16:03:32.836993,  0, pid=60705] ../source3/rpc_client/cli_pipe.c:3065(cli_rpc_pipe_open_schannel_with_key)
Jun 15 16:03:32 storage-03 winbindd[60705]: cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL



If I login local as root, the username resolving (e. g. id username) works fine.
It seems the password isn't validated anymore. To temporary workaround this issue, I must login local as root and restart winbindd:
# pkill winbindd
# winbindd
Comment 2 Jeremy Allison 2015-06-16 00:58:47 UTC
Can you also get a wireshark trace when this is going on ?
Comment 3 Marc Muehlfeld 2015-06-16 06:59:18 UTC
Created attachment 11159 [details]
network trace, messages and log.wb-DOMAIN

> This bug has a serious side effect: Users can't login any more e. g. via ssh, 
> when this error comes up (and it does often here).

I must correct this: Today I saw this messages comming up, and could login at the same time with an AD user. Also I saw on a different machine with 4.2.2, where the ssh login didn't work suddenly, that login start working again, if I waited a few minutes (without restarting winbindd).



> Can you also get a wireshark trace when this is going on?

Find attached a network trace, /var/log/messages and log.wb-DOMAIN captured while the
  cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
messages were logged. The log.wb-DOMAIN is this times just a level 1 log. If I turn up the log level and restart/reload winbindd, the NT_STATUS_BUFFER_TOO_SMALL is gone for a while. I hope the previous attached level 10 debug log is fine, even if it doesn't match with this network trace.
Comment 4 Marc Muehlfeld 2015-06-17 06:59:40 UTC
Created attachment 11167 [details]
Packet capture, messages, log.wb-DOMAIN

Re-attaching packet capture, /var/log/messages and log.wb-DOMAIN. This time the log.wb-DOMAIN logfile was captured with level 10 enabled and done at the same time than the other two files.
Comment 5 Marc Muehlfeld 2015-10-21 07:18:09 UTC
Because of this problem in 4.2 I downgraded a while ago to 4.1 again and the error was gone.


Yesterday I updated to 4.3.1 and it starts flooding my logs again:

[2015/10/21 09:10:39.990940,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL


This occurs more than 1150 times only during the last 5 hours!
Comment 6 Stefan Metzmacher 2015-10-21 07:53:57 UTC
(In reply to Marc Muehlfeld from comment #5)

The attached capture doesn't contain the traffic between winbindd and
the domain controller.

What Kind of DC is exon.muc.medizinische-genetik.de?
Comment 7 Marc Muehlfeld 2015-10-21 08:09:21 UTC
Created attachment 11517 [details]
new packet capture

(In reply to Stefan Metzmacher from comment #6)
> The attached capture doesn't contain the traffic between winbindd and
> the domain controller.

I did a new capture with the following filter, that should have everything included:
# tcpdump -p -s 0 -w trace.pcap host 192.168.29.9 or host 192.168.29.2
The IPs are the ones of the DCs on that AD site (allel + exon)

The capture was done, while the following eight errors entries occure in log.wb-MUC:

[2015/10/21 10:01:40.618009,  1, pid=67178] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 1 (../librpc/ndr/ndr_basic.c:79)
[2015/10/21 10:01:40.618072,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
[2015/10/21 10:01:40.695364,  1, pid=67178] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 1 (../librpc/ndr/ndr_basic.c:79)
[2015/10/21 10:01:40.695429,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
[2015/10/21 10:01:40.763428,  1, pid=67178] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 1 (../librpc/ndr/ndr_basic.c:79)
[2015/10/21 10:01:40.763484,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
[2015/10/21 10:01:40.866210,  1, pid=67178] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 1 (../librpc/ndr/ndr_basic.c:79)
[2015/10/21 10:01:40.866267,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL





> What Kind of DC is exon.muc.medizinische-genetik.de?

All DCs are Samba 4.1.19. Two (exon + allel) of them are in the same AD site as this domain member server.
Comment 8 Stefan Metzmacher 2015-10-21 09:33:24 UTC
I think I partly understand the problem.

We got the ndr error because we don't have f73ef3028c4f4583c81b611a9714608eae79360c in v4-1.
It means we can't parse a BIND_NAK response.

The reason for the BIND_NAK might appear in the logs
of the DC.

It seems that the member has a valid netlogon_creds_cli.tdb
but the doesn't know about the session key in its
schannel_store.tdb.

Can you provide logs and captures of all related servers
from a fresh winbindd restart. Once winbindd is stopped
check if someone else has netlogon_creds_cli.tdb still open.
and then remove that file. If that works retry the same without
removing the file.
Comment 9 Marc Muehlfeld 2015-10-21 10:21:50 UTC
(In reply to Stefan Metzmacher from comment #8)
> We got the ndr error because we don't have 
> f73ef3028c4f4583c81b611a9714608eae79360c in v4-1.
> It means we can't parse a BIND_NAK response.


Does this mean, if the DC is running v4-2 or later, the problem won't exist? In that case we should close this bug as WONTFIX, because v4-1 is already in security-only mode. For me this won't be a problem, because it is anyway planned to update the DCs to 4.3.1 during the next days.

Can you please confirm or deny?
Comment 10 Stefan Metzmacher 2015-10-21 10:30:03 UTC
(In reply to Marc Muehlfeld from comment #9)

I'd guess so, but I haven't tested it myself.

I think we should fix v4-2 and v4-3 in order to cope with 4.1/4.0 dcs.
Comment 11 Marc Muehlfeld 2015-10-21 10:33:36 UTC
(In reply to Stefan Metzmacher from comment #10)
> I think we should fix v4-2 and v4-3 in order to cope with 4.1/4.0 dcs.

Ok. Then I'll wait with the DC updates and come back with the requested debug stuff asap.
Comment 12 Stefan Metzmacher 2015-10-21 13:24:03 UTC
Created attachment 11518 [details]
Possible patch for master
Comment 13 Marc Muehlfeld 2015-10-22 06:27:04 UTC
Your patch works. The error flood is gone. Thanks. The only message I saw - but just once - was:
[2015/10/22 04:40:50.921599,  1, pid=22529] ../source3/rpc_client/cli_pipe.c:470(cli_pipe_validate_current_pdu)
  ../source3/rpc_client/cli_pipe.c:470: Bind NACK received from host allel.muc.medizinische-genetik.de!
Comment 14 Stefan Metzmacher 2015-10-22 07:07:55 UTC
(In reply to Marc Muehlfeld from comment #13)

Yes, this means server and client got out of sync with the negotiated
session key and the clients needs to recover, which failed before because
of the unexpected NT_STATUS_BUFFER_TOO_SMALL error code.
Comment 15 Stefan Metzmacher 2015-10-22 11:55:07 UTC
Created attachment 11527 [details]
Patch for v4-3-test
Comment 16 Stefan Metzmacher 2015-10-22 11:55:42 UTC
Created attachment 11528 [details]
Patches for v4-2-test
Comment 17 Karolin Seeger 2015-10-26 10:19:00 UTC
Pushed to autobuild-v4-[2|3]-test.
Comment 18 Karolin Seeger 2015-10-28 09:46:43 UTC
(In reply to Karolin Seeger from comment #17)
Pushed to both branches.
Closing out bug report.

Thanks!