11327 – cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL

Bug 11327 - cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL

Summary: cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATU...

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 4.1 and newer
Classification:	Unclassified
Component:	Winbind (show other bugs)
Version:	4.2.2
Hardware:	All All

Importance:	P5 normal (vote)
Target Milestone:	---
Assignee:	Karolin Seeger
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2015-06-12 06:02 UTC by Marc Muehlfeld
Modified:	2015-10-28 09:46 UTC (History)
CC List:	4 users (show)

See Also:

Attachments
log.wb-MUC (level 10 debug log) (1.53 MB, application/x-gzip) 2015-06-12 06:02 UTC, Marc Muehlfeld	no flags	Details
network trace, messages and log.wb-DOMAIN (82.25 KB, application/octet-stream) 2015-06-16 06:59 UTC, Marc Muehlfeld	no flags	Details
Packet capture, messages, log.wb-DOMAIN (110.43 KB, application/octet-stream) 2015-06-17 06:59 UTC, Marc Muehlfeld	no flags	Details
new packet capture (39.33 KB, application/vnd.tcpdump.pcap) 2015-10-21 08:09 UTC, Marc Muehlfeld	no flags	Details
Possible patch for master (3.49 KB, text/plain) 2015-10-21 13:24 UTC, Stefan Metzmacher	no flags	Details
Patch for v4-3-test (4.05 KB, patch) 2015-10-22 11:55 UTC, Stefan Metzmacher	vl: review+	Details
Patches for v4-2-test (4.05 KB, patch) 2015-10-22 11:55 UTC, Stefan Metzmacher	vl: review+	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Marc Muehlfeld 2015-06-12 06:02:32 UTC

Created attachment 11148 [details]
log.wb-MUC (level 10 debug log)

Since we've upgraded our member servers from 4.1.17 to 4.2.2, the log files on these machines get flooded with the following errors (6000-7000 per day on each server):


[2015/06/12 06:17:31.230322,  3, pid=70786, effective(0, 0), real(0, 0), class=rpc_cli] ../source3/rpc_client/cli_pipe.c:1803(rpc_pipe_bind_step_one_done)
  rpc_pipe_bind: host allel.muc.medizinische-genetik.de bind request returned NT_STATUS_BUFFER_TOO_SMALL
[2015/06/12 06:17:31.230350,  0, pid=70786, effective(0, 0), real(0, 0), class=rpc_cli] ../source3/rpc_client/cli_pipe.c:3065(cli_rpc_pipe_open_schannel_with_key)
  cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
...
[2015/06/12 06:17:31.230982,  3, pid=70786, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_cm.c:3015(cm_connect_netlogon)
  Could not open schannel'ed NETLOGON pipe. Error was NT_STATUS_BUFFER_TOO_SMALL
[2015/06/12 06:17:31.231091,  3, pid=70786, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_pam.c:1322(winbind_samlogon_retry_loop)
  Could not open handle to NETLOGON pipe (error: NT_STATUS_BUFFER_TOO_SMALL, attempts: 0)
[2015/06/12 06:17:31.231131,  3, pid=70786, effective(0, 0), real(0, 0), class=winbind] ../source3/winbindd/winbindd_pam.c:1352(winbind_samlogon_retry_loop)
  The connection to netlogon failed, retrying



Find attached a level 10 debug log

Comment 1 Marc Muehlfeld 2015-06-15 14:13:45 UTC

This bug has a serious side effect: Users can't login any more e. g. via ssh, when this error comes up (and it does often here).

Winbindd is configured in /etc/nsswitch.conf to retrieve the users from AD. SSH enabled users can login to the machine. But when this problem appears, ssh logins are denied and every try is logged in /var/log/messages with:

Jun 15 16:03:31 storage-03 sshd[70478]: error: Received disconnect from 10.1.0.254: 14: No supported authentication methods available [preauth]
Jun 15 16:03:32 storage-03 winbindd[60705]: [2015/06/15 16:03:32.836993,  0, pid=60705] ../source3/rpc_client/cli_pipe.c:3065(cli_rpc_pipe_open_schannel_with_key)
Jun 15 16:03:32 storage-03 winbindd[60705]: cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL



If I login local as root, the username resolving (e. g. id username) works fine.
It seems the password isn't validated anymore. To temporary workaround this issue, I must login local as root and restart winbindd:
# pkill winbindd
# winbindd

Comment 2 Jeremy Allison 2015-06-16 00:58:47 UTC

Can you also get a wireshark trace when this is going on ?

Comment 3 Marc Muehlfeld 2015-06-16 06:59:18 UTC

Created attachment 11159 [details]
network trace, messages and log.wb-DOMAIN

> This bug has a serious side effect: Users can't login any more e. g. via ssh, 
> when this error comes up (and it does often here).

I must correct this: Today I saw this messages comming up, and could login at the same time with an AD user. Also I saw on a different machine with 4.2.2, where the ssh login didn't work suddenly, that login start working again, if I waited a few minutes (without restarting winbindd).



> Can you also get a wireshark trace when this is going on?

Find attached a network trace, /var/log/messages and log.wb-DOMAIN captured while the
  cli_rpc_pipe_open_schannel_with_key: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
messages were logged. The log.wb-DOMAIN is this times just a level 1 log. If I turn up the log level and restart/reload winbindd, the NT_STATUS_BUFFER_TOO_SMALL is gone for a while. I hope the previous attached level 10 debug log is fine, even if it doesn't match with this network trace.

Comment 4 Marc Muehlfeld 2015-06-17 06:59:40 UTC

Created attachment 11167 [details]
Packet capture, messages, log.wb-DOMAIN

Re-attaching packet capture, /var/log/messages and log.wb-DOMAIN. This time the log.wb-DOMAIN logfile was captured with level 10 enabled and done at the same time than the other two files.

Comment 5 Marc Muehlfeld 2015-10-21 07:18:09 UTC

Because of this problem in 4.2 I downgraded a while ago to 4.1 again and the error was gone.


Yesterday I updated to 4.3.1 and it starts flooding my logs again:

[2015/10/21 09:10:39.990940,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL


This occurs more than 1150 times only during the last 5 hours!

Comment 6 Stefan Metzmacher 2015-10-21 07:53:57 UTC

(In reply to Marc Muehlfeld from comment #5)

The attached capture doesn't contain the traffic between winbindd and
the domain controller.

What Kind of DC is exon.muc.medizinische-genetik.de?

Comment 7 Marc Muehlfeld 2015-10-21 08:09:21 UTC

Created attachment 11517 [details]
new packet capture

(In reply to Stefan Metzmacher from comment #6)
> The attached capture doesn't contain the traffic between winbindd and
> the domain controller.

I did a new capture with the following filter, that should have everything included:
# tcpdump -p -s 0 -w trace.pcap host 192.168.29.9 or host 192.168.29.2
The IPs are the ones of the DCs on that AD site (allel + exon)

The capture was done, while the following eight errors entries occure in log.wb-MUC:

[2015/10/21 10:01:40.618009,  1, pid=67178] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 1 (../librpc/ndr/ndr_basic.c:79)
[2015/10/21 10:01:40.618072,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
[2015/10/21 10:01:40.695364,  1, pid=67178] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 1 (../librpc/ndr/ndr_basic.c:79)
[2015/10/21 10:01:40.695429,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
[2015/10/21 10:01:40.763428,  1, pid=67178] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 1 (../librpc/ndr/ndr_basic.c:79)
[2015/10/21 10:01:40.763484,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL
[2015/10/21 10:01:40.866210,  1, pid=67178] ../librpc/ndr/ndr.c:578(ndr_pull_error)
  ndr_pull_error(11): Pull bytes 1 (../librpc/ndr/ndr_basic.c:79)
[2015/10/21 10:01:40.866267,  0, pid=67178] ../source3/rpc_client/cli_pipe.c:3170(cli_rpc_pipe_open_schannel_with_creds)
  cli_rpc_pipe_open_schannel_with_creds: rpc_pipe_bind failed with error NT_STATUS_BUFFER_TOO_SMALL





> What Kind of DC is exon.muc.medizinische-genetik.de?

All DCs are Samba 4.1.19. Two (exon + allel) of them are in the same AD site as this domain member server.

Comment 8 Stefan Metzmacher 2015-10-21 09:33:24 UTC

I think I partly understand the problem.

We got the ndr error because we don't have f73ef3028c4f4583c81b611a9714608eae79360c in v4-1.
It means we can't parse a BIND_NAK response.

The reason for the BIND_NAK might appear in the logs
of the DC.

It seems that the member has a valid netlogon_creds_cli.tdb
but the doesn't know about the session key in its
schannel_store.tdb.

Can you provide logs and captures of all related servers
from a fresh winbindd restart. Once winbindd is stopped
check if someone else has netlogon_creds_cli.tdb still open.
and then remove that file. If that works retry the same without
removing the file.

Comment 9 Marc Muehlfeld 2015-10-21 10:21:50 UTC

(In reply to Stefan Metzmacher from comment #8)
> We got the ndr error because we don't have 
> f73ef3028c4f4583c81b611a9714608eae79360c in v4-1.
> It means we can't parse a BIND_NAK response.


Does this mean, if the DC is running v4-2 or later, the problem won't exist? In that case we should close this bug as WONTFIX, because v4-1 is already in security-only mode. For me this won't be a problem, because it is anyway planned to update the DCs to 4.3.1 during the next days.

Can you please confirm or deny?

Comment 10 Stefan Metzmacher 2015-10-21 10:30:03 UTC

(In reply to Marc Muehlfeld from comment #9)

I'd guess so, but I haven't tested it myself.

I think we should fix v4-2 and v4-3 in order to cope with 4.1/4.0 dcs.

Comment 11 Marc Muehlfeld 2015-10-21 10:33:36 UTC

(In reply to Stefan Metzmacher from comment #10)
> I think we should fix v4-2 and v4-3 in order to cope with 4.1/4.0 dcs.

Ok. Then I'll wait with the DC updates and come back with the requested debug stuff asap.

Comment 12 Stefan Metzmacher 2015-10-21 13:24:03 UTC

Created attachment 11518 [details]
Possible patch for master

Comment 13 Marc Muehlfeld 2015-10-22 06:27:04 UTC

Your patch works. The error flood is gone. Thanks. The only message I saw - but just once - was:
[2015/10/22 04:40:50.921599,  1, pid=22529] ../source3/rpc_client/cli_pipe.c:470(cli_pipe_validate_current_pdu)
  ../source3/rpc_client/cli_pipe.c:470: Bind NACK received from host allel.muc.medizinische-genetik.de!

Comment 14 Stefan Metzmacher 2015-10-22 07:07:55 UTC

(In reply to Marc Muehlfeld from comment #13)

Yes, this means server and client got out of sync with the negotiated
session key and the clients needs to recover, which failed before because
of the unexpected NT_STATUS_BUFFER_TOO_SMALL error code.

Comment 15 Stefan Metzmacher 2015-10-22 11:55:07 UTC

Created attachment 11527 [details]
Patch for v4-3-test

Comment 16 Stefan Metzmacher 2015-10-22 11:55:42 UTC

Created attachment 11528 [details]
Patches for v4-2-test

Comment 17 Karolin Seeger 2015-10-26 10:19:00 UTC

Pushed to autobuild-v4-[2|3]-test.

Comment 18 Karolin Seeger 2015-10-28 09:46:43 UTC

(In reply to Karolin Seeger from comment #17)
Pushed to both branches.
Closing out bug report.

Thanks!