Bug 11537 - NT_STATUS_HOST_UNREACHABLE logged over and over every 3 seconds
NT_STATUS_HOST_UNREACHABLE logged over and over every 3 seconds
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB
4.2.4
All All
: P1 minor
: ---
Assigned To: Andrew Bartlett
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-09-30 20:01 UTC by mm13@ezplanet.net
Modified: 2016-06-30 05:40 UTC (History)
1 user (show)

See Also:


Attachments
sernet-samba-suppress-connection-message patch (1.02 KB, patch)
2015-09-30 20:01 UTC, mm13@ezplanet.net
no flags Details
git-am fix for master. (1.13 KB, patch)
2015-11-06 00:45 UTC, Jeremy Allison
no flags Details
patch to suppress network error messages every 3 seconds (1.02 KB, patch)
2015-12-16 23:57 UTC, mm13@ezplanet.net
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mm13@ezplanet.net 2015-09-30 20:01:20 UTC
Created attachment 11466 [details]
sernet-samba-suppress-connection-message patch

I have a samba 4 server node configured as Domain Controller and I have another samba 4 node configured as secondary DC. When the secondary DC node is taken offline the remaining DC continously logs the following messages:

Sep 29 23:27:55 mydc1 samba[4955]: [2015/09/29 23:27:55.623400,  0] ../source4/librpc/rpc/dcerpc_sock.c:63(continue_socket_connect)
Sep 29 23:27:55 mydc1 samba[4955]:   Failed to connect host xxx.xxx.xxx.xxx on port 135 - NT_STATUS_HOST_UNREACHABLE

and

Sep 30 20:33:31 mydc1 samba[4955]: [2015/09/30 20:33:31.610738,  0] ../source4/librpc/rpc/dcerpc_sock.c:240(continue_ip_open_socket)
Sep 30 20:33:31 mydc1 samba[4955]:   Failed to connect host xxx.xxx.xxx.xxx (df0289d6-aae5-4f26-b980-0ea0673527da._msdcs.mydomain.org) on port 135 - NT_STATUS_HOST_UNREACHABLE.

Since my secondary DC is a standby node usually left powered off, within a few hours the log file /var/log/messages increases to several megabytes sizes requiring continous trimming. When the secondary DC is powered on and booted, communications resume, the ldaps are synchronized automatically, operations resume normally and the messages no longer appear. The messages resume when the secondary node is shutdown and powered off.

To avoid filling up log files with these repeated (and not so useful) alerts it would be preferable to have an option to suppress the messages or to have them suppressed by default and activated if/when necessary.

I found the issue in the following code (lines marked with =====>), in:

source4/librpc/rpc/dcerpc_sock.c

static void continue_socket_connect(struct composite_context *ctx)
{
    struct dcecli_connection *conn;
    struct composite_context *c = talloc_get_type_abort(
        ctx->async.private_data, struct composite_context);
    struct pipe_open_socket_state *s = talloc_get_type_abort(
        c->private_data, struct pipe_open_socket_state);
    int rc;
    int sock_fd;

    /* make it easier to write a function calls */
    conn = s->conn;

    c->status = socket_connect_recv(ctx);
    if (!NT_STATUS_IS_OK(c->status)) {
====>   DEBUG(0, ("Failed to connect host %s on port %d - %s\n",
              s->server->addr, s->server->port,
              nt_errstr(c->status)));
        composite_error(c, c->status);
        return;
    }


[...]
static void continue_ip_open_socket(struct composite_context *ctx)
{
    struct composite_context *c = talloc_get_type_abort(
        ctx->async.private_data, struct composite_context);
    struct pipe_tcp_state *s = talloc_get_type_abort(
        c->private_data, struct pipe_tcp_state);
    struct socket_address *localaddr = NULL;

    /* receive result socket open request */
    c->status = dcerpc_pipe_open_socket_recv(ctx, s, &localaddr);
    if (!NT_STATUS_IS_OK(c->status)) {
        /* something went wrong... */
=====>   DEBUG(0, ("Failed to connect host %s (%s) on port %d - %s.\n",
              s->addresses[s->index - 1], s->target_hostname,
              s->port, nt_errstr(c->status)));


If the DEBUG level is changed from 0 to 1 for example, the messages are suppressed unless debug level 1 is activated, which provide the desirable result.

This represents a simple modification that suppresses the above messages by default. Should it be more desirable to have to suppress the messages only by choice then we should introduce a new switch with the relative handling case in the code (this would be a more time consuming modification).

I hereby provide an attachment with a patch file that changes the DEBUG level from 0 to 1 for these messages only.

The patch is against samba-4.2.4

I would like to see this patch included in future releases to provide a better administration experience.

Thank you in advance.
Mauro M.
Comment 1 mm13@ezplanet.net 2015-10-30 09:43:25 UTC
The issue persists in release 4.2.5 and it continously fills the logs unnecessarily. Requesting to raise importance because of the disruption caused and the need to recompile the package separately with the patch posted below every time we receive an update.
Comment 2 Jeremy Allison 2015-11-06 00:45:41 UTC
Created attachment 11580 [details]
git-am fix for master.

Does this work for you ?
Comment 3 mm13@ezplanet.net 2015-11-07 10:20:30 UTC
Thank you for the patch Jeremy. There is another point where it is required, please see the patch I posted in my previous message.
Comment 4 mm13@ezplanet.net 2015-12-16 23:57:54 UTC
Created attachment 11731 [details]
patch to suppress network error messages every 3 seconds

Suppress un-necessary connection error messages that are oherwise logged every 3 seconds when one of the domain controller is down (configuration normally used for standby DC node)
Comment 5 mm13@ezplanet.net 2015-12-16 23:59:54 UTC
I just received the upgrade to sernet-samba-4.2.7 for CentOS and the network connection error issue persists. I downloaded the SRPMS and recompiled with my patch (attached) and this suppressed the messages. 

Please can you include this patch in the mainstream?