Bug 11537 - NT_STATUS_HOST_UNREACHABLE logged over and over every 3 seconds
Summary: NT_STATUS_HOST_UNREACHABLE logged over and over every 3 seconds
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: AD: LDB/DSDB/SAMDB (show other bugs)
Version: 4.2.4
Hardware: All All
: P1 minor (vote)
Target Milestone: 4.16
Assignee: Andrew Bartlett
QA Contact: Samba QA Contact
URL: https://gitlab.com/samba-team/samba/-...
Keywords:
Depends on:
Blocks:
 
Reported: 2015-09-30 20:01 UTC by mm13@ezplanet.net (dead mail address)
Modified: 2022-01-23 19:20 UTC (History)
2 users (show)

See Also:


Attachments
sernet-samba-suppress-connection-message patch (1.02 KB, patch)
2015-09-30 20:01 UTC, mm13@ezplanet.net (dead mail address)
no flags Details
git-am fix for master. (1.13 KB, patch)
2015-11-06 00:45 UTC, Jeremy Allison
no flags Details
patch to suppress network error messages every 3 seconds (1.02 KB, patch)
2015-12-16 23:57 UTC, mm13@ezplanet.net (dead mail address)
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description mm13@ezplanet.net (dead mail address) 2015-09-30 20:01:20 UTC
Created attachment 11466 [details]
sernet-samba-suppress-connection-message patch

I have a samba 4 server node configured as Domain Controller and I have another samba 4 node configured as secondary DC. When the secondary DC node is taken offline the remaining DC continously logs the following messages:

Sep 29 23:27:55 mydc1 samba[4955]: [2015/09/29 23:27:55.623400,  0] ../source4/librpc/rpc/dcerpc_sock.c:63(continue_socket_connect)
Sep 29 23:27:55 mydc1 samba[4955]:   Failed to connect host xxx.xxx.xxx.xxx on port 135 - NT_STATUS_HOST_UNREACHABLE

and

Sep 30 20:33:31 mydc1 samba[4955]: [2015/09/30 20:33:31.610738,  0] ../source4/librpc/rpc/dcerpc_sock.c:240(continue_ip_open_socket)
Sep 30 20:33:31 mydc1 samba[4955]:   Failed to connect host xxx.xxx.xxx.xxx (df0289d6-aae5-4f26-b980-0ea0673527da._msdcs.mydomain.org) on port 135 - NT_STATUS_HOST_UNREACHABLE.

Since my secondary DC is a standby node usually left powered off, within a few hours the log file /var/log/messages increases to several megabytes sizes requiring continous trimming. When the secondary DC is powered on and booted, communications resume, the ldaps are synchronized automatically, operations resume normally and the messages no longer appear. The messages resume when the secondary node is shutdown and powered off.

To avoid filling up log files with these repeated (and not so useful) alerts it would be preferable to have an option to suppress the messages or to have them suppressed by default and activated if/when necessary.

I found the issue in the following code (lines marked with =====>), in:

source4/librpc/rpc/dcerpc_sock.c

static void continue_socket_connect(struct composite_context *ctx)
{
    struct dcecli_connection *conn;
    struct composite_context *c = talloc_get_type_abort(
        ctx->async.private_data, struct composite_context);
    struct pipe_open_socket_state *s = talloc_get_type_abort(
        c->private_data, struct pipe_open_socket_state);
    int rc;
    int sock_fd;

    /* make it easier to write a function calls */
    conn = s->conn;

    c->status = socket_connect_recv(ctx);
    if (!NT_STATUS_IS_OK(c->status)) {
====>   DEBUG(0, ("Failed to connect host %s on port %d - %s\n",
              s->server->addr, s->server->port,
              nt_errstr(c->status)));
        composite_error(c, c->status);
        return;
    }


[...]
static void continue_ip_open_socket(struct composite_context *ctx)
{
    struct composite_context *c = talloc_get_type_abort(
        ctx->async.private_data, struct composite_context);
    struct pipe_tcp_state *s = talloc_get_type_abort(
        c->private_data, struct pipe_tcp_state);
    struct socket_address *localaddr = NULL;

    /* receive result socket open request */
    c->status = dcerpc_pipe_open_socket_recv(ctx, s, &localaddr);
    if (!NT_STATUS_IS_OK(c->status)) {
        /* something went wrong... */
=====>   DEBUG(0, ("Failed to connect host %s (%s) on port %d - %s.\n",
              s->addresses[s->index - 1], s->target_hostname,
              s->port, nt_errstr(c->status)));


If the DEBUG level is changed from 0 to 1 for example, the messages are suppressed unless debug level 1 is activated, which provide the desirable result.

This represents a simple modification that suppresses the above messages by default. Should it be more desirable to have to suppress the messages only by choice then we should introduce a new switch with the relative handling case in the code (this would be a more time consuming modification).

I hereby provide an attachment with a patch file that changes the DEBUG level from 0 to 1 for these messages only.

The patch is against samba-4.2.4

I would like to see this patch included in future releases to provide a better administration experience.

Thank you in advance.
Mauro M.
Comment 1 mm13@ezplanet.net (dead mail address) 2015-10-30 09:43:25 UTC
The issue persists in release 4.2.5 and it continously fills the logs unnecessarily. Requesting to raise importance because of the disruption caused and the need to recompile the package separately with the patch posted below every time we receive an update.
Comment 2 Jeremy Allison 2015-11-06 00:45:41 UTC
Created attachment 11580 [details]
git-am fix for master.

Does this work for you ?
Comment 3 mm13@ezplanet.net (dead mail address) 2015-11-07 10:20:30 UTC
Thank you for the patch Jeremy. There is another point where it is required, please see the patch I posted in my previous message.
Comment 4 mm13@ezplanet.net (dead mail address) 2015-12-16 23:57:54 UTC
Created attachment 11731 [details]
patch to suppress network error messages every 3 seconds

Suppress un-necessary connection error messages that are oherwise logged every 3 seconds when one of the domain controller is down (configuration normally used for standby DC node)
Comment 5 mm13@ezplanet.net (dead mail address) 2015-12-16 23:59:54 UTC
I just received the upgrade to sernet-samba-4.2.7 for CentOS and the network connection error issue persists. I downloaded the SRPMS and recompiled with my patch (attached) and this suppressed the messages. 

Please can you include this patch in the mainstream?
Comment 6 AgustinF 2018-06-13 13:42:59 UTC
Hi Mauro,

i know it's an old issue, but remains open.

So i can tell you that in our case, we've found the fix in the Samba wiki "Demoting a Samba AD DC" : https://wiki.samba.org/index.php/Demoting_a_Samba_AD_DC

When demoting the offline AD DC, those false-possitive messages stops.

Hope this help!
Comment 7 Björn Jacke 2022-01-23 19:20:54 UTC
fixed in master with 809f4fe2c7862f25547cbdcf01160537e43e3f95