Bug 14578 - libsmbclient handles only the first referral from the list
Summary: libsmbclient handles only the first referral from the list
Status: RESOLVED WORKSFORME
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: libsmbclient (show other bugs)
Version: 4.13.2
Hardware: All Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-11-19 17:04 UTC by Szilard Matyas
Modified: 2020-12-03 16:48 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Szilard Matyas 2020-11-19 17:04:29 UTC
Hi Gents, 

We use several MS DCs and all of them are DFS namespace servers as well to service our DFS targets. We connect to DFS targets from Linux apps using libsmbclient.so. If one the DCs fail and libsmbclient gets the failing DC for first referral from other working DCs libsmbclient does not try the second and third referrals in the list. Due to this the whole operation fails till DC issue is resolved.


In an example below ZUIX-PWDS-DCS01 fails:

Packet capture - libsmbclient requests referrals for root target \xyz.com\shares:

SMB2 (Server Message Block Protocol version 2)
    SMB2 Header
        Server Component: SMB2
        Header Length: 64
        Credit Charge: 1
        Channel Sequence: 0
        Reserved: 0000
        Command: Ioctl (11)
        Credits requested: 1
        Flags: 0x00000010, Priority
            .... .... .... .... .... .... .... ...0 = Response: This is a REQUEST
            .... .... .... .... .... .... .... ..0. = Async command: This is a SYNC command
            .... .... .... .... .... .... .... .0.. = Chained: This pdu is NOT a chained command
            .... .... .... .... .... .... .... 0... = Signing: This pdu is NOT signed
            .... .... .... .... .... .... .001 .... = Priority: This pdu contains a PRIORITY
            ...0 .... .... .... .... .... .... .... = DFS operation: This is a normal operation
            ..0. .... .... .... .... .... .... .... = Replay operation: This is NOT a replay operation
        Chain Offset: 0x00000000
        Message ID: Unknown (4)
        Process Id: 0x00000000
        Tree Id: 0x00000001  \\xyz.com\IPC$
            [Tree: \\xyz.com\IPC$]
            [Share Type: Named pipe (0x02)]
            [Connected in Frame: 21]
        Session Id: 0x00004890e0000d01 Acct:srv_linux_trdr Domain:HCT Host:RAPP-SMBA-TST01
            [Account: srv_linux_trdr]
            [Domain: HCT]
            [Host: RAPP-SMBA-TST01]
            [Authenticated in Frame: 18]
        Signature: 00000000000000000000000000000000
    Ioctl Request (0x0b)
        StructureSize: 0x0039
            0000 0000 0011 100. = Fixed Part Length: 28
            .... .... .... ...1 = Dynamic Part: True
        Reserved: 0000
        Function: FSCTL_DFS_GET_REFERRALS (0x00060194)
            0000 0000 0000 0110 .... .... .... .... = Device: DFS (0x0006)
            .... .... .... .... 00.. .... .... .... = Access: FILE_ANY_ACCESS (0x0)
            .... .... .... .... ..00 0001 1001 01.. = Function: 0x065
            .... .... .... .... .... .... .... ..00 = Method: METHOD_BUFFERED (0x0)
        GUID handle
            File Id: ffffffff-ffff-ffff-ffff-ffffffffffff
        Max Ioctl In Size: 0
        Max Ioctl Out Size: 65535
        Flags: 0x00000001
            .... .... .... .... .... .... .... ...1 = Is FSCTL: True
        Reserved: 00000000
        Blob Offset: 0x00000078
        Blob Length: 62
        In Data
            Max Referral Level: 3
            File Name: \xyz.com\shares
        Blob Offset: 0x00000078
        Blob Length: 0
        Out Data: NO DATA

Packet capture - libsmbclient gets referral list from one of the working DCs with the failing DC being the first in the target set.

SMB2 (Server Message Block Protocol version 2)
    SMB2 Header
        Server Component: SMB2
        Header Length: 64
        Credit Charge: 1
        NT Status: STATUS_SUCCESS (0x00000000)
        Command: Ioctl (11)
        Credits granted: 1
        Flags: 0x00000011, Response, Priority
            .... .... .... .... .... .... .... ...1 = Response: This is a RESPONSE
            .... .... .... .... .... .... .... ..0. = Async command: This is a SYNC command
            .... .... .... .... .... .... .... .0.. = Chained: This pdu is NOT a chained command
            .... .... .... .... .... .... .... 0... = Signing: This pdu is NOT signed
            .... .... .... .... .... .... .001 .... = Priority: This pdu contains a PRIORITY
            ...0 .... .... .... .... .... .... .... = DFS operation: This is a normal operation
            ..0. .... .... .... .... .... .... .... = Replay operation: This is NOT a replay operation
        Chain Offset: 0x00000000
        Message ID: Unknown (4)
        Process Id: 0x00000000
        Tree Id: 0x00000001  \\xyz.com\IPC$
            [Tree: \\xyz.com\IPC$]
            [Share Type: Named pipe (0x02)]
            [Connected in Frame: 21]
        Session Id: 0x00004890e0000d01 Acct:srv_linux_trdr Domain:HCT Host:RAPP-SMBA-TST01
            [Account: srv_linux_trdr]
            [Domain: HCT]
            [Host: RAPP-SMBA-TST01]
            [Authenticated in Frame: 18]
        Signature: 00000000000000000000000000000000
        [Response to: 22]
        [Time from request: 0.000154000 seconds]
    Ioctl Response (0x0b)
        StructureSize: 0x0031
            0000 0000 0011 000. = Fixed Part Length: 24
            .... .... .... ...1 = Dynamic Part: True
        Unknown: 0000
        Function: FSCTL_DFS_GET_REFERRALS (0x00060194)
            0000 0000 0000 0110 .... .... .... .... = Device: DFS (0x0006)
            .... .... .... .... 00.. .... .... .... = Access: FILE_ANY_ACCESS (0x0)
            .... .... .... .... ..00 0001 1001 01.. = Function: 0x065
            .... .... .... .... .... .... .... ..00 = Method: METHOD_BUFFERED (0x0)
        GUID handle
            File Id: ffffffff-ffff-ffff-ffff-ffffffffffff
        Reserved: 00000000
        Reserved: 00000000
        Blob Offset: 0x00000070
        Blob Length: 0
        In Data: NO DATA
        Blob Offset: 0x00000070
        Blob Length: 462
        Out Data
            Path Consumed: 58
            Num Referrals: 3
            Flags: 0x0003, Hold Storage, Fielding
                .... .... .... ..1. = Hold Storage: Referral SERVER HOLDS STORAGE for the file
                .... .... .... ...1 = Fielding: The server in referral is FIELDING CAPABLE
            Padding: 0000
            Referrals
                Referral
                    Version: 3
                    Size: 34
                    Server Type: Root targets returns (1)
                    Flags: 0x0000
                        .... .... .... ..0. = NameListReferral: NOT a domain/DC referral response
                        .... .... .... .0.. = TargetSetBoundary: NOT the first target in the target set
                    TTL: 300
                    Path Offset: 102
                    Alt Path Offset: 162
                    Node Offset: 222
                    Server GUID: 00000000-0000-0000-0000-000000000000
                    Path: \xyz.com\shares
                    Alt Path: \xyz.com\shares
                    Node: \ZUIX-PWDS-DCS01\Shares
                Referral
                    Version: 3
                    Size: 34
                    Server Type: Root targets returns (1)
                    Flags: 0x0000
                        .... .... .... ..0. = NameListReferral: NOT a domain/DC referral response
                        .... .... .... .0.. = TargetSetBoundary: NOT the first target in the target set
                    TTL: 300
                    Path Offset: 68
                    Alt Path Offset: 128
                    Node Offset: 236
                    Server GUID: 00000000-0000-0000-0000-000000000000
                    Path: \xyz.com\shares
                    Alt Path: \xyz.com\shares
                    Node: \RAPP-PWDS-DCS03.xyz.com\Shares
                Referral
                    Version: 3
                    Size: 34
                    Server Type: Root targets returns (1)
                    Flags: 0x0000
                        .... .... .... ..0. = NameListReferral: NOT a domain/DC referral response
                        .... .... .... .0.. = TargetSetBoundary: NOT the first target in the target set
                    TTL: 300
                    Path Offset: 34
                    Alt Path Offset: 94
                    Node Offset: 294
                    Server GUID: 00000000-0000-0000-0000-000000000000
                    Path: \xyz.com\shares
                    Alt Path: \xyz.com\shares
                    Node: \RAPP-PWDS-DCS04.xyz.com\Shares


I reproduced it with smbclient which I guess uses the same underlying mechanisms that libsmbclient uses. Here is the relevant part of the log:

# smbclient -d 10 -A <password file> //xyz.com/shares -c "cd test/eu/datafiles;get users.hdp"

........................................

output omitted
........................................
gensec_update_send: spnego[0x56533b24c930]: subreq: 0x56533b279dc0
gensec_update_done: spnego[0x56533b24c930]: NT_STATUS_OK tevent_req[0x56533b279dc0/../../auth/gensec/spnego.c:1632]: state[2] error[0 (0x0)]  state[struct gensec_spnego_update_state (0x56533b279f70)] timer[(nil)] finish[../../auth/gensec/spnego.c:2116]
 session setup ok
signed SMB2 message
sitename_fetch: Returning sitename for realm 'xyz.com': "Default-First-Site-Name"
internal_resolve_name: looking up ZUIX-PWDS-DCS01#20 (sitename Default-First-Site-Name)
name ZUIX-PWDS-DCS01#20 found.
remove_duplicate_addrs2: looking for duplicate address/port pairs
Connecting to 1.1.1.1 at port 445
do_connect: Connection to ZUIX-PWDS-DCS01 failed (Error NT_STATUS_IO_TIMEOUT)

Please, tell if there's a configuration option or a workaround to get libsmbclient to use other targets from the target set.

Thanks, 
Szilard
Comment 1 Jeremy Allison 2020-11-19 20:09:19 UTC
Yes, that's correct. The problematic code is (in master):

source3/libsmb/clidfs.c: cli_check_msdfs_proxy()

1212         status = cli_dfs_get_referral(ctx, cli, fullpath, &refs,
1213                                       &num_refs, &consumed);
1214         res = NT_STATUS_IS_OK(status);
1215 
1216         status = cli_tdis(cli);
1217 
1218         cli_state_restore_tcon(cli, orig_tcon);
1219 
1220         if (!NT_STATUS_IS_OK(status)) {
1221                 return false;
1222         }
1223 
1224         if (!res || !num_refs) {
1225                 return false;
1226         }
1227 
1228         if (!refs[0].dfspath) {
1229                 return false;
1230         }
1231 
1232         if (!split_dfs_path(ctx, refs[0].dfspath, pp_newserver,
1233                             pp_newshare, &newextrapath)) {
1234                 return false;
1235         }

Note in lines 1228 and 1232 we only look at refs[0]. This function needs updating to return the full list of possible referrals to the caller and then loops adding around the connections to each 'newserver/newshare' in the list returned.

If you are a competent C coder (or can find one to use :-) I'd be happy to review such a patch.

Cheers,

Jeremy.
Comment 2 Jeremy Allison 2020-11-19 20:19:07 UTC
Interestingly enough, the required logic is already present in:

source3/libsmb/clidfs.c:cli_resolve_path()

 946         status = cli_dfs_get_referral(ctx, cli_ipc, dfs_path, &refs,
 947                                       &num_refs, &consumed);
 948         if (!NT_STATUS_IS_OK(status)) {
 949                 return status;
 950         }
 951 
 952         if (!num_refs || !refs[0].dfspath) {
 953                 return NT_STATUS_NOT_FOUND;
 954         }
 955 
 956         /*
 957          * Bug#10123 - DFS referal entries can be provided in a random order,
 958          * so check the connection cache for each item to avoid unnecessary
 959          * reconnections.
 960          */
 961         dfs_refs = talloc_array(ctx, struct cli_dfs_path_split, num_refs);
 962         if (dfs_refs == NULL) {
 963                 return NT_STATUS_NO_MEMORY;
 964         }
 965 
 966         for (count = 0; count < num_refs; count++) {
 967                 if (!split_dfs_path(dfs_refs, refs[count].dfspath,
 968                                     &dfs_refs[count].server,
 969                                     &dfs_refs[count].share,
 970                                     &dfs_refs[count].extrapath)) {
 971                         TALLOC_FREE(dfs_refs);
 972                         return NT_STATUS_NOT_FOUND;
 973                 }
 974 
 975                 ccli = cli_cm_find(rootcli, dfs_refs[count].server,
 976                                    dfs_refs[count].share);
 977                 if (ccli != NULL) {
 978                         extrapath = dfs_refs[count].extrapath;
 979                         *targetcli = ccli;
 980                         break;
 981                 }
 982         }
 983 
 984         /*
 985          * If no cached connection was found, then connect to the first live
 986          * referral server in the list.
 987          */
 988         for (count = 0; (ccli == NULL) && (count < num_refs); count++) {
 989                 /* Connect to the target server & share */
 990                 status = cli_cm_connect(ctx, rootcli,
 991                                 dfs_refs[count].server,
 992                                 dfs_refs[count].share,
 993                                 creds,
 994                                 NULL, /* dest_ss */
 995                                 0, /* port */
 996                                 0x20,
 997                                 targetcli);
 998                 if (!NT_STATUS_IS_OK(status)) {
 999                         d_printf("Unable to follow dfs referral [\\%s\\%s]\n",
1000                                  dfs_refs[count].server,
1001                                  dfs_refs[count].share);
1002                         continue;
1003                 } else {
1004                         extrapath = dfs_refs[count].extrapath;
1005                         break;
1006                 }
1007         }
1008 
1009         /* No available referral server for the connection */
1010         if (*targetcli == NULL) {
1011                 TALLOC_FREE(dfs_refs);
1012                 return status;
1013         }

So it looks like that logic also needs adding to the root referral path.
Comment 3 Björn Jacke 2020-12-03 00:26:55 UTC
user replied: "You can close the case, it was very informative."
Comment 4 Jeremy Allison 2020-12-03 16:48:24 UTC
Shame, I'd rather actually fix the bug :-).