Hi Gents, We use several MS DCs and all of them are DFS namespace servers as well to service our DFS targets. We connect to DFS targets from Linux apps using libsmbclient.so. If one the DCs fail and libsmbclient gets the failing DC for first referral from other working DCs libsmbclient does not try the second and third referrals in the list. Due to this the whole operation fails till DC issue is resolved. In an example below ZUIX-PWDS-DCS01 fails: Packet capture - libsmbclient requests referrals for root target \xyz.com\shares: SMB2 (Server Message Block Protocol version 2) SMB2 Header Server Component: SMB2 Header Length: 64 Credit Charge: 1 Channel Sequence: 0 Reserved: 0000 Command: Ioctl (11) Credits requested: 1 Flags: 0x00000010, Priority .... .... .... .... .... .... .... ...0 = Response: This is a REQUEST .... .... .... .... .... .... .... ..0. = Async command: This is a SYNC command .... .... .... .... .... .... .... .0.. = Chained: This pdu is NOT a chained command .... .... .... .... .... .... .... 0... = Signing: This pdu is NOT signed .... .... .... .... .... .... .001 .... = Priority: This pdu contains a PRIORITY ...0 .... .... .... .... .... .... .... = DFS operation: This is a normal operation ..0. .... .... .... .... .... .... .... = Replay operation: This is NOT a replay operation Chain Offset: 0x00000000 Message ID: Unknown (4) Process Id: 0x00000000 Tree Id: 0x00000001 \\xyz.com\IPC$ [Tree: \\xyz.com\IPC$] [Share Type: Named pipe (0x02)] [Connected in Frame: 21] Session Id: 0x00004890e0000d01 Acct:srv_linux_trdr Domain:HCT Host:RAPP-SMBA-TST01 [Account: srv_linux_trdr] [Domain: HCT] [Host: RAPP-SMBA-TST01] [Authenticated in Frame: 18] Signature: 00000000000000000000000000000000 Ioctl Request (0x0b) StructureSize: 0x0039 0000 0000 0011 100. = Fixed Part Length: 28 .... .... .... ...1 = Dynamic Part: True Reserved: 0000 Function: FSCTL_DFS_GET_REFERRALS (0x00060194) 0000 0000 0000 0110 .... .... .... .... = Device: DFS (0x0006) .... .... .... .... 00.. .... .... .... = Access: FILE_ANY_ACCESS (0x0) .... .... .... .... ..00 0001 1001 01.. = Function: 0x065 .... .... .... .... .... .... .... ..00 = Method: METHOD_BUFFERED (0x0) GUID handle File Id: ffffffff-ffff-ffff-ffff-ffffffffffff Max Ioctl In Size: 0 Max Ioctl Out Size: 65535 Flags: 0x00000001 .... .... .... .... .... .... .... ...1 = Is FSCTL: True Reserved: 00000000 Blob Offset: 0x00000078 Blob Length: 62 In Data Max Referral Level: 3 File Name: \xyz.com\shares Blob Offset: 0x00000078 Blob Length: 0 Out Data: NO DATA Packet capture - libsmbclient gets referral list from one of the working DCs with the failing DC being the first in the target set. SMB2 (Server Message Block Protocol version 2) SMB2 Header Server Component: SMB2 Header Length: 64 Credit Charge: 1 NT Status: STATUS_SUCCESS (0x00000000) Command: Ioctl (11) Credits granted: 1 Flags: 0x00000011, Response, Priority .... .... .... .... .... .... .... ...1 = Response: This is a RESPONSE .... .... .... .... .... .... .... ..0. = Async command: This is a SYNC command .... .... .... .... .... .... .... .0.. = Chained: This pdu is NOT a chained command .... .... .... .... .... .... .... 0... = Signing: This pdu is NOT signed .... .... .... .... .... .... .001 .... = Priority: This pdu contains a PRIORITY ...0 .... .... .... .... .... .... .... = DFS operation: This is a normal operation ..0. .... .... .... .... .... .... .... = Replay operation: This is NOT a replay operation Chain Offset: 0x00000000 Message ID: Unknown (4) Process Id: 0x00000000 Tree Id: 0x00000001 \\xyz.com\IPC$ [Tree: \\xyz.com\IPC$] [Share Type: Named pipe (0x02)] [Connected in Frame: 21] Session Id: 0x00004890e0000d01 Acct:srv_linux_trdr Domain:HCT Host:RAPP-SMBA-TST01 [Account: srv_linux_trdr] [Domain: HCT] [Host: RAPP-SMBA-TST01] [Authenticated in Frame: 18] Signature: 00000000000000000000000000000000 [Response to: 22] [Time from request: 0.000154000 seconds] Ioctl Response (0x0b) StructureSize: 0x0031 0000 0000 0011 000. = Fixed Part Length: 24 .... .... .... ...1 = Dynamic Part: True Unknown: 0000 Function: FSCTL_DFS_GET_REFERRALS (0x00060194) 0000 0000 0000 0110 .... .... .... .... = Device: DFS (0x0006) .... .... .... .... 00.. .... .... .... = Access: FILE_ANY_ACCESS (0x0) .... .... .... .... ..00 0001 1001 01.. = Function: 0x065 .... .... .... .... .... .... .... ..00 = Method: METHOD_BUFFERED (0x0) GUID handle File Id: ffffffff-ffff-ffff-ffff-ffffffffffff Reserved: 00000000 Reserved: 00000000 Blob Offset: 0x00000070 Blob Length: 0 In Data: NO DATA Blob Offset: 0x00000070 Blob Length: 462 Out Data Path Consumed: 58 Num Referrals: 3 Flags: 0x0003, Hold Storage, Fielding .... .... .... ..1. = Hold Storage: Referral SERVER HOLDS STORAGE for the file .... .... .... ...1 = Fielding: The server in referral is FIELDING CAPABLE Padding: 0000 Referrals Referral Version: 3 Size: 34 Server Type: Root targets returns (1) Flags: 0x0000 .... .... .... ..0. = NameListReferral: NOT a domain/DC referral response .... .... .... .0.. = TargetSetBoundary: NOT the first target in the target set TTL: 300 Path Offset: 102 Alt Path Offset: 162 Node Offset: 222 Server GUID: 00000000-0000-0000-0000-000000000000 Path: \xyz.com\shares Alt Path: \xyz.com\shares Node: \ZUIX-PWDS-DCS01\Shares Referral Version: 3 Size: 34 Server Type: Root targets returns (1) Flags: 0x0000 .... .... .... ..0. = NameListReferral: NOT a domain/DC referral response .... .... .... .0.. = TargetSetBoundary: NOT the first target in the target set TTL: 300 Path Offset: 68 Alt Path Offset: 128 Node Offset: 236 Server GUID: 00000000-0000-0000-0000-000000000000 Path: \xyz.com\shares Alt Path: \xyz.com\shares Node: \RAPP-PWDS-DCS03.xyz.com\Shares Referral Version: 3 Size: 34 Server Type: Root targets returns (1) Flags: 0x0000 .... .... .... ..0. = NameListReferral: NOT a domain/DC referral response .... .... .... .0.. = TargetSetBoundary: NOT the first target in the target set TTL: 300 Path Offset: 34 Alt Path Offset: 94 Node Offset: 294 Server GUID: 00000000-0000-0000-0000-000000000000 Path: \xyz.com\shares Alt Path: \xyz.com\shares Node: \RAPP-PWDS-DCS04.xyz.com\Shares I reproduced it with smbclient which I guess uses the same underlying mechanisms that libsmbclient uses. Here is the relevant part of the log: # smbclient -d 10 -A <password file> //xyz.com/shares -c "cd test/eu/datafiles;get users.hdp" ........................................ output omitted ........................................ gensec_update_send: spnego[0x56533b24c930]: subreq: 0x56533b279dc0 gensec_update_done: spnego[0x56533b24c930]: NT_STATUS_OK tevent_req[0x56533b279dc0/../../auth/gensec/spnego.c:1632]: state[2] error[0 (0x0)] state[struct gensec_spnego_update_state (0x56533b279f70)] timer[(nil)] finish[../../auth/gensec/spnego.c:2116] session setup ok signed SMB2 message sitename_fetch: Returning sitename for realm 'xyz.com': "Default-First-Site-Name" internal_resolve_name: looking up ZUIX-PWDS-DCS01#20 (sitename Default-First-Site-Name) name ZUIX-PWDS-DCS01#20 found. remove_duplicate_addrs2: looking for duplicate address/port pairs Connecting to 1.1.1.1 at port 445 do_connect: Connection to ZUIX-PWDS-DCS01 failed (Error NT_STATUS_IO_TIMEOUT) Please, tell if there's a configuration option or a workaround to get libsmbclient to use other targets from the target set. Thanks, Szilard
Yes, that's correct. The problematic code is (in master): source3/libsmb/clidfs.c: cli_check_msdfs_proxy() 1212 status = cli_dfs_get_referral(ctx, cli, fullpath, &refs, 1213 &num_refs, &consumed); 1214 res = NT_STATUS_IS_OK(status); 1215 1216 status = cli_tdis(cli); 1217 1218 cli_state_restore_tcon(cli, orig_tcon); 1219 1220 if (!NT_STATUS_IS_OK(status)) { 1221 return false; 1222 } 1223 1224 if (!res || !num_refs) { 1225 return false; 1226 } 1227 1228 if (!refs[0].dfspath) { 1229 return false; 1230 } 1231 1232 if (!split_dfs_path(ctx, refs[0].dfspath, pp_newserver, 1233 pp_newshare, &newextrapath)) { 1234 return false; 1235 } Note in lines 1228 and 1232 we only look at refs[0]. This function needs updating to return the full list of possible referrals to the caller and then loops adding around the connections to each 'newserver/newshare' in the list returned. If you are a competent C coder (or can find one to use :-) I'd be happy to review such a patch. Cheers, Jeremy.
Interestingly enough, the required logic is already present in: source3/libsmb/clidfs.c:cli_resolve_path() 946 status = cli_dfs_get_referral(ctx, cli_ipc, dfs_path, &refs, 947 &num_refs, &consumed); 948 if (!NT_STATUS_IS_OK(status)) { 949 return status; 950 } 951 952 if (!num_refs || !refs[0].dfspath) { 953 return NT_STATUS_NOT_FOUND; 954 } 955 956 /* 957 * Bug#10123 - DFS referal entries can be provided in a random order, 958 * so check the connection cache for each item to avoid unnecessary 959 * reconnections. 960 */ 961 dfs_refs = talloc_array(ctx, struct cli_dfs_path_split, num_refs); 962 if (dfs_refs == NULL) { 963 return NT_STATUS_NO_MEMORY; 964 } 965 966 for (count = 0; count < num_refs; count++) { 967 if (!split_dfs_path(dfs_refs, refs[count].dfspath, 968 &dfs_refs[count].server, 969 &dfs_refs[count].share, 970 &dfs_refs[count].extrapath)) { 971 TALLOC_FREE(dfs_refs); 972 return NT_STATUS_NOT_FOUND; 973 } 974 975 ccli = cli_cm_find(rootcli, dfs_refs[count].server, 976 dfs_refs[count].share); 977 if (ccli != NULL) { 978 extrapath = dfs_refs[count].extrapath; 979 *targetcli = ccli; 980 break; 981 } 982 } 983 984 /* 985 * If no cached connection was found, then connect to the first live 986 * referral server in the list. 987 */ 988 for (count = 0; (ccli == NULL) && (count < num_refs); count++) { 989 /* Connect to the target server & share */ 990 status = cli_cm_connect(ctx, rootcli, 991 dfs_refs[count].server, 992 dfs_refs[count].share, 993 creds, 994 NULL, /* dest_ss */ 995 0, /* port */ 996 0x20, 997 targetcli); 998 if (!NT_STATUS_IS_OK(status)) { 999 d_printf("Unable to follow dfs referral [\\%s\\%s]\n", 1000 dfs_refs[count].server, 1001 dfs_refs[count].share); 1002 continue; 1003 } else { 1004 extrapath = dfs_refs[count].extrapath; 1005 break; 1006 } 1007 } 1008 1009 /* No available referral server for the connection */ 1010 if (*targetcli == NULL) { 1011 TALLOC_FREE(dfs_refs); 1012 return status; 1013 } So it looks like that logic also needs adding to the root referral path.
user replied: "You can close the case, it was very informative."
Shame, I'd rather actually fix the bug :-).