We are using libsmbclient to talk to Windows share, It works perfectly fine with regular DFS environment. But in case of DFS with replication environment, we are getting some issues with the way referrals are managed, Here is what we do >> file = smbc_getFunctionOpen(c)(c, smb_path, O_RDWR|O_TRUNC, 0666); .. .. if((smbc_getFunctionFstat(c)(c, file, pSt)) < 0) .. .. while(//chunk by chunk) writtenLen = smbc_getFunctionWrite(c)(c, file, writebuf, len); >> Our understanding is, trans2_reqeust Get_Dfs_referral is made for each of the smbclient api's. AD returns referrals in different order for subsequent calls and the request ends up on to a different member server. And so in the above case smbc_write fails with EBADF (errno 9) cli_resolve_path always calls cli_dfs_get_referral and is taking the first referral from the responded referral list.. Is there any control over the referrals? We basically want every call to stick to one server always (until the server is active)..
Created attachment 9785 [details] Patch proposal for fixing the issue related to DFS link The issue mentioned in the description is found to be happening due to libsmbclient resolving to referrals on every request. So if SMBC_Open() goes to one referral server and does connection, the SMBC_Read() will try to get the referral list again. It will cause confusion as the context obtained (like tid) in the previous connection will not go well with the new server connection. To fix this, now looking for the connection and re-using it. This change also include checking for the server alive status before using that connection.
metze: do you have some time to have a look at the patch here? This looks a bit tricky...
(In reply to hargagan from comment #1) In general I like the patch, but there are a few things I'd like to change before pushing: - fix build failure from the d_printf("Unable to follow dfs referral" statement. - free allocated dfs_refs memory in exit paths. - Remove the cached connection check (cli_echo). + IMO it shouldn't be done at this layer. - move cm_find / cm_connect logic closer together. - fix minor white-space / formatting issues.
Created attachment 10620 [details] new patch addressing points in comment#3 I'll start work on a simple test for this.
Created attachment 10624 [details] simple test, based on examples/libsmbclient/teststat3.c This test successfully reproduces the issue and demonstrates the effectiveness of the fix.
Comment on attachment 10620 [details] new patch addressing points in comment#3 Fix is now upstream - marking obsolete. Maintenance back-ports to follow.
Created attachment 10632 [details] fix for 4-2-test branch, cherry-picked from master
Created attachment 10633 [details] fix for 4-1-test branch, cherry-picked from master
Created attachment 10634 [details] fix for 4-0-test branch, slightly modified from master - no smbXcli_tcon context
Created attachment 10635 [details] fix for 3-6-stable branch, modified from master - just in case it's useful to others maintaining the 3.6 series
Comment on attachment 10632 [details] fix for 4-2-test branch, cherry-picked from master LGTM.
Comment on attachment 10633 [details] fix for 4-1-test branch, cherry-picked from master LGTM.
Comment on attachment 10634 [details] fix for 4-0-test branch, slightly modified from master - no smbXcli_tcon context LGTM.
Re-assigning to Karolin to include in 4.0.next, 4.1.next, 4.2.0.
Pushed to autobuild-v4-[0|1|2]-test.
Pushed to all branches. Closing out bug report. Thanks!