This is somewhat similar to https://bugzilla.samba.org/show_bug.cgi?id=11195 and https://bugzilla.samba.org/show_bug.cgi?id=11624 The patch in 11624 resolved the issue except in instances where we are accessing files within nested DFS namespaces. In these circumstances the same symptoms as in 11195 are seen. To reproduce create two DFS namespaces:- domain.com\namespace1 domain.com\namespace2 Create a folder target inside namespace1 which points to namespace2:- domain.com\namespace1\Target -> domain.com\namespace2 Then open and close files using domain.com\namespace1\Target As with 11195 running... lsof | grep microsoft ...will show multiple connections labelled "microsoft-ds (ESTABLISHED)" to the target server even after the files have been closed. The connections will only be closed once the parent process is terminated. Accessing multiple files in this way can quickly lead to the file server returning nt_status_too_many_opened_files
Hmmm. Can you can get debug logs to show exactly *when* the extra connections are being made ? That would help me track down the circumstances causing this.
Or indeed, a wireshark trace might also do the trick. I need to determine why when going to the same server name we're not re-using the existing connections in the DFS connection cache.
Sure, no problem. I tried running:- smbclient //domain/dfs -W DOMAIN.COM -U username -d10 -l/tmp --option=gensec:gse_krb5=no ...and a /tmp/log.smbclient is created but never written. I'll try to get Wireshark running tomorrow.
Created attachment 14912 [details] Debug output from smbclient
OK, I set debug to 10 and tee-d the output. Hope that's OK? I've attached the log file. As you'll see all I was doing was navigating the folder structure and dir-ing. By the end of this test:- lsof | grep microsoft-ds | grep smbclient -c ...was showing 30. I also spotted that in the log there are entries like:- dos_clean_name [\DFS\Information for Staff\Information for Staff\Information for Students] unix_clean_name [\DFS\Information for Staff\Information for Staff\Information for Students] ...but the actual path is:- \DFS\Information for Staff\Information for Students So it seems to be doubling part of the path up somewhere. Not sure if this is just a display issue but I've also found a problem with renaming files in this type of nested namespace environment where libsmbclient returns that it can't find the file. Could this be related?
At a brief glance it looks like it's not finding the existing cached connections, so it's opening a new connection for every operation. Even with SMB2 we should be finding existing connections tagged with remote_host name, look at the cli_cm_find(referring_cli, server, share) code in cli_cm_open(). Are you able to rebuild the code ? If so, I might send you a patch adding extra debugs that will show what remote names we're caching and examining why the cached connection lookup isn't working.
Sure. Fire it over when you're ready and I'll build and post back more logs. Any thoughts on the incorrect paths in the logs?
Created attachment 14916 [details] Extra debug patch. Can you try building with this and giving me the level 10 output. Should give me more data on the problems. Thanks ! Jeremy.
OK, I applied the patch and built with no issues but I don't see any extra output. I'm doing this against the latest CentOS 7 srpm. I guess that's the issue?
You need to run with debug level 10. The extra calls are DBG_DEBUG (log level 10) values. If you're running with debug level 10 but don't see the new calls that at least is showing the SMB2 calls aren't going through the DFS connection manager (which is strange).
So as above? smbclient //domain/dfs -W DOMAIN.COM -U username -d10 --option=gensec:gse_krb5=no If so then definitely not seeing any of the additional output. I'm just building from the latest source from samba.org to check if that makes a difference.
OK, just in case it's something strange to do with DBG_DEBUG(), change these calls to d_printf() to have the debug data come out in the normal output stream and re-test. If you still don't see them, then there's a problem that SMB2 just isn't going through the DFS connection caching layer.
Created attachment 14917 [details] Debug output from smbclient with patch Rebuilt it from latest and it looks like the additional output is there. See attached.
OK, this looks like the core of the error: cli_resolve_path: Calling cli_dfs_get_referral on dfs_path \2012FS\DFS\Information for Staff signed SMB2 message cli_resolve_path: Calling cli_cm_find on server domain.com share Information for Staff cli_cm_find: Looking for connection to server domain.com share Information for Staff cli_cm_find: List entry server 2012FS share DFS cli_cm_find: List entry server 2012FS share IPC$ The server / share name pair it should be looking for inside the DFS connection caching code is incorrect. It seems to be looking for a share called "Information for Staff". Looks like cli_dfs_get_referral() is working correctly, but then this code: for (count = 0; count < num_refs; count++) { if (!split_dfs_path(dfs_refs, refs[count].dfspath, &dfs_refs[count].server, &dfs_refs[count].share, &dfs_refs[count].extrapath)) { TALLOC_FREE(dfs_refs); return NT_STATUS_NOT_FOUND; } ccli = cli_cm_find(rootcli, dfs_refs[count].server, dfs_refs[count].share); if (ccli != NULL) { extrapath = dfs_refs[count].extrapath; *targetcli = ccli; break; } } specifically the split_dfs_path() code is messing up. Extra patch to follow to discover what is being passed into this function and what it is doing.
Created attachment 14920 [details] Extra debug - v2. Updated patch to give debug info inside split_dfs_path().
Created attachment 14921 [details] Debug output from smbclient with patch v2 Rebuilt and output attached. I followed the same route through the folder structure as before.
OK, here is the problem: cli_resolve_path: Calling cli_dfs_get_referral on dfs_path \2012FS\DFS\Information for Staff The return from the referral lookup is returning a bogus DFS path of: signed SMB2 message split_dfs_path: split_dfs_path: |\domain.com\Information for Staff| split_dfs_path: server: |domain.com| split_dfs_path: share: |Information for Staff| split_dfs_path: extrapath: || \domain.com\Information for Staff Given that - the code is trying to parse it into 'server' \ 'share' and gives the wrong result. How are you creating these DFS links ? Are they on a Samba server ? I will now look into adding debugs into cli_dfs_get_referral() to see what might be doing this. We're making progress (slowly :-).
Created attachment 14922 [details] Extra debug - v3 OK, here is another version that adds dump_data() calls to the SMB2 request to get the DFS referral. This is essentially a poor-mans wireshark trace (you could also just upload the wireshark traces :-) but will allow me to examine what the request/response data is for the DFS referral lookup.
OK, just building now. To answer your question about how this is all setup, it's a little out-of-the-ordinary (I hadn't ever come across one like this before) but apparently Microsoft allow it. There are three separate domain-based DFS namespaces domain.com\DFS domain.com\Information for Students domain.com\Information for Staff From here, there are folder targets like so:- domain.com\DFS\Information for Staff -> domain.com\Information for Staff domain.com\DFS\Information for Students -> domain.com\Information for Students And then, within the Information for Staff namespace there's one for Information for Students:- domain.com\Information for Staff\Information for Students -> domain.com\Information for Students So you effectively end up with:- domain.com\DFS\Information for Staff\Information for Students ...where "Information for Staff" and "Information for Students" are folder targets to different namespaces. These namespaces are nested within one another. Again, this isn't something I've seen before but Microsoft do seem to suggest that namespaces can be nested in this way. I'll post the output back as soon as I have it.
OK, this is starting to look like a strange setup we haven't come across before and the code isn't coping well with - not a generic "DFS is broken with SMB2" bug. I still want to know what the problem is, but this might take a little longer to fix as we'll need to be able to create a regression test case that duplicates the problem so we can test the fix.
Created attachment 14925 [details] Debug output from smbclient with patch v3 Yes, I agree. I had to check the docs to confirm that it was actually possible to configure things this way but I've replicated the setup on my network just to be sure. Latest output attached.
Created attachment 14926 [details] Debug output from smbclient with patch v3 corrected Apologies, one of the namespace names is wrong in the previous attachment. Corrected in this one.