In short: when accessing .zip files across a dfs referral on Windows 7 & Windows 8 clients (we've installed a recent devel version), SOMETIMES you get "Invalid zip file".
We've been wrestling with this buglet for a while and have recently put a lot of work into it; taking wireshark dumps, etc. (we can provide these if requested).
The big problem with this bug is that it's intermittent, and I can't come up with a scenario that's 100% reproducible - it's very sensitive to how the client makes the request as to whether the bug triggers; for instance, we found that running 'procmon' on the client kept the bug from triggering.
HOWEVER, we did find that, when the bug triggers, the client always makes a request that looks like:
"Give me /share/dfsref/path/to/file.zip". The samba server responds "STATUS_OBJECT_PATH_NOT_FOUND", but when the client asks a Windows DFS server for the same file, the Windows server returns "STATUS_PATH_NOT_COVERED" (errors as translated by wireshark) - which apparently means something entirely different.
A workaround we found - strangely - is that adding "follow symlinks = no" to the dfs share causes the Windows client to behave differently, such that the client never asks for the full path to the file.
This behavior was observed with:
RHEL5 running samba3x-3.5.4-0.83.el5_7.2
RHEL6 running samba-3.5.6-86.el6_1.4.x86_64
I checked the changelogs for samba 3.5.7+, and nothing indicates this bug might be fixed. We do however have a test RHEL6 server where we might be able to try newer packages.
Is there any chance you get get me a debug level to when the STATUS_OBJECT_PATH_NOT_FOUND error is returned in place of the STATUS_PATH_NOT_COVERED ?
If you can get me that I stand a chance of tracking this down and fixing it, without it it's just an unreproducible bug :-(.
Created attachment 6959 [details]
msdfs log level 10 debug log showing error
Glad to provide a debug log. This was on the RHEL6 box running samba-3.5.6-86.el6_1.4.x86_64. I set log level to 3, and msdfs to 10. The NT_STATUS_OBJECT_PATH_NOT_FOUND is sent twice, on lines 1674 and 1706.
Note that the full path would be:
\sbdrive.bridgewater.edu\shares\dparsley\StaffHomes\mflores\test.zip; apparently SB8391~7 is a shortened version of StaffHomes? (not sure how that works?)
Oh, that's very interesting. Looks like the client is sending an 8.3 mangled name "SB8391~7" in the DFS path instead of "StaffHomes".
I bet we don't handle the demangling correctly in the DFS case - we do in the non-DFS case.
Can you attach your smb.conf please - "SB8391~7" looks like an old mangled name, not a hash2 style mangled name.
Also, this doesn't look like a debug level 10 log - I don't see any raw packet data in the log. Please send a debug level 10 log. Run smbd -d10 to make sure it really is using debug level 10.
Oh yes, I see above you only set log level to 3, and msfds to 10. I want a full level 10 log from all components please.
Created attachment 6970 [details]
Raw patch for 3.5.next
Ok, can you try this patch. I think it will fix the problem.
The issue is that we're correctly demangling the pathname inside unix_convert() called from dfs_path_lookup(), but then we're not using the correctly returned demangled name in the subsequent path walk. We're stupidly just doing :
create_synthetic_smb_fname(ctx, pdp->reqpath, NULL, NULL, &smb_fname);
which throws away the correctly demangled smb_fname->base_name we just created and replaces it with a copy of the mangled name that was incoming. This fix will work as when we demangle a 8.3 mangled path component it can only be replaced with a valid single component, so the walk over the pathname via '/' characters should still be correct even with the demangled name.
Please test this both with and without "follow symlinks = no" but it should work in both cases.
Thanks for the patch! I rebuilt the rpms on our RHEL6 test box, and we stress tested with a variety of zip files known to break before, and now can't reproduce the error with or without "follow symlinks = no". As before, when "follow symlinks = no" is set, the client behaves differently and doesn't make the same request.
Next week I can try rebuilding and testing on the production RHEL5 server; your patch and explanation look simple enough that it's probably "obviously correct" and safe.
Let me know if you still need anything from me. I'd like to file a bug with Red Hat bugzilla and try to get this into their packages ahead of them shipping Samba 3.6 (assuming your fix will likely find it's way into that tree, not sure if there will be more 3.5.x releases?)
Created attachment 6972 [details]
git-am fix for 3.6.1
Patch I've applied to master. Also applies cleanly to 3.5.next.
Metze please review and re-assign to Karolin for inclusion.
The bug reporter has confirmed this patch fixes the issues.
Comment on attachment 6972 [details]
git-am fix for 3.6.1
Karolin, please pick for the release
Pushed to v3-5-test and v3-6-test.
Closing out bug report.