Bug 8493 - DFS breaks zip file extracting unless "follow symlinks = no" set
Summary: DFS breaks zip file extracting unless "follow symlinks = no" set
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.5
Classification: Unclassified
Component: File services (show other bugs)
Version: 3.5.6
Hardware: All All
: P5 normal
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-09-27 20:45 UTC by David L. Parsley
Modified: 2011-10-03 17:41 UTC (History)
1 user (show)

See Also:


Attachments
msdfs log level 10 debug log showing error (114.32 KB, text/x-log)
2011-09-28 17:47 UTC, David L. Parsley
no flags Details
Raw patch for 3.5.next (543 bytes, patch)
2011-09-29 23:45 UTC, Jeremy Allison
no flags Details
git-am fix for 3.6.1 (1.77 KB, patch)
2011-09-30 23:15 UTC, Jeremy Allison
metze: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description David L. Parsley 2011-09-27 20:45:44 UTC
In short: when accessing .zip files across a dfs referral on Windows 7 & Windows 8 clients (we've installed a recent devel version), SOMETIMES you get "Invalid zip file".

We've been wrestling with this buglet for a while and have recently put a lot of work into it; taking wireshark dumps, etc. (we can provide these if requested).

The big problem with this bug is that it's intermittent, and I can't come up with a scenario that's 100% reproducible - it's very sensitive to how the client makes the request as to whether the bug triggers; for instance, we found that running 'procmon' on the client kept the bug from triggering.

HOWEVER, we did find that, when the bug triggers, the client always makes a request that looks like:
"Give me /share/dfsref/path/to/file.zip". The samba server responds "STATUS_OBJECT_PATH_NOT_FOUND", but when the client asks a Windows DFS server for the same file, the Windows server returns "STATUS_PATH_NOT_COVERED" (errors as translated by wireshark) - which apparently means something entirely different.

A workaround we found - strangely - is that adding "follow symlinks = no" to the dfs share causes the Windows client to behave differently, such that the client never asks for the full path to the file.

This behavior was observed with:
RHEL5 running samba3x-3.5.4-0.83.el5_7.2
RHEL6 running samba-3.5.6-86.el6_1.4.x86_64

I checked the changelogs for samba 3.5.7+, and nothing indicates this bug might be fixed. We do however have a test RHEL6 server where we might be able to try newer packages.
Comment 1 Jeremy Allison 2011-09-27 23:09:20 UTC
Is there any chance you get get me a debug level to when the STATUS_OBJECT_PATH_NOT_FOUND error is returned in place of the STATUS_PATH_NOT_COVERED ?

If you can get me that I stand a chance of tracking this down and fixing it, without it it's just an unreproducible bug :-(.

Jeremy.
Comment 2 David L. Parsley 2011-09-28 17:47:42 UTC
Created attachment 6959 [details]
msdfs log level 10 debug log showing error

Glad to provide a debug log. This was on the RHEL6 box running samba-3.5.6-86.el6_1.4.x86_64. I set log level to 3, and msdfs to 10. The NT_STATUS_OBJECT_PATH_NOT_FOUND is sent twice, on lines 1674 and 1706.
Comment 3 David L. Parsley 2011-09-28 17:51:57 UTC
Note that the full path would be:
\sbdrive.bridgewater.edu\shares\dparsley\StaffHomes\mflores\test.zip; apparently SB8391~7 is a shortened version of StaffHomes? (not sure how that works?)
Comment 4 Jeremy Allison 2011-09-29 22:46:12 UTC
Oh, that's very interesting. Looks like the client is sending an 8.3 mangled name "SB8391~7" in the DFS path instead of "StaffHomes".

I bet we don't handle the demangling correctly in the DFS case - we do in the non-DFS case.

Can you attach your smb.conf please - "SB8391~7" looks like an old mangled name, not a hash2 style mangled name.

Also, this doesn't look like a debug level 10 log - I don't see any raw packet data in the log. Please send a debug level 10 log. Run smbd -d10 to make sure it really is using debug level 10.

Jeremy.
Comment 5 Jeremy Allison 2011-09-29 22:47:08 UTC
Oh yes, I see above you only set log level to 3, and msfds to 10. I want a full level 10 log from all components please.
Comment 6 Jeremy Allison 2011-09-29 23:45:18 UTC
Created attachment 6970 [details]
Raw patch for 3.5.next

Ok, can you try this patch. I think it will fix the problem.

The issue is that we're correctly demangling the pathname inside unix_convert() called from dfs_path_lookup(), but then we're not using the correctly returned demangled name in the subsequent path walk. We're stupidly just doing :

create_synthetic_smb_fname(ctx, pdp->reqpath, NULL, NULL, &smb_fname);

which throws away the correctly demangled smb_fname->base_name we just created and replaces it with a copy of the mangled name that was incoming. This fix will work as when we demangle a 8.3 mangled path component it can only be replaced with a valid single component, so the walk over the pathname via '/' characters should still be correct even with the demangled name.

Please test this both with and without "follow symlinks = no" but it should work in both cases.

Jeremy.
Comment 7 David L. Parsley 2011-09-30 19:03:40 UTC
Jeremy,

Thanks for the patch! I rebuilt the rpms on our RHEL6 test box, and we stress tested with a variety of zip files known to break before, and now can't reproduce the error with or without "follow symlinks = no". As before, when "follow symlinks = no" is set, the client behaves differently and doesn't make the same request.

Next week I can try rebuilding and testing on the production RHEL5 server; your patch and explanation look simple enough that it's probably "obviously correct" and safe.

Let me know if you still need anything from me. I'd like to file a bug with Red Hat bugzilla and try to get this into their packages ahead of them shipping Samba 3.6 (assuming your fix will likely find it's way into that tree, not sure if there will be more 3.5.x releases?)

-David
Comment 8 Jeremy Allison 2011-09-30 23:15:23 UTC
Created attachment 6972 [details]
git-am fix for 3.6.1

Patch I've applied to master. Also applies cleanly to 3.5.next.

Metze please review and re-assign to Karolin for inclusion.

The bug reporter has confirmed this patch fixes the issues.

Jeremy.
Comment 9 Stefan Metzmacher 2011-10-01 00:59:48 UTC
Comment on attachment 6972 [details]
git-am fix for 3.6.1

Looks ok
Comment 10 Stefan Metzmacher 2011-10-01 01:00:31 UTC
Karolin, please pick for the release
Comment 11 Karolin Seeger 2011-10-03 17:41:02 UTC
Pushed to v3-5-test and v3-6-test.
Closing out bug report.

Thanks!