Bug 11596 - Creating Symlinks from Mac client fails when using SMB2 with AIO
Summary: Creating Symlinks from Mac client fails when using SMB2 with AIO
Status: RESOLVED DUPLICATE of bug 13047
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.2.4
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-11-10 16:33 UTC by John Mulligan (dead mail address)
Modified: 2017-09-22 18:51 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description John Mulligan (dead mail address) 2015-11-10 16:33:04 UTC
Attempting to create a symlink from a Mac OS X 10.10.5 client fails with an Input/output error when the share is enabled for AIO. Samba version is 4.2.4.

$ ln -s foo.txt symlink3.txt
ln: symlink3.txt: Input/output error

The content of the file after the error message appears to be an "mfsymlink" with the correct contents but the client treats them as files rather than symlinks. When the directory is listed (ls -l, for instance) all of the mfsymlinks in the share appear as text files in the share with AIO enabled. When the directory is listed with AIO disabled all of the mfsymlinks that were created look like symlinks regardless of if there was an error on create.


From the server, the resulting files, either created successfully or with an error appear to have the same contents:
# md5sum symlink*txt
cdc02dd0400e70ba5126c5c8ea45e137  symlink.txt
cdc02dd0400e70ba5126c5c8ea45e137  symlink2.txt
cdc02dd0400e70ba5126c5c8ea45e137  symlink3.txt
cdc02dd0400e70ba5126c5c8ea45e137  symlink4.txt
cdc02dd0400e70ba5126c5c8ea45e137  symlink5.txt
cdc02dd0400e70ba5126c5c8ea45e137  symlink6.txt
cdc02dd0400e70ba5126c5c8ea45e137  symlink7.txt

[/srv/symlink7.txt]
00000000  58 53 79 6d 0a 30 30 30  37 0a 34 66 64 38 63 63  |XSym.0007.4fd8cc|
00000010  38 35 63 61 39 65 65 62  64 32 66 61 33 63 35 35  |85ca9eebd2fa3c55|
00000020  30 30 36 39 63 65 32 38  34 36 0a 66 6f 6f 2e 74  |0069ce2846.foo.t|
00000030  78 74 0a 20 20 20 20 20  20 20 20 20 20 20 20 20  |xt.             |
00000040  20 20 20 20 20 20 20 20  20 20 20 20 20 20 20 20  |                |
*
00000420  20 20 20 20 20 20 20 20  20 20 20                 |           |
0000042b


I am attaching smbd logs and tcpdump packet captures for the create error to this bug.  File system is XFS on CentOS 7.
I have pasted below what I think may be the relevant section of the log from the error case. This corresponds to a STATUS_INTERNAL_ERROR appearing in the related tcpdump:


[2015/11/09 16:19:31.824661, 10, pid=6794, effective(401113, 400513), real(401113, 0)] ../source3/smbd/aio.c:962(aio_pwrite_smb2_done)
  pwrite_recv returned 1067, err = no error
[2015/11/09 16:19:31.824721,  3, pid=6794, effective(401113, 400513), real(401113, 0)] ../source3/smbd/aio.c:966(aio_pwrite_smb2_done)
  aio_pwrite_smb2_done: request cancelled (mid[2338])
[2015/11/09 16:19:31.824790, 10, pid=6794, effective(401113, 400513), real(401113, 0)] ../source3/smbd/smb2_server.c:2706(smbd_smb2_request_error_ex)
  smbd_smb2_request_error_ex: idx[9] status[NT_STATUS_INTERNAL_ERROR] || at ../source3/smbd/smb2_write.c:135
[2015/11/09 16:19:31.824871, 10, pid=6794, effective(401113, 400513), real(401113, 0)] ../source3/smbd/smb2_server.c:2597(smbd_smb2_request_done_ex)
  smbd_smb2_request_done_ex: idx[9] status[NT_STATUS_INTERNAL_ERROR] body[8] dyn[yes:1] at ../source3/smbd/smb2_server.c:2754


We have tested with aio read/write size as both 1 and 1024.

A summarized/filtered smb.conf:

[global]
        workgroup = FOREST1
        realm = FOREST1.EXAMPLE.NET
        security = ADS
        syslog = 0
        debug pid = Yes
        min receivefile size = 131072
        load printers = No
        printcap name = /dev/null
        disable spoolss = Yes
        show add printer wizard = No
        template shell = /sbin/nologin
        idmap config qa2k8: read only = no
        idmap config qa2k8: backend = rid
        idmap config qa2k8: range = 8000100-16000100
        idmap config *: read only = yes
        idmap config *: range = 200000-210000
        idmap config forest1: read only = no
        idmap config forest1: backend = rid
        idmap config forest1: range = 400000-8000000
        idmap config * : backend = tdb
        force unknown acl user = Yes

[symtest_aio]
        path = /srv
        read only = No
        inherit permissions = Yes
        inherit acls = Yes
        aio read size = 1
        aio write size = 1
        map acl inherit = Yes
        hide unreadable = Yes
        veto files = /.snapshot/
        store dos attributes = Yes
        delete readonly = Yes
        dos filemode = Yes
        vfs objects = acl_xattr

[symtest_noaio]
        path = /srv
        read only = No
        inherit permissions = Yes
        inherit acls = Yes
        map acl inherit = Yes
        hide unreadable = Yes
        veto files = /.snapshot/
        store dos attributes = Yes
        delete readonly = Yes
        dos filemode = Yes
        vfs objects = acl_xattr


My search uncovered Samba bug 11224, which seems like it could be related, however I'm using Samba 4.2.4 which should already include the fix for that bug.
Please let me know if there is any other information I can provide.
Comment 1 John Mulligan (dead mail address) 2015-11-10 16:37:54 UTC
Tarball was too large to attach, please access it here.

http://asynchrono.us/files/symlink-aio-issue.tar.xz

Let me know if there are any issues getting to it.
Comment 2 Jeremy Allison 2015-11-11 01:21:42 UTC
I'll take a look at the wireshark traces. That error comes from doing a cancel whilst the IO is still outstanding (which is strange).
Comment 3 Jeremy Allison 2015-11-11 01:26:43 UTC
Oh, looks like a chaining issue.

Client is doing SMB2

Create + GetInfo + Write + Close

and it looks like the file id isn't set inside the write when it gets scheduled...
Comment 4 Jeremy Allison 2015-11-11 01:31:18 UTC
Ah - looking at :

https://bugzilla.samba.org/show_bug.cgi?id=11224

looks to me like we're deliberately refusing this request as we want the write to go async and it's not the last in the chain.

Is this allowable behavior from the Mac client w.r.t. the SMB2 spec ? I don't think so (i.e. this looks like a Mac client SMB2 bug to me) but I'll check.
Comment 5 Jeremy Allison 2015-11-11 04:34:50 UTC
OK, pretty sure this is a Mac client bug but I'll know more tomorrow.

There are some mitigation strategies we can do but I'll update tomorrow.

Jeremy.
Comment 6 Jeremy Allison 2015-11-11 18:24:42 UTC
Right - went through the spec very carefully. This is a Mac client bug.

They send (from your packet trace):

4173	SMB2	Create Request File: symlink5.txt;GetInfo Request FILE_INFO/SMB2_FILE_INTERNAL_INFO;Write Request Len:1067 Off:0;Close Request

Now in MS-SMB2.pdf we have:

------------------------------------------------------------------------------------------------
3.3.5.2.7
Handling Compounded Requests
If the NextCommand field in the SMB2 header of the request is not equal to 0, the server MUST
process the received request as a compounded series of requests. The server SHOULD<206> fail
requests in a compound chain which require asynchronous processing.

and note <206> is:

<206> Section 3.3.5.2.7: When an operation in a compound request requires asynchronous
processing, Windows servers fail them with STATUS_INTERNAL_ERROR except for the following two
cases: when a create request in the compound request triggers an oplock break, or when the
operation is last in the compound request.
If a create request in a compound chain is processed asynchronously due to an oplock break,
Windows servers send only one interim response to the client, which includes the compounded
results of all prior operations in the request. If other create requests in the compound chain are
completed before the oplock break, the level of the broken oplock is not updated in those create
responses in the compound response.
If the operation is last in the compound request, a single interim response is sent, which includes
the compounded results of all prior operations in the request. Processing of the final operation in the
request then proceeds asynchronously.
------------------------------------------------------------------------------------------------

When you set "aio write size = 1" in your smb.conf it means that *ALL* SMB2 write calls will
be forced to use asynchronous processing.

So the client should be able to handle this by resending the SMB2 write call in a non-compound
request - and it looks like against a Windows server that write call doesn't go async, so they
never see this problem.

The only immediate fix is to set:

"aio write size = X"

where X is the max length of a possible symlink that the Mac can set. That should solve the
issue in production.

In the meantime I'll raise this with Apple. The only other fix we could do it explicitly
refuse to go async in a write call that isn't the end of a chain, but that might cause
other problems.
Comment 7 Stefan Metzmacher 2016-01-20 07:46:41 UTC
(In reply to Jeremy Allison from comment #6)

aio write size = 1068
should work around this.
Can someone test this?
Comment 8 John Mulligan (dead mail address) 2016-01-26 16:07:23 UTC
(In reply to Stefan Metzmacher from comment #7)

Sorry for the delay. Yes the workaround of setting 'aio write size = 1068' does appear to work for the basic tests we performed.
Comment 9 Jeremy Allison 2016-01-26 19:25:38 UTC
Now if only we could get Apple to stop their client sending illegal requests :-(.
Comment 10 Ralph Böhme 2016-02-24 21:06:06 UTC
Sorry guys, this bugreport slipped my attention. I could have saved you some valuable time, as I ran across this issue last year.

This is a known OS X client issue that I have already reported to Apple (via sidechannels). I haven't filed a proper Apple radar for this one yet.

This issue only affects MF-symlinks, which is just a fallback from a OS X SMB perspective.

A server side fix would be to implement symlinks via reparse points which is the preferred method of OS X SMB client. This is also what Apple recommended in private conversation.
Comment 11 Jeremy Allison 2016-02-24 21:07:59 UTC
Ralph - Apple is violating the protocol spec here. Yes, they're getting away with it *for now* on Windows, but that could change at any time.

The Apple client needs to change here.
Comment 12 Ralph Böhme 2016-02-24 21:22:13 UTC
(In reply to Jeremy Allison from comment #11)
Sure. Maybe Apple fixed this in a release lately, haven't tested since then.

Anyone who is affected may either file a radar with Apple and hope for the best, or help us to implement symlinks via reparse points. :)
Comment 13 John Mulligan (dead mail address) 2017-09-22 18:19:24 UTC
Was catching up on the mailing list and it looks like fix for bug 13047 might resolve this issue. If I'm correct should this bug be marked as fixed or duplicate?
Comment 14 Jeremy Allison 2017-09-22 18:51:58 UTC
Yep, duplicate. I *thought* we already had a bug for this but couldn't find it :-).

*** This bug has been marked as a duplicate of bug 13047 ***