Bug 9519 - Samba returns unexpected error on SMB posix open
Summary: Samba returns unexpected error on SMB posix open
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.0
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.0.0
Hardware: All Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-12-23 12:54 UTC by frederik.vogelsang
Modified: 2013-02-18 08:46 UTC (History)
3 users (show)

See Also:


Attachments
Samba log with loglevel=10 (121.62 KB, application/x-bzip)
2012-12-23 12:54 UTC, frederik.vogelsang
no flags Details
Client network capture when error occurs (281.11 KB, application/x-bzip)
2012-12-23 12:55 UTC, frederik.vogelsang
no flags Details
this patch masks the flag ... but I think it is too aggressive a warning and makes it looks like a kernel bug rather than an app bug. (701 bytes, patch)
2013-02-12 02:32 UTC, Steve French
no flags Details
2nd try at kernel patch to warn on illegal app behavior (and ignoring O_EXCL) (993 bytes, patch)
2013-02-12 02:42 UTC, Steve French
no flags Details
git-am fileserver patch. (1.37 KB, patch)
2013-02-12 18:48 UTC, Jeremy Allison
jlayton: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description frederik.vogelsang 2012-12-23 12:54:06 UTC
Created attachment 8365 [details]
Samba log with loglevel=10

I am using Samba4 to share the home directories of users. Whenever a user starts LibreOffice, the client system hangs and the system log says:

> CIFS VFS: server 10.0.0.1 of type Samba 4.0.0 returned unexpected error on SMB posix open, disabling posix open support. Check if server update available.

First I thought this could be related to #9196, so I've patched the Samba server with the patches provided in that bug report, but I am still getting these hangs related to POSIX open.

Server:
* Samba 4.0.0, AD Domain Controller, with patch from #9196
* Linux 3.7.1, amd64

Clients:
* Linux 3.7.0, amd64
* User home (cifs) mounted through pam_mount
Comment 1 frederik.vogelsang 2012-12-23 12:55:18 UTC
Created attachment 8366 [details]
Client network capture when error occurs
Comment 2 Jeremy Allison 2012-12-23 20:26:13 UTC
No, bug #9196 is SMB2-only.

I'll try and take a look at these logs over the holidays.

Jeremy.
Comment 3 frederik.vogelsang 2013-02-08 16:40:42 UTC
Any news on this one? I just upgraded to 4.0.3 and this bug is still present:

> CIFS VFS: server 10.0.0.1 of type Samba 4.0.3 returned unexpected error on SMB posix open, disabling posix open support. Check if server update available.
Comment 4 Jeremy Allison 2013-02-08 23:26:11 UTC
Ok, it's a client bug in the Linux kernel cifsfs.

In the network capture look at packet 2222:

In the data section of the posix_open we have:

Offset           Value (converted from little-endian to native byte order)
0000             0x02
0004             0x21

The 0x2 at offset 0 == REQUEST_OPLOCK

However, at offset 4 we have the wire_open_mode. 0x21 maps to:

#define SMB_O_EXCL                       0x20
#define SMB_O_RDONLY                      0x1

But sending O_EXCL on its own without SMB_O_CREAT makes no sense, so the Samba server code says:

        /* First take care of O_CREAT|O_EXCL interactions. */
        switch (wire_open_mode & (SMB_O_CREAT | SMB_O_EXCL)) {
                case (SMB_O_CREAT | SMB_O_EXCL):
                        /* File exists fail. File not exist create. */
                        create_disp = FILE_CREATE;
                        break;
                case SMB_O_CREAT:
                        /* File exists open. File not exist create. */
                        create_disp = FILE_OPEN_IF;
                        break;
                case 0:
                        /* File exists open. File not exist fail. */
                        create_disp = FILE_OPEN;
                        break;
                case SMB_O_EXCL:
                        /* O_EXCL on its own without O_CREAT is undefined. */
                default:
                        DEBUG(5,("smb_posix_open: invalid create mode 0x%x\n",
                                (unsigned int)wire_open_mode ));
                        return NT_STATUS_INVALID_PARAMETER;
        }

Note in the SMB_O_EXCL case we fall through to the error NT_STATUS_INVALID_PARAMETER case, which is exactly what we see in the reply packet 2223.

I'm going to re-assign this bug to Jeff and Steve as they are the client maintainers.

Jeremy.
Comment 5 Jeff Layton 2013-02-09 11:44:09 UTC
Probably, it's time to think about removing that hack that disables POSIX open support when we get back an EINVAL.

In any case, the code that sets these flags just mirrors what the application sends:

        if ((flags & O_ACCMODE) == O_RDONLY)
                posix_flags = SMB_O_RDONLY;
[...]

        if (flags & O_CREAT)
                posix_flags |= SMB_O_CREAT;
        if (flags & O_EXCL)
                posix_flags |= SMB_O_EXCL;

...could it be that the userland app is sending down these nonsensical flag combinations? I'd be hesitant to second-guess what the application is passing.
Comment 6 Jeff Layton 2013-02-09 12:17:03 UTC
Also, on Linux at least, if I pass down a flag combination of O_RDONLY|O_EXCL when open a file on a local fs, the O_EXCL is just ignored:

    open("test", O_RDONLY|O_EXCL)           = 3

Jeremy, perhaps samba is being just a little too strict here? If the goal is to mirror local fs semantics, then it probably ought to just ignore a bare SMB_O_EXCL flag. Maybe that should read:

                case SMB_O_EXCL:
                case 0:
                        /* File exists open. File not exist fail. */
                        create_disp = FILE_OPEN;
                        break;

...with a nice comment explaining why that's being done?

What would be nice though is to also verify that the userland app (libreoffice?) is actually doing this and have them fix it too. I imagine it would cause problems on some platforms, even if Linux is generally tolerant of it.
Comment 7 Jeremy Allison 2013-02-09 16:54:28 UTC
I'm sure the application is just passing down O_EXCL to you. The problem is without O_CREAT that is undefined. From man 2 open.

       O_EXCL Ensure that this call creates the file: if this flag  is  speci‐
              fied  in  conjunction with O_CREAT, and pathname already exists,
              then open() will fail.

              When these two flags are specified, symbolic links are not  fol‐
              lowed: if pathname is a symbolic link, then open() fails regard‐
              less of where the symbolic link points to.

              In general, the behavior of O_EXCL is undefined if  it  is  used
              without  O_CREAT.   There  is  one  exception:  on Linux 2.6 and
              later, O_EXCL can be used without O_CREAT if pathname refers  to
              a  block  device.   If  the block device is in use by the system
              (e.g., mounted), open() fails with the error EBUSY.

You can't break applications, so my advice would be to just silently strip out O_EXCL if passed in without O_CREAT before sending it to the server.

Jeremy.
Comment 8 Jeremy Allison 2013-02-09 16:56:30 UTC
So yeah, I can fix this in the server. But I also recommend you fix this in the client as well. You should never be sending a bare O_EXCL on the wire (IMHO).

Jeremy.
Comment 9 Steve French 2013-02-11 23:29:05 UTC
(In reply to comment #6)
> Also, on Linux at least, if I pass down a flag combination of O_RDONLY|O_EXCL
> when open a file on a local fs, the O_EXCL is just ignored:
> 
>     open("test", O_RDONLY|O_EXCL)           = 3
> 
> Jeremy, perhaps samba is being just a little too strict here? If the goal is to
> mirror local fs semantics, then it probably ought to just ignore a bare
> SMB_O_EXCL flag. Maybe that should read:
> 
>                 case SMB_O_EXCL:
>                 case 0:
>                         /* File exists open. File not exist fail. */
>                         create_disp = FILE_OPEN;
>                         break;
> 
> ...with a nice comment explaining why that's being done?
> 
> What would be nice though is to also verify that the userland app
> (libreoffice?) is actually doing this and have them fix it too. I imagine it
> would cause problems on some platforms, even if Linux is generally tolerant of
> it.

I tried this on libreoffice 3.4.6 to Samba 4.0.1-pre and don't see the hang
Comment 10 Steve French 2013-02-11 23:51:24 UTC
well, I can reproduce it if I write a C program to force an open with O_EXCL | O_RDNOLY.

You get different errors if the file doesn't exist vs. if it does exits, but it looks like it is similar whether the file is local or remote (at least on 3.8-rc6) ... other than we disable posix support incorrectly (when the client pases us the strange flags) and the server returns the error.  We should fix the client to not turn off posix just because an app passed a bad flag combination.
Comment 11 Steve French 2013-02-12 01:28:53 UTC
Since the fix for the Samba 3.3 (and earlier bug) in 64cc2c63694a03393985ffc8b178e72f52dd8a06 shows that we have to check for EINVAL on posix open, and Samba 3.3 and earlier is not that rare...then I think we are probably better off masking the flag if there is a common app that sends it so we don't run into this on Samba 4.

What is the version of libreoffice (was wondering if they fixed it)?

and had anyone tried this on Samba 3.4, 3.5 or 3.6?
Comment 12 Steve French 2013-02-12 01:49:39 UTC
Samba 3.6 returned the same error
Comment 13 Steve French 2013-02-12 02:32:45 UTC
Created attachment 8541 [details]
this patch masks the flag ... but I think it is too aggressive a warning and makes it looks like a kernel bug rather than an app bug.
Comment 14 Steve French 2013-02-12 02:42:59 UTC
Created attachment 8542 [details]
2nd try at kernel patch to warn on illegal app behavior (and ignoring O_EXCL)

I like this patch better.  Although it doesn't include details about the app name, at least it warns once in a way that implies the app needs to be fixed not that there is a kernel bug.
Comment 15 Jeremy Allison 2013-02-12 18:48:53 UTC
Created attachment 8544 [details]
git-am fileserver patch.

Can I get a review of this fileserver patch ?

Once you've +1'ed it I'll get it into master/4.0.next/3.6.next.

Cheers,

Jeremy.
Comment 16 Jeff Layton 2013-02-12 20:00:44 UTC
Comment on attachment 8544 [details]
git-am fileserver patch.

Looks good to me.
Comment 17 frederik.vogelsang 2013-02-12 20:09:04 UTC
(In reply to comment #11)
> What is the version of libreoffice (was wondering if they fixed it)?
Right now I am using LibreOffice 3.6.2 (Ubuntu 12.10) and this version also triggers this error. But only when the $HOME folder (or actually ~/.config/libreoffice/) is CIFS-mounted. Opening documents with LibreOffice from a CIFS share with a local $HOME does not trigger the posix_open error, so it must be caused by an operation within ~/.config/libreoffice/.

There are actually even more applications causing these hangs on SMB posix open, but with LibreOffice this was 100% reproducible for me. It is sometimes hard to find the actual culprit with a CIFS-mounted $HOME - maybe a few Samba Devs should consider this setup for their personal use, as it might uncover more bugs related to Samba and the Linux CIFS client ;)

Anyway, it is nice that you are looking into this issue.
Comment 18 Jeremy Allison 2013-02-13 01:29:27 UTC
Patch applies cleanly to 3.6.next and 4.0.next - re-assigning to Karolin for inclusion.

Jeremy.
Comment 19 Jeff Layton 2013-02-13 11:44:11 UTC
Comment on attachment 8542 [details]
2nd try at kernel patch to warn on illegal app behavior (and ignoring O_EXCL)

It doesn't tell you anything about the application that caused that error to fire though which makes it pretty useless for tracking it down. I'd probably just make it ignore the O_EXCL when O_CREAT isn't set, and leave out the cERROR message.

At the same time, it's probably a good idea to think about deprecating the "broken_posix_open" workaround. Changing the client's behavior on an EINVAL return has always been a bit dodgy. There are plenty of ways to get that sort of error back that don't involve a broken server.
Comment 20 Karolin Seeger 2013-02-13 18:40:04 UTC
Pushed to v3-6-test and autobuild-v4-0-test.
Comment 21 Karolin Seeger 2013-02-18 08:46:59 UTC
Pushed to v4-0-test.
Closing out bug report.

Thanks!