Bug 5446 - NFSv4/ZFS ACLs dont work on NFS mount because POSIX ACL are being used
NFSv4/ZFS ACLs dont work on NFS mount because POSIX ACL are being used
Status: RESOLVED FIXED
Product: Samba 3.2
Classification: Unclassified
Component: VFS Modules
3.2.0
All Solaris
: P3 normal
: ---
Assigned To: Michael Adam
Samba QA Contact
http://bugs.opensolaris.org/view_bug....
:
: 5447 (view as bug list)
Depends on:
Blocks: 5135 5447
  Show dependency treegraph
 
Reported: 2008-05-07 02:28 UTC by Nils Goroll
Modified: 2009-01-05 09:19 UTC (History)
5 users (show)

See Also:


Attachments
Suggested patch / workaround (1.06 KB, patch)
2008-05-07 02:41 UTC, Nils Goroll
no flags Details
Cleaned up patch for vfs_zfsacl.c (3.47 KB, patch)
2008-06-16 13:40 UTC, Nils Goroll
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nils Goroll 2008-05-07 02:28:08 UTC
The opensolaris bug contains the basic bug description

This issue is caused by the fact that the Solaris NFSv4 client
provides backwards compatibility (to a certain extent) to POSIX draft
ACLs, which breaks NFSv4 ACLs.

Background information:

* Why does ACL inheritance work with Samba on ZFS ?

  When initially analysing this issue, I was puzzled by the fact that
  ACL inheritance does work, even though "all" acl() calls
  fail. Here's an except of a truss:

   acl("acl1/acluser1", GETACLCNT, 0, 0x00000000)  Err#89 ENOSYS
   mkdir("acl1/acluser1/b", 040755)                = 0
   acl("acl1/acluser1/b", GETACLCNT, 0, 0x00000000) Err#89 ENOSYS
   lstat64("acl1/acluser1/b", 0x08047460)          = 0
   acl("acl1/acluser1", GETACLCNT, 0, 0x00000000)  Err#89 ENOSYS
   acl("acl1/acluser1", GETACLCNT, 0, 0x00000000)  Err#89 ENOSYS

  Those system calls fail simply because ZFS does NOT support POSIX
  draft ACLs, but only NFSv4 style ACLs (acl() commands with ACE_).

* Why ACLs fail on NFS

  The Solaris NFS client returns POSIX draft ACLs or NFSv4 style ACLs,
  whatever is being requested (see nfs4_getsecattr in
  uts/common/fs/nfs/nfs4_vnops.c and nfs4_acl.c).

  As a consequence, samba never gets to use NFSv4 acls and only uses
  the POSIX draft ACLs. Because the NFS nfs client acl compatibility
  is incomplete, this leads to insufficient results.

Suggestions to fix this:

a) The Solaris NFS client could be amended by a mount option whether
   or not the POSIX draft acl compatiblity interface should be enabled
   or not.

b) Samba should not use POSIX draft acl() functions if NFSv4 ACLs are
   to be used

A workaround (or is this a solution?) to achieve b) is to change
vfs_zfsacl.c to return an error for all POSIX ACL VFS operations. I
will attach a patch.
Comment 1 Nils Goroll 2008-05-07 02:41:46 UTC
Created attachment 3278 [details]
Suggested patch / workaround
Comment 2 Michael Adam 2008-05-07 04:41:42 UTC
Thanks for reporting this!

The problem is in fact that the separation of the nfs and posix acl layers is incomplete. Your patch provides a workaround by adding dummy implementations the posix acl backend vfs methods to the zfs acl module, but I am not certain it is correct this way, because the dummy function signature does not match the various vfs methods.

But there is current work going on to fix that properly: Jeremy Allison has recently started removing explicit references to posix acl code from directory and inheritance functions. This is ongoing work in the new v3-3-test branch.

Could you check whether the v3-3-branch brings improvement here?
You can retriev snapshot tarballs from here:
http://repo.or.cz/w/Samba.git?a=shortlog;h=refs/heads/v3-3-test

Comment 3 Jura Sasek 2008-05-07 06:47:10 UTC
Here is the "zfsacl" .so-object on Solaris which is handling the ACL using the NFSv4 conversion in module nfs4_acls.c . It is possible because the ACL on ZFS and NFSv4 is handled by the set of functions ACE_SETACL, ACE_GETACL and ACE_GETACLCNT of the acl(2) system-call. Using of this set of the functions implies using of the  different type of the data structure ace_t instead of the original aclent_t which is used by the set ofthe functions SETACL, GETACL and GETACLCNT.

Calling of the acl(2) using the SETACL, GETACL and GETACLCNT set of functions on ZFS or NFSv4 is causing the ENOSYS error on return because such "compatibility bridge" is not supported on vfs level in Solaris. This is because the ace_t is using the access/deny model instead of the POSIX "masking" model used in aclent_t so there are ambiguities in such conversion. Details can be seen here:

http://www.ietf.org/mail-archive/web/nfsv4/current/msg03286.html

Evaluating the problem of handling of ACL on samba shares on ZFS volumes I was also evaluating of the possibility to add this handling into the "solarisacl" module and switch the handling procedure according of the result of the previous operation returning the ENOSYS errno value but such approach seems to me too costly. Advantage of such approach is the possibility of the handle of the files located on samba share combining the ufs/zfs i.e. - you have the patch /samba_share shared by samba which includes the directory path /samba_share/zfs_mount where the zfs volume is mounted. Now you have files/subdirs located in both paths (/samba_share and /samba_share/zfs_mount) which all can be handled as one samba share by this approach.

Because the mentioned case is relatively rare in the fact I have decided there is a problem to judge if the ufs or zfs/nfsv4 should be taken as default where the other filesystem will be penalized by the additional operation. Currently the administrator of the samba server should decide which share is ZFS/NFSv4 or UFS and set appropriate handling either by the "zfsacl" or by the "solarisacl" Samba VFS module where:

"zfsacl" module is dynamically linked so it should be explicitly specified by the samba "share specific" option:

  vfs objects = zfsacl

where the additional sub-options controlling the NFSv4 ACE conversion could be added i.e.:

   nfs4: mode = simple
   nfs4: acedup = merge

otherwise the default "solarisacl" module (linked statically) is used.

Comment 4 Nils Goroll 2008-05-07 10:43:04 UTC
Obnox/Michael,

I have tried the latest 3-3 snapshot, no change:

totaleclipse:/netapp/nfsacl/acl1/acl2# /usr/sfw/sbin/smbd -V
Version 3.3.0pre1-GIT-UNKNOWN-test

totaleclipse:/netapp/nfsacl/acl1/acl2# ls -dV . aaa
drwxr-xr-x+  4 root     root        4096 May  7 17:35 .
     user:acluser3:rwxp---ARW-Co-:fd-----:allow
     user:acluser1:rwxp---ARW-Co-:fd-----:allow
     user:acluser2:rwxp---ARW-Co-:fd-----:allow
            owner@:--------------:-------:deny
            owner@:rwxp---A-W-Co-:fd-----:allow
            group@:-w-p----------:-------:deny
            group@:r-x-----------:fd-----:allow
         everyone@:-w-p---A-W-Co-:-------:deny
         everyone@:r-x-----------:fd-----:allow
drwxrwx---   2 acluser1 other       4096 May  7 17:35 aaa
            owner@:rwxp-DaA--cC-s:-------:allow
            owner@:--------------:-------:deny
            group@:-------A---C--:-------:deny
            group@:rwxp-Da---c--s:-------:allow
            group@:-------A---C--:-------:deny
         everyone@:------a---c--s:-------:allow
         everyone@:rwxp-D-A---C--:-------:deny

a truss on the process serving my test user shows that POSIX draft acl() SETACL is being called:

totaleclipse:/netapp/nfsacl/acl1/acl2# egrep '^(acl|mkdir|rename|chmod)' /tmp/tr_ 
mkdir("acl1/acl2/Neuer Ordner", 0770)           = 0
acl("acl1/acl2/Neuer Ordner", GETACLCNT, 0, 0x00000000) = 4
acl("acl1/acl2/Neuer Ordner", GETACL, 4, 0x085FB088) = 4
acl("acl1/acl2/Neuer Ordner", GETACLCNT, 0, 0x00000000) = 4
acl("acl1/acl2/Neuer Ordner", GETACL, 4, 0x085FB088) = 4
acl("acl1/acl2/Neuer Ordner", SETACL, 4, 0x085FC9D8) = 0  <<<====
acl("acl1/acl2/Neuer Ordner", GETACLCNT, 0, 0x00000000) = 4
acl("acl1/acl2/Neuer Ordner", GETACL, 4, 0x085FB088) = 4
acl("acl1/acl2", ACE_GETACLCNT, 0, 0x00000000)  = 9
acl("acl1/acl2", ACE_GETACL, 9, 0x085FF388)     = 9
rename("acl1/acl2/Neuer Ordner", "acl1/acl2/aaa") = 0

I dont have time to investigate further tonight, but I wanted
to keep you postet with with intermediate result.

BTW: The zfs_vfsacl patch does not work with 3-3-test (triggers
SIGSEGV), have not investigated either.
Comment 5 Nils Goroll 2008-06-16 08:50:56 UTC
Hi All,

I have made up my mind about this issue, please correct me if I'm
wrong. IMHO, there are only a couple of ways out here:

- Modify the Solaris NFS(v4) client code to not support acl(2) with
  POSIX ACLs (iow, add an option to disable the compatibility wrapper)
- Make vfs_solarisacl know that it should not work on NFSv4 mounts
- Add an "official" way for VFS modules to define conflicting VFS
  methods from other modules
- or invalidate conflicting methods in the module loaded last, which
  is what I have suggested here.

I dont see any of the first three on the horizon, so  I suggest
we use the fourth method for the time being. I dont consider this
a workaround, but rather one way of implementing vfs method invalidation.

I'll add a patch with cleaned up code shortly.
Comment 6 Nils Goroll 2008-06-16 13:40:57 UTC
Created attachment 3346 [details]
Cleaned up patch for vfs_zfsacl.c

Have cleaned up function signatures and successfully repeated
validation tests.
Comment 7 Nicolas Dorfsman (mail address dead) 2008-06-17 14:06:12 UTC
Hi,

I'll link bug 5135 to this one.
Anyway,  ACLs on ZFS doesn't work even on ZFS.  It works with very simple ACLs.  I have a customer with sophisticated rules...results are awful (check logs on 5135).   I really think the correct path to fix is to remove direct call to posix_acl.c . Or is there any improvement in this area ?
I may make new tests/logs next week if somebody hav time to help me on those cases.

Nicolas
Comment 8 Nils Goroll 2008-06-17 15:24:41 UTC
Re Nicolas: I Think obnox has been working on these improvements lately,
i have also left a note in bug#5135. Thanks obnox !
Comment 9 Michael Adam 2008-11-16 17:58:44 UTC
*** Bug 5447 has been marked as a duplicate of this bug. ***
Comment 10 Michael Adam 2008-11-26 07:37:35 UTC
Nils: 
I have pushed your cleaned up patch basically as is 
to v3-2-test, v3-3-test and master.
So it will be in one of the next bugfix releases and in 3.3.0

Thanks - Michael
Comment 11 Michael Adam 2009-01-05 09:19:02 UTC
I got confirmation from the reporter. --> marking the bug as fixed.