Created attachment 18248 [details] patch that fixes the behavior Sometimes setting a NTACL via set_nt_acl_conn() fails with NT_STATUS_INVALID_PARAMETER. The strace shows an openat() with EACCES (Permission denied). I can reproduce this with "samba-tool ntacl sysvolreset" or "samba-tool domain provision". When I run the same command again it fails again or sometime succeeds. I didn't reproduce it with "samba-tool ntacl set". The attached patch reverts a commit, which solved this on my test setup. I'll attach strace and log files which show the successful and failing "ntacl sysvolreset".
Created attachment 18249 [details] strace - fail
Created attachment 18250 [details] strace - success
Created attachment 18251 [details] log - fail
Created attachment 18252 [details] log - ok
Björn, can you create a merge request please?
Reproducer: - install Fedore 39 - dnf install samba-dc acl # (samba 4.19) - rm /etc/samba/smb.conf - samba-tool domain provision --interactive - check extra ACLs with getfacl /var/lib/samba/sysvol/ feel free to check sub-directories as well - remove extra ACLs with setfacl -b -R /var/lib/samba/sysvol - check that extra ACLs are removed with getfacl /var/lib/samba/sysvol/ - restore extra ACLs by calling samba-tool ntacl sysvolreset - check that extra ACLs are present again getfacl /var/lib/samba/sysvol/ - add Fedora 40 repository to /etc/yum.repos.d - update Samba to Fedora 40 version dnf update samba-common # (samba 4.20.0rc2) - check that new samba-tool is working if ACLs are present samba-tool ntacl sysvolreset - remove extra ACLs with setfacl -b -R /var/lib/samba/sysvol - calling samba-tool again will fail: samba-tool ntacl sysvolreset set_nt_acl_conn: init_files_struct failed: NT_STATUS_INVALID_PARAMETER ERROR(runtime): uncaught exception - (3221225485, 'An invalid parameter was passed to a service or function.') File "/usr/lib64/python3.12/site-packages/samba/netcmd/__init__.py", line 285, in _run return self.run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib64/python3.12/site-packages/samba/netcmd/ntacl.py", line 449, in run provision.setsysvolacl(samdb, sysvol, File "/usr/lib64/python3.12/site-packages/samba/provision/__init__.py", line 1754, in setsysvolacl _setntacl(os.path.join(root, name)) File "/usr/lib64/python3.12/site-packages/samba/provision/__init__.py", line 1743, in _setntacl return setntacl( ^^^^^^^^^ File "/usr/lib64/python3.12/site-packages/samba/ntacls.py", line 229, in setntacl smbd.set_nt_acl( - a second run will pass and fix the ACLs
See also https://gitlab.com/samba-team/samba/-/commit/697d41420b4f4830396acfbc96bd1f1c1f0531f4 which partially reveted the code already with an explanation: With capabilities preferred over become_root() we failed to achieve the basic goal of storing NT ACLs in xattrs using vfs_acl_xattr. This is due to the fact that apart from CAP_DAC_OVERRIDE it is manadatory to have CAP_SYS_ADMIN for write access to xattrs from security namespace[1]. Despite the option to configure the xattr name within the module we should not anticipate and miss to consider xattrs from security namespace which is far more protected even with our default name "security.NTACL". Theorotically we could make it work by adding another capability on top of existing ones. But given the functions designed around this area we may not be able to come up with a cleaner change which can handle the fallback mechanism to become_root(). Any failure to set the very first capability would put us in become_root() path where further capabilities are mostly not required. Thus reverting to old behaviour to always become_root() until we have a cleaner approach to handle the fallback while modifying multiple capabilities at once. [1] https://www.man7.org/linux/man-pages/man7/xattr.7.html
(In reply to Andreas Schneider from comment #7) Until someone can explain why the openat() as root fails with EACCES there's still some unresolved mystery here to be understood. If you revert anything, we should probably for now revert *all* the commits from the changeset that introduced this regression.
openat only does a permission check based on the file system permissions. Can you maybe print the stat and ACL information and the current process uid and capabilities? There should be an obvious enough explanation.
I guess the problem is that we drop the DAC_OVERRIDE capability unconditionally. As a result we run as uid=0 but without DAC_OVERRIDE, so subsequently and code that relies on being run as root means you can do everything, eg during provisions, may fail with EACCESS as root is not root anymore. We need some kind of logic in the set_effective_capability()/drop_effective_capability() to make sure we don't drop the capability if it was present in the initial capabilites before calling set_effective_capability().
(In reply to Ralph Böhme from comment #10) So the problem is that we call set_effective_capability(DAC_OVERRIDE_CAPABILITY) while we already have CAP_DAC_OVERRIDE in the kernel. The drop_effective_capability(DAC_OVERRIDE_CAPABILITY), really drops it and while we want to restore the state we had before set_effective_capability(DAC_OVERRIDE_CAPABILITY). I guess we need to: - remove PR_SET_KEEPCAPS, - move set_effective_capability() to source3/lib/smbd_shim.c - implement or re-use the stack that's also used by become_root() - maybe push_effective_capabilities()/pop_effective_capabilities() would be a better api and the pop function just resets to the state before the push, that would means we won't drop DAC_OVERRIDE_CAPABILITY if it was already there before the push.
(In reply to Stefan Metzmacher from comment #11) The simplest regression fix would be this: diff --git a/source3/lib/system.c b/source3/lib/system.c index 74d5dae5446..10ea98f64d8 100644 --- a/source3/lib/system.c +++ b/source3/lib/system.c @@ -637,7 +637,7 @@ static bool set_process_capability(enum smbd_capability capability, ****************************************************************************/ #if defined(HAVE_POSIX_CAPABILITIES) && defined(CAP_DAC_OVERRIDE) -static bool have_cap_dac_override = true; +static bool have_cap_dac_override = false; /* TODO: fix bug #15583 */ #else static bool have_cap_dac_override = false; #endif
+1 on this API. push_effective_capabilities()/pop_effective_capabilities() Makes much more sense and matches our current code.
Any updates? Who will go on here so that we can get a fix for this into 4.20.0?
Can we get the regression fix from comment #12 as a MR?
This bug was referenced in samba master: 58ea952fd0c716f94b1b79b8ed1829bb72732ccc 87479544381e103ee2b1def574a5865a3f6a93d9 88eb58af6783ad23d2e2b602ee9fdbbdf556b354 7f19afbd40d3ad3c8d186d0a2a64d07a2a8bd00a 10c7a3e47c62dcb1dfe7e384960d60cafcb9e44e 52ad635b2705bcfc8166bd90b1ad35ebb9cbc986 af7b930e2bfe2275cee14dc2154f2aea8875fa63 33e88911ee7a8974d52021632ca25c1ddfcb6f45 32aa11e9b570ce1c0bec889b699bc4897c9d9843 0dec2ef188a93504da873d927ca2b26f8c491fb8
Created attachment 18277 [details] Patch for 4.20 backported from master
Created attachment 18278 [details] Patch for 4.20 cherry-picked from master
Pushed to autobuild-v4-20-test.
This bug was referenced in samba v4-20-test: bb68b730290c20fe0ecc6e72580fbab6ea674692 dc1616263034a78b6a43f9f89d59a8090f881a1b 4f38859f5d861a5f77c223ad720416b719e8e2f8 d0c295e5344d7858cf75e19184e3842de06f27ab f6d549de47c463905c5d95bc6556e2c7c4a25540 6e0986b2c30e78e0c9ffec62fb0666cd85dad316 52b1d9d7cb8d70fc1137c26c4a38c530116802c4 6ca9461a1dbee5762220f0ae9e0b67c846d4feae f7491b2994157615032e80b5f10df5953ae0543a 5cedf3b5eb02c3050cb2e82d4602d63c565d4a7f
Closing out bug report. Thanks!
This bug was referenced in samba v4-20-stable (Release samba-4.20.0): bb68b730290c20fe0ecc6e72580fbab6ea674692 dc1616263034a78b6a43f9f89d59a8090f881a1b 4f38859f5d861a5f77c223ad720416b719e8e2f8 d0c295e5344d7858cf75e19184e3842de06f27ab f6d549de47c463905c5d95bc6556e2c7c4a25540 6e0986b2c30e78e0c9ffec62fb0666cd85dad316 52b1d9d7cb8d70fc1137c26c4a38c530116802c4 6ca9461a1dbee5762220f0ae9e0b67c846d4feae f7491b2994157615032e80b5f10df5953ae0543a 5cedf3b5eb02c3050cb2e82d4602d63c565d4a7f