Creating a directory is a process that may take a bit longer, see mkdir_internal(). After the SMB_VFS_MKDIRAT() the directory is visible for other clients and they are able to create files in it, which then may end up inheriting the wrong permissions. The rclone smb support (see https://rclone.org/smb/) seems to create multiple connection without using multichannel, so has a high chance to hit the race... The solution would be to do let mkdir_internal() use a hidden tmp name for SMB_VFS_MKDIRAT() and call SMB_VFS_RENAMEAT at the end of the function. O_TMPFILE doesn't work for directories.
Created attachment 18388 [details] Work in progress patches for master
Created attachment 18389 [details] Work in progress patches for 4.17
Comment on attachment 18388 [details] Work in progress patches for master This fails the following tests, see https://gitlab.com/samba-team/devel/samba/-/pipelines/1398636685 smb2.create.mkdir-dup smb2.fileid samba3.blackbox.dfree_quota.SMB3 samba3.vfs.fruit
Ok, I think I've found a strategie for this without relying on renameat2(RENAME_NOREPLACE): We do this: We generate a tmpname and do all of mkdir_internal on it: mkdirat() openat(O_DIRECTORY) inherit permissions... And at the end of we rename it to the client provided name, but without renameat2(RENAME_NOREPLACE) it's racy... The first idea was to use this to avoid races: fstatat(orig_name) => OBJECT_NAME_COLLISION renameat(tmp_name, orig_name) But that turns out to too racy and causes existing tests like smb2.create.mkdir_dup to fail very often, while two smbd's report to their client they have created the directory. The good thing about renameat() is that it can only replace an existing directory if the source is also a directory, otherwise is returns ENOTDIR. The old logic used the fact that mkdirat() returns EEXIST if the path already exists, this basically acts as race protection. So instead of doing an fstatat(orig_path) we use mkdirat(orig_name, mode=0) in order to notice EEXIST. On success of mkdirat(orig_name) we know we created the directory and a renameat(tmp_name, orig_name) will only replace the directory we use created. The race window between mkdirat(orig_name) and renameat(tmp_name, orig_name) is much smaller and in that window the directory only exists with a mode of 0, so it's unlikely that another client can create subdirectories or files in it. The relevant strace will look like this: 2807042 14:17:14.016489 mkdirat(30, ".::TMPNAME:D:2807042%9718758468351611005:mkdir_dup", 0755 <unfinished ...> 2807042 14:17:14.016551 <... mkdirat resumed>) = 0 <0.000057> 2807041 14:17:14.016756 mkdirat(30, ".::TMPNAME:D:2807041%14857462290981720820:mkdir_dup", 0755 <unfinished ...> 2807041 14:17:14.016795 <... mkdirat resumed>) = 0 <0.000036> 2807042 14:17:14.017504 mkdirat(30, "mkdir_dup", 000 <unfinished ...> 2807042 14:17:14.017537 <... mkdirat resumed>) = 0 <0.000028> 2807042 14:17:14.017598 renameat(30, ".::TMPNAME:D:2807042%9718758468351611005:mkdir_dup", 30, "mkdir_dup" <unfinished ...> 2807042 14:17:14.017633 <... renameat resumed>) = 0 <0.000030> 2807041 14:17:14.017785 mkdirat(30, "mkdir_dup", 000 <unfinished ...> 2807041 14:17:14.017794 <... mkdirat resumed>) = -1 EEXIST (File exists) <0.000005> 2807041 14:17:14.017945 unlinkat(30, ".::TMPNAME:D:2807041%14857462290981720820:mkdir_dup", AT_REMOVEDIR <unfinished ...> 2807041 14:17:14.017960 <... unlinkat resumed>) = 0 <0.000011> When we have a kernel and filesystem with renameat2(RENAME_NOREPLACE) support we can close the race even further and the strace will look like this: 2809162 14:23:21.440813 mkdirat(30, ".::TMPNAME:D:2809162%8126942474193098692:mkdir_dup", 0755 <unfinished ...> 2809162 14:23:21.440867 <... mkdirat resumed>) = 0 <0.000050> 2809161 14:23:21.441154 mkdirat(30, ".::TMPNAME:D:2809161%4586615993619424924:mkdir_dup", 0755 <unfinished ...> 2809161 14:23:21.441190 <... mkdirat resumed>) = 0 <0.000033> 2809162 14:23:21.441762 renameat2(30, ".::TMPNAME:D:2809162%8126942474193098692:mkdir_dup", 30, "mkdir_dup", RENAME_NOREPLACE <unfinished ...> 2809162 14:23:21.441787 <... renameat2 resumed>) = 0 <0.000019> 2809161 14:23:21.442150 renameat2(30, ".::TMPNAME:D:2809161%4586615993619424924:mkdir_dup", 30, "mkdir_dup", RENAME_NOREPLACE <unfinished ...> 2809161 14:23:21.442167 <... renameat2 resumed>) = -1 EEXIST (File exists) <0.000011> 2809161 14:23:21.442305 unlinkat(30, ".::TMPNAME:D:2809161%4586615993619424924:mkdir_dup", AT_REMOVEDIR <unfinished ...> 2809161 14:23:21.442323 <... unlinkat resumed>) = 0 <0.000014>
This bug was referenced in samba master: f9d7a930f00b0ac8848d87b3ff07997c477de5f9 87aa3a46a3df36aeccdc5c93364fe1e71166c8fd c815128caa7acb0b1359b6a69e9033ce74d77c41 86952314036e5b6395f113c5968df794f5f95574 20431cc62272f228023297a09eb256263184fa63 30ddbe4611c39af40ce946d6e5693ee07553df5b 75fe450f98c35095007a867de45286bbbd4251b0 23f85e60ec9e806a82d2c33e905be9bad9c26db7 5d077cd442116cc2304f4eba67e140be80c8db7a f8be83a0a3302bc1802a34c03c49a774c56d7194 460e280d3af70c38b04219e414ea3b6439caaed6 7baeeece2dd563d404352e5f3e34661d1cbc2b03 f6550e804ea5604b1d7ec67e629fad8baee87f8b 5b305d1fbb2778f8c8dd2873e79f28a27d8ce581 adc8dea944bd26186814bf3dae9a300957532d93 fe8b4617ddc1663d57ab5b0ffdeedb413e19a506 1bacaae5261b157f0dc4efb68fdc82d4cf387eb9 3790d0d3b989d7edbe34888bb068f54f0cb1282d
Will be fixed in 4.22