Created attachment 16808 [details] smb.conf, network traces, java testcode Hello, after upgrading to Samba 4.11 I have noticed a problem with a processing software we are using. This software creates temporary directories with a log file, which are deleted after processing. Unfortunately, the deletion of the directory no longer works reliably since this version, it remains sporadically directories. For testing I compiled different versions and found out that the problem occurs only from Samba 4.11.0, with Samba 4.10.18 everything still works. Unfortunately this problem does not occur when I set the Samba log level to 10, probably it has something to do with timing. I was able to reproduce the behavior with a small Java program, it is attached. Also the smb.conf and a small network trace which shows the error (Samba 4.11.0) and one without folder deletion issue (Samba 4.10.18) Noticeable in tcpdump are the messages "Error: STATUS_FILE_IS_A_DIRECTORY". These only occur as starting from Samba 4.11.0. Tested with different Samba versions on Debian 10 Accessing system: Windows Server 2012R2 Java test with openjdk version 1.8.0_121 Regards Daniel
Please upgrade to latest 4.13 at least, the current version your are using is out of date and out of support. This is a bug which is fixed, cant directly recall which one, but its fixed. Your using Debian 10, i have amd64/i386 and armhf packages available. There are 4.13.10 and 4.14.7 packages are available for buster. Can you test agains these versions and report back. Import my key. wget -O /etc/apt/trusted.gpg.d/louis-van-belle.gpg-key.asc http://apt.van-belle.nl/louis-van-belle.gpg-key.asc setup the repo. Change arch if needed. Change buster-samba414 to buster-samba413 if you want to test 4.13.x echo "deb [arch=amd64 signed-by=/etc/apt/trusted.gpg.d/louis-van-belle.gpg] http://apt.van-belle.nl/debian/ buster-samba414 main" | sudo tee -a /etc/apt/sources.list.d/van-belle.list Install samba as normal with apt.
Hello, I have tested the behavior with different versions. Our production system runs SLES 15 SP3 with Samba 4.13.4 where the problem also occurs. I just wanted to show with my tests that the problem occurred between 4.10.18 and 4.11.0. So far I have tested the following versions: 3.6.3 ✔️ SLES 4.2.14 ✔️ SLES 4.9.5 ✔️ Debian 8 4.10.0 ✔️ Debian 10 4.11.0 ❌ Debian 10 4.11.1 ❌ Debian 10 4.11.2 ❌ Debian 10 4.12.0 ❌ Debian 10 4.11.14 ❌ Debian 10 4.13.4 ❌ SLES 15 4.13.5 ❌ Debian
(In reply to Daniel Pschierer from comment #2) Are you able to reproduce the issue on a Samba built from source? If so, can you reproduce this on master? If it does reproduce on master, are you willing to try a git bisect between samba 4.10.0rc1 and the samba 4.11 version you used to confirm which commit caused the issue? The build on master or at least Samba 4.15 is important as our fileserver has been under a big re-architecture and the issue might have changed since even Samba 4.13. Thanks!
(In reply to Andrew Bartlett from comment #3) I am able to reproduce the issue on a Samba built from source / master ( 4.16.0pre1-GIT-cf4a868be50 ) I will try to find the commit between Samba 4.10.0rc1 and 4.11.0 that causes the issue.
I have now built different versions. samba-4.11.0rc1 -> good samba-4.11.0rc2 -> good samba-4.11.0rc3 -> good samba-4.11.0rc4 -> bad Then I checked out and tested different commits between samba-4.11.0rc3 and rc4. The error occurs with the commit d887047aa0c2489d1d6251ffcb9ce083e86866e1. "vfs: restore stat fields in vfs_stat_fsp()" d887047aa0c --> bad b4aaa612d33 --> good
Awesome work there, thanks so much! With that info I'm sure one of our Fileserver developers will pick this up soon.
Hi Ralph, d887047aa0c2489d1d6251ffcb9ce083e86866e1 is a fix for durable handles reconnect problem. Seems really weird to me that this affects directory deletion, but there you go :-).
OK, I've spent some time taking a look at this, and the *client* is behaving differently in the "good" and "bad" cases, not the server. Take a look at: tcpdump_good_4.10.18.pcap, starting at packet 76. -------------------------------------------------------------------------- Just before this the client has created "Folder2" (packet 63), and then created file "Folder2\info.log" within it (packet 67) and then written into that file (packet 71) and then closed it (packet 73). So we have a directory "Folder2" containing a file "info.log". Now, in packet 76 the client tries to open "Folder2", but with create_options set to: Create Options: 0x00000060 .... .... .... .... .... .... .... ...0 = Directory: File being created/opened must not be a directory .... .... .... .... .... .... .... ..0. = Write Through: Writes need not flush buffered data before completing .... .... .... .... .... .... .... .0.. = Sequential Only: The file might not only be accessed sequentially .... .... .... .... .... .... .... 0... = Intermediate Buffering: Intermediate buffering is allowed .... .... .... .... .... .... ...0 .... = Sync I/O Alert: Operations NOT necessarily synchronous .... .... .... .... .... .... ..1. .... = Sync I/O Nonalert: All operations SYNCHRONOUS, waits not subject to alert .... .... .... .... .... .... .1.. .... = Non-Directory: File being created/opened must not be a directory So as it's a directory we respond in packet 77 with STATUS_FILE_IS_A_DIRECTORY and fail the open, as you'd expect. The client then sends a compound operation in packet 78 containing an SMB2_CREATE + SMB2_FIND(*) + SMB2_FIND(*) - trying to read the directory. Note the SMB2_CREATE in packet 78 has create options set to: Create Options: 0x00000021 .... .... .... .... .... .... .... ...1 = Directory: File being created/opened must be a directory .... .... .... .... .... .... .... ..0. = Write Through: Writes need not flush buffered data before completing .... .... .... .... .... .... .... .0.. = Sequential Only: The file might not only be accessed sequentially .... .... .... .... .... .... .... 0... = Intermediate Buffering: Intermediate buffering is allowed .... .... .... .... .... .... ...0 .... = Sync I/O Alert: Operations NOT necessarily synchronous .... .... .... .... .... .... ..1. .... = Sync I/O Nonalert: All operations SYNCHRONOUS, waits not subject to alert So the open succeeds and returns the info.log file in packet 79. The client then opens and deletes "info.log" (packets 82-97) and then opens and deletes "Folder2" in packets 88-93. All well and good (that's why it's the good trace :-). -------------------------------------------------------------------------------------------- Now look at tcpdump_bad_4.11.0.pcap. We see the client doing the same behavior. It creates "Folder2" (packet 70), and then creates file "Folder2\info.log" within it (packet 74) and then writes into that file (packet 78) and then closes it (packet 80). So again we have a directory "Folder2" containing a file "info.log". Now, in packet 83 the client tries to open "Folder2", but with create_options set to: Create Options: 0x00000060 .... .... .... .... .... .... .... ...0 = Directory: File being created/opened must not be a directory .... .... .... .... .... .... .... ..0. = Write Through: Writes need not flush buffered data before completing .... .... .... .... .... .... .... .0.. = Sequential Only: The file might not only be accessed sequentially .... .... .... .... .... .... .... 0... = Intermediate Buffering: Intermediate buffering is allowed .... .... .... .... .... .... ...0 .... = Sync I/O Alert: Operations NOT necessarily synchronous .... .... .... .... .... .... ..1. .... = Sync I/O Nonalert: All operations SYNCHRONOUS, waits not subject to alert .... .... .... .... .... .... .1.. .... = Non-Directory: File being created/opened must not be a directory and as expected we return in packet 84 with STATUS_FILE_IS_A_DIRECTORY and fail the open, as you'd expect. So far, exactly as the good trace. This is where it diverges. Now instead of redoing the open with different create options, the client then sends a second SMB2_CREATE for "Folder2" in packet 85, now with create options set to: Create Options: 0x00200040 .... .... .... .... .... .... .... ...0 = Directory: File being created/opened must not be a directory .... .... .... .... .... .... .... ..0. = Write Through: Writes need not flush buffered data before completing .... .... .... .... .... .... .... .0.. = Sequential Only: The file might not only be accessed sequentially .... .... .... .... .... .... .... 0... = Intermediate Buffering: Intermediate buffering is allowed .... .... .... .... .... .... ...0 .... = Sync I/O Alert: Operations NOT necessarily synchronous .... .... .... .... .... .... ..0. .... = Sync I/O Nonalert: Operations NOT necessarily synchronous .... .... .... .... .... .... .1.. .... = Non-Directory: File being created/opened must not be a directory .... .... .... .... .... .... 0... .... = Create Tree Connection: Create Tree Connections is NOT set .... .... .... .... .... ...0 .... .... = Complete If Oplocked: Complete if oplocked is NOT set .... .... .... .... .... ..0. .... .... = No EA Knowledge: The client understands extended attributes .... .... .... .... .... .0.. .... .... = 8.3 Only: The client understands long file names .... .... .... .... .... 0... .... .... = Random Access: The file will not be accessed randomly .... .... .... .... ...0 .... .... .... = Delete On Close: The file should not be deleted when it is closed .... .... .... .... ..0. .... .... .... = Open By FileID: OpenByFileID is NOT set .... .... .... .... .0.. .... .... .... = Backup Intent: This is a normal create .... .... .... .... 0... .... .... .... = No Compression: Compression is allowed for Open/Create .... .... ...0 .... .... .... .... .... = Reserve Opfilter: Reserve Opfilter is NOT set .... .... ..1. .... .... .... .... .... = Open Reparse Point: Open a Reparse Point Note, now it's asking to open a reparse point - NOT a directory. So again we fail the open in packet 86 with STATUS_FILE_IS_A_DIRECTORY. As far as I can see, in no previous replies have we told the client this is a reparse point and not a directory (I'll keep looking). So this looks like a client bug of some kind.
Is there a different version of the OpenJDK java you can try ? This certainly looks like an OpenJDK bug.
Thank you for the analysis. I just tested different Java versions, all behave the same, some folders are not deleted. Tested with the following versions: - openjdk "1.8.0_212" - openjdk "11.0.10" 2021-01-19 (BlueJ IDE) - JDK "17" 2021-09-14 LTS (Oracle) - jre 1.8.0_301 (Oracle)
Can you get me a wireshark trace of the same app run against a Windows server please ? I'd love to see if it issues the "open reparse point" request against a Windows server, and if it does, what error code (or success) is returned.
Created attachment 16852 [details] Windows Share Wireshark recording sure, I have attached a Wireshark recording. The share is on the same host, I hope that does not matter?
Oh, now *this* is an interesting trace (the Windows -> Windows one). The delete pattern for Folder2 starts with packet 308, SMB2_CREATE with: Create Options: 0x00000060 .... .... .... .... .... .... .... ...0 = Directory: File being created/opened must not be a directory .... .... .... .... .... .... .... ..0. = Write Through: Writes need not flush buffered data before completing .... .... .... .... .... .... .... .0.. = Sequential Only: The file might not only be accessed sequentially .... .... .... .... .... .... .... 0... = Intermediate Buffering: Intermediate buffering is allowed .... .... .... .... .... .... ...0 .... = Sync I/O Alert: Operations NOT necessarily synchronous .... .... .... .... .... .... ..1. .... = Sync I/O Nonalert: All operations SYNCHRONOUS, waits not subject to alert .... .... .... .... .... .... .1.. .... = Non-Directory: File being created/opened must not be a directory Returns NT_STATUS_FILE_IS_A_DIRECTORY in packet 310. It then opens Folder2 again in packet 312 with Create Options: 0x00000021 .... .... .... .... .... .... .... ...1 = Directory: File being created/opened must be a directory .... .... .... .... .... .... .... ..0. = Write Through: Writes need not flush buffered data before completing .... .... .... .... .... .... .... .0.. = Sequential Only: The file might not only be accessed sequentially .... .... .... .... .... .... .... 0... = Intermediate Buffering: Intermediate buffering is allowed .... .... .... .... .... .... ...0 .... = Sync I/O Alert: Operations NOT necessarily synchronous .... .... .... .... .... .... ..1. .... = Sync I/O Nonalert: All operations SYNCHRONOUS, waits not subject to alert So the open succeeds in packet 314. Packets 316-318 are listing the Folder2 contents, followed by a close in packet 320. Then we see in packet 324 an SMB2_CREATE for Folder2 for READ_ATTRIBUTES only with: Create Options: 0x00200000 .... .... .... .... .... .... .... ...0 = Directory: File being created/opened must not be a directory .... .... .... .... .... .... .... ..0. = Write Through: Writes need not flush buffered data before completing .... .... .... .... .... .... .... .0.. = Sequential Only: The file might not only be accessed sequentially .... .... .... .... .... .... .... 0... = Intermediate Buffering: Intermediate buffering is allowed .... .... .... .... .... .... ...0 .... = Sync I/O Alert: Operations NOT necessarily synchronous .... .... .... .... .... .... ..0. .... = Sync I/O Nonalert: Operations NOT necessarily synchronous .... .... .... .... .... .... .0.. .... = Non-Directory: File being created/opened must be a directory .... .... .... .... .... .... 0... .... = Create Tree Connection: Create Tree Connections is NOT set .... .... .... .... .... ...0 .... .... = Complete If Oplocked: Complete if oplocked is NOT set .... .... .... .... .... ..0. .... .... = No EA Knowledge: The client understands extended attributes .... .... .... .... .... .0.. .... .... = 8.3 Only: The client understands long file names .... .... .... .... .... 0... .... .... = Random Access: The file will not be accessed randomly .... .... .... .... ...0 .... .... .... = Delete On Close: The file should not be deleted when it is closed .... .... .... .... ..0. .... .... .... = Open By FileID: OpenByFileID is NOT set .... .... .... .... .0.. .... .... .... = Backup Intent: This is a normal create .... .... .... .... 0... .... .... .... = No Compression: Compression is allowed for Open/Create .... .... ...0 .... .... .... .... .... = Reserve Opfilter: Reserve Opfilter is NOT set .... .... ..1. .... .... .... .... .... = Open Reparse Point: Open a Reparse Point Note that this open *SUCCEEDS* !!!! and a valid handle is returned in packet 326, with the returned attributes set: File Attributes: 0x00000010 .... .... .... .... .... .... .... ...0 = Read Only: No .... .... .... .... .... .... .... ..0. = Hidden: No .... .... .... .... .... .... .... .0.. = System: No .... .... .... .... .... .... ...1 .... = Directory: Yes According to all published SMB2 specs, the open in packet 324 SHOULD HAVE FAILED ! It looks like doing an open for READ_ATTRIBUTES only (or maybe the 0x00200000 create options - open reparse point) is succeeding, even though it's not explicitly asking for a directory open.
Hmmm. Or maybe not... Looking at the code in git head, we can also open a directory as a file, in source3/smbd/open.c we will fail the directory open as the normal file code path with NT_STATUS_FILE_IS_A_DIRECTORY internally and then fall back into the open_directory() codepath. So this should also work against Samba..
OK, so the bug really seems to be the addition of the CreateOptions bit that states: Non-Directory: File being created/opened must not be a directory in the second open of Folder2. In both the "Good" Samba trace, and the Windows to Windows trace that bit is *NOT* being set.
Another note, I'm not sure if it's relevant but it occurs sporadically, the problem could also occur with folder0 or 1 or with all or none. With three loops it can be that all folders are deleted successfully, if I increase that to 10 folders always remain.
I wanted to ask about the current status, I'm not sure where we currently stand. Would you say it is a bug in Samba or rather in Java ? Is there anything else I can support?
If it works against Windows but doesn't against Samba, then by definition it's a Samba bug. However, I am at somewhat of a loss as to how to proceed on this. The client is behaving very strangely in setting the CreateOptions non-directory bit in some cases, but I don't know why. Do you have a link to the source code of the java runtime that you are using ? It might be time to dig into this to see if it's a java bug.
I just tested it again with a recent JDK 18 early access build https://download.java.net/java/early_access/jdk18/22/GPL/openjdk-18-ea+22_windows-x64_bin.zip here you can find the source code for the release: https://github.com/openjdk/jdk/releases/tag/jdk-18%2B22 the problem occurs also with this version
Finally ! I think I understand (and have just fixed) this in 4.14.next, 4.15.next. This is a duplicate bug of : https://bugzilla.samba.org/show_bug.cgi?id=14928 The problem is duplicate fileids returned by smbd. In tcpdump_bad4_4.11.0.pcap in packet 71 we have, as a return from SMB2_CREATE of Folder2: Folder2: Opaque File ID: c0c11932fe1c5198010800000000000000000000000000000000000000000000 and then in packet 75 we have, as a return from SMB2_CREATE of Folder2\info.log: Folder2\info.log: Opaque File ID: c0c11932fe1c5198010800000000000000000000000000000000000000000000 *THEY ARE THE SAME FILE ID* !!!! This is caused by insufficient time granularity in the underlying file system, so both Folder2 and Folder2\info.log are being returned as the same fileid (inode number). We used to generate this off the underlying file system inode, but this causes problems with MacOSX clients (curse them, curse them :-). So in 4.11.0 we converted to returning a fileid generated from the file birth time. What we missed is that many underlying filesystems have insufficient granularity in timestamp resolution, meaning that 2 back-to-back SMB2 creates will return files with the *same* birthtime, and so the same returned fileid generated from it. If you can check with latest (as yet unreleased) 4.15.next code (you'll have to pull from git) or 4.14.next or even top of tree master, I'm 100% certain this bug will go away. This one really was the canary in the coalmine that I missed :-).
*** This bug has been marked as a duplicate of bug 14928 ***
Tested it with the current sources from git. It works. The problem is fixed. Thanks a lot!