Created attachment 15947 [details] Hex-dump showing the zeroing of data as compared to the correct file. This is a follow-up bug report which was initially reported on the mailing lists https://lists.samba.org/archive/samba/2020-April/229372.html In my home setup i experience data loss/corruption when the "vfs module = io_uring" is enabled. Current setup: * Linux-kernel 5.6.7 (also tested with 5.7-rc3) * Samba 4.12.2 (also tested with 4.12.1) * liburing 0.6 (also tested with 0.5) * gcc 9.3.0 * glibc 2.30 * Two Windows 10 clients * 1 Gbit Ethernet connection #### smb.conf #### [global] log level = 1 workgroup = WORKGROUP netbios name = SAMBA server string = Samba Server server role = standalone server hosts allow = 192.168.0. 127. interfaces = lan max protocol = SMB3_11 # max protocol = SMB2 log file = /var/log/samba/%I.log max log size = 10240 security = user passdb backend = tdbsam wins support = yes dns proxy = yes [share_io_uring_ext4] comment = Samba share with io_uring path = /mnt/ext4 browseable = yes read only = yes guest only = Yes guest ok = Yes vfs objects = io_uring [share_no_io_uring_ext4] comment = Samba share without io_uring path = /mnt/ext4 browseable = yes read only = yes guest only = Yes guest ok = Yes vfs objects = ######## There is 1000 copies of the source-test-file.bin in /mnt/ext4/test2-ro/ named 0.bin, 1.bin ... 1000.bin * From a Windows File Explorer I browse to each of the two shares \\samba\share_io_uring_ext4\test2-ro and \\samba\share_io_uring_ext4\test2-ro * Copy all files to a local disk. This was done several times on two different Windows 10 clients. * Verify the file integrity with sha256sum. * Files from the share with no io_uring all were OK. * Files from the share with io_uring enabled had small portions of the files zeroed out. The test was redone on different file-systems: * 1TB USB3 disk with Btrfs * 240GB SSD SATA disk with Btrfs * 160GB USB2 disk with ext4 This is "iotop -o" when copying from the no io_uring share: https://paste.tnonline.net/files/G1impe8EHhUU_no_io_uring_copy.png ###### NO IO_URING ###### # iotop -o Total DISK READ : 86.32 M/s | Total DISK WRITE : 29.76 K/s Actual DISK READ: 88.13 M/s | Actual DISK WRITE: 22.32 K/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 29898 be/4 nasuser 10.06 M/s 0.00 B/s 0.00 % 25.41 % smbd -D 29894 be/4 nasuser 9.53 M/s 0.00 B/s 0.00 % 25.12 % smbd -D 29893 be/4 nasuser 10.58 M/s 0.00 B/s 0.00 % 22.86 % smbd -D 29892 be/4 nasuser 13.40 M/s 0.00 B/s 0.00 % 19.01 % smbd -D 29891 be/4 nasuser 10.73 M/s 0.00 B/s 0.00 % 18.57 % smbd -D 29895 be/4 nasuser 9.44 M/s 0.00 B/s 0.00 % 18.36 % smbd -D 29897 be/4 nasuser 11.11 M/s 0.00 B/s 0.00 % 17.86 % smbd -D 29896 be/4 nasuser 11.48 M/s 0.00 B/s 0.00 % 17.08 % smbd -D 29816 be/4 root 0.00 B/s 29.76 K/s 0.00 % 0.00 % [kworker/u64:8-btrfs-endio] ############ This is "iotop -o" when copying from the with io_uring share: https://paste.tnonline.net/files/HCv5TZtNDd7h_with_io_uring_copy.png ###### WITH IO_URING ###### # iotop -o Total DISK READ : 77.47 M/s | Total DISK WRITE : 0.00 B/s Actual DISK READ: 77.61 M/s | Actual DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 29984 be/4 root 8.36 M/s 0.00 B/s 0.00 % 20.21 % [io_wqe_worker-1] 29986 be/4 root 9.55 M/s 0.00 B/s 0.00 % 19.44 % [io_wqe_worker-1] 29983 be/4 root 4.37 M/s 0.00 B/s 0.00 % 18.47 % [io_wqe_worker-0] 29981 be/4 root 9.26 M/s 0.00 B/s 0.00 % 17.94 % [io_wqe_worker-0] 29980 be/4 root 5.53 M/s 0.00 B/s 0.00 % 17.85 % [io_wqe_worker-0] 29982 be/4 root 5.12 M/s 0.00 B/s 0.00 % 17.46 % [io_wqe_worker-1] 29987 be/4 root 11.75 M/s 0.00 B/s 0.00 % 16.38 % [io_wqe_worker-0] 29985 be/4 root 6.53 M/s 0.00 B/s 0.00 % 15.81 % [io_wqe_worker-0] 29820 be/4 nasuser 17.00 M/s 0.00 B/s 0.00 % 4.18 % smbd -D ############ I noticed that when copying from the slow 160GB USB2 disk, only one [io_wqe_worker-0] was running. Also during that test, the file corruptions ended up at the end of the files, whereas during the other tests they were at more random locations. * All files in the test are uploaded to https://mirrors.tnonline.net/?dir=samba * Source file used: https://mirrors.tnonline.net/samba/test2-ro_no_io_uring.7z * Resulting copied files with errors from first test: https://mirrors.tnonline.net/samba/test2-ro_with_io_uring.7z Samba log files do not contain anything useful that I can see: #### 192.168.0.10.log #### [2020/05/02 13:51:30.722686, 1] ../../source3/param/loadparm.c:2516(lp_idmap_range) idmap range not specified for domain '*' [2020/05/02 13:51:31.225355, 1] ../../source3/param/loadparm.c:2516(lp_idmap_range) idmap range not specified for domain '*' [2020/05/02 14:24:02.381203, 1] ../../source3/printing/printer_list.c:234(printer_list_get_last_refresh) Failed to fetch record! [2020/05/02 14:24:02.381314, 1] ../../source3/smbd/server_reload.c:66(delete_and_reload_printers) pcap cache not loaded [2020/05/02 14:24:16.078717, 1] ../../source3/printing/printer_list.c:234(printer_list_get_last_refresh) Failed to fetch record! [2020/05/02 14:24:16.078762, 1] ../../source3/smbd/server_reload.c:66(delete_and_reload_printers) ######## #### log.smbd #### [2020/05/02 11:20:14.680521, 0] ../../source3/smbd/server.c:1782(main) smbd version 4.12.2 started. Copyright Andrew Tridgell and the Samba Team 1992-2020 ######## #### log.nmbd #### [2020/05/02 11:20:14.716863, 0] ../../source3/nmbd/nmbd.c:960(main) nmbd version 4.12.2 started. Copyright Andrew Tridgell and the Samba Team 1992-2020 ######## Just to summarise: * Any share that has "VFS module = io_uring" enabled can cause data corruption with the Windows clients. * Shares that do not have the io_uring module all work fine 100% of the time. This I have verified with several TB of data. * Using "smbclient mget" cause no errors with io_uring. * Mounting the share in a Linux client with mount -t cifs shows no errors when copying the files. * Only one [io_wqe_worker-0] is every used when using Linux clients. Could this point to where the problem might be? For example some kind of read-ahead or other feature that cifs/smbclient does not use?
The only think I can think of right now is the kernel version. I'm running Ubuntu 19.04: 5.3.0-51-generic #44-Ubuntu SMP Wed Apr 22 21:09:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux and I can't make this setup corrupt any files on a standard ext4 partition. Can you spin up a VM running this kernel and see if it reproduces for you there ?
Until someone else reports the same problem, this is a unicorn. It only happens for you on your home system. Do you have access to other systems you can test with ? We need to reproduce this somewhere else other than on your single system in order to make any progress.
I just did this again whilst adding debug statements into the io_uring vfs module to make sure I was going through that code. I am. My Windows 10 client is only copying one file at a time though, which can be seen as I log the fd, offset, len for each io_uring request/reply. As I said, *how* the client copies the files is important. If you can get me a command-line mechanism used from Windows powershell that duplicates the problem this might help greatly. Relying on the UI copy/paste mechanism is the very essence of non-reproducibility. You and I might be doing very different copy mechanisms and we'd never know because all you're telling me is "I used the UI to copy". Get me a powershell script please, then we can talk.
Just managed to make the Windows 10 client use 2 threads by running 2 powershell windows doing a command line: copy *.bin . command. Still no difference in SHA256 hashes :-(.
I was able to re-produce the error using a VM using the same kernel and samba versions as on my NAS box. I will do some further testing with older kernels and see if the problem is repeatable with them. Regarding Wireshark. The capture file will be rather large, is there any specific I should look for in it?
(In reply to mail from comment #5) You're still not giving me a command-line reproducible mechanism to reproduce this. Using the Windows GUI isn't going to work for reproducible testing (unless it happens every time, which it doesn't for me). Let me know when you have a reproducible case I can look at (and by that I mean reproducible elsewhere :-).
(In reply to Jeremy Allison from comment #6) While I agree that it is easier to define repoducibility with a shell script, I feel I must point out that 99.9% of all Windows users use File Explorer for file-related tasks, not powershell or the command prompt. One reason for doing testing in a VM is that it can be exported and tested in other locations - as you rightly pointed out that we need to verify this elsewhere too. Lastly I would like to mention that I am just a end-user and not a software developer. I provide information as best I can and to the extent it is possible in this situation, and I would not spend many hours providing as detailed reports as I can unless I believed this was a real issue. Remember turning off "vfs modules = io_uring" makes everything work 100% of the time. With Windows File Explorer too. I do not think many use io_uring by default. Ubuntu does not shop Samba 4.12. Fedora does ship Samba 4.12.1, but has omitted io_uring.so from the install packages. The userbase for io_uring is probably rather small in a relative sense. PowerShell is not something I am comfortable with. Do you have an example here?
(In reply to mail from comment #5) Regarding the network capture Jeremy has asked for: We don't really mind how large it is - if the data is a simple pattern then a modern compression algorithm will make it tiny again. Eg tar with the --xz option is a good choice. I know it is very frustrating to be asked to assist at the edge of your capabilities, but similarly it is very, very frustrating for Jeremy to have a data corruption issue that he simply can't solve! Data corruption is, after major security dramas, the single thing that keeps a Samba developer up at night, that they can't and won't just let sit unresolved. So we do appreciate you continuing to help us narrow this down. The network capture really will help. Clearly there is something (and there are so many things it could be) different between what you are seeing and what Jermey can reproduce, we need to work out what they are. Anything at all that can fully automate the issue would be a big step forward, at least to reliably show that you and Jeremy are performing the same steps, against the same Samba, but (sadly) getting different results.
This is what I have done so far: 1) Install Fedora 32 Workstation in VirtualBox (on Windows 10) 2) Created new test files with binary 11111111 (0xFF) repeated. 3) Copiled Samba 4.12.2 from source(*) into /usr/local/samba 4) Set up a minimal smb.conf #### smb.conf #### [global] workgroup = SAMBA security = user passdb backend = tdbsam load printers = no [io_uring] comment = TEST share with IO_URING path = /media/test browseable = yes read only = yes guest only = Yes guest ok = Yes vfs objects = io_uring [no_io_uring] comment = TEST share without IO_URING path = /media/test browseable = yes read only = yes guest only = Yes guest ok = Yes vfs objects = ######## 5) Added local user with "smbpasswd -a fedora" password: fedora 6) Started Samba by running as root "/usr/local/samba/sbin/smbd" 7) Connect to the samba share using File Explorer by browsing to \\192.168.0.130\io_uring\ (user:fedora pass:fedora) 7) In PowerShell issued the following commands: ##### PS E:\test> cp \\192.168.0.130\io_uring\test-FF\*.bin . PS E:\test> .\sha256sum.exe 1.bin 6f19f1799f73c875745c43af0b7d6a4fd3078688762a3457ed13a5fa7b3ed79c *1.bin ##### 8) sha256sum is not correct. At position 0x8000 the file has a zeroed block until 0x10000. Most of the files were like this. 9) Rebooted the VirtualBox machine 10 Tested powershell again: ##### PS E:\test> cp \\192.168.0.130\io_uring\test-FF\*.bin . PS E:\test> .\sha256sum.exe 1.bin 7899a615e333c749b0204beb73adf1b3304405f592f716872c8fbdbd4be82ed9 *1.bin PS E:\test> .\sha256sum.exe 15.bin 8702a06044f71e08d7351a7d07ab67522f907ee5e4a9c805c18b6ea1dc8fad3b *15.bin PS E:\test> ##### 11) This time the 1.bin was correct, but 15.bin was not. I will upload the VirtualBox file to https://mirrors.tnonline.net/?dir=samba in a few minutes. It contains the test files in smb-test.vdi, which is mounted as /media/test *.a) https://wiki.samba.org/index.php/Build_Samba_from_Source *.b) liburing-devel needs to be installed before configure the source files. This was not noted in the wiki.
Thank you so much. One other question comes to mind. Can you give us the exact Windows 10 build numbers, versions, etc?
https://mirrors.tnonline.net/?dir=samba/virtualbox is the complete VirtualBox image. I have now tested this image on my works computer which also gives the corruptions. Windows 10 Enterprise builds: * 1809, 17763.1158 * 1909, 18363.778
(In reply to mail from comment #11) The Windows 1909 computer is a AMD Ryzen 2700X 8 core desktop. The Windows 1809 computer is a Intel i7-8850H 6 core laptop.
Created attachment 15955 [details] Patch for debugging Can you apply this patch create add the "io_uring:check_ff=yes" option. "log level = 1" would also be good. Then please upload the resulting log file (typically log.smbd).
(In reply to Stefan Metzmacher from comment #13) Please make sure you have "max log size = 0"
Anoop noew reproduced this. The underlying issue is that Windows clients don't deal with short read returns over SMB2 at least and ignore the SMB2_READ.DataLength reply field. I have a simple patch to the smbd server that shows this.
From the mailing list: ----------------------------------------------------------------------------------- Am 06.05.20 um 14:41 schrieb Anoop C S: > On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote: >> Hi Anoop, >> >>> I could reproduce the difference in SHA256 checksum after copying a >>> directory with 100 copies of test file(provided by reporter) from >>> io_uring VFS module enabled share using Windows explorer(right- >>> click- >>>> copy/paste). Only 5 out of 100 files had correct checksum after >>>> copy >>> operation :-/ >> >> Great! Can you please try to collect level 1 log files with >> the patch https://bugzilla.samba.org/attachment.cgi?id=15955 >> applied? > > I have attached three log files. > log.io_uring.smbd -- Copy using Windows explorer > log.io_uring-mget.smd -- Copy using smbclient > log.io_uring-powershell.smd -- Copy using `Copy-Item` Thanks! All of them show short reads like: > [2020/05/06 17:27:28.130248, 1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req) > vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin + fnum 1607026405 > [2020/05/06 17:27:28.131049, 1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req) > vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680) nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=1953 +6 dir/1.bin fnum 1607026405 > [2020/05/06 17:27:28.133679, 1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req) > vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 di +r/1.bin fnum 1607026405 > [2020/05/06 17:27:28.140184, 0] ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req) > vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576 (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=195 +36 dir/1.bin fnum 1607026405 It seems the first request is at ofs=0 len=32768 (0x8000) and we get 32768 back. A follow up request with ofs=0 len=1048576 (0x100000) only gets the first 32768 bytes which are already in the page cache.
Just confirmed, MacOS catalina clients suffer from the same problem with ignoring short reads.
Created attachment 15960 [details] PROTOTYPE !!! wip fix for master. Here is a *MASSIVELY* prototype fix for the io_uring short read problem. It still is very ugly and contains extra debugs etc. It works for me here if I enable the JRATEST code that forces short reads if read_len > 4096, but I never was able to reproduce the Windows issue (my kernel isn't modern enough). Can you test it on your Windows environment and see if it fixes the SHA256 checksum problem after copying the test files ? If so I'll fix it up to be production-ready, (for example I think I can get rid of the immediate event useage) fix the pwrite case and then cut it onto bite-sized reviewable chunks.
Comment from Anoop on the samba-technical list: Your patch fixes mismatch in SHA256 checksum of 100 bin files copied using Windows explorer in my environment.
Created attachment 15961 [details] git-am fix for master. Cleaned up version that fixes io_uring short pread. If Anoop confirms I'll add in the short pwrite fixes.
Created attachment 15962 [details] WIP patch for master Sorry Jeremy, with your changes the common code is too complex. I hope your're as happy as I am with my version :-) I think it's much clearer. It includes your generic short read patchset with the new sys_valid_io_range() helper. Regarding the pwrite and short writes we need more research as well as tests for the offset < 0 case. I also included a HACK/DEBUG patch, that adds a few parametric options. The io_uring:nowait_pread=yes option should be able to trigger the problem even with 5.3/5.4 kernels. The io_uring:truncate_pread=4096 option should be able to test the retries reliable. The io_uring:force_async_pread[_retry]=yes options should allow us to play a bit with the IOSQE_ASYNC flag, which may alter the kernel behavior and using this on the retry may avoid more than 1 retry.
Created attachment 15964 [details] WIP patch for master
(In reply to Jeremy Allison from comment #18) I applied this patch to the master git://git.samba.org/samba.git In my initial tests, the data corruptions seems to be fixed. I'll do some more testing over the weekend. Thank you! EDIT: Saw the new patches, will apply those in further testing over the weekend.
(In reply to Stefan Metzmacher from comment #22) It would be good to get clarification from dochelp on the matter and to get documented how client and server are supposed to handle this.
(In reply to Stefan Metzmacher from comment #21) I do think your patches are nice. However, my patch has the small advantage that it actually works as confirmed by the testers :-) :-). So I think marking that patch "obsolete" was a little premature :-). Having said that, your code is cleaner so once you've gotten it working I'm happy for that to be the final version. I'm not invested in my patchset, I learned a lot about how io_uring works and that was the big plus for me. I hope that this now has your full attention, as currently we're shipping 4.12.x with a vfs module that corrupts data, which IMHO is the most important fix we have right now to get finished.
(In reply to Stefan Metzmacher from comment #21) > Sorry Jeremy, with your changes the common code is too complex. Also, you have to remember I was learning how your code actually worked whilst trying to fix it. My goal, as always, was to find the "minimum necessary change" in order to fix the problem, so making large rewrites wasn't something I thought was desirable. I also had the idea of adding some "test" parameters into the module so we could emulate the buggy kernel behavior on non-buggy kernels (and yes, I consider the change in kernel io_uring behavior to be a bug as it exposes great changes in how the original code behaved to userspace). As you wrote the original code, doing large cleanup fixes isn't so much of a hurdle for you. Your new code does make it simpler to understand, which is a big plus IMHO. Let's just get it working now :-).
(In reply to Stefan Metzmacher from comment #22) Ok, here is the issue I see with this WIP patch (copied from the mailing list so we have a record here): I see the problem. It's the reason I initially introduced an immediate event, and then refactored my code to call _vfs_io_uring_queue_run() in a loop if it returned 'need_retry'. Metze's current code is recursive, and it's running out of stack. It uses: vfs_io_uring_pread_send() ->vfs_io_uring_pread_submit() <----------------------------------- ->vfs_io_uring_request_submit() | ->vfs_io_uring_queue_run() | | But inside vfs_io_uring_queue_run() once a read | completes it calls: | | vfs_io_uring_queue_run() | ->vfs_io_uring_finish_req() | ->cur->completion_fn() | | cur->completion_fn() for pread is set to vfs_io_uring_pread_completion() | | vfs_io_uring_pread_completion() | ->vfs_io_uring_pread_submit() ------------------------------------- (the ascii art will probably only work in fixed-font text mailers but you should get the idea). I don't think this pattern will work. You have to exit from vfs_io_uring_queue_run() all the way out so that you've called io_uring_cq_advance() to move the completion queue forward before you can submit another request to finish the short IO. That's what my initial patch did with an immedate event, until I got the idea of wrapping vfs_io_uring_queue_run() inside a retry wrapper function. I'll take a look to see if I can fix this so the short read re-submission happens *outside* of vfs_io_uring_queue_run(), as I think that's the only way to make this work.
Created attachment 15965 [details] WIP patch for master
Created attachment 15969 [details] Possible patch for master (should also apply to 4.12) Can someone please verify this (most likely final) patch? Thanks! A pipeline is running at https://gitlab.com/samba-team/samba/-/merge_requests/1328 But that doesn't test the io_uring module (yet).
Comment on attachment 15969 [details] Possible patch for master (should also apply to 4.12) Missing the expanded comments/explaination on the recursion fix. Can you re-add those ?
Comment on attachment 15969 [details] Possible patch for master (should also apply to 4.12) Missing the recursion comments and explaination. Can you re-add those please ?
Created attachment 15971 [details] Possible patch for master (should also apply to 4.12) - comment updated Updated version with restored comments.
Created attachment 15976 [details] git-am fix for 4.12.next. Clean cherry-pick from master.
Pushed to autobuild-v4-12-test.
(In reply to Karolin Seeger from comment #34) Pushed to v4-12-test. Closing out bug report. Thanks!
On 2020-05-17 07:16, samba-bugs@samba.org wrote: > https://bugzilla.samba.org/show_bug.cgi?id=14361 > > Karolin Seeger <kseeger@samba.org> changed: > > What |Removed |Added > ---------------------------------------------------------------------------- > Resolution|--- |FIXED > Status|ASSIGNED |RESOLVED > > --- Comment #35 from Karolin Seeger <kseeger@samba.org> --- > (In reply to Karolin Seeger from comment #34) > Pushed to v4-12-test. > Closing out bug report. > > Thanks! > The new samba-4.12.3 which has this fix passes my tests. So thank you very much :) Have a nice weekend!