Bug 14361 - vfs_io_uring sometimes cause data corruption with Windows clients
Summary: vfs_io_uring sometimes cause data corruption with Windows clients
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: VFS Modules (show other bugs)
Version: 4.12.2
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-05-02 12:53 UTC by mail
Modified: 2020-05-23 17:14 UTC (History)
8 users (show)

See Also:


Attachments
Hex-dump showing the zeroing of data as compared to the correct file. (127.76 KB, image/png)
2020-05-02 12:53 UTC, mail
no flags Details
Patch for debugging (4.37 KB, patch)
2020-05-06 10:27 UTC, Stefan Metzmacher
no flags Details
PROTOTYPE !!! wip fix for master. (8.57 KB, patch)
2020-05-08 04:08 UTC, Jeremy Allison
no flags Details
git-am fix for master. (14.81 KB, patch)
2020-05-08 06:40 UTC, Jeremy Allison
no flags Details
WIP patch for master (52.32 KB, patch)
2020-05-08 12:50 UTC, Stefan Metzmacher
no flags Details
WIP patch for master (52.28 KB, patch)
2020-05-08 13:03 UTC, Stefan Metzmacher
no flags Details
WIP patch for master (60.28 KB, patch)
2020-05-10 08:59 UTC, Stefan Metzmacher
no flags Details
Possible patch for master (should also apply to 4.12) (66.44 KB, patch)
2020-05-11 20:58 UTC, Stefan Metzmacher
jra: review-
Details
Possible patch for master (should also apply to 4.12) - comment updated (70.07 KB, patch)
2020-05-11 23:41 UTC, Jeremy Allison
no flags Details
git-am fix for 4.12.next. (73.31 KB, patch)
2020-05-12 21:42 UTC, Jeremy Allison
metze: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description mail 2020-05-02 12:53:48 UTC
Created attachment 15947 [details]
Hex-dump showing the zeroing of data as compared to the correct file.

This is a follow-up bug report which was initially reported on the mailing lists https://lists.samba.org/archive/samba/2020-April/229372.html

In my home setup i experience data loss/corruption when the "vfs module = io_uring" is enabled.

Current setup:

* Linux-kernel 5.6.7 (also tested with 5.7-rc3)
* Samba 4.12.2 (also tested with 4.12.1)
* liburing 0.6 (also tested with 0.5)
* gcc 9.3.0
* glibc 2.30
* Two Windows 10 clients
* 1 Gbit Ethernet connection

#### smb.conf ####
[global]
    log level = 1
    workgroup = WORKGROUP
    netbios name = SAMBA
    server string = Samba Server
    server role = standalone server
    hosts allow = 192.168.0. 127.
    interfaces = lan
    max protocol = SMB3_11
#   max protocol = SMB2

    log file = /var/log/samba/%I.log
    max log size = 10240

    security = user
    passdb backend = tdbsam
    wins support = yes
    dns proxy = yes

[share_io_uring_ext4]
    comment = Samba share with io_uring
    path = /mnt/ext4
    browseable = yes
    read only = yes
    guest only = Yes
    guest ok = Yes
    vfs objects = io_uring

[share_no_io_uring_ext4]
    comment = Samba share without io_uring
    path = /mnt/ext4
    browseable = yes
    read only = yes
    guest only = Yes
    guest ok = Yes
    vfs objects =
########

There is 1000 copies of the source-test-file.bin in /mnt/ext4/test2-ro/ named 0.bin, 1.bin ... 1000.bin

* From a Windows File Explorer I browse to each of the two shares \\samba\share_io_uring_ext4\test2-ro and \\samba\share_io_uring_ext4\test2-ro
* Copy all files to a local disk. This was done several times on two different Windows 10 clients.
* Verify the file integrity with sha256sum.

* Files from the share with no io_uring all were OK.
* Files from the share with io_uring enabled had small portions of the files zeroed out. 

The test was redone on different file-systems:
* 1TB USB3 disk with Btrfs
* 240GB SSD SATA disk with Btrfs
* 160GB USB2 disk with ext4

This is "iotop -o" when copying from the no io_uring share:
https://paste.tnonline.net/files/G1impe8EHhUU_no_io_uring_copy.png

###### NO IO_URING ######
# iotop -o
Total DISK READ :      86.32 M/s | Total DISK WRITE : 29.76 K/s
Actual DISK READ:      88.13 M/s | Actual DISK WRITE: 22.32 K/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN IO>    COMMAND
29898 be/4 nasuser    10.06 M/s    0.00 B/s  0.00 % 25.41 % smbd -D
29894 be/4 nasuser     9.53 M/s    0.00 B/s  0.00 % 25.12 % smbd -D
29893 be/4 nasuser    10.58 M/s    0.00 B/s  0.00 % 22.86 % smbd -D
29892 be/4 nasuser    13.40 M/s    0.00 B/s  0.00 % 19.01 % smbd -D
29891 be/4 nasuser    10.73 M/s    0.00 B/s  0.00 % 18.57 % smbd -D
29895 be/4 nasuser     9.44 M/s    0.00 B/s  0.00 % 18.36 % smbd -D
29897 be/4 nasuser    11.11 M/s    0.00 B/s  0.00 % 17.86 % smbd -D
29896 be/4 nasuser    11.48 M/s    0.00 B/s  0.00 % 17.08 % smbd -D
29816 be/4 root        0.00 B/s   29.76 K/s  0.00 %  0.00 % [kworker/u64:8-btrfs-endio]
############


This is "iotop -o" when copying from the with io_uring share:
https://paste.tnonline.net/files/HCv5TZtNDd7h_with_io_uring_copy.png

###### WITH IO_URING ######
# iotop -o
Total DISK READ :      77.47 M/s | Total DISK WRITE : 0.00 B/s
Actual DISK READ:      77.61 M/s | Actual DISK WRITE: 0.00 B/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN IO>    COMMAND
29984 be/4 root        8.36 M/s    0.00 B/s  0.00 % 20.21 % [io_wqe_worker-1]
29986 be/4 root        9.55 M/s    0.00 B/s  0.00 % 19.44 % [io_wqe_worker-1]
29983 be/4 root        4.37 M/s    0.00 B/s  0.00 % 18.47 % [io_wqe_worker-0]
29981 be/4 root        9.26 M/s    0.00 B/s  0.00 % 17.94 % [io_wqe_worker-0]
29980 be/4 root        5.53 M/s    0.00 B/s  0.00 % 17.85 % [io_wqe_worker-0]
29982 be/4 root        5.12 M/s    0.00 B/s  0.00 % 17.46 % [io_wqe_worker-1]
29987 be/4 root       11.75 M/s    0.00 B/s  0.00 % 16.38 % [io_wqe_worker-0]
29985 be/4 root        6.53 M/s    0.00 B/s  0.00 % 15.81 % [io_wqe_worker-0]
29820 be/4 nasuser    17.00 M/s    0.00 B/s  0.00 %  4.18 % smbd -D
############ 

I noticed that when copying from the slow 160GB USB2 disk, only one [io_wqe_worker-0] was running. Also during that test, the file corruptions ended up at the end of the files, whereas during the other tests they were at more random locations. 

* All files in the test are uploaded to https://mirrors.tnonline.net/?dir=samba
* Source file used: https://mirrors.tnonline.net/samba/test2-ro_no_io_uring.7z
* Resulting copied files with errors from first test: https://mirrors.tnonline.net/samba/test2-ro_with_io_uring.7z


Samba log files do not contain anything useful that I can see:
#### 192.168.0.10.log ####
[2020/05/02 13:51:30.722686,  1] ../../source3/param/loadparm.c:2516(lp_idmap_range)
  idmap range not specified for domain '*'
[2020/05/02 13:51:31.225355,  1] ../../source3/param/loadparm.c:2516(lp_idmap_range)
  idmap range not specified for domain '*'
[2020/05/02 14:24:02.381203,  1] ../../source3/printing/printer_list.c:234(printer_list_get_last_refresh)
  Failed to fetch record!
[2020/05/02 14:24:02.381314,  1] ../../source3/smbd/server_reload.c:66(delete_and_reload_printers)
  pcap cache not loaded
[2020/05/02 14:24:16.078717,  1] ../../source3/printing/printer_list.c:234(printer_list_get_last_refresh)
  Failed to fetch record!
[2020/05/02 14:24:16.078762,  1] ../../source3/smbd/server_reload.c:66(delete_and_reload_printers)
########
#### log.smbd ####
[2020/05/02 11:20:14.680521,  0] ../../source3/smbd/server.c:1782(main)
  smbd version 4.12.2 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2020
########
#### log.nmbd ####
[2020/05/02 11:20:14.716863,  0] ../../source3/nmbd/nmbd.c:960(main)
  nmbd version 4.12.2 started.
  Copyright Andrew Tridgell and the Samba Team 1992-2020
########

Just to summarise:
* Any share that has "VFS module = io_uring" enabled can cause data corruption with the Windows clients.
* Shares that do not have the io_uring module all work fine 100% of the time. This I have verified with several TB of data.
* Using "smbclient mget" cause no errors with io_uring. 
* Mounting the share in a Linux client with mount -t cifs shows no errors when copying the files. 
* Only one [io_wqe_worker-0] is every used when using Linux clients. Could this point to where the problem might be? For example some kind of read-ahead or other feature that cifs/smbclient does not use?
Comment 1 Jeremy Allison 2020-05-02 18:38:00 UTC
The only think I can think of right now is the kernel version.

I'm running Ubuntu 19.04:

5.3.0-51-generic #44-Ubuntu SMP Wed Apr 22 21:09:44 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

and I can't make this setup corrupt any files on a standard ext4 partition. Can you spin up a VM running this kernel and see if it reproduces for you there ?
Comment 2 Jeremy Allison 2020-05-02 18:41:02 UTC
Until someone else reports the same problem, this is a unicorn. It only happens for you on your home system.

Do you have access to other systems you can test with ? We need to reproduce this somewhere else other than on your single system in order to make any progress.
Comment 3 Jeremy Allison 2020-05-02 20:33:54 UTC
I just did this again whilst adding debug statements into the io_uring vfs module to make sure I was going through that code. I am.

My Windows 10 client is only copying one file at a time though, which can be seen as I log the fd, offset, len for each io_uring request/reply.

As I said, *how* the client copies the files is important.

If you can get me a command-line mechanism used from Windows powershell that duplicates the problem this might help greatly. Relying on the UI copy/paste mechanism is the very essence of non-reproducibility. You and I might be doing very different copy mechanisms and we'd never know because all you're telling me is "I used the UI to copy".

Get me a powershell script please, then we can talk.
Comment 4 Jeremy Allison 2020-05-02 20:50:48 UTC
Just managed to make the Windows 10 client use 2 threads by running 2 powershell windows doing a command line: copy *.bin . command.

Still no difference in SHA256 hashes :-(.
Comment 5 mail 2020-05-03 08:45:25 UTC
I was able to re-produce the error using a VM using the same kernel and samba versions as on my NAS box. I will do some further testing with older kernels and see if the problem is repeatable with them.

Regarding Wireshark. The capture file will be rather large, is there any specific I should look for in it?
Comment 6 Jeremy Allison 2020-05-03 18:26:43 UTC
(In reply to mail from comment #5)

You're still not giving me a command-line reproducible mechanism to reproduce this.

Using the Windows GUI isn't going to work for reproducible testing (unless it happens every time, which it doesn't for me).

Let me know when you have a reproducible case I can look at (and by that I mean reproducible elsewhere :-).
Comment 7 mail 2020-05-04 07:20:25 UTC
(In reply to Jeremy Allison from comment #6)

While I agree that it is easier to define repoducibility with a shell script, I feel I must point out that 99.9% of all Windows users use File Explorer for file-related tasks, not powershell or the command prompt. 

One reason for doing testing in a VM is that it can be exported and tested in other locations - as you rightly pointed out that we need to verify this elsewhere too.

Lastly I would like to mention that I am just a end-user and not a software developer. I provide information as best I can and to the extent it is possible in this situation, and I would not spend many hours providing as detailed reports as I can unless I believed this was a real issue. 

Remember turning off "vfs modules = io_uring" makes everything work 100% of the time. With Windows File Explorer too.

I do not think many use io_uring by default. Ubuntu does not shop Samba 4.12. Fedora does ship Samba 4.12.1, but has omitted io_uring.so from the install packages. The userbase for io_uring is probably rather small in a relative sense.

PowerShell is not something I am comfortable with. Do you have an example here?
Comment 8 Andrew Bartlett 2020-05-04 07:57:38 UTC
(In reply to mail from comment #5)
Regarding the network capture Jeremy has asked for:

We don't really mind how large it is - if the data is a simple pattern then a modern compression algorithm will make it tiny again.  Eg tar with the --xz option is a good choice.

I know it is very frustrating to be asked to assist at the edge of your capabilities, but similarly it is very, very frustrating for Jeremy to have a data corruption issue that he simply can't solve!

Data corruption is, after major security dramas, the single thing that keeps a Samba developer up at night, that they can't and won't just let sit unresolved.

So we do appreciate you continuing to help us narrow this down.  The network capture really will help.  Clearly there is something (and there are so many things it could be) different between what you are seeing and what Jermey can reproduce, we need to work out what they are.

Anything at all that can fully automate the issue would be a big step forward, at least to reliably show that you and Jeremy are performing the same steps, against the same Samba, but (sadly) getting different results.
Comment 9 mail 2020-05-04 08:59:24 UTC
This is what I have done so far:

1) Install Fedora 32 Workstation in VirtualBox (on Windows 10)
2) Created new test files with binary 11111111 (0xFF) repeated.
3) Copiled Samba 4.12.2 from source(*) into /usr/local/samba
4) Set up a minimal smb.conf
#### smb.conf ####
[global]
        workgroup = SAMBA
        security = user
        passdb backend = tdbsam
        load printers = no
[io_uring]
    comment = TEST share with IO_URING
    path = /media/test
    browseable = yes
    read only = yes
    guest only = Yes
    guest ok = Yes
    vfs objects = io_uring

[no_io_uring]
    comment = TEST share without IO_URING
    path = /media/test
    browseable = yes
    read only = yes
    guest only = Yes
    guest ok = Yes
    vfs objects =
########
5) Added local user with "smbpasswd -a fedora" password: fedora
6) Started Samba by running as root "/usr/local/samba/sbin/smbd"
7) Connect to the samba share using File Explorer by browsing to \\192.168.0.130\io_uring\ (user:fedora pass:fedora)
7) In PowerShell issued the following commands:
#####
PS E:\test> cp \\192.168.0.130\io_uring\test-FF\*.bin .
PS E:\test> .\sha256sum.exe 1.bin
6f19f1799f73c875745c43af0b7d6a4fd3078688762a3457ed13a5fa7b3ed79c *1.bin
#####
8) sha256sum is not correct. At position 0x8000 the file has a zeroed block until 0x10000. Most of the files were like this.
9) Rebooted the VirtualBox machine
10 Tested powershell again:
#####
PS E:\test> cp \\192.168.0.130\io_uring\test-FF\*.bin .
PS E:\test> .\sha256sum.exe 1.bin
7899a615e333c749b0204beb73adf1b3304405f592f716872c8fbdbd4be82ed9 *1.bin
PS E:\test> .\sha256sum.exe 15.bin
8702a06044f71e08d7351a7d07ab67522f907ee5e4a9c805c18b6ea1dc8fad3b *15.bin
PS E:\test>
#####
11) This time the 1.bin was correct, but 15.bin was not.


I will upload the VirtualBox file to https://mirrors.tnonline.net/?dir=samba in a few minutes. It contains the test files in smb-test.vdi, which is mounted as /media/test 




*.a) https://wiki.samba.org/index.php/Build_Samba_from_Source
*.b) liburing-devel needs to be installed before configure the source files. This was not noted in the wiki.
Comment 10 Andrew Bartlett 2020-05-04 09:17:43 UTC
Thank you so much.  

One other question comes to mind.  Can you give us the exact Windows 10 build numbers, versions, etc?
Comment 11 mail 2020-05-04 10:28:36 UTC
https://mirrors.tnonline.net/?dir=samba/virtualbox is the complete VirtualBox image. I have now tested this image on my works computer which also gives the corruptions.

Windows 10 Enterprise
builds:
* 1809, 17763.1158
* 1909, 18363.778
Comment 12 mail 2020-05-04 10:32:46 UTC
(In reply to mail from comment #11)
The Windows 1909 computer is a AMD Ryzen 2700X 8 core desktop.
The Windows 1809 computer is a Intel i7-8850H 6 core laptop.
Comment 13 Stefan Metzmacher 2020-05-06 10:27:09 UTC
Created attachment 15955 [details]
Patch for debugging

Can you apply this patch create add the "io_uring:check_ff=yes" option.
"log level = 1" would also be good.

Then please upload the resulting log file (typically log.smbd).
Comment 14 Stefan Metzmacher 2020-05-06 12:39:59 UTC
(In reply to Stefan Metzmacher from comment #13)

Please make sure you have "max log size = 0"
Comment 15 Jeremy Allison 2020-05-06 18:47:34 UTC
Anoop noew reproduced this. The underlying issue is that Windows clients don't deal with short read returns over SMB2 at least and ignore the SMB2_READ.DataLength reply field.

I have a simple patch to the smbd server that shows this.
Comment 16 Jeremy Allison 2020-05-06 18:49:09 UTC
From the mailing list:

-----------------------------------------------------------------------------------
Am 06.05.20 um 14:41 schrieb Anoop C S:
> On Wed, 2020-05-06 at 12:33 +0200, Stefan Metzmacher wrote:
>> Hi Anoop,
>>
>>> I could reproduce the difference in SHA256 checksum after copying a
>>> directory with 100 copies of test file(provided by reporter) from
>>> io_uring VFS module enabled share using Windows explorer(right-
>>> click-
>>>> copy/paste). Only 5 out of 100 files had correct checksum after
>>>> copy
>>> operation :-/
>>
>> Great! Can you please try to collect level 1 log files with
>> the patch https://bugzilla.samba.org/attachment.cgi?id=15955
>> applied?
>
> I have attached three log files.
> log.io_uring.smbd -- Copy using Windows explorer
> log.io_uring-mget.smd -- Copy using smbclient
> log.io_uring-powershell.smd -- Copy using `Copy-Item`

Thanks! All of them show short reads like:

> [2020/05/06 17:27:28.130248,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=0 (0x0) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 dir/1.bin
+ fnum 1607026405
> [2020/05/06 17:27:28.131049,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=9969664 (0x982000) len=30336 (0x7680) nread=30336 (0x30336) eof=10000000 (0x989680) blks=4096 blocks=1953
+6 dir/1.bin fnum 1607026405
> [2020/05/06 17:27:28.133679,  1] ../../source3/modules/vfs_io_uring.c:103(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: pread ofs=61440 (0xf000) len=32768 (0x8000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=19536 di
+r/1.bin fnum 1607026405
> [2020/05/06 17:27:28.140184,  0] ../../source3/modules/vfs_io_uring.c:88(vfs_io_uring_finish_req)
>   vfs_io_uring_finish_req: Invalid pread ofs=0 (0x0) len=1048576 (0x100000) nread=32768 (0x32768) eof=10000000 (0x989680) blks=4096 blocks=195
+36 dir/1.bin fnum 1607026405

It seems the first request is at ofs=0 len=32768 (0x8000) and we get
32768 back.
A follow up request with ofs=0 len=1048576 (0x100000) only gets the
first 32768 bytes which are already in the page cache.
Comment 17 Jeremy Allison 2020-05-06 20:31:23 UTC
Just confirmed, MacOS catalina clients suffer from the same problem with ignoring short reads.
Comment 18 Jeremy Allison 2020-05-08 04:08:42 UTC
Created attachment 15960 [details]
PROTOTYPE !!! wip fix for master.

Here is a *MASSIVELY* prototype fix
for the io_uring short read problem. It still
is very ugly and contains extra debugs etc.

It works for me here if I enable the JRATEST
code that forces short reads if read_len > 4096,
but I never was able to reproduce the Windows
issue (my kernel isn't modern enough).

Can you test it on your Windows environment
and see if it fixes the SHA256 checksum problem
after copying the test files ?

If so I'll fix it up to be production-ready,
(for example I think I can get rid of the
immediate event useage) fix the pwrite case
and then cut it onto bite-sized reviewable chunks.
Comment 19 Jeremy Allison 2020-05-08 05:37:28 UTC
Comment from Anoop on the samba-technical list:

Your patch fixes mismatch in SHA256 checksum of 100 bin files copied
using Windows explorer in my environment.
Comment 20 Jeremy Allison 2020-05-08 06:40:35 UTC
Created attachment 15961 [details]
git-am fix for master.

Cleaned up version that fixes io_uring short pread. If Anoop confirms I'll add in the short pwrite fixes.
Comment 21 Stefan Metzmacher 2020-05-08 12:50:01 UTC
Created attachment 15962 [details]
WIP patch for master

Sorry Jeremy, with your changes the common code is too complex.

I hope your're as happy as I am with my version :-)
I think it's much clearer.

It includes your generic short read patchset with the new 
sys_valid_io_range() helper.

Regarding the pwrite and short writes we need more research
as well as tests for the offset < 0 case.

I also included a HACK/DEBUG patch, that
adds a few parametric options.

The io_uring:nowait_pread=yes option should be able to trigger
the problem even with 5.3/5.4 kernels.

The io_uring:truncate_pread=4096 option should be able to test
the retries reliable.

The io_uring:force_async_pread[_retry]=yes options
should allow us to play a bit with the IOSQE_ASYNC flag,
which may alter the kernel behavior and using this
on the retry may avoid more than 1 retry.
Comment 22 Stefan Metzmacher 2020-05-08 13:03:23 UTC
Created attachment 15964 [details]
WIP patch for master
Comment 23 mail 2020-05-08 13:57:57 UTC
(In reply to Jeremy Allison from comment #18)

I applied this patch to the master git://git.samba.org/samba.git 
In my initial tests, the data corruptions seems to be fixed. I'll do some more testing over the weekend. 

Thank you!


EDIT: Saw the new patches, will apply those in further testing over the weekend.
Comment 24 Ralph Böhme 2020-05-08 14:40:06 UTC
(In reply to Stefan Metzmacher from comment #22)
It would be good to get clarification from dochelp on the matter and to get documented how client and server are supposed to handle this.
Comment 25 Jeremy Allison 2020-05-08 16:54:08 UTC
(In reply to Stefan Metzmacher from comment #21)

I do think your patches are nice. However, my patch has the small advantage that it actually works as confirmed by the testers :-) :-).

So I think marking that patch "obsolete" was a little premature :-).

Having said that, your code is cleaner so once you've gotten it working I'm happy for that to be the final version.

I'm not invested in my patchset, I learned a lot about how io_uring works and that was the big plus for me.

I hope that this now has your full attention, as currently we're shipping 4.12.x with a vfs module that corrupts data, which IMHO is the most important fix we have right now to get finished.
Comment 26 Jeremy Allison 2020-05-08 17:07:09 UTC
(In reply to Stefan Metzmacher from comment #21)

> Sorry Jeremy, with your changes the common code is too complex.

Also, you have to remember I was learning how your code actually worked whilst trying to fix it. My goal, as always, was to find the "minimum necessary change" in order to fix the problem, so making large rewrites wasn't something I thought was desirable. I also had the idea of adding some "test" parameters into the module so we could emulate the buggy kernel behavior on non-buggy kernels (and yes, I consider the change in kernel io_uring behavior to be a bug as it exposes great changes in how the original code behaved to userspace).

As you wrote the original code, doing large cleanup fixes isn't so much of a hurdle for you. Your new code does make it simpler to understand, which is a big plus IMHO. Let's just get it working now :-).
Comment 27 Jeremy Allison 2020-05-08 18:55:21 UTC
(In reply to Stefan Metzmacher from comment #22)

Ok, here is the issue I see with this WIP patch (copied from the mailing list so we have a record here):

I see the problem. It's the reason I initially introduced
an immediate event, and then refactored my code to call
_vfs_io_uring_queue_run() in a loop if it returned 'need_retry'.

Metze's current code is recursive, and it's running out
of stack.

It uses:

vfs_io_uring_pread_send()
        ->vfs_io_uring_pread_submit()  <-----------------------------------
                ->vfs_io_uring_request_submit()                           |
                        ->vfs_io_uring_queue_run()                        |
                                                                          |
But inside vfs_io_uring_queue_run() once a read                           |
completes it calls:                                                       |
                                                                          |
vfs_io_uring_queue_run()                                                  |
        ->vfs_io_uring_finish_req()                                       |
                ->cur->completion_fn()                                    |
                                                                          |
cur->completion_fn() for pread is set to vfs_io_uring_pread_completion()  |
                                                                          |
vfs_io_uring_pread_completion()                                           |
        ->vfs_io_uring_pread_submit() -------------------------------------

(the ascii art will probably only work in fixed-font text mailers
but you should get the idea).

I don't think this pattern will work. You have to exit
from vfs_io_uring_queue_run() all the way out so that
you've called io_uring_cq_advance() to move the completion
queue forward before you can submit another request to
finish the short IO.

That's what my initial patch did with an immedate event,
until I got the idea of wrapping vfs_io_uring_queue_run()
inside a retry wrapper function.

I'll take a look to see if I can fix this so the short
read re-submission happens *outside* of vfs_io_uring_queue_run(),
as I think that's the only way to make this work.
Comment 28 Stefan Metzmacher 2020-05-10 08:59:25 UTC
Created attachment 15965 [details]
WIP patch for master
Comment 29 Stefan Metzmacher 2020-05-11 20:58:51 UTC
Created attachment 15969 [details]
Possible patch for master (should also apply to 4.12)

Can someone please verify this (most likely final) patch?

Thanks!

A pipeline is running at
https://gitlab.com/samba-team/samba/-/merge_requests/1328

But that doesn't test the io_uring module (yet).
Comment 30 Jeremy Allison 2020-05-11 23:13:20 UTC
Comment on attachment 15969 [details]
Possible patch for master (should also apply to 4.12)

Missing the expanded comments/explaination on the recursion fix. Can you re-add those ?
Comment 31 Jeremy Allison 2020-05-11 23:14:04 UTC
Comment on attachment 15969 [details]
Possible patch for master (should also apply to 4.12)

Missing the recursion comments and explaination. Can you re-add those please ?
Comment 32 Jeremy Allison 2020-05-11 23:41:53 UTC
Created attachment 15971 [details]
Possible patch for master (should also apply to 4.12) - comment updated

Updated version with restored comments.
Comment 33 Jeremy Allison 2020-05-12 21:42:44 UTC
Created attachment 15976 [details]
git-am fix for 4.12.next.

Clean cherry-pick from master.
Comment 34 Karolin Seeger 2020-05-13 10:33:27 UTC
Pushed to autobuild-v4-12-test.
Comment 35 Karolin Seeger 2020-05-17 05:16:38 UTC
(In reply to Karolin Seeger from comment #34)
Pushed to v4-12-test.
Closing out bug report.

Thanks!
Comment 36 mail 2020-05-23 17:14:04 UTC
On 2020-05-17 07:16, samba-bugs@samba.org wrote:
> https://bugzilla.samba.org/show_bug.cgi?id=14361
>
> Karolin Seeger <kseeger@samba.org> changed:
>
>             What    |Removed                     |Added
> ----------------------------------------------------------------------------
>           Resolution|---                         |FIXED
>               Status|ASSIGNED                    |RESOLVED
>
> --- Comment #35 from Karolin Seeger <kseeger@samba.org> ---
> (In reply to Karolin Seeger from comment #34)
> Pushed to v4-12-test.
> Closing out bug report.
>
> Thanks!
>
The new samba-4.12.3 which has this fix passes my tests. So thank you 
very much :)

Have a nice weekend!