i'm testing samba as a backup target for proxmox 7.3 vzdump backup via 10gigE without "oplocks = no" (i.e. default settings) smbd processes go nuts , grow to >>5gb in size and then getting all killed by oomkiller. i'm using debian11 default smb.conf with only basic share added like [sharename] valid users=username writeable=yes path=/path/to/zfs/dataset i found the problem goes away when setting "oplocks = no" (via https://www.taste-of-it.de/samba-smb-process-crashes-with-memory-leak/ ) more details is being reported here: https://forum.proxmox.com/threads/smbd-memory-leak.119199/ i cannot believe this is correct behaviour. unix processes should not receiving data from the network without limitation until they either burst or getting killed externally, no matter how fast the network or the storage can deliver troughput
This smells like a known issue that basically our async SMB read/write processing is subtly broken and results in letting the client run loose and out of control of the SMB creditting mechanism that is supposed to throttle clients. We should likely NOT returns async interim responses to read and write SMB requests, but that's what we do currently. This needs some decent research and validation and unfortunately so far noone has put the required resources into this. Would you be able to test a simple patch on top of the sources of your Samba version?
>Would you be able to test a simple patch on top of the sources of your Samba version? yes, i can give it a try and like to help resolving this
Created attachment 17683 [details] WIP patch for 4.13 WIP patch that should do the trick. Needs more research to compare against Windows behaviour. There's likely already another bugreport that discusses this.
i tested the patch and i see no real difference. i could only test with 1 remote writer, but i see the VSZ climb >5GB and RSS >1,5GB
>There's likely already another bugreport that discusses this. i guess you meant this? https://lists.samba.org/archive/samba/2021-September/237262.html it seems there is no entry in bugzilla for this
(In reply to roland from comment #4) Oh, so now what? :) I'd say as 4.13 is EOL the next sensible step would be updating to 4.17 to check whether the issue is still present in the latest release.
ok. will test and report
problem also happens with samba 4.17.3-Debian
(In reply to roland from comment #8) That's unfortunate. You can follow the instructions in https://lists.samba.org/archive/samba/2021-September/237295.html to check the `smbcontrol PID pool-usage > pool-usage.txt` output if the memory consumption is really caused by the IO buffers. If not, someone has to take a closer look at the pool-usage output. You can also try the big hammer of disabling async IO as described at the end of the mail linked above. Maybe it's something else altogether.
i have pasted pool-usage of some >1gb smbd process at https://paste.debian.net/1263460/ i see no pthreadpool_tevent_job_state entries >You can also try the big hammer of disabling async IO as >described at the end of the mail linked above. Maybe it's >something else altogether. mind that oplocks=no seems to resolve the problem. i don't like disabling async io, as there is zfs underneath and i have no ZIL with that pool for accelerating sync writes
mhh, i don't see anything in that, maybe i need to issue the command while the process is growing, i.e. while data is in flight, and not afterwards. will do later on
(In reply to roland from comment #10) Nothing in the talloc report. Next would be running smbd under valgrind with memcheck.
so, while data is in flight things look different. i could not upload to pastbin, because file to large, so i uploaded to https://www.file-upload.net/download-15056180/pool-usage3.txt.html there are 1949 occurences of the following structs: aio_extra , aio_req_fsp_link , pthreadpool_tevent_job_state , pwrite_fsync_state , smbd_smb2_request , smbd_smb2_write_state, smb_request, smb_vfs_call_pwrite_state
shouldn't this problem be reproducable whith gigabit networking if the storage where the samba share resides is slow enough? if so, i would like to try reproducing it that way to be honest, i think this bug is quite serious, and i'm wondering why there are not more users affected. samba is such popular tool and so widely used.
i could reproduce the problem with a virtual machine on proxmox, exporting a samba share, mounted on the host via bridge interface, i.e. without phyiscal nic in between. virtual machine network has been throttled to 1gbit and virtual machines virtual disk has been throttled to a lower bandwith of <100mb/s i could reproduce the problem with this settings, so this should not be a high-performance-network-only bug
The new pool report confirms pretty much what Ralph said in comment 1. We should not turn smb2_read and smb2_write requests async in the smb2 sense, even if we do process them asynchronously internally. Could you test a patch if we sent it to you?
By the way, Ralph attached the patch in comment 3. This still applies to master and thus 4.17. Can you give that a try?
thanks. i gave it a try and i can still reproduce the problem
(In reply to roland from comment #18) > thanks. i gave it a try and i can still reproduce the problem With that patch in place, does setting "smb2 max credits = 512" make a difference? The default is 8192, which means 8192*64k worth of buffers. If you play with the credits and set them much lower, this should throttle the clients.
yes 512 seems to make a difference, it's much harder to push smbd to excess memory usage. i was able to push it to 1gb rss and 5gb vsz with this, but no further. i'm testing with a single linux cifs client with multiple dd writers, writing zeroes or garbage to the samba share
I can reproduce this on Samba 4.17.12 and 4.19.5 (Debian Stable), by using cp to copy a ~50GB file to a cifs-mounted samba share on a low-power ARM server. The target filesystem is on an encrypted volume of a relatively slow NAS disk, which I imagine leads to some backpressure on writes. Gigabit ethernet. When smbd's memory usage pushes into swap, which is on the same disk, the transfer drags to a crawl. If it's allowed to continue for more than a few minutes, the client's cp process becomes effectively unkillable. The problem is avoided with oplocks = no, as reported here: https://www.omnespro.ch/post/samba-extremer-ram-fussabdruck-bei-grossen-dateien/