Created attachment 16109 [details] patch file based on Samba 4.11.9 If a Windows client sends change notification requests n times for the same folder, smbd-notifyd triggers at least n*n messages to the requesting smbd process when something happens to the folder. We noticed this issue multiple times when the memory usage of smbd-notifyd process was huge, in one case it was 250GiB. Eventually the issue was reproduced by working with the end user who experienced the issue. I understand it is unusual for a client to send change notification requests for the same folder again and again, but a node.js component named "webpack-dev-server" actually does it. I have come up with a potential fix to the issue and verified that it fixes our problem. Please see the attached diff file.
Can you clarify exactly what the client is doing to trigger this ? My understanding is that the client, on the same connection, is repeatedly issuing identical change-notify requests on an open handle ? Is that correct ? It would be good to see a wireshark trace of this so we can create a regression test to reproduce this. Thanks !
Created attachment 16115 [details] pcap file showing many change notification requests on the same folder
(In reply to Jeremy Allison from comment #1) A pcap file was attached to show how a Windows client sends 3000+ change notification requests on the same folder. No change was triggered on that folder as I didn't want to receive 9,000,000 responses. This issue has been reproduced on both Linux and illumos system. I also have a Windows C program which can be used to reproduce the issue reliably, please let me know if it is needed. Thanks!
The Windows C program would be really helpful also - thanks !
Created attachment 16116 [details] Patch to open many notifies on a single connection and directory Attached find a work-in-progress patch that implements important bits of the problematic network trace. smbtorture3 //127.0.0.1/tmp -Uuser%pass notify-bench4 -o 2000 opens 2000 handles and notifies and goes to sleep after that. It is just an initial step towards further analysis, which I don't have time for at this moment. So this is just a snapshot that might help others to take a closer look before I can return to this case.
Created attachment 16117 [details] Fixed torture patch Ooops, wrong patch posted. I had uncommitted changes in my working tree that did not make it to patch 16166. Sorry for the hickup.
Created attachment 16118 [details] Windows C program for reproducing the issue The requested C program was attached. Thanks.
Thanks for providing the Windows C program for reproducing the issue. I think we have same troubles as described above. I can confirm this behaviour on RHEL 9.2 / Samba 4.17.5 ... can be demonstrated with this Demo-Code. Steps to reproduce from a Windows 10 Client: I used Visual Studio 2022 to Compile this Windows-Demo-Tool, I named it WatchDirectorySendChangeNotifications.exe Samba-Share connected Drive W: 1. Starting the Demonstration-Tool "WatchDirectorySendChangeNotifications.exe W:\Allg" 2. Starting the Demonstration-Tool "WatchDirectorySendChangeNotifications.exe W:\Allg" a second time 3. Now the User changes something, using "touch W:\Allg\p130" which only updates the Timestamp of this W:\Allg\p130 Folder => immediately after 1-3 Seconds leads to a huge amount (> 10GB) of memory allocated by smbd-notifyd! => gets killed by oom-killer. Is there any progress on this? I reached out to Redhat Support and filed a support case (#03845049) there to hopefully get this fixed.
Jeremy, could you look into this?
Created attachment 18372 [details] network trace of windows reproducer
Created attachment 18373 [details] Network trace running the changenotify.exe against Windows Server
If I just look at Windows Server, then there is a limit of 511 change notifications you can register. For each additional you try you get NT_STATUS_INSUFFICIENT_RESOURCES.
Set-SmbServerConfiguration -AsynchronousCredits <num> doesn't affect it.
Created attachment 18374 [details] trace of changenotify repo against windows with 512 This is changenotify.exe z:/source3 8 - which results in 512 change notification requests. Windows handles 511 but will stop at 512 and return NT_STATUS_INSUFFICIENT_RESOURCES for each request if the client requests >= 512
The limit of 512 is per connection. If you open a new connection you can get another 511 change notification requests. (I used `net use <hostname>` and `net use <ip>` to create two connections).
I already tried to set "smb2 max credits = 512" in [global] (default is 8192) as a workaround, but this didn't help. But didn't improve the situation. One single client who starts the Demo-Tool "WatchDirectorySendChangeNotifications.exe" and touches a directory afterwards is enough to let smbd-notifyd consume >11GB of ram which leads to a OOM kill on a 12GB Ram Machine. In my opinion this is not only a simple bug but a severe denial-of-service security-issue. Using the demo-tool you can just kill samba as a single user at any time.
Set-SmbServerConfiguration -AsynchronousCredits <num> defines how many change notifcation handles you can register per connection on Windows.
If you register 512 watchers on the same directory and then trigger a notification by `touch some/file/in/watched/dir`, smbd-notifyd starts generating messages. ull talloc report on 'null_context' (total 141578278 bytes in 1568355 blocks) struct pdb_methods contains 544 bytes in 1 blocks (ref 0) 0x5620122b89b0 struct messaging_dgm_context contains 141471086 bytes in 1566926 blocks (ref 0) 0x56201229d980 struct messaging_dgm_out contains 141470654 bytes in 1566922 blocks (ref 0) 0x5620122df8c0 struct tevent_req contains 632 bytes in 7 blocks (ref 0) 0x5620258070e0 struct tevent_timer contains 104 bytes in 1 blocks (ref 0) 0x562025807590 struct tevent_queue_entry contains 80 bytes in 1 blocks (ref 0) 0x5620258074d0 struct messaging_dgm_out_queue_state contains 168 bytes in 3 blocks (ref 0) 0x5620258072c0 int contains 0 bytes in 1 blocks (ref 0) 0x562025807460 uint8_t contains 96 bytes in 1 blocks (ref 0) 0x562025807370 struct tevent_immediate contains 112 bytes in 1 blocks (ref 0) 0x5620258071f0 struct tevent_req contains 632 bytes in 7 blocks (ref 0) 0x562025806b40 struct tevent_timer contains 104 bytes in 1 blocks (ref 0) 0x562025806ff0 struct tevent_queue_entry contains 80 bytes in 1 blocks (ref 0) 0x562025806f30 struct messaging_dgm_out_queue_state contains 168 bytes in 3 blocks (ref 0) 0x562025806d20 int contains 0 bytes in 1 blocks (ref 0) 0x562025806ec0 uint8_t contains 96 bytes in 1 blocks (ref 0) 0x562025806dd0 struct tevent_immediate contains 112 bytes in 1 blocks (ref 0) 0x562025806c50 grep -c 'messaging_dgm_out_queue_state' smbd-notifyd-trigger.log 223845 Note that I called `smbcontrol <notifyd> pool-usage` somewhere in the middle, but there are already 223845 messages queued. We have 512 change notify watchers, so I would expect this amount of messages, however we have **500 times** the amount of messages (probably more). Why does it generate so many messages? What are those message about?
With 1 watcher it creates 1 message Wtih 2 watchers it creates 4 messages With 4 watchers it creates 16 messages ...
I have a first patchset addressing the issue in notifyd. I used the Windows reproducer and with 4096 change notifications. On my VM it takes 2.8 seconds to register them, and 1.1 seconds to deliver 4096 change notifications to the client. The next step is to implement MaxAsynchronousCredits in smbd.
(In reply to Andreas Schneider from comment #20) Thanks, that's great news!
This bug was referenced in samba master: af011b987a4ad0d3753d83cc0b8d97ad64ba874a
Created attachment 18455 [details] patch for 4.21