Created attachment 16109 [details] patch file based on Samba 4.11.9 If a Windows client sends change notification requests n times for the same folder, smbd-notifyd triggers at least n*n messages to the requesting smbd process when something happens to the folder. We noticed this issue multiple times when the memory usage of smbd-notifyd process was huge, in one case it was 250GiB. Eventually the issue was reproduced by working with the end user who experienced the issue. I understand it is unusual for a client to send change notification requests for the same folder again and again, but a node.js component named "webpack-dev-server" actually does it. I have come up with a potential fix to the issue and verified that it fixes our problem. Please see the attached diff file.
Can you clarify exactly what the client is doing to trigger this ? My understanding is that the client, on the same connection, is repeatedly issuing identical change-notify requests on an open handle ? Is that correct ? It would be good to see a wireshark trace of this so we can create a regression test to reproduce this. Thanks !
Created attachment 16115 [details] pcap file showing many change notification requests on the same folder
(In reply to Jeremy Allison from comment #1) A pcap file was attached to show how a Windows client sends 3000+ change notification requests on the same folder. No change was triggered on that folder as I didn't want to receive 9,000,000 responses. This issue has been reproduced on both Linux and illumos system. I also have a Windows C program which can be used to reproduce the issue reliably, please let me know if it is needed. Thanks!
The Windows C program would be really helpful also - thanks !
Created attachment 16116 [details] Patch to open many notifies on a single connection and directory Attached find a work-in-progress patch that implements important bits of the problematic network trace. smbtorture3 //127.0.0.1/tmp -Uuser%pass notify-bench4 -o 2000 opens 2000 handles and notifies and goes to sleep after that. It is just an initial step towards further analysis, which I don't have time for at this moment. So this is just a snapshot that might help others to take a closer look before I can return to this case.
Created attachment 16117 [details] Fixed torture patch Ooops, wrong patch posted. I had uncommitted changes in my working tree that did not make it to patch 16166. Sorry for the hickup.
Created attachment 16118 [details] Windows C program for reproducing the issue The requested C program was attached. Thanks.
Thanks for providing the Windows C program for reproducing the issue. I think we have same troubles as described above. I can confirm this behaviour on RHEL 9.2 / Samba 4.17.5 ... can be demonstrated with this Demo-Code. Steps to reproduce from a Windows 10 Client: I used Visual Studio 2022 to Compile this Windows-Demo-Tool, I named it WatchDirectorySendChangeNotifications.exe Samba-Share connected Drive W: 1. Starting the Demonstration-Tool "WatchDirectorySendChangeNotifications.exe W:\Allg" 2. Starting the Demonstration-Tool "WatchDirectorySendChangeNotifications.exe W:\Allg" a second time 3. Now the User changes something, using "touch W:\Allg\p130" which only updates the Timestamp of this W:\Allg\p130 Folder => immediately after 1-3 Seconds leads to a huge amount (> 10GB) of memory allocated by smbd-notifyd! => gets killed by oom-killer. Is there any progress on this? I reached out to Redhat Support and filed a support case (#03845049) there to hopefully get this fixed.
Jeremy, could you look into this?
Created attachment 18372 [details] network trace of windows reproducer
Created attachment 18373 [details] Network trace running the changenotify.exe against Windows Server
If I just look at Windows Server, then there is a limit of 511 change notifications you can register. For each additional you try you get NT_STATUS_INSUFFICIENT_RESOURCES.
Set-SmbServerConfiguration -AsynchronousCredits <num> doesn't affect it.
Created attachment 18374 [details] trace of changenotify repo against windows with 512 This is changenotify.exe z:/source3 8 - which results in 512 change notification requests. Windows handles 511 but will stop at 512 and return NT_STATUS_INSUFFICIENT_RESOURCES for each request if the client requests >= 512
The limit of 512 is per connection. If you open a new connection you can get another 511 change notification requests. (I used `net use <hostname>` and `net use <ip>` to create two connections).
I already tried to set "smb2 max credits = 512" in [global] (default is 8192) as a workaround, but this didn't help. But didn't improve the situation. One single client who starts the Demo-Tool "WatchDirectorySendChangeNotifications.exe" and touches a directory afterwards is enough to let smbd-notifyd consume >11GB of ram which leads to a OOM kill on a 12GB Ram Machine. In my opinion this is not only a simple bug but a severe denial-of-service security-issue. Using the demo-tool you can just kill samba as a single user at any time.
Set-SmbServerConfiguration -AsynchronousCredits <num> defines how many change notifcation handles you can register per connection on Windows.
If you register 512 watchers on the same directory and then trigger a notification by `touch some/file/in/watched/dir`, smbd-notifyd starts generating messages. ull talloc report on 'null_context' (total 141578278 bytes in 1568355 blocks) struct pdb_methods contains 544 bytes in 1 blocks (ref 0) 0x5620122b89b0 struct messaging_dgm_context contains 141471086 bytes in 1566926 blocks (ref 0) 0x56201229d980 struct messaging_dgm_out contains 141470654 bytes in 1566922 blocks (ref 0) 0x5620122df8c0 struct tevent_req contains 632 bytes in 7 blocks (ref 0) 0x5620258070e0 struct tevent_timer contains 104 bytes in 1 blocks (ref 0) 0x562025807590 struct tevent_queue_entry contains 80 bytes in 1 blocks (ref 0) 0x5620258074d0 struct messaging_dgm_out_queue_state contains 168 bytes in 3 blocks (ref 0) 0x5620258072c0 int contains 0 bytes in 1 blocks (ref 0) 0x562025807460 uint8_t contains 96 bytes in 1 blocks (ref 0) 0x562025807370 struct tevent_immediate contains 112 bytes in 1 blocks (ref 0) 0x5620258071f0 struct tevent_req contains 632 bytes in 7 blocks (ref 0) 0x562025806b40 struct tevent_timer contains 104 bytes in 1 blocks (ref 0) 0x562025806ff0 struct tevent_queue_entry contains 80 bytes in 1 blocks (ref 0) 0x562025806f30 struct messaging_dgm_out_queue_state contains 168 bytes in 3 blocks (ref 0) 0x562025806d20 int contains 0 bytes in 1 blocks (ref 0) 0x562025806ec0 uint8_t contains 96 bytes in 1 blocks (ref 0) 0x562025806dd0 struct tevent_immediate contains 112 bytes in 1 blocks (ref 0) 0x562025806c50 grep -c 'messaging_dgm_out_queue_state' smbd-notifyd-trigger.log 223845 Note that I called `smbcontrol <notifyd> pool-usage` somewhere in the middle, but there are already 223845 messages queued. We have 512 change notify watchers, so I would expect this amount of messages, however we have **500 times** the amount of messages (probably more). Why does it generate so many messages? What are those message about?
With 1 watcher it creates 1 message Wtih 2 watchers it creates 4 messages With 4 watchers it creates 16 messages ...
I have a first patchset addressing the issue in notifyd. I used the Windows reproducer and with 4096 change notifications. On my VM it takes 2.8 seconds to register them, and 1.1 seconds to deliver 4096 change notifications to the client. The next step is to implement MaxAsynchronousCredits in smbd.
(In reply to Andreas Schneider from comment #20) Thanks, that's great news!
This bug was referenced in samba master: af011b987a4ad0d3753d83cc0b8d97ad64ba874a
Created attachment 18455 [details] patch for 4.21
This bug was referenced in samba master: 52079c8d911a3d5608333a59593249020a0d5b48 2856e74998492d17eaf2dc4efc8259f95065262f e2969ed00fd85d5b2bf62d2a17643b5f8481b278 6cb46e5cb60ce753cac55a8ba19f5cffb697d545 3884a7c085f7fe0f32a5801e0abc7d63a95a6433 c15b3f9ee7ccd375cf721c63f6947d60235a6e78 a3e74a1e58dff3327b2b6f63ecf044d1cf8f47bb 64168ebe8871e0d15d644966acb54af41f8f74af b0f287c4dffb9dea301d52342a433b5b94ac4009 adbacb85744d06477254a478853b6864c32c3819 ae2a9cb75ddc9e7cedf1744a434edc583dd2a0a6 2a90b06679ae6415c4d60dc14a17385da85668c6 bb26e104d6466bdc153508d670370c5d54fc032e cbadfaaf3d90db871a13e6423a68a643b0ff99e9 1260fcb61c83e4e66d6e32be8d587821c983f090