Bug 14430 - smbd-notifyd O(n*n) performance issue with n watches registered for the same folder
Summary: smbd-notifyd O(n*n) performance issue with n watches registered for the same ...
Status: ASSIGNED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Other (show other bugs)
Version: 4.11.9
Hardware: All All
: P5 critical (vote)
Target Milestone: ---
Assignee: Andreas Schneider
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2020-07-02 22:22 UTC by YOUZHONG YANG
Modified: 2024-10-02 08:11 UTC (History)
8 users (show)

See Also:


Attachments
patch file based on Samba 4.11.9 (2.20 KB, text/plain)
2020-07-02 22:22 UTC, YOUZHONG YANG
no flags Details
pcap file showing many change notification requests on the same folder (2.72 MB, application/vnd.tcpdump.pcap)
2020-07-07 02:45 UTC, YOUZHONG YANG
no flags Details
Patch to open many notifies on a single connection and directory (5.94 KB, patch)
2020-07-07 12:32 UTC, Volker Lendecke
no flags Details
Fixed torture patch (9.02 KB, patch)
2020-07-07 12:35 UTC, Volker Lendecke
no flags Details
Windows C program for reproducing the issue (3.15 KB, text/plain)
2020-07-07 15:42 UTC, YOUZHONG YANG
no flags Details
network trace of windows reproducer (3.72 MB, application/octet-stream)
2024-07-10 13:03 UTC, Andreas Schneider
no flags Details
Network trace running the changenotify.exe against Windows Server (3.30 MB, application/octet-stream)
2024-07-10 13:48 UTC, Andreas Schneider
no flags Details
trace of changenotify repo against windows with 512 (443.32 KB, application/octet-stream)
2024-07-10 14:21 UTC, Andreas Schneider
no flags Details
patch for 4.21 (14.44 KB, patch)
2024-10-02 08:11 UTC, Andreas Schneider
asn: review? (metze)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description YOUZHONG YANG 2020-07-02 22:22:58 UTC
Created attachment 16109 [details]
patch file based on Samba 4.11.9

If a Windows client sends change notification requests n times for the same folder, smbd-notifyd triggers at least n*n messages to the requesting smbd process when something happens to the folder.

We noticed this issue multiple times when the memory usage of smbd-notifyd process was huge, in one case it was 250GiB. Eventually the issue was reproduced by working with the end user who experienced the issue.

I understand it is unusual for a client to send change notification requests for the same folder again and again, but a node.js component named "webpack-dev-server" actually does it.

I have come up with a potential fix to the issue and verified that it fixes our problem. Please see the attached diff file.
Comment 1 Jeremy Allison 2020-07-07 00:58:18 UTC
Can you clarify exactly what the client is doing to trigger this ?

My understanding is that the client, on the same connection, is repeatedly issuing identical change-notify requests on an open handle ?

Is that correct ? It would be good to see a wireshark trace of this so we can create a regression test to reproduce this.

Thanks !
Comment 2 YOUZHONG YANG 2020-07-07 02:45:42 UTC
Created attachment 16115 [details]
pcap file showing many change notification requests on the same folder
Comment 3 YOUZHONG YANG 2020-07-07 02:50:49 UTC
(In reply to Jeremy Allison from comment #1)
A pcap file was attached to show how a Windows client sends 3000+ change notification requests on the same folder. No change was triggered on that folder as I didn't want to receive 9,000,000 responses.

This issue has been reproduced on both Linux and illumos system.

I also have a Windows C program which can be used to reproduce the issue reliably, please let me know if it is needed.

Thanks!
Comment 4 Jeremy Allison 2020-07-07 04:39:26 UTC
The Windows C program would be really helpful also - thanks !
Comment 5 Volker Lendecke 2020-07-07 12:32:27 UTC
Created attachment 16116 [details]
Patch to open many notifies on a single connection and directory

Attached find a work-in-progress patch that implements important bits of the problematic network trace.

smbtorture3 //127.0.0.1/tmp -Uuser%pass notify-bench4 -o 2000

opens 2000 handles and notifies and goes to sleep after that. It is just an initial step towards further analysis, which I don't have time for at this moment. So this is just a snapshot that might help others to take a closer look before I can return to this case.
Comment 6 Volker Lendecke 2020-07-07 12:35:27 UTC
Created attachment 16117 [details]
Fixed torture patch

Ooops, wrong patch posted. I had uncommitted changes in my working tree that did not make it to patch 16166. Sorry for the hickup.
Comment 7 YOUZHONG YANG 2020-07-07 15:42:27 UTC
Created attachment 16118 [details]
Windows C program for reproducing the issue

The requested C program was attached. Thanks.
Comment 8 Gunnar Haslinger 2024-06-13 09:13:27 UTC
Thanks for providing the Windows C program for reproducing the issue.

I think we have same troubles as described above.

I can confirm this behaviour on RHEL 9.2 / Samba 4.17.5 ... can be demonstrated with this Demo-Code. 

Steps to reproduce from a Windows 10 Client:

I used Visual Studio 2022 to Compile this Windows-Demo-Tool, I named it WatchDirectorySendChangeNotifications.exe

Samba-Share connected Drive W:

1. Starting the Demonstration-Tool "WatchDirectorySendChangeNotifications.exe W:\Allg"

2. Starting the Demonstration-Tool "WatchDirectorySendChangeNotifications.exe W:\Allg" a second time

3. Now the User changes something, using "touch W:\Allg\p130" which only updates the Timestamp of this W:\Allg\p130 Folder

=> immediately after 1-3 Seconds leads to a huge amount (> 10GB) of memory allocated by smbd-notifyd! => gets killed by oom-killer.


Is there any progress on this?

I reached out to Redhat Support and filed a support case (#03845049) there to hopefully get this fixed.
Comment 9 Andreas Schneider 2024-06-13 15:40:45 UTC
Jeremy, could you look into this?
Comment 10 Andreas Schneider 2024-07-10 13:03:10 UTC
Created attachment 18372 [details]
network trace of windows reproducer
Comment 11 Andreas Schneider 2024-07-10 13:48:44 UTC
Created attachment 18373 [details]
Network trace running the changenotify.exe against Windows Server
Comment 12 Andreas Schneider 2024-07-10 14:05:05 UTC
If I just look at Windows Server, then there is a limit of 511 change notifications you can register. For each additional you try you get NT_STATUS_INSUFFICIENT_RESOURCES.
Comment 13 Andreas Schneider 2024-07-10 14:12:22 UTC
Set-SmbServerConfiguration -AsynchronousCredits <num> doesn't affect it.
Comment 14 Andreas Schneider 2024-07-10 14:21:53 UTC
Created attachment 18374 [details]
trace of changenotify repo against windows with 512

This is changenotify.exe z:/source3 8 - which results in 512 change notification requests. Windows handles 511 but will stop at 512 and return NT_STATUS_INSUFFICIENT_RESOURCES for each request if the client requests >= 512
Comment 15 Andreas Schneider 2024-07-10 14:29:59 UTC
The limit of 512 is per connection. If you open a new connection you can get another 511 change notification requests. (I used `net use <hostname>` and `net use <ip>` to create two connections).
Comment 16 Gunnar Haslinger 2024-07-10 14:35:13 UTC
I already tried to set "smb2 max credits = 512" in [global] (default is 8192) as a workaround, but this didn't help. 

But didn't improve the situation.

One single client who starts the Demo-Tool "WatchDirectorySendChangeNotifications.exe" and touches a directory afterwards is enough to let smbd-notifyd consume >11GB of ram which leads to a OOM kill on a 12GB Ram Machine.

In my opinion this is not only a simple bug but a severe denial-of-service security-issue. Using the demo-tool you can just kill samba as a single user at any time.
Comment 17 Andreas Schneider 2024-07-11 07:32:53 UTC
Set-SmbServerConfiguration -AsynchronousCredits <num> defines how many change notifcation handles you can register per connection on Windows.
Comment 18 Andreas Schneider 2024-07-11 10:40:31 UTC
If you register 512 watchers on the same directory and then trigger a notification by `touch some/file/in/watched/dir`, smbd-notifyd starts generating messages.

ull talloc report on 'null_context' (total 141578278 bytes in 1568355 blocks)
    struct pdb_methods             contains    544 bytes in   1 blocks (ref 0) 0x5620122b89b0
    struct messaging_dgm_context   contains 141471086 bytes in 1566926 blocks (ref 0) 0x56201229d980
        struct messaging_dgm_out       contains 141470654 bytes in 1566922 blocks (ref 0) 0x5620122df8c0
            struct tevent_req              contains    632 bytes in   7 blocks (ref 0) 0x5620258070e0
                struct tevent_timer            contains    104 bytes in   1 blocks (ref 0) 0x562025807590
                struct tevent_queue_entry      contains     80 bytes in   1 blocks (ref 0) 0x5620258074d0
                struct messaging_dgm_out_queue_state contains    168 bytes in   3 blocks (ref 0) 0x5620258072c0
                    int                            contains      0 bytes in   1 blocks (ref 0) 0x562025807460
                    uint8_t                        contains     96 bytes in   1 blocks (ref 0) 0x562025807370
                struct tevent_immediate        contains    112 bytes in   1 blocks (ref 0) 0x5620258071f0
            struct tevent_req              contains    632 bytes in   7 blocks (ref 0) 0x562025806b40
                struct tevent_timer            contains    104 bytes in   1 blocks (ref 0) 0x562025806ff0
                struct tevent_queue_entry      contains     80 bytes in   1 blocks (ref 0) 0x562025806f30
                struct messaging_dgm_out_queue_state contains    168 bytes in   3 blocks (ref 0) 0x562025806d20
                    int                            contains      0 bytes in   1 blocks (ref 0) 0x562025806ec0
                    uint8_t                        contains     96 bytes in   1 blocks (ref 0) 0x562025806dd0
                struct tevent_immediate        contains    112 bytes in   1 blocks (ref 0) 0x562025806c50

grep -c 'messaging_dgm_out_queue_state' smbd-notifyd-trigger.log
223845

Note that I called `smbcontrol <notifyd> pool-usage` somewhere in the middle, but there are already 223845 messages queued.

We have 512 change notify watchers, so I would expect this amount of messages, however we have **500 times** the amount of messages (probably more).

Why does it generate so many messages? What are those message about?
Comment 19 Andreas Schneider 2024-07-11 11:22:42 UTC
With 1 watcher it creates 1 message
Wtih 2 watchers it creates 4 messages
With 4 watchers it creates 16 messages
...
Comment 20 Andreas Schneider 2024-07-25 16:18:45 UTC
I have a first patchset addressing the issue in notifyd.

I used the Windows reproducer and with 4096 change notifications. On my VM it takes 2.8 seconds to register them, and 1.1 seconds to deliver 4096 change notifications to the client.

The next step is to implement MaxAsynchronousCredits in smbd.
Comment 21 Gunnar Haslinger 2024-07-25 19:27:46 UTC
(In reply to Andreas Schneider from comment #20)

Thanks, that's great news!
Comment 22 Samba QA Contact 2024-10-01 14:23:03 UTC
This bug was referenced in samba master:

af011b987a4ad0d3753d83cc0b8d97ad64ba874a
Comment 23 Andreas Schneider 2024-10-02 08:11:30 UTC
Created attachment 18455 [details]
patch for 4.21