Bug 15799 - smbd-notifyd leaks memory as 'notify_remove()' gets no chance to be processed when it's done by exit_server_common() -> smbXsrv_session_logoff_all() -> ... -> files_forall() -> ... -> fsp_unbind_smb() -> notify_remove()
Summary: smbd-notifyd leaks memory as 'notify_remove()' gets no chance to be processed...
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.19.7
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-02-07 21:16 UTC by YOUZHONG YANG
Modified: 2025-02-12 11:35 UTC (History)
3 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description YOUZHONG YANG 2025-02-07 21:16:55 UTC
We frequently see huge memory usage by smbd-notifyd processes on our servers when Samba is relatively idle. After observing the smbd-notifyd process by smbcontrol <pid> pool-usage, apparently there are tons of inotify_watch_context structs which appear to be leaked.

I was able to reproduce it by the following steps:

1. start our application, which sends notification requests for thousands of folders to Samba.
2. Close the tcp connection between the Windows client and Samba server using tcpview from 'sysinternals'.
3. smbcontrol $(ps -o pid -C smbd-notifyd --no-headers) pool-usage > /var/tmp/pool-usage
4. grep inotify_watch_context /var/tmp/pool-usage | wc -l

Here is the stack trace when notify_remove() gets executed during shutting down the smbd process:

#0  notify_remove (ctx=0x5a7494532990, private_data=0x5a74960f0840,
    path=0x7ffe0cb2b8e0 "/vmgr/yyang-bmain-copy/build/matlab/toolbox/driving/supportpackages/scenariobuilder/scenariobuilder")
    at ../../source3/smbd/notify_msg.c:191
#1  0x00007ba7592407a9 in fsp_unbind_smb (req=0x0, fsp=0x5a74960f0840) at ../../source3/smbd/files.c:1779
#2  0x00007ba759285098 in close_file_smb (req=0x0, fsp=0x5a74960f0840, close_type=SHUTDOWN_CLOSE) at ../../source3/smbd/close.c:1691
#3  0x00007ba7592850da in close_file_free (req=0x0, _fsp=0x7ffe0cb2ba68, close_type=SHUTDOWN_CLOSE) at ../../source3/smbd/close.c:1703
#4  0x00007ba75923f57d in close_file_in_loop (fsp=0x5a74960f0840, close_type=SHUTDOWN_CLOSE) at ../../source3/smbd/files.c:1346
#5  0x00007ba75923f8ec in file_close_user_fn (fsp=0x5a74960f0840, private_data=0x7ffe0cb2bb10) at ../../source3/smbd/files.c:1475
#6  0x00007ba75923fa4e in files_forall (sconn=0x5a74945292f0, fn=0x7ba75923f8a8 <file_close_user_fn>, private_data=0x7ffe0cb2bb10)
    at ../../source3/smbd/files.c:1511
#7  0x00007ba75923f957 in file_close_user (sconn=0x5a74945292f0, vuid=3002850907) at ../../source3/smbd/files.c:1487
#8  0x00007ba7592fac84 in smbXsrv_session_logoff (session=0x5a74944ef610) at ../../source3/smbd/smbXsrv_session.c:1862
#9  0x00007ba7592f90b8 in smbXsrv_session_clear_and_logoff (session=0x5a74944ef610) at ../../source3/smbd/smbXsrv_session.c:1262
#10 0x00007ba7592fb291 in smbXsrv_session_logoff_all_callback (local_rec=0x7ffe0cb2bcc0, private_data=0x7ffe0cb2bdd0)
    at ../../source3/smbd/smbXsrv_session.c:2008
#11 0x00007ba758a7eaa9 in db_rbt_traverse_internal (db=0x5a7494523730, f=0x7ba7592fb1b1 <smbXsrv_session_logoff_all_callback>,
    private_data=0x7ffe0cb2bdd0, count=0x7ffe0cb2bd48, rw=true) at ../../lib/dbwrap/dbwrap_rbt.c:467
#12 0x00007ba758a7ec97 in db_rbt_traverse (db=0x5a7494523730, f=0x7ba7592fb1b1 <smbXsrv_session_logoff_all_callback>, private_data=0x7ffe0cb2bdd0)
    at ../../lib/dbwrap/dbwrap_rbt.c:525
#13 0x00007ba758a7b615 in dbwrap_traverse (db=0x5a7494523730, f=0x7ba7592fb1b1 <smbXsrv_session_logoff_all_callback>,
    private_data=0x7ffe0cb2bdd0, count=0x7ffe0cb2bdcc) at ../../lib/dbwrap/dbwrap.c:381
#14 0x00007ba7592fb0ab in smbXsrv_session_logoff_all (client=0x5a749452a2d0) at ../../source3/smbd/smbXsrv_session.c:1963
#15 0x00007ba759303652 in exit_server_common (how=SERVER_EXIT_NORMAL, reason=0x7ba758e8efea "NT_STATUS_CONNECTION_RESET")
    at ../../source3/smbd/server_exit.c:167
#16 0x00007ba759303a53 in smbd_exit_server_cleanly (explanation=0x7ba758e8efea "NT_STATUS_CONNECTION_RESET")
    at ../../source3/smbd/server_exit.c:247
#17 0x00007ba758dbc32d in exit_server_cleanly (reason=0x7ba758e8efea "NT_STATUS_CONNECTION_RESET") at ../../source3/lib/smbd_shim.c:113
#18 0x00007ba7592bb5a5 in smbd_server_connection_terminate_ex (xconn=0x5a74944f62a0, reason=0x7ba758e8efea "NT_STATUS_CONNECTION_RESET",
    location=0x7ba7593bd468 "../../source3/smbd/smb2_server.c:5128") at ../../source3/smbd/smb2_server.c:1762
#19 0x00007ba7592c7684 in smbd_smb2_connection_handler (ev=0x5a74944f99a0, fde=0x5a7494510210, flags=1, private_data=0x5a74944f62a0)
    at ../../source3/smbd/smb2_server.c:5128
#20 0x00007ba758fe2641 in tevent_common_invoke_fd_handler (fde=0x5a7494510210, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:142
#21 0x00007ba758fed43a in epoll_event_loop (epoll_ev=0x5a749452a240, tvalp=0x7ffe0cb2c000) at ../../lib/tevent/tevent_epoll.c:737
#22 0x00007ba758fedb1b in epoll_event_loop_once (ev=0x5a74944f99a0, location=0x7ba7593b4458 "../../source3/smbd/smb2_process.c:2031")
    at ../../lib/tevent/tevent_epoll.c:938
#23 0x00007ba758fea2c8 in std_event_loop_once (ev=0x5a74944f99a0, location=0x7ba7593b4458 "../../source3/smbd/smb2_process.c:2031")
    at ../../lib/tevent/tevent_standard.c:110
#24 0x00007ba758fe122b in _tevent_loop_once (ev=0x5a74944f99a0, location=0x7ba7593b4458 "../../source3/smbd/smb2_process.c:2031")
    at ../../lib/tevent/tevent.c:824
#25 0x00007ba758fe157c in tevent_common_loop_wait (ev=0x5a74944f99a0, location=0x7ba7593b4458 "../../source3/smbd/smb2_process.c:2031")
    at ../../lib/tevent/tevent.c:950
#26 0x00007ba758fea36a in std_event_loop_wait (ev=0x5a74944f99a0, location=0x7ba7593b4458 "../../source3/smbd/smb2_process.c:2031")
    at ../../lib/tevent/tevent_standard.c:141
#27 0x00007ba758fe161f in _tevent_loop_wait (ev=0x5a74944f99a0, location=0x7ba7593b4458 "../../source3/smbd/smb2_process.c:2031")
    at ../../lib/tevent/tevent.c:969
#28 0x00007ba7592a6572 in smbd_process (ev_ctx=0x5a74944f99a0, msg_ctx=0x5a74944f3280, sock_fd=34, interactive=false)
    at ../../source3/smbd/smb2_process.c:2031
#29 0x00005a749254b202 in smbd_accept_connection (ev=0x5a74944f99a0, fde=0x5a749452a160, flags=1, private_data=0x5a7494528b30)
    at ../../source3/smbd/server.c:1034
#30 0x00007ba758fe2641 in tevent_common_invoke_fd_handler (fde=0x5a749452a160, flags=1, removed=0x0) at ../../lib/tevent/tevent_fd.c:142
#31 0x00007ba758fed43a in epoll_event_loop (epoll_ev=0x5a749450e820, tvalp=0x7ffe0cb2c530) at ../../lib/tevent/tevent_epoll.c:737
#32 0x00007ba758fedb1b in epoll_event_loop_once (ev=0x5a74944f99a0, location=0x5a74925507a8 "../../source3/smbd/server.c:1376")
    at ../../lib/tevent/tevent_epoll.c:938
#33 0x00007ba758fea2c8 in std_event_loop_once (ev=0x5a74944f99a0, location=0x5a74925507a8 "../../source3/smbd/server.c:1376")
    at ../../lib/tevent/tevent_standard.c:110
#34 0x00007ba758fe122b in _tevent_loop_once (ev=0x5a74944f99a0, location=0x5a74925507a8 "../../source3/smbd/server.c:1376")
    at ../../lib/tevent/tevent.c:824
#35 0x00007ba758fe157c in tevent_common_loop_wait (ev=0x5a74944f99a0, location=0x5a74925507a8 "../../source3/smbd/server.c:1376")
    at ../../lib/tevent/tevent.c:950
#36 0x00007ba758fea36a in std_event_loop_wait (ev=0x5a74944f99a0, location=0x5a74925507a8 "../../source3/smbd/server.c:1376")
    at ../../lib/tevent/tevent_standard.c:141
#37 0x00007ba758fe161f in _tevent_loop_wait (ev=0x5a74944f99a0, location=0x5a74925507a8 "../../source3/smbd/server.c:1376")
    at ../../lib/tevent/tevent.c:969
#38 0x00005a749254beff in smbd_parent_loop (ev_ctx=0x5a74944f99a0, parent=0x5a74945081c0) at ../../source3/smbd/server.c:1376
#39 0x00005a749254e167 in main (argc=6, argv=0x7ffe0cb2cb18) at ../../source3/smbd/server.c:2138

I tested the following fix which has been proven to work for us, but I am not sure if there is any better approach:

--- a/source3/smbd/server_exit.c
+++ b/source3/smbd/server_exit.c
@@ -176,6 +176,17 @@ static void exit_server_common(enum server_exit_reason how,
 
 	change_to_root_user();
 
+	/* housekeeping_fn() or keepalive_fn() use the
+	   connection, so handle the event queue here
+	   before the connection is destroyed.
+	*/
+	if (tevent_common_have_events(global_event_context())) {
+		tevent_loop_allow_nesting(global_event_context());
+		(void) tevent_loop_wait(global_event_context());
+	}
+
+	change_to_root_user();
+
 	if (client != NULL) {
 		struct smbXsrv_connection *xconn_next = NULL;