Bug 13887 - SMBD leaves lots of zombie processes around
Summary: SMBD leaves lots of zombie processes around
Status: RESOLVED WONTFIX
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.9.4
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-04 18:16 UTC by Alex Richardson
Modified: 2020-12-22 09:54 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Alex Richardson 2019-04-04 18:16:24 UTC
I am using samba in QEMU (with the -net smb=/path/to/hostdir) to share directories between QEMU and the host.
It seems like samba is spawning a new process every 30 minutes (the first one at startup) and is not reaping it.

QEMU starts smbd with the following flags: 
/usr/local/opt/samba/sbin/smbd -l $TMPDIR/qemu-smb.S21YZZ -s $TMPDIR/qemu-smb.S21YZZ/smb.conf

The config file is as follows (generated from https://github.com/qemu/qemu/blob/f4b37171378b0b1782570060c3411a504375d1bc/net/slirp.c#L871):

[global]
private dir=$TMPDIR/qemu-smb.S21YZZ
interfaces=127.0.0.1
bind interfaces only=yes
pid directory=$TMPDIR/qemu-smb.S21YZZ
lock directory=$TMPDIR/qemu-smb.S21YZZ
state directory=$TMPDIR/qemu-smb.S21YZZ
cache directory=$TMPDIR/qemu-smb.S21YZZ
ncalrpc dir=$TMPDIR/qemu-smb.S21YZZ/ncalrpc
log file=$TMPDIR/qemu-smb.S21YZZ/log.smbd
smb passwd file=$TMPDIR/qemu-smb.S21YZZ/smbpasswd
security = user
map to guest = Bad User
load printers = no
printing = bsd
disable spoolss = yes
usershare max shares = 0
[qemu]
path=/path/to/host/share
read only=no
guest ok=yes
force user=alex


Is there any way I can provide more relevant information? I tried adding signal(SIGCHLD, SIG_IGN) to the code that forks() a new process in QEMU, but that didn't fix the issue. I assume smbd installs a sigchld handler so it might be a bug/race there?
Comment 1 Alex Richardson 2019-04-04 18:31:46 UTC
I changed the config file to output debug logs and I see the following output referring to a PID (36422 is the parent) that ends up as a zombie.

The following line looks like it might be related:   pid_to_procid: messaging_dgm_get_unique failed: No such file or directory

Full log:


[2019/04/04 19:17:03.001092,  4, pid=36422, effective(501, 20), real(501, 20)] ../source3/param/loadparm.c:3914(lp_load_ex)
  pm_process() returned Yes
[2019/04/04 19:17:03.001119,  7, pid=36422, effective(501, 20), real(501, 20)] ../source3/param/loadparm.c:4234(lp_servicenumber)
  lp_servicenumber: couldn't find homes
[2019/04/04 19:17:03.001142,  8, pid=36422, effective(501, 20), real(501, 20)] ../source3/param/loadparm.c:1462(add_a_service)
  add_a_service: Creating snum = 4 for IPC$
[2019/04/04 19:17:03.001156, 10, pid=36422, effective(501, 20), real(501, 20)] ../source3/param/loadparm.c:1512(hash_a_service)
  hash_a_service: hashing index 4 for service name IPC$
[2019/04/04 19:17:03.001175,  3, pid=36422, effective(501, 20), real(501, 20)] ../source3/param/loadparm.c:1621(lp_add_ipc)
  adding IPC service
[2019/04/04 19:17:03.001308,  6, pid=36422, effective(501, 20), real(501, 20)] ../source3/param/loadparm.c:2336(lp_file_list_changed)
  lp_file_list_changed()
  file /var/folders/9v/jqc8jwqj06s625gy8d8cdw8h0000gn/T/qemu-smb.T2O3ZZ/smb.conf -> /var/folders/9v/jqc8jwqj06s625gy8d8cdw8h0000gn/T/qemu-smb.T2O3ZZ/smb.conf  last mod_time: Thu Apr  4 19:16:30 2019
  
[2019/04/04 19:17:03.002051,  2, pid=36422, effective(501, 20), real(501, 20)] ../source3/lib/interface.c:345(add_interface)
  added interface lo0 ip=127.0.0.1 bcast=127.255.255.255 netmask=255.0.0.0
[2019/04/04 19:17:03.002124,  3, pid=36422, effective(501, 20), real(501, 20)] ../source3/smbd/server.c:1810(main)
  loaded services
[2019/04/04 19:17:03.002158,  5, pid=36422, effective(501, 20), real(501, 20)] ../source3/lib/util_names.c:152(init_names)
  Netbios name list:-
  my_netbios_names[0]="C0825"
[2019/04/04 19:17:03.002297,  1, pid=36422, effective(501, 20), real(501, 20)] ../source3/profile/profile_dummy.c:30(set_profile_level)
  INFO: Profiling support unavailable in this build.
[2019/04/04 19:17:03.002373, 10, pid=36422, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:175(msg_dgm_ref_destructor)
  msg_dgm_ref_destructor: refs=0x0
[2019/04/04 19:17:03.002945, 10, pid=36422, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:92(messaging_dgm_ref)
  messaging_dgm_ref: messaging_dgm_init returned Undefined error: 0
[2019/04/04 19:17:03.002974, 10, pid=36422, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:121(messaging_dgm_ref)
  messaging_dgm_ref: unique = 399598752467790900
[2019/04/04 19:17:03.003075,  5, pid=36422, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:79(smb_register_passdb)
  Attempting to register passdb backend smbpasswd
[2019/04/04 19:17:03.003106,  5, pid=36422, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:92(smb_register_passdb)
  Successfully added passdb backend 'smbpasswd'
[2019/04/04 19:17:03.003131,  5, pid=36422, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:79(smb_register_passdb)
  Attempting to register passdb backend tdbsam
[2019/04/04 19:17:03.003156,  5, pid=36422, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:92(smb_register_passdb)
  Successfully added passdb backend 'tdbsam'
[2019/04/04 19:17:03.003182,  5, pid=36422, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:155(make_pdb_method_name)
  Attempting to find a passdb backend to match tdbsam (tdbsam)
[2019/04/04 19:17:03.003203,  5, pid=36422, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:176(make_pdb_method_name)
  Found pdb backend tdbsam
[2019/04/04 19:17:03.003244,  5, pid=36422, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:187(make_pdb_method_name)
  pdb backend tdbsam has a valid init
[2019/04/04 19:17:03.003411,  5, pid=36422, effective(501, 20), real(501, 20)] ../lib/dbwrap/dbwrap.c:130(dbwrap_lock_order_lock)
  dbwrap_lock_order_lock: check lock order 1 for /var/folders/9v/jqc8jwqj06s625gy8d8cdw8h0000gn/T/qemu-smb.T2O3ZZ/smbXsrv_version_global.tdb
[2019/04/04 19:17:03.003438, 10, pid=36422, effective(501, 20), real(501, 20)] ../lib/dbwrap/dbwrap.c:116(debug_lock_order)
  lock order:  1:/var/folders/9v/jqc8jwqj06s625gy8d8cdw8h0000gn/T/qemu-smb.T2O3ZZ/smbXsrv_version_global.tdb 2:<none> 3:<none>
[2019/04/04 19:17:03.003461, 10, pid=36422, effective(501, 20), real(501, 20)] ../lib/dbwrap/dbwrap_tdb.c:61(db_tdb_log_key)
  db_tdb_log_key: Locking key 736D62587372765F7665
[2019/04/04 19:17:03.003486, 10, pid=36422, effective(501, 20), real(501, 20)] ../lib/dbwrap/dbwrap_tdb.c:145(db_tdb_fetch_locked_internal)
  db_tdb_fetch_locked_internal: Allocated locked data 0x7fbae1528130
[2019/04/04 19:17:03.003629, 10, pid=36422, effective(501, 20), real(501, 20)] ../lib/dbwrap/dbwrap_tdb.c:61(db_tdb_log_key)
  db_tdb_log_key: Unlocking key 736D62587372765F7665
[2019/04/04 19:17:03.003655,  5, pid=36422, effective(501, 20), real(501, 20)] ../lib/dbwrap/dbwrap.c:159(dbwrap_lock_order_unlock)
  dbwrap_lock_order_unlock: release lock order 1 for /var/folders/9v/jqc8jwqj06s625gy8d8cdw8h0000gn/T/qemu-smb.T2O3ZZ/smbXsrv_version_global.tdb
[2019/04/04 19:17:03.003675, 10, pid=36422, effective(501, 20), real(501, 20)] ../source3/smbd/smbXsrv_version.c:250(smbXsrv_version_global_init)
  smbXsrv_version_global_init
[2019/04/04 19:17:03.003695, 10, pid=36422, effective(501, 20), real(501, 20)] ../source3/smbd/smbXsrv_version.c:251(smbXsrv_version_global_init)
[2019/04/04 19:17:03.003708,  1, pid=36422, effective(501, 20), real(501, 20), class=rpc_parse] ../librpc/ndr/ndr.c:422(ndr_print_debug)
       &global_blob: struct smbXsrv_version_globalB
          version                  : SMBXSRV_VERSION_0 (0)
          seqnum                   : 0x00000002 (2)
          info                     : union smbXsrv_version_globalU(case 0)
          info0                    : *
              info0: struct smbXsrv_version_global0
                  db_rec                   : NULL
                  num_nodes                : 0x00000001 (1)
                  nodes: ARRAY(1)
                      nodes: struct smbXsrv_version_node0
                          server_id: struct server_id
                              pid                      : 0x0000000000008e46 (36422)
                              task_id                  : 0x00000000 (0)
                              vnn                      : 0xffffffff (4294967295)
                              unique_id                : 0x058ba8f2bd882c34 (399598752467790900)
                          min_version              : SMBXSRV_VERSION_0 (0)
                          max_version              : SMBXSRV_VERSION_0 (0)
                          current_version          : SMBXSRV_VERSION_0 (0)
[2019/04/04 19:17:03.004770,  3, pid=36422, effective(501, 20), real(501, 20)] ../source3/lib/util_procid.c:54(pid_to_procid)
  pid_to_procid: messaging_dgm_get_unique failed: No such file or directory
[2019/04/04 19:17:03.005387, 10, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:175(msg_dgm_ref_destructor)
  msg_dgm_ref_destructor: refs=0x0
[2019/04/04 19:17:03.005807, 10, pid=36424, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:175(msg_dgm_ref_destructor)
  msg_dgm_ref_destructor: refs=0x0
[2019/04/04 19:17:03.006126, 10, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:92(messaging_dgm_ref)
  messaging_dgm_ref: messaging_dgm_init returned Undefined error: 0
[2019/04/04 19:17:03.006265, 10, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:121(messaging_dgm_ref)
  messaging_dgm_ref: unique = 4235220531487419619
[2019/04/04 19:17:03.006443,  5, pid=36423, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:155(make_pdb_method_name)
  Attempting to find a passdb backend to match tdbsam (tdbsam)
[2019/04/04 19:17:03.006485, 10, pid=36424, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:92(messaging_dgm_ref)
  messaging_dgm_ref: messaging_dgm_init returned Undefined error: 0
[2019/04/04 19:17:03.006560,  5, pid=36423, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:176(make_pdb_method_name)
  Found pdb backend tdbsam
[2019/04/04 19:17:03.006665, 10, pid=36424, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm_ref.c:121(messaging_dgm_ref)
  messaging_dgm_ref: unique = 15027546810487544883
[2019/04/04 19:17:03.006713,  5, pid=36423, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:187(make_pdb_method_name)
  pdb backend tdbsam has a valid init
[2019/04/04 19:17:03.006835,  5, pid=36424, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:155(make_pdb_method_name)
  Attempting to find a passdb backend to match tdbsam (tdbsam)
[2019/04/04 19:17:03.006897,  5, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:709(messaging_register)
  Registering messaging pointer for type 794 - private_data=0x7fbae1609bd0
[2019/04/04 19:17:03.006949,  5, pid=36424, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:176(make_pdb_method_name)
  Found pdb backend tdbsam
[2019/04/04 19:17:03.007020,  5, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:709(messaging_register)
  Registering messaging pointer for type 795 - private_data=0x7fbae1609bd0
[2019/04/04 19:17:03.007084,  5, pid=36424, effective(501, 20), real(501, 20), class=passdb] ../source3/passdb/pdb_interface.c:187(make_pdb_method_name)
  pdb backend tdbsam has a valid init
[2019/04/04 19:17:03.007128,  5, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:709(messaging_register)
  Registering messaging pointer for type 796 - private_data=0x7fbae1609bd0
[2019/04/04 19:17:03.007231,  5, pid=36424, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:709(messaging_register)
  Registering messaging pointer for type 13 - private_data=0x7fbae1526cd0
[2019/04/04 19:17:03.007350,  5, pid=36424, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:709(messaging_register)
  Registering messaging pointer for type 788 - private_data=0x7fbae1526cd0
[2019/04/04 19:17:03.007461,  5, pid=36424, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:709(messaging_register)
[2019/04/04 19:17:03.007460, 10, pid=36423, effective(501, 20), real(501, 20)] ../source3/smbd/notifyd/notifyd.c:247(notifyd_send)
  Registering messaging pointer for type 785 - private_data=0x0
  notifyd_send: server_id_db_add failed: File exists
[2019/04/04 19:17:03.007619,  5, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:756(messaging_deregister)
  Deregistering messaging pointer for type 796 - private_data=0x7fbae1609bd0
[2019/04/04 19:17:03.007722, 10, pid=36422, effective(501, 20), real(501, 20)] ../source3/smbd/server.c:593(cleanupd_init)
[2019/04/04 19:17:03.007734,  5, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:756(messaging_deregister)
  Deregistering messaging pointer for type 795 - private_data=0x7fbae1609bd0
  cleanupd_init: Started cleanupd pid=36424
[2019/04/04 19:17:03.007841,  5, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:756(messaging_deregister)
  Deregistering messaging pointer for type 794 - private_data=0x7fbae1609bd0
[2019/04/04 19:17:03.008069,  5, pid=36422, effective(501, 20), real(501, 20)] ../source3/lib/messages.c:709(messaging_register)
  Registering messaging pointer for type 789 - private_data=0x7fbae15285a0
[2019/04/04 19:17:03.008200, 10, pid=36422, effective(501, 20), real(501, 20), class=registry] ../source3/registry/reg_backend_db.c:770(regdb_init)
[2019/04/04 19:17:03.008114, 10, pid=36423, effective(501, 20), real(501, 20)] ../source3/lib/messages_dgm.c:1436(messaging_dgm_send)
  regdb_init: registry db openend. refcount reset (1)
  messaging_dgm_send: Sending message to 36422
Comment 2 David Mulder 2019-07-18 16:28:08 UTC
This bug is still present in 4.10.
Comment 3 Olaf Dietsche 2019-08-01 19:14:20 UTC
I experience the same behaviour.

My environment is a Linux host (Ubuntu 18.04) and a Windows 10 guest running in qemu. Samba package's version is 2:4.7.6+dfsg~ubuntu-0ubuntu2.11. The only difference seems to be the frequency, in my case it is spawning a new process every 15 minutes.

When I run smbd standalone (same parameters, same config file, no qemu), it does not show this behaviour.

So, it seems to be connected to qemu and/or Windows 10. Maybe when started by qemu, smbd doesn't detach itself into the background. The parent process of smbd is still qemu in this case.

One difference in the logs is a call to remove_child_pid():

[2019/08/01 20:38:07.096390, 10, pid=29238, effective(0, 0), real(0, 0)] ../source3/lib/messages_dgm_ref.c:157(msg_dgm_ref_destructor)
  msg_dgm_ref_destructor: refs=(nil)
[2019/08/01 20:38:07.096914, 10, pid=29238, effective(0, 0), real(0, 0)] ../source3/lib/messages_dgm_ref.c:79(messaging_dgm_ref)
  messaging_dgm_ref: messaging_dgm_init returned Success
[2019/08/01 20:38:07.096986, 10, pid=29238, effective(0, 0), real(0, 0)] ../source3/lib/messages_dgm_ref.c:108(messaging_dgm_ref)
  messaging_dgm_ref: unique = 7539815965325289560
[2019/08/01 20:38:07.097171, 10, pid=29238, effective(0, 0), real(0, 0)] ../source3/lib/messages_dgm.c:1520(messaging_dgm_wipe)
  messaging_dgm_cleanup(29103) returned Resource temporarily unavailable
[2019/08/01 20:38:07.097223, 10, pid=29238, effective(0, 0), real(0, 0)] ../source3/lib/messages_dgm.c:1520(messaging_dgm_wipe)
  messaging_dgm_cleanup(29106) returned Resource temporarily unavailable
[2019/08/01 20:38:07.097271, 10, pid=29238, effective(0, 0), real(0, 0)] ../source3/lib/messages_dgm.c:1520(messaging_dgm_wipe)
  messaging_dgm_cleanup(29105) returned Resource temporarily unavailable
[2019/08/01 20:38:07.097320, 10, pid=29238, effective(0, 0), real(0, 0)] ../source3/lib/messages.c:1093(mess_parent_dgm_cleanup)
  messaging_dgm_wipe returned ok
[2019/08/01 20:38:07.097405, 10, pid=29238, effective(0, 0), real(0, 0)] ../source3/lib/messages_dgm_ref.c:157(msg_dgm_ref_destructor)
  msg_dgm_ref_destructor: refs=(nil)
[2019/08/01 20:38:07.099693,  2, pid=29103, effective(0, 0), real(0, 0)] ../source3/smbd/server.c:807(remove_child_pid)
  Could not find child 29238 -- ignoring

I don't see this call, when smbd was started by qemu.
Comment 4 Björn Jacke 2020-12-19 21:45:08 UTC
this looks like a bad integration of Samba into qemu. As you can see Bugzilla is not the right place to get this solved. You or the Quemu people might want to consult a community or commercial support option as listed at https://www.samba.org/samba/support/ to get this solved.
Comment 5 Alex Richardson 2020-12-22 09:54:24 UTC
This report is not "INVALID". I've changed to resolution to the actual "WONTFIX".