Bug 14640 - socket_wrapper 1.3.3 should be backported in order to fix deadlocks in the tfork test
Summary: socket_wrapper 1.3.3 should be backported in order to fix deadlocks in the tf...
Status: ASSIGNED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Test infrastructure (show other bugs)
Version: 4.14.0rc2
Hardware: All All
: P5 critical (vote)
Target Milestone: 4.14
Assignee: Andreas Schneider
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2021-02-16 16:32 UTC by Stefan Metzmacher
Modified: 2021-02-17 14:38 UTC (History)
2 users (show)

See Also:


Attachments
Patch for v4-14-test (62.38 KB, patch)
2021-02-16 16:32 UTC, Stefan Metzmacher
no flags Details
Patch for v4-13-test (62.38 KB, patch)
2021-02-16 16:32 UTC, Stefan Metzmacher
no flags Details
Patches for v4-12-test (89.70 KB, patch)
2021-02-16 16:33 UTC, Stefan Metzmacher
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Stefan Metzmacher 2021-02-16 16:32:18 UTC
Created attachment 16454 [details]
Patch for v4-14-test

From time to time we see deadlocks on socket_reset_mutex in combination
with forking.

These problems should be fixed in socket_wrapper 1.3.2.
Comment 1 Stefan Metzmacher 2021-02-16 16:32:44 UTC
Created attachment 16455 [details]
Patch for v4-13-test
Comment 2 Stefan Metzmacher 2021-02-16 16:33:13 UTC
Created attachment 16456 [details]
Patches for v4-12-test
Comment 3 Stefan Metzmacher 2021-02-17 11:58:34 UTC
We'll need socket_wrapper 1.3.3
Comment 4 Stefan Metzmacher 2021-02-17 14:38:12 UTC
The problem with 1.3.2 is this:

   #7 abort + 0x12b [ip=0x7f14fb670859] [sp=0x7fffd08856f0]
   #8 _swrap_mutex_lock + 0x102 [ip=0x7f14fc207a7d] [sp=0x7fffd0885820]
   #9 swrap_sendmsg_before + 0xd0 [ip=0x7f14fc212f0e] [sp=0x7fffd0885880]
   #10 swrap_write + 0x129 [ip=0x7f14fc214ca6] [sp=0x7fffd0885920]
   #11 write + 0x3b [ip=0x7f14fc214d8c] [sp=0x7fffd0885a50]
   #12 swrap_pcap_dump_packet + 0xc5 [ip=0x7f14fc20ca19] [sp=0x7fffd0885a90]
   #13 swrap_accept + 0x821 [ip=0x7f14fc20d9e2] [sp=0x7fffd0885b00]
   #14 accept + 0x3d [ip=0x7f14fc20db26] [sp=0x7fffd0886050]
   #15 prefork_listen_accept_handler + 0x1c0 [ip=0x7f14fbc4e06f] [sp=0x7fffd0886090]
   #16 tevent_common_invoke_fd_handler + 0x118 [ip=0x7f14fbcc3219] [sp=0x7fffd0886180]
   #17 epoll_event_loop + 0x3a9 [ip=0x7f14fbccf785] [sp=0x7fffd08861d0]
   #18 epoll_event_loop_once + 0x13c [ip=0x7f14fbccfe9f] [sp=0x7fffd0886230]
   #19 std_event_loop_once + 0x6f [ip=0x7f14fbccc0da] [sp=0x7fffd0886280]
   #20 _tevent_loop_once + 0x126 [ip=0x7f14fbcc20cd] [sp=0x7fffd08862c0]

It happens with a stale fd closed via __close_nocancel() in nss_host. 
While socket() is a weak symbol in libc.so.6, so swrap_socket can be injected
into the resolver code in libc.so.6, but the socket is closed with __close_nocancel, which is not a weak symbol in libc.so.6, and it's not
possible to catch the close of the fd and it remains stale in the
socket_wrapper table.