I'm running samba in "active directory domain controller" role. I have the following /etc/nsswitch.conf setup related to samba: # grep winbind /etc/nsswitch.conf |grep -v ^# group: files winbind passwd: files winbind This is a non-pam system. After upgrading samba-4.17.10 to samba-4.18 (tested samba-4.18.6 and samba-4.18.5) I noticed that sendmail (8.17.1.9) started crashing: sendmail[5498]: segfault at 563b95c2ad84 ip 00007f17e02c686a sp 00007ffcecfff970 error 4 in libc.so.6[7f17e0254000+155000] Code: cc f9 ff 66 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 85 ff 0f 84 bf 00 00 00 55 48 8d 77 f0 53 48 83 ec 18 48 8b 1d 8e a5 13 00 <48> 8b 47 f8 64 8b 2b a8 02 75 5b 48 8b 15 1c a5 13 00 64 48 83 3a Before the crash, sendmail complains that fd 0 is not open: SYSERR(root): fill_fd: disconnect: fd 0 not open: Bad file descriptor 1: fl=0x8001, mode=20666: CHR: dev=0/6, ino=9218, nlink=1, u/gid=0/0, size=0 2: fl=0x8001, mode=20666: CHR: dev=0/6, ino=9218, nlink=1, u/gid=0/0, size=0 3: fl=0x2, mode=140777: SOCK localhost->[[UNIX: /dev/log]] Reverting to samba-4.17.10 fixes the problem, same if I remove "winbind" from "passwd". Strace with samba-4.18: 4405 close(0) = 0 4405 openat(AT_FDCWD, "/dev/null", O_RDONLY) = 0 4405 close(0) = 0 4405 openat(AT_FDCWD, "/dev/null", O_WRONLY) = 0 4405 dup2(0, 1) = 1 4405 dup2(0, 2) = 2 4405 close(0) = 0 4405 newfstatat(0, "", 0x7ffd3fab3ce0, AT_EMPTY_PATH) = -1 EBADF (Bad file descriptor) Strace with samba-4.17: 6047 close(0) = 0 6047 openat(AT_FDCWD, "/dev/null", O_RDONLY) = 0 6047 close(4) = 0 6047 openat(AT_FDCWD, "/dev/null", O_WRONLY) = 4 6047 dup2(4, 1) = 1 6047 dup2(4, 2) = 2 6047 close(4) = 0 6047 newfstatat(0, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_EMPTY_PATH) = 0 6047 newfstatat(1, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_EMPTY_PATH) = 0 6047 newfstatat(2, "", {st_mode=S_IFCHR|0666, st_rdev=makedev(0x1, 0x3), ...}, AT_EMPTY_PATH) = 0 As you can see, under 4.18 fd=0 (which sendmail first closes and then re-opens as /dev/null) gets closed soon after it is opened. For samba-4.17, fd=4 gets closed instead. I believe the code making these syscalls comes from sendmail/main.c, function "disconnect": sm_io_reopen(SmFtStdio, SM_TIME_DEFAULT, SM_PATH_DEVNULL, SM_IO_RDONLY, NULL, smioin) (...) fd = open(SM_PATH_DEVNULL, O_WRONLY, 0666); dup2(fd, STDOUT_FILENO); dup2(fd, STDERR_FILENO); close(fd); Where "sm_io_reopen" comes from libsm/fopen.c and does: if ((ioflags = sm_flags(flags)) == 0) { (void) sm_io_close(fp, timeout); return NULL; } (...) (*fp2->f_open)(fp2, info, flags, rpool); sm_io_close is a wrapper to close(2), same for f_open, both coming from I think libsm/stdio.c. I have not yet identified what calls close(4) (or close(0)). I will try to identify the offending commit over the weekend. It seems we only have a limited number changes, assuming https://git.samba.org/?p=samba.git;a=history;f=nsswitch;hb=refs/heads/v4-18-stable is the correct place to look.
Also replacing /usr/lib64/libnss_winbind.so.2 with the version from samba-4.17.10 also fixes the problem, as expected.
Reverting "nsswitch: leverage TLS if available in favour over global locking" [1] which also requires "nsswitch: avoid calling pthread_getspecific() on an uninitialized key" [2] to be reverted to apply cleanly, fixed the problem for me. No more crash and no more warning about "fd 0 not open: Bad file descriptor". Tested on both x86 and x86-64. [1] https://git.samba.org/?p=samba.git;a=commitdiff;h=642a4452ce5b3333c50e41e54bc6ca779686ecc3 [2] https://git.samba.org/?p=samba.git;a=commitdiff;h=7545e2c77b69fc57e436e3ed298fdb68033ce49f
I also found the following line in the strace output: writev(2, [{iov_base="free(): invalid next size (fast)", iov_len=32}, {iov_base="\n", iov_len=1}], 2) = 33 This suggests a memory corruption. Indeed, running sendmail under valgrind with the original winbind from samba-4.18.6 produces: ==2054== Invalid read of size 1 ==2054== at 0x4ADB638: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f310 is 32 bytes before a block of size 480 in arena "client" ==2054== ==2054== Invalid write of size 8 ==2054== at 0x4ADB657: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f210 is 32 bytes inside a block of size 256 free'd ==2054== at 0x484308B: free (vg_replace_malloc.c:974) ==2054== by 0x50C6F86: initgroups (initgroups.c:212) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Block was alloc'd at ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x50C6F21: initgroups (initgroups.c:200) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== ==2054== Invalid write of size 8 ==2054== at 0x4ADB66B: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f290 is 160 bytes inside a block of size 256 free'd ==2054== at 0x484308B: free (vg_replace_malloc.c:974) ==2054== by 0x50C6F86: initgroups (initgroups.c:212) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Block was alloc'd at ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x50C6F21: initgroups (initgroups.c:200) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== ==2054== Invalid write of size 4 ==2054== at 0x4ADB677: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f310 is 32 bytes before a block of size 480 in arena "client" ==2054== ==2054== Invalid write of size 4 ==2054== at 0x4ADB682: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f150 is 16 bytes before a block of size 69 alloc'd ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x4003911: malloc (rtld-malloc.h:56) ==2054== by 0x4003911: _dl_exception_create_format (dl-exception.c:157) ==2054== by 0x400A4C7: _dl_lookup_symbol_x (dl-lookup.c:793) ==2054== by 0x5145E5C: do_sym (dl-sym.c:146) ==2054== by 0x507AF03: dlsym_doit (dlsym.c:40) ==2054== by 0x4001488: _dl_catch_exception (dl-catch.c:237) ==2054== by 0x40015AE: _dl_catch_error (dl-catch.c:256) ==2054== by 0x507A906: _dlerror_run (dlerror.c:138) ==2054== by 0x507AF9B: dlsym_implementation (dlsym.c:54) ==2054== by 0x507AF9B: dlsym@@GLIBC_2.34 (dlsym.c:68) ==2054== by 0x55E1DF6: winbind_open_pipe_sock (in /usr/lib64/libnss_winbind.so.2) ==2054== by 0x55E1FD3: winbind_write_sock (in /usr/lib64/libnss_winbind.so.2) ==2054== by 0x55E21C3: winbindd_send_request.part.0 (in /usr/lib64/libnss_winbind.so.2) ==2054== ==2054== Invalid write of size 4 ==2054== at 0x4ADB69E: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f3d0 is 160 bytes inside a block of size 472 free'd ==2054== at 0x484308B: free (vg_replace_malloc.c:974) ==2054== by 0x5069619: _IO_deallocate_file (libioP.h:863) ==2054== by 0x5069619: fclose@@GLIBC_2.2.5 (iofclose.c:74) ==2054== by 0x5130E11: _nss_files_initgroups_dyn (files-initgroups.c:126) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== Block was alloc'd at ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x5069F9A: __fopen_internal (iofopen.c:65) ==2054== by 0x5129BFC: __nss_files_fopen (nss_files_fopen.c:27) ==2054== by 0x5130BF2: _nss_files_initgroups_dyn (files-initgroups.c:36) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== ==2054== Invalid read of size 8 ==2054== at 0x4ADB6A9: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f350 is 32 bytes inside a block of size 472 free'd ==2054== at 0x484308B: free (vg_replace_malloc.c:974) ==2054== by 0x5069619: _IO_deallocate_file (libioP.h:863) ==2054== by 0x5069619: fclose@@GLIBC_2.2.5 (iofclose.c:74) ==2054== by 0x5130E11: _nss_files_initgroups_dyn (files-initgroups.c:126) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== Block was alloc'd at ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x5069F9A: __fopen_internal (iofopen.c:65) ==2054== by 0x5129BFC: __nss_files_fopen (nss_files_fopen.c:27) ==2054== by 0x5130BF2: _nss_files_initgroups_dyn (files-initgroups.c:36) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== ==2054== Invalid read of size 8 ==2054== at 0x4ADB6B6: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f410 is 224 bytes inside a block of size 472 free'd ==2054== at 0x484308B: free (vg_replace_malloc.c:974) ==2054== by 0x5069619: _IO_deallocate_file (libioP.h:863) ==2054== by 0x5069619: fclose@@GLIBC_2.2.5 (iofclose.c:74) ==2054== by 0x5130E11: _nss_files_initgroups_dyn (files-initgroups.c:126) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== Block was alloc'd at ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x5069F9A: __fopen_internal (iofopen.c:65) ==2054== by 0x5129BFC: __nss_files_fopen (nss_files_fopen.c:27) ==2054== by 0x5130BF2: _nss_files_initgroups_dyn (files-initgroups.c:36) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== ==2054== Invalid write of size 8 ==2054== at 0x4ADB6C6: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f350 is 32 bytes inside a block of size 472 free'd ==2054== at 0x484308B: free (vg_replace_malloc.c:974) ==2054== by 0x5069619: _IO_deallocate_file (libioP.h:863) ==2054== by 0x5069619: fclose@@GLIBC_2.2.5 (iofclose.c:74) ==2054== by 0x5130E11: _nss_files_initgroups_dyn (files-initgroups.c:126) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== Block was alloc'd at ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x5069F9A: __fopen_internal (iofopen.c:65) ==2054== by 0x5129BFC: __nss_files_fopen (nss_files_fopen.c:27) ==2054== by 0x5130BF2: _nss_files_initgroups_dyn (files-initgroups.c:36) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== ==2054== Invalid write of size 8 ==2054== at 0x4ADB6D7: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f410 is 224 bytes inside a block of size 472 free'd ==2054== at 0x484308B: free (vg_replace_malloc.c:974) ==2054== by 0x5069619: _IO_deallocate_file (libioP.h:863) ==2054== by 0x5069619: fclose@@GLIBC_2.2.5 (iofclose.c:74) ==2054== by 0x5130E11: _nss_files_initgroups_dyn (files-initgroups.c:126) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== Block was alloc'd at ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x5069F9A: __fopen_internal (iofopen.c:65) ==2054== by 0x5129BFC: __nss_files_fopen (nss_files_fopen.c:27) ==2054== by 0x5130BF2: _nss_files_initgroups_dyn (files-initgroups.c:36) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== ==2054== Invalid write of size 8 ==2054== at 0x4ADB692: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f1a0 is 64 bytes inside a block of size 69 alloc'd ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x4003911: malloc (rtld-malloc.h:56) ==2054== by 0x4003911: _dl_exception_create_format (dl-exception.c:157) ==2054== by 0x400A4C7: _dl_lookup_symbol_x (dl-lookup.c:793) ==2054== by 0x5145E5C: do_sym (dl-sym.c:146) ==2054== by 0x507AF03: dlsym_doit (dlsym.c:40) ==2054== by 0x4001488: _dl_catch_exception (dl-catch.c:237) ==2054== by 0x40015AE: _dl_catch_error (dl-catch.c:256) ==2054== by 0x507A906: _dlerror_run (dlerror.c:138) ==2054== by 0x507AF9B: dlsym_implementation (dlsym.c:54) ==2054== by 0x507AF9B: dlsym@@GLIBC_2.34 (dlsym.c:68) ==2054== by 0x55E1DF6: winbind_open_pipe_sock (in /usr/lib64/libnss_winbind.so.2) ==2054== by 0x55E1FD3: winbind_write_sock (in /usr/lib64/libnss_winbind.so.2) ==2054== by 0x55E21C3: winbindd_send_request.part.0 (in /usr/lib64/libnss_winbind.so.2) ==2054== ==2054== Invalid write of size 4 ==2054== at 0x4ADB68A: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x559f11c is 0 bytes after a block of size 12 alloc'd ==2054== at 0x48407C4: malloc (vg_replace_malloc.c:431) ==2054== by 0x55E1599: get_wb_thread_ctx (in /usr/lib64/libnss_winbind.so.2) ==2054== by 0x55E1CDC: winbindd_request_response (in /usr/lib64/libnss_winbind.so.2) ==2054== by 0x55E3A5F: _nss_winbind_initgroups_dyn (in /usr/lib64/libnss_winbind.so.2) ==2054== by 0x50C6BEF: internal_getgrouplist (initgroups.c:101) ==2054== by 0x50C6F4A: initgroups (initgroups.c:205) ==2054== by 0x17437C: include (in /usr/sbin/sendmail) ==2054== by 0x11DEC2: forward (in /usr/sbin/sendmail) ==2054== by 0x176586: recipient (in /usr/sbin/sendmail) ==2054== by 0x164B31: readqf (in /usr/sbin/sendmail) ==2054== by 0x16784C: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== ==2054== Invalid free() / delete / delete[] / realloc() ==2054== at 0x484308B: free (vg_replace_malloc.c:974) ==2054== by 0x4ADB6B5: err_delete_thread_state (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B258C3: init_thread_stop.part.0 (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B25B70: OPENSSL_thread_stop (in /usr/lib64/libcrypto.so.3) ==2054== by 0x4B24FBC: OPENSSL_cleanup (in /usr/lib64/libcrypto.so.3) ==2054== by 0x5034C93: __run_exit_handlers (exit.c:111) ==2054== by 0x5034DC9: exit (exit.c:141) ==2054== by 0x119F5B: finis (in /usr/sbin/sendmail) ==2054== by 0x167AA4: doworklist (in /usr/sbin/sendmail) ==2054== by 0x17E538: smtp_data (in /usr/sbin/sendmail) ==2054== by 0x182206: smtp (in /usr/sbin/sendmail) ==2054== by 0x11742F: main (in /usr/sbin/sendmail) ==2054== Address 0x51ca6a0 is 0 bytes inside data symbol "_IO_2_1_stderr_"
Another observation - adding: @@ -28,6 +28,9 @@ #include "winbind_client.h" #include <assert.h> +#undef HAVE_PTHREAD_H +#undef HAVE_PTHREAD + #ifdef HAVE_PTHREAD_H #include <pthread.h> #endif to wb_common.c also seems to fix the issue. Note however that the current code is broken and does not compile in this situation - function "get_wb_thread_ctx" is not inside the "#ifdef HAVE_PTHREAD" block. Moving "#endif" fixes the problem. Another thing I notice is the inconsistency in "#ifdef HAVE_PTHREAD_H" vs "#ifdef HAVE_PTHREAD". In particular, "HAVE_PTHREAD_H" is used inside the winbind_destructor function, where I think "HAVE_PTHREAD" should be used instead? Once done, I can only add "#undef HAVE_PTHREAD".
sendmail isn't the only application affected (by crashing) by "nsswitch: leverage TLS if available in favour over global locking". I did a git bisect on samba and ended up with the above commit but I haven't been able to pinpoint exactly why the changes introduced makes zabbix crash. See https://support.zabbix.com/browse/ZBX-22658
Created attachment 18079 [details] Minimalist patch for samba-4.18 to work around the bug Minimalist patch for samba-4.18 to *work around* the bug by adding "#undef HAVE_PTHREAD". For this to work, it fixes the two other code issues mentioned in https://bugzilla.samba.org/show_bug.cgi?id=15464#c4 - it moved "#endif" down to also cover the get_wb_thread_ctx function and replaced HAVE_PTHREAD_H with HAVE_PTHREAD inside the winbind_destructor function.
Comment on attachment 18079 [details] Minimalist patch for samba-4.18 to work around the bug This will introduce a thread locking problem. 642a4452ce5b3333c50e41e54bc6ca779686ecc3 needs to be reverted completely in order to work around the problem
(In reply to Kacper from comment #5) In both cases openssl is involved and that generates the first invalid writes, so I guess the base of the problem is located there, it's just triggered by nss_winbind bringing in pthread. The glibc version may also be relevant
(In reply to Stefan Metzmacher from comment #8) pthread_key_create() can return 0 as a valid key. And this in openssl crypto/err/err.c static void err_delete_thread_state(void *unused) { ERR_STATE *state = CRYPTO_THREAD_get_local(&err_thread_local); if (state == NULL) return; CRYPTO_THREAD_set_local(&err_thread_local, NULL); OSSL_ERR_STATE_free(state); } doesn't check if set_err_thread_local is valid. It means ERR_STATE *state can be non-NULL coming from somewhere else.
Created attachment 18080 [details] Revert 642a4452ce5b3333c50e41e54bc6ca779686ecc3 and 7545e2c77b69fc57e436e3ed298fdb68033ce49f
(In reply to Stefan Metzmacher from comment #9) I had a little bit time to look at the problem today and I think I have made some progress in debugging. First, I checked that err_thread_local in the openssl code is normally a non-zero value, like 6 or 7. However, with the NSS library from samba-4.18, it becomes 0, as this is what pthread_key_create in CRYPTO_THREAD_init_local provides. Next, I discovered that removing this call from wb_thread_ctx_initialize(): ret = pthread_atfork(NULL, NULL, wb_atfork_child); or removing pthread_key_delete(wb_global_ctx.key) from wb_atfork_child() fixes the problem. Knowing this is fork related, I was able to write a simple code to reproduce *the behavior* (not *the bug*): --- cut here --- #include <sys/types.h> #include <grp.h> #include <stdio.h> #include <unistd.h> #include <pthread.h> int main(void) { int rv; pid_t pid; pthread_key_t key1a, key1b; pthread_key_t key2; pthread_key_t key3; printf("Starting.\n"); rv = initgroups("root", 0); printf("initgroups: %d\n", rv); pthread_key_create(&key1a, NULL); pthread_key_create(&key1b, NULL); printf("key1a=%d, key1b=%d\n", key1a, key1b); pid = fork(); pthread_key_create(&key2, NULL); pthread_key_create(&key3, NULL); printf("Hello after fork (%s), pid=%d, key2=%d, key3=%d\n", pid?"parent":"child", pid, key2, key3); } --- cut here --- **** With libnss_winbind.so.2 from samba-4.17 I get: Starting. initgroups: 0 key1a=0, key1b=1 Hello after fork (parent), pid=5058, key2=2, key3=3 Hello after fork (child), pid=0, key2=2, key3=3 **** With libnss_winbind.so.2 from samba-4.18 I get: Starting. initgroups: 0 key1a=1, key1b=2 Hello after fork (parent), pid=5117, key2=3, key3=4 Hello after fork (child), pid=0, key2=0, key3=3 So, with the nss from samba-4.17 we get 0+1 and 2+3 (parent) / 2+3 (child) allocated. With the nss from samba-4.18 we get 1+2 (where 0 is allocated in the nss library) and then 3+4 (parent) and 0+3 (child) allocated, 0+3 as 0 has been released. Unfortunately I may not have additional time to look at this more today, so sharing what I have learned so far.
(In reply to Krzysztof Olędzki from comment #11) Alright, here is a potential fix: --- 1/nsswitch/wb_common.c 2023-09-02 13:38:34.506064173 -0700 +++ 2/nsswitch/wb_common.c 2023-09-06 23:43:49.393985656 -0700 @@ -66,6 +66,12 @@ struct winbindd_context *ctx = NULL; int ret; + if (!wb_global_ctx.initialized) { + return; + } + + wb_global_ctx.initialized = false; + ctx = (struct winbindd_context *)pthread_getspecific(wb_global_ctx.key); if (ctx == NULL) { return; Without this, every time wb_atfork_child() is called, it calls pthread_key_delete with the original wb_global_ctx.key even if it has been already deleted and the same key is re-used in other place.
Created attachment 18081 [details] b15464-testcase.c
Created attachment 18082 [details] Potential fix
(In reply to Krzysztof Olędzki from comment #13) Buggy: # ./b15464-testcase 18303: k1=1 18304: Hello after fork, k1=1, k2=0 18305: Hello after fork2, k1=1, k2=0, k3=0 FAIL Fixed (or samba-4.17): # ./b15464-testcase 18310: k1=0 18311: Hello after fork, k1=0, k2=1 18312: Hello after fork2, k1=0, k2=1, k3=2 OK
(In reply to Krzysztof Olędzki from comment #15) Correct output for the "samba-4.18-fixed" case - the one above listed as "fixed" is from samba-4.17 - w/o Thread Local Storage (TLS): # ./b15464-testcase 18509: k1=1 18510: Hello after fork, k1=1, k2=0 18511: Hello after fork2, k1=1, k2=0, k3=2 OK
(In reply to Krzysztof Olędzki from comment #12) Great detective work, thanks! Do you want to provide the fix as git format-patch output? If so please also see: https://wiki.samba.org/index.php/Contribute (skip step 2.2.3 Fork the Samba repo (just until we get to know you)) and jump to https://wiki.samba.org/index.php/Samba_on_GitLab#Other_Samba_developers Note that they might an additional bug regarding leaking of winbindd_context structures and there related socket file descriptors. But that's a more complex task and should not hold us back from pushing the fix that causes invalid writes of unrelated code.
Ok, I tried to more complete fix, that also tries to avoid fd and memory leaks. Only compile tested, see https://gitlab.com/samba-team/samba/-/merge_requests/3259 It would be great if someone could test this in a real setup, it should also fix the problem of this bug and I'd assume to see an output like this: # ./b15464-testcase 18509: k1=1 18510: Hello after fork, k1=1, k2=2 18511: Hello after fork2, k1=1, k2=2, k3=3 OK
(In reply to Stefan Metzmacher from comment #18) It seems like a much larger change, so I have not been able to look at the code yet, just compiled and tested. Sadly, it triggers an assertion: b15464-testcase: ../../nsswitch/wb_common.c:99: wb_atfork_child: Assertion `ctx_ptr == NULL' failed. For now, do you still want me to provide the git-patch? I wonder if it make sense to fix the bug in a simple manner for the 4.18 and 4.19 branches, and then work on a more comprehensive change for the 4.20, so we have more time for testing? Two more comments: 1. for the "fix build without HAVE_PTHREAD" do should we also rename HAVE_PTHREAD_H to HAVE_PTHREAD inside winbind_destructor in that same patch? Happy to take care of it if you want, BTW. 2. I'm little bit concerned about "leaking" one "atfork handler" form a process that accessed libnss_winbind. These handles seem to to be stored in a linked list and require a malloc. There seems to be __unregister_atfork but I'm not sure once this gets triggered.
(In reply to Krzysztof Olędzki from comment #19) I pushed a fixed version. The minimal fix for the problem is this: https://gitlab.com/samba-team/samba/-/merge_requests/3259/diffs?commit_id=8162c1b0cccc29d6b76567e3e2c41f985ada0cbe It just avoids calling pthread_key_delete() in wb_atfork_child(). And if wb_atfork_child was registered we also know that pthread_key_create() was called with success.
(In reply to Krzysztof Olędzki from comment #19) A dlclose() deinstalls the atfork handlers...
(In reply to Stefan Metzmacher from comment #20) The patches in https://gitlab.com/samba-team/samba/-/merge_requests/3259 work without issues as far as Zabbix is concerned. Any chance we can get the final fixes in for 4.18.7?
(In reply to Stefan Metzmacher from comment #20) Thanks! Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl> Also, if it matters: Reported-by: Krzysztof Piotr Oledzki <ole@ans.pl>
(In reply to Kacper from comment #22) Thanks! I added a regression test based on your reproducer, see: https://gitlab.com/samba-team/samba/-/merge_requests/3259/diffs?commit_id=30568253df96514d04931a7adbd8a3ab5aaa17ac I hope someone will review the changes in order to get it into 4.18.7...
(In reply to Stefan Metzmacher from comment #24) Thanks! If we want a more general regression test (not just the PoC for the BUG) then in addition to: if (k1 == k2 || k2 == k3) we probably also want to cover k1 == k3: if (k1 == k2 || k2 == k3 || k1 == k3)
This bug was referenced in samba master: 62af25d44e542548d8cdecb061a6001e0071ee76 4faf806412c4408db25448b1f67c09359ec2f81f 836823e5047d0eb18e66707386ba03b812adfaf8 91b30a7261e6455d3a4f31728c23e4849e3945b9 4af3faace481d23869b64485b791bdd43d8972c5
Created attachment 18103 [details] Patches for v4-19-test
Created attachment 18104 [details] Patches for v4-18-test
Re-assigned to Jule for inclusion in 4.18.next, 4.19.next.
Pushed to autobuild-v4-{19,18}-test.
This bug was referenced in samba v4-19-test: 340b7fd1eec58ccbbfbcf706829b3a8593700cab 61f6f46b26b5207fb411c2d4d4734c3fed0f88a7 9c10f828dfbf44ec09a2ddf9d98bc5248bf5cf22 7d04c32ed7eaacfa7e233a7cc141344041c20fc5 374ba0d2c9a32ade701d7cdd25034692fe055108
This bug was referenced in samba v4-18-test: cb71db6827f2575799d65c8a3560e1748a389889 0ebaac2afe94cf09599970962c66a7cc2761625c 5b9b8b315821c429ecfcb9153aa5308e3c9f5086 3d8e8ed15942374939c95384b5cd03b0162000ad 82d6f8a6ce3918b51a9422101823328084a27ffa
Closing out bug report. Thanks!
This bug was referenced in samba v4-18-stable (Release samba-4.18.7): cb71db6827f2575799d65c8a3560e1748a389889 0ebaac2afe94cf09599970962c66a7cc2761625c 5b9b8b315821c429ecfcb9153aa5308e3c9f5086 3d8e8ed15942374939c95384b5cd03b0162000ad 82d6f8a6ce3918b51a9422101823328084a27ffa
This bug was referenced in samba v4-19-stable (Release samba-4.19.2): 340b7fd1eec58ccbbfbcf706829b3a8593700cab 61f6f46b26b5207fb411c2d4d4734c3fed0f88a7 9c10f828dfbf44ec09a2ddf9d98bc5248bf5cf22 7d04c32ed7eaacfa7e233a7cc141344041c20fc5 374ba0d2c9a32ade701d7cdd25034692fe055108