One a ctdb-samba cluster, when an internal ctdb network interface is brought down, sometimes notifyd is seen to crash like this: (gdb) bt #0 0x00007f2c471765f7 in raise () from /lib64/libc.so.6 #1 0x00007f2c47177ce8 in abort () from /lib64/libc.so.6 #2 0x00007f2c48ad6beb in dump_core () at ../source3/lib/dumpcore.c:322 #3 0x00007f2c48ac9fe7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814 #4 0x00007f2c4afbb57f in smb_panic (why=why@entry=0x7f2c4b00254a "internal error") at ../lib/util/fault.c:166 #5 0x00007f2c4afbb796 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83 #6 sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94 #7 <signal handler called> #8 dbwrap_traverse_read (db=0x0, f=f@entry=0x7f2c4aabe210 <notifyd_db_del_syswatches>, private_data=private_data@entry=0x0, count=count@entry=0x0) at ../lib/dbwrap/dbwrap.c:361 #9 0x00007f2c4aabbc40 in notifyd_peer_destructor (p=p@entry=0x7f2c4c9a8e60) at ../source3/smbd/notifyd/notifyd.c:1249 #10 0x00007f2c47714e80 in _talloc_free_internal (location=<optimized out>, ptr=<optimized out>) at ../talloc.c:1046 #11 _talloc_free (ptr=0x7f2c4c9a8e60, location=0x7f2c4ac73ac0 "../source3/smbd/notifyd/notifyd.c:1154") at ../talloc.c:1647 #12 0x00007f2c4aabcd08 in notifyd_clean_peers_next (subreq=<optimized out>) at ../source3/smbd/notifyd/notifyd.c:1154 #13 0x00007f2c4750ab4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7f2c4c998df0) at ../tevent_timed.c:341 #14 0x00007f2c48adf3f9 in run_events_poll (ev=0x7f2c4c998df0, pollrtn=0, pfds=0x7f2c4c9a7f50, num_pfds=4) at ../source3/lib/events.c:199 #15 0x00007f2c48adf5f0 in s3_event_loop_once (ev=0x7f2c4c998df0, location=<optimized out>) at ../source3/lib/events.c:326 #16 0x00007f2c4750640d in _tevent_loop_once (ev=ev@entry=0x7f2c4c998df0, location=location@entry=0x7f2c4750c5c5 "../tevent_req.c:256") at ../tevent.c:533 #17 0x00007f2c475076df in tevent_req_poll (req=req@entry=0x7f2c4c9a5440, ev=ev@entry=0x7f2c4c998df0) at ../tevent_req.c:256 #18 0x00007f2c4b653f03 in smbd_notifyd_init (interactive=false, msg=0x7f2c4c998ee0) at ../source3/smbd/server.c:411 #19 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1597
It seems valid that it could happen that p->db == NULL in the list from notifyd_clean_peers_next(). This has been seen in a ctdb cluster when an node-internal ctdb interface is brought down. So we seem to need a null check there.
Created attachment 12119 [details] proposed fix for master I think this is the right fix. will send to samba-technical
Created attachment 12128 [details] patch for v4-4-test, cherry-picked from master
Comment on attachment 12128 [details] patch for v4-4-test, cherry-picked from master requesting ACK for getting this into 4.4
Assigning to Karo for inclusion into 4.4.NEXT.
(In reply to Michael Adam from comment #5) Pushed to autobuild-v4-4-test.
(In reply to Karolin Seeger from comment #6) Pushed to v4-4-test. Closing out bug report. Thanks!