Bug 11930 - notifyd crashes sometimes when a ctdb internal network interface is brought down
Summary: notifyd crashes sometimes when a ctdb internal network interface is brought down
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.4.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
Depends on:
Reported: 2016-05-20 11:10 UTC by Michael Adam
Modified: 2016-06-01 07:40 UTC (History)
1 user (show)

See Also:

proposed fix for master (1.16 KB, patch)
2016-05-20 11:12 UTC, Michael Adam
no flags Details
patch for v4-4-test, cherry-picked from master (1.39 KB, patch)
2016-05-23 11:35 UTC, Michael Adam
obnox: review+
vl: review+

Note You need to log in before you can comment on or make changes to this bug.
Description Michael Adam 2016-05-20 11:10:38 UTC
One a ctdb-samba cluster, when an internal ctdb network interface is brought down, sometimes notifyd is seen to crash like this:

(gdb) bt
#0  0x00007f2c471765f7 in raise () from /lib64/libc.so.6
#1  0x00007f2c47177ce8 in abort () from /lib64/libc.so.6
#2  0x00007f2c48ad6beb in dump_core () at ../source3/lib/dumpcore.c:322
#3  0x00007f2c48ac9fe7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814
#4  0x00007f2c4afbb57f in smb_panic (why=why@entry=0x7f2c4b00254a "internal error") at ../lib/util/fault.c:166
#5  0x00007f2c4afbb796 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83
#6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
#7  <signal handler called>
#8  dbwrap_traverse_read (db=0x0, f=f@entry=0x7f2c4aabe210 <notifyd_db_del_syswatches>, private_data=private_data@entry=0x0, count=count@entry=0x0)
    at ../lib/dbwrap/dbwrap.c:361
#9  0x00007f2c4aabbc40 in notifyd_peer_destructor (p=p@entry=0x7f2c4c9a8e60) at ../source3/smbd/notifyd/notifyd.c:1249
#10 0x00007f2c47714e80 in _talloc_free_internal (location=<optimized out>, ptr=<optimized out>) at ../talloc.c:1046
#11 _talloc_free (ptr=0x7f2c4c9a8e60, location=0x7f2c4ac73ac0 "../source3/smbd/notifyd/notifyd.c:1154") at ../talloc.c:1647
#12 0x00007f2c4aabcd08 in notifyd_clean_peers_next (subreq=<optimized out>) at ../source3/smbd/notifyd/notifyd.c:1154
#13 0x00007f2c4750ab4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7f2c4c998df0) at ../tevent_timed.c:341
#14 0x00007f2c48adf3f9 in run_events_poll (ev=0x7f2c4c998df0, pollrtn=0, pfds=0x7f2c4c9a7f50, num_pfds=4) at ../source3/lib/events.c:199
#15 0x00007f2c48adf5f0 in s3_event_loop_once (ev=0x7f2c4c998df0, location=<optimized out>) at ../source3/lib/events.c:326
#16 0x00007f2c4750640d in _tevent_loop_once (ev=ev@entry=0x7f2c4c998df0, location=location@entry=0x7f2c4750c5c5 "../tevent_req.c:256") at ../tevent.c:533
#17 0x00007f2c475076df in tevent_req_poll (req=req@entry=0x7f2c4c9a5440, ev=ev@entry=0x7f2c4c998df0) at ../tevent_req.c:256
#18 0x00007f2c4b653f03 in smbd_notifyd_init (interactive=false, msg=0x7f2c4c998ee0) at ../source3/smbd/server.c:411
#19 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1597
Comment 1 Michael Adam 2016-05-20 11:11:46 UTC
It seems valid that it could happen that p->db == NULL in the list from notifyd_clean_peers_next(). This has been seen in a ctdb cluster when an node-internal ctdb interface is brought down.

So we seem to need a null check there.
Comment 2 Michael Adam 2016-05-20 11:12:51 UTC
Created attachment 12119 [details]
proposed fix for master

I think this is the right fix. will send to samba-technical
Comment 3 Michael Adam 2016-05-23 11:35:02 UTC
Created attachment 12128 [details]
patch for v4-4-test, cherry-picked from master
Comment 4 Michael Adam 2016-05-23 11:35:29 UTC
Comment on attachment 12128 [details]
patch for v4-4-test, cherry-picked from master

requesting ACK for getting this into 4.4
Comment 5 Michael Adam 2016-05-23 12:35:50 UTC
Assigning to Karo for inclusion into 4.4.NEXT.
Comment 6 Karolin Seeger 2016-05-30 09:46:26 UTC
(In reply to Michael Adam from comment #5)
Pushed to autobuild-v4-4-test.
Comment 7 Karolin Seeger 2016-06-01 07:40:54 UTC
(In reply to Karolin Seeger from comment #6)
Pushed to v4-4-test.
Closing out bug report.