11930 – notifyd crashes sometimes when a ctdb internal network interface is brought down

Bug 11930 - notifyd crashes sometimes when a ctdb internal network interface is brought down

Summary: notifyd crashes sometimes when a ctdb internal network interface is brought down

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 4.1 and newer
Classification:	Unclassified
Component:	File services (show other bugs)
Version:	4.4.3
Hardware:	All All

Importance:	P5 normal (vote)
Target Milestone:	---
Assignee:	Karolin Seeger
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2016-05-20 11:10 UTC by Michael Adam
Modified:	2016-06-01 07:40 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
proposed fix for master (1.16 KB, patch) 2016-05-20 11:12 UTC, Michael Adam	no flags	Details
patch for v4-4-test, cherry-picked from master (1.39 KB, patch) 2016-05-23 11:35 UTC, Michael Adam	obnox: review+ vl: review+	Details
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Adam 2016-05-20 11:10:38 UTC

One a ctdb-samba cluster, when an internal ctdb network interface is brought down, sometimes notifyd is seen to crash like this:

(gdb) bt
#0  0x00007f2c471765f7 in raise () from /lib64/libc.so.6
#1  0x00007f2c47177ce8 in abort () from /lib64/libc.so.6
#2  0x00007f2c48ad6beb in dump_core () at ../source3/lib/dumpcore.c:322
#3  0x00007f2c48ac9fe7 in smb_panic_s3 (why=<optimized out>) at ../source3/lib/util.c:814
#4  0x00007f2c4afbb57f in smb_panic (why=why@entry=0x7f2c4b00254a "internal error") at ../lib/util/fault.c:166
#5  0x00007f2c4afbb796 in fault_report (sig=<optimized out>) at ../lib/util/fault.c:83
#6  sig_fault (sig=<optimized out>) at ../lib/util/fault.c:94
#7  <signal handler called>
#8  dbwrap_traverse_read (db=0x0, f=f@entry=0x7f2c4aabe210 <notifyd_db_del_syswatches>, private_data=private_data@entry=0x0, count=count@entry=0x0)
    at ../lib/dbwrap/dbwrap.c:361
#9  0x00007f2c4aabbc40 in notifyd_peer_destructor (p=p@entry=0x7f2c4c9a8e60) at ../source3/smbd/notifyd/notifyd.c:1249
#10 0x00007f2c47714e80 in _talloc_free_internal (location=<optimized out>, ptr=<optimized out>) at ../talloc.c:1046
#11 _talloc_free (ptr=0x7f2c4c9a8e60, location=0x7f2c4ac73ac0 "../source3/smbd/notifyd/notifyd.c:1154") at ../talloc.c:1647
#12 0x00007f2c4aabcd08 in notifyd_clean_peers_next (subreq=<optimized out>) at ../source3/smbd/notifyd/notifyd.c:1154
#13 0x00007f2c4750ab4f in tevent_common_loop_timer_delay (ev=ev@entry=0x7f2c4c998df0) at ../tevent_timed.c:341
#14 0x00007f2c48adf3f9 in run_events_poll (ev=0x7f2c4c998df0, pollrtn=0, pfds=0x7f2c4c9a7f50, num_pfds=4) at ../source3/lib/events.c:199
#15 0x00007f2c48adf5f0 in s3_event_loop_once (ev=0x7f2c4c998df0, location=<optimized out>) at ../source3/lib/events.c:326
#16 0x00007f2c4750640d in _tevent_loop_once (ev=ev@entry=0x7f2c4c998df0, location=location@entry=0x7f2c4750c5c5 "../tevent_req.c:256") at ../tevent.c:533
#17 0x00007f2c475076df in tevent_req_poll (req=req@entry=0x7f2c4c9a5440, ev=ev@entry=0x7f2c4c998df0) at ../tevent_req.c:256
#18 0x00007f2c4b653f03 in smbd_notifyd_init (interactive=false, msg=0x7f2c4c998ee0) at ../source3/smbd/server.c:411
#19 main (argc=<optimized out>, argv=<optimized out>) at ../source3/smbd/server.c:1597

Comment 1 Michael Adam 2016-05-20 11:11:46 UTC

It seems valid that it could happen that p->db == NULL in the list from notifyd_clean_peers_next(). This has been seen in a ctdb cluster when an node-internal ctdb interface is brought down.

So we seem to need a null check there.

Comment 2 Michael Adam 2016-05-20 11:12:51 UTC

Created attachment 12119 [details]
proposed fix for master

I think this is the right fix. will send to samba-technical

Comment 3 Michael Adam 2016-05-23 11:35:02 UTC

Created attachment 12128 [details]
patch for v4-4-test, cherry-picked from master

Comment 4 Michael Adam 2016-05-23 11:35:29 UTC

Comment on attachment 12128 [details]
patch for v4-4-test, cherry-picked from master

requesting ACK for getting this into 4.4

Comment 5 Michael Adam 2016-05-23 12:35:50 UTC

Assigning to Karo for inclusion into 4.4.NEXT.

Comment 6 Karolin Seeger 2016-05-30 09:46:26 UTC

(In reply to Michael Adam from comment #5)
Pushed to autobuild-v4-4-test.

Comment 7 Karolin Seeger 2016-06-01 07:40:54 UTC

(In reply to Karolin Seeger from comment #6)
Pushed to v4-4-test.
Closing out bug report.

Thanks!