A coredump of the winbind server service shows the attached backtrace in gdb. The backtrace indicates that the abort occurred due to a NULL-valued (dcecli_connection *)->transport.private_data in the source4/librpc/rpc/dcerpc_smb.c:pipe_dead function, which was called by source4/librpc/rpc/dcerpc_smb.c:smb_trans_callback as a reaction to NT_STATUS_IO_TIMEOUT.
Created attachment 7447 [details] backtrace of the winbind core Maybe it's important to note that the following two patches to winbind were not yet included during compilation of the samba sources: * http://gitweb.samba.org/samba.git/?p=samba.git;a=commit;h=692c42c42731b017310e07549489c3ab0bca7d12 * http://gitweb.samba.org/samba.git/?p=samba.git;a=commit;h=71587285ccf78547ee4830b03d8a1493412504a5
Created attachment 7448 [details] patch proposal to handle smb == NULL in pipe_dead
Does it still work if you use talloc_get_type_abort()? What I'm getting at is: is this a wild (non)talloc pointer, which talloc_get_type() returns NULL for, or is this really a NULL pointer to begin with?
talloc_get_type_abort() would be correct. There're so many fixes in master (alpha19), which might fix this problem.
The backtrace indicates that it was a NULL pointer. I guess that it is supposed to be a talloc pointer, as e.g. smb_read_callback uses talloc_get_type to fetch it. Using talloc_get_type_abort would result in a segfault as well? Anyway.. @Metze: I'm following the commits and looking out for commit sets to pull. We have been publishing a couple of updates for really important ones. I'll check again.
Arvid, can we close this one? I really think this is fixed in master...
I'm fine with that, if it would happen again then I would reopen :-)
Thanks