I'm trying to sync the local unix account passwords to the samba smbpass db using pam. When i run passwd, after it's done it seg faults and produces a core dump. The odd thing is that it works, the users local unix password gets synced to the smbpass db, but it seg faults. Below are my relevant config files. Is this a bug or am I doing something wrong? This problem is persistent.
security = user
passdb backend = smbpasswd
# $FreeBSD: src/etc/pam.d/passwd,v 188.8.131.52 2010/02/10 00:26:20 kensmith Exp $
# PAM configuration for the "passwd" service
# passwd(1) does not use the auth, account or session services.
#password requisite pam_passwdqc.so enforce=users
password required pam_unix.so no_warn
password optional /usr/local/lib/pam_smbpass.so
[root@localhost ~]# passwd
Changing local password for root
Retype New Password:
Segmentation fault: 11 (core dumped)
I have the latest port version of samba, samba34-3.4.8. The core dump does not give much info here is a snippet of the end of the trace. Here is a link to the end of the truss trace of the process.
#636 0x792f6e69622f7273 in ?? ()
#637 0x0064777373617070 in ?? ()
#638 0x247c8d48002454ff in ?? ()
#639 0x01a1c0c748006a10 in ?? ()
#640 0x66fdebf4050f0000 in ?? ()
#641 0x9066669066669066 in ?? ()
#642 0x00007fffffffec18 in ?? ()
#643 0x0000000000000001 in ?? ()
#644 0x00007fffffffec28 in ?? ()
#645 0x0000000000000010 in ?? ()
Cannot access memory at address 0x800000000000
Jul 19 10:11:49 kernel: Jul 19 10:11:49 kernel: pid 58460 (passwd),
uid 0: exited on signal 11
Reproduced with FreeBSD 8.0 and Samba master. Under debian lenny it works fine. Valgrind under FreeBSD does not give anything enlightening. Now to compile the FreeBSD "passwd" utility with debug symbols....
Volker, is this one a blocker for 3.4.10 or can we lower severity here?
Ok, this took a while to find, and it's not easy to solve.
We call talloc_autofree_context() from deep inside pam_smbpass. This calls atexit. pam_smbpass.so is loaded via dlopen and later dlclose'd. The atexit handler goes away, but it is not removed from the atexit list -> crash at exit time.
Linux and OpenSolaris solve this, FreeBSD says "don't use atexit in .so's". (search the net for "freebsd atexit dlclose").
One solution would be to replace all calls to talloc_autofree_context() by NULL. I think we can achieve the same effect (not pollute valgrind output) by doing a talloc_enable_null_tracking() at the program start and a talloc_disable_null_tracking() at the end of the program or in the _init and _fini routines of pam_smbpass.so.
Hmm. talloc_disable_null_tracking does not free the null_context children. But by starting talloc_enable_null_tracking should stop valgrind confusion. The question is then -- how can we delete the null context children at dlclose time? Maybe it's better to just leak memory instead of crashing. If we have more than one shlib all depending on the same talloc, this would become pretty tricky very quickly.
Created attachment 5975 [details]
Patch for 3.4 and 3.5
This solves the problem. I don't expect this to go in, but it is the minimum necessary change to fix the crash.
Comment on attachment 5975 [details]
Patch for 3.4 and 3.5
Jeremy, what do you think?
It might be ok. I need some time to audit all uses of talloc_autofree_context() to ensure that we don't depend on a destructor being fired on process termination.
Give me a few more days to review please.
Jeremy, assigning to you as an additional reminder that this needs someone's review. I've now gone through almost all of the talloc_autofree_uses in master, none of which really requires a special destructor.
Probably too late for 3.5.6 unfortunately :-(
I think we need to use a library destructor function,
git grep DESTRUCTOR_ATTRIBUTE shows some examples...
While pam_smbpass has been removed, the fundemental issue remains until talloc_autofree_context() is no longer used. This continues to be worked on.
Fixed in talloc-2.1.12
(In reply to Stefan Metzmacher from comment #11)
Using a library destructor doesn't seem to be a good idea,
see bug #13366.
Samba doesn't use talloc_autofree_context() anymore since 4.7.
*** This bug has been marked as a duplicate of bug 12932 ***