Bug 7587 - libtalloc bug makes pam_smbpass & passwd seg fault on FreeBSD 7.3
libtalloc bug makes pam_smbpass & passwd seg fault on FreeBSD 7.3
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Other
unspecified
x64 FreeBSD
: P3 critical
: ---
Assigned To: Jeremy Allison
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-26 11:41 UTC by Michael
Modified: 2015-12-07 22:31 UTC (History)
2 users (show)

See Also:


Attachments
Patch for 3.4 and 3.5 (953 bytes, patch)
2010-09-22 16:09 UTC, Volker Lendecke
vl: review? (jra)
bjacke: review? (metze)
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Michael 2010-07-26 11:41:33 UTC
I'm trying to sync the local unix account passwords to the samba smbpass db using pam. When i run passwd, after it's done it seg faults and produces a core dump. The odd thing is that it works, the users local unix password gets synced to the smbpass db, but it seg faults. Below are my relevant config files. Is this a bug or am I doing something wrong? This problem is persistent. 

/usr/local/etc/smb.conf

security = user
passdb backend = smbpasswd

cat /etc/pam.d/passwd
#
# $FreeBSD: src/etc/pam.d/passwd,v 1.3.36.1 2010/02/10 00:26:20 kensmith Exp $
#
# PAM configuration for the "passwd" service
#

# passwd(1) does not use the auth, account or session services.

# password
#password       requisite       pam_passwdqc.so         enforce=users
password        required        pam_unix.so             no_warn
try_first_pass nullok
password        optional        /usr/local/lib/pam_smbpass.so
try_first_pass smbconf=/usr/local/etc/smb.conf

[root@localhost ~]# passwd
Changing local password for root
New Password:
Retype New Password:
Segmentation fault: 11 (core dumped)

I have the latest port version of samba, samba34-3.4.8. The core dump does not give much info here is a snippet of the end of the trace. Here is a link to the end of the truss trace of the process.
http://mmcgrew.net/out


#636 0x792f6e69622f7273 in ?? ()
#637 0x0064777373617070 in ?? ()
#638 0x247c8d48002454ff in ?? ()
#639 0x01a1c0c748006a10 in ?? ()
#640 0x66fdebf4050f0000 in ?? ()
#641 0x9066669066669066 in ?? ()
#642 0x00007fffffffec18 in ?? ()
#643 0x0000000000000001 in ?? ()
#644 0x00007fffffffec28 in ?? ()
#645 0x0000000000000010 in ?? ()
Cannot access memory at address 0x800000000000

tail /var/log/messages
Jul 19 10:11:49 kernel: Jul 19 10:11:49 kernel: pid 58460 (passwd),
uid 0: exited on signal 11
Comment 1 Volker Lendecke 2010-08-02 06:51:17 UTC
Reproduced with FreeBSD 8.0 and Samba master. Under debian lenny it works fine. Valgrind under FreeBSD does not give anything enlightening. Now to compile the FreeBSD "passwd" utility with debug symbols....

Volker
Comment 2 Karolin Seeger 2010-09-21 03:41:28 UTC
Volker, is this one a blocker for 3.4.10 or can we lower severity here?
Comment 3 Volker Lendecke 2010-09-22 01:38:08 UTC
Ok, this took a while to find, and it's not easy to solve.

We call talloc_autofree_context() from deep inside pam_smbpass. This calls atexit. pam_smbpass.so is loaded via dlopen and later dlclose'd. The atexit handler goes away, but it is not removed from the atexit list -> crash at exit time.

Linux and OpenSolaris solve this, FreeBSD says "don't use atexit in .so's". (search the net for "freebsd atexit dlclose").

One solution would be to replace all calls to talloc_autofree_context() by NULL. I think we can achieve the same effect (not pollute valgrind output) by doing a talloc_enable_null_tracking() at the program start and a talloc_disable_null_tracking() at the end of the program or in the _init and _fini routines of pam_smbpass.so.

Volker
Comment 4 Volker Lendecke 2010-09-22 01:49:58 UTC
Hmm. talloc_disable_null_tracking does not free the null_context children. But by starting talloc_enable_null_tracking should stop valgrind confusion. The question is then -- how can we delete the null context children at dlclose time? Maybe it's better to just leak memory instead of crashing. If we have more than one shlib all depending on the same talloc, this would become pretty tricky very quickly.

Volker
Comment 5 Volker Lendecke 2010-09-22 16:09:12 UTC
Created attachment 5975 [details]
Patch for 3.4 and 3.5

This solves the problem. I don't expect this to go in, but it is the minimum necessary change to fix the crash.

Comments?

Volker
Comment 6 Volker Lendecke 2010-09-24 19:46:48 UTC
Comment on attachment 5975 [details]
Patch for 3.4 and 3.5

Jeremy, what do you think?

Volker
Comment 7 Jeremy Allison 2010-09-25 13:50:30 UTC
It might be ok. I need some time to audit all uses of talloc_autofree_context() to ensure that we don't depend on a destructor being fired on process termination.

Give me a few more days to review please.

Jeremy.
Comment 8 Volker Lendecke 2010-10-08 05:38:54 UTC
Jeremy, assigning to you as an additional reminder that this needs someone's review. I've now gone through almost all of the talloc_autofree_uses in master, none of which really requires a special destructor.

Probably too late for 3.5.6 unfortunately :-(

Volker
Comment 9 Stefan Metzmacher 2015-11-19 10:01:07 UTC
I think we need to use a library destructor function,
git grep DESTRUCTOR_ATTRIBUTE shows some examples...