10725 – smb connections panic - authentication issue? - stacktrace starts in util_pw.c; plus lots of broken nmb processes

Bug 10725 - smb connections panic - authentication issue? - stacktrace starts in util_pw.c; plus lots of broken nmb processes

Summary: smb connections panic - authentication issue? - stacktrace starts in util_pw....

Status:	NEW

Alias:	None

Product:	Samba 3.6
Classification:	Unclassified
Component:	User & Group Accounts (show other bugs)
Version:	3.6.24
Hardware:	All Linux

Importance:	P5 major
Target Milestone:	---
Assignee:	Samba Bugzilla Account
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2014-07-18 10:28 UTC by Jochem
Modified:	2014-08-02 13:31 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
excerpt from testparm output (1.66 KB, text/plain) 2014-07-18 10:28 UTC, Jochem	no flags	Details
backtrace 1 (12.27 KB, text/plain) 2014-07-18 10:29 UTC, Jochem	no flags	Details
backtrace 2 (12.23 KB, text/plain) 2014-07-18 10:30 UTC, Jochem	no flags	Details
list of times when connections panicked (9.15 KB, text/plain) 2014-07-21 11:53 UTC, Jochem	no flags	Details
small log excerpt with panic stacktrace (4.39 KB, text/plain) 2014-07-21 11:54 UTC, Jochem	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Jochem 2014-07-18 10:28:17 UTC

Created attachment 10120 [details]
excerpt from testparm output

Hi,
in our new Samba 3.6.23 and also 3.6.24 installation smb-connections sometimes panic and finally can sum up and stop the whole server from working. Not reproducable, happens about every 2 or 3 days. I'm not absolutely sure about that, but it seems to happen more likely on user logins (and under higher load, timing problems?).
We never had this problem before under previous versions. 

The backend is simple ldap, no winbind used. 

compile options:
--enable-socket-wrapper --enable-cups --enable-nss-wrapper --with-ldap --with-acl-support --without-ads --enable-pthreadpool --enable-debug --without-wbclient --without-winbind

Attached is an excerpt from testparm and two backtraces, which both start in file util_pw.c:82.

Regards.
Jochem

Comment 1 Jochem 2014-07-18 10:29:13 UTC

Created attachment 10121 [details]
backtrace 1

Comment 2 Jochem 2014-07-18 10:30:10 UTC

Created attachment 10122 [details]
backtrace 2

Comment 3 Volker Lendecke 2014-07-18 11:47:28 UTC

Stared at the code closely. I don't see how this can happen. Do you have any further hints towards a reproducer? What were the users doing? Is the user with rid 1108 in any way special in LDAP?

Comment 4 Jochem 2014-07-21 11:52:35 UTC

(In reply to comment #3)
> Stared at the code closely. I don't see how this can happen. Do you have any
> further hints towards a reproducer? What were the users doing? Is the user with
> rid 1108 in any way special in LDAP?

Hello Volker,
I cannot find anything special about the users in ldap (all IDs are unique). But I realized, as can be seen from the logs-excerpt "samba-panictimes.txt" which I will attach here, that these panics happen either several times in a rather short amount of time (within one or several minutes like in "log.b020.old") or they happen regularly every one or every two(!) hours (as in "/var/log/samba/log.a174") - but I couldn't find a corresponding regular entry in the windows clients logs. 

I get the "ldapsam_getsampwsid: Unable to locate SID..." message (also in pdbedit calls), but I don't think it is important. I will also attach a typical stacktrace written to a logfile.

Question that comes to my mind: Could it be a problem with parallel logons of one user on several machines (/different threads)? plus timing problem? (hard to trace)

Regards.
Jochem

Comment 5 Jochem 2014-07-21 11:53:53 UTC

Created attachment 10131 [details]
list of times when connections panicked

Comment 6 Jochem 2014-07-21 11:54:50 UTC

Created attachment 10132 [details]
small log excerpt with panic stacktrace

Comment 7 Jochem 2014-08-02 13:30:30 UTC

I also have a lot of broken nmb-processes whenever the panics happen (last time about 600!).

And in the newest release notes (for Samba 4.1.11 and 4.0.21) there was a corrected bug 10735. I grepped for the mentioned function "unstrcpy" in the source3-directory, and it was used (only) in files named nmbd_* within the directory nmbd. Of course it is a bit flat idea, but could there be any connection between this bug and the broken nbm-processes (and the connection panics)?

Kind regards.
Jochem