Bug 10725 - smb connections panic - authentication issue? - stacktrace starts in util_pw.c; plus lots of broken nmb processes
Summary: smb connections panic - authentication issue? - stacktrace starts in util_pw....
Status: NEW
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: User & Group Accounts (show other bugs)
Version: 3.6.24
Hardware: All Linux
: P5 major
Target Milestone: ---
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2014-07-18 10:28 UTC by Jochem
Modified: 2014-08-02 13:31 UTC (History)
1 user (show)

See Also:


Attachments
excerpt from testparm output (1.66 KB, text/plain)
2014-07-18 10:28 UTC, Jochem
no flags Details
backtrace 1 (12.27 KB, text/plain)
2014-07-18 10:29 UTC, Jochem
no flags Details
backtrace 2 (12.23 KB, text/plain)
2014-07-18 10:30 UTC, Jochem
no flags Details
list of times when connections panicked (9.15 KB, text/plain)
2014-07-21 11:53 UTC, Jochem
no flags Details
small log excerpt with panic stacktrace (4.39 KB, text/plain)
2014-07-21 11:54 UTC, Jochem
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jochem 2014-07-18 10:28:17 UTC
Created attachment 10120 [details]
excerpt from testparm output

Hi,
in our new Samba 3.6.23 and also 3.6.24 installation smb-connections sometimes panic and finally can sum up and stop the whole server from working. Not reproducable, happens about every 2 or 3 days. I'm not absolutely sure about that, but it seems to happen more likely on user logins (and under higher load, timing problems?).
We never had this problem before under previous versions. 

The backend is simple ldap, no winbind used. 

compile options:
--enable-socket-wrapper --enable-cups --enable-nss-wrapper --with-ldap --with-acl-support --without-ads --enable-pthreadpool --enable-debug --without-wbclient --without-winbind

Attached is an excerpt from testparm and two backtraces, which both start in file util_pw.c:82.

Regards.
Jochem
Comment 1 Jochem 2014-07-18 10:29:13 UTC
Created attachment 10121 [details]
backtrace 1
Comment 2 Jochem 2014-07-18 10:30:10 UTC
Created attachment 10122 [details]
backtrace 2
Comment 3 Volker Lendecke 2014-07-18 11:47:28 UTC
Stared at the code closely. I don't see how this can happen. Do you have any further hints towards a reproducer? What were the users doing? Is the user with rid 1108 in any way special in LDAP?
Comment 4 Jochem 2014-07-21 11:52:35 UTC
(In reply to comment #3)
> Stared at the code closely. I don't see how this can happen. Do you have any
> further hints towards a reproducer? What were the users doing? Is the user with
> rid 1108 in any way special in LDAP?

Hello Volker,
I cannot find anything special about the users in ldap (all IDs are unique). But I realized, as can be seen from the logs-excerpt "samba-panictimes.txt" which I will attach here, that these panics happen either several times in a rather short amount of time (within one or several minutes like in "log.b020.old") or they happen regularly every one or every two(!) hours (as in "/var/log/samba/log.a174") - but I couldn't find a corresponding regular entry in the windows clients logs. 

I get the "ldapsam_getsampwsid: Unable to locate SID..." message (also in pdbedit calls), but I don't think it is important. I will also attach a typical stacktrace written to a logfile.

Question that comes to my mind: Could it be a problem with parallel logons of one user on several machines (/different threads)? plus timing problem? (hard to trace)

Regards.
Jochem
Comment 5 Jochem 2014-07-21 11:53:53 UTC
Created attachment 10131 [details]
list of times when connections panicked
Comment 6 Jochem 2014-07-21 11:54:50 UTC
Created attachment 10132 [details]
small log excerpt with panic stacktrace
Comment 7 Jochem 2014-08-02 13:30:30 UTC
I also have a lot of broken nmb-processes whenever the panics happen (last time about 600!).

And in the newest release notes (for Samba 4.1.11 and 4.0.21) there was a corrected bug 10735. I grepped for the mentioned function "unstrcpy" in the source3-directory, and it was used (only) in files named nmbd_* within the directory nmbd. Of course it is a bit flat idea, but could there be any connection between this bug and the broken nbm-processes (and the connection panics)?

Kind regards.
Jochem