Bug 5691 - Panic on Solaris 10 AD member server - samba 3.2.1 and 3.2.2
Panic on Solaris 10 AD member server - samba 3.2.1 and 3.2.2
Status: RESOLVED FIXED
Product: Samba 3.2
Classification: Unclassified
Component: File services
3.2.2
Sparc Solaris
: P3 major
: ---
Assigned To: Samba Bugzilla Account
Samba QA Contact
http://urban.csuohio.edu/~bob/samba_3...
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-08-13 15:34 UTC by Robert M Martel
Modified: 2008-10-13 04:55 UTC (History)
0 users

See Also:


Attachments
samba 3.2.4 full backtrace on Solaris 9 AD member server (24.24 KB, text/plain)
2008-10-07 14:09 UTC, Robert M Martel
no flags Details
patch (807 bytes, patch)
2008-10-07 14:27 UTC, Volker Lendecke
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Robert M Martel 2008-08-13 15:34:33 UTC
Samba 3.2.1 on Sun Sparc Solaris 10 built with gcc 3.4.3

AD Member server using idmap_rid

attempts to access resources from client PCs result in "access denied ... the network path was not found" or "The specified network name is no longer available"  Smblog file for the client shows an internal error for an smbd process.  

using smbclient to test access to the server worked once, then subsequent attempts yielded "Receiving SMB: Server stopped responding
session setup failed: Call timed out: server did not respond after 20000 milliseconds"

I am not seeing this problem on a server running Samba 3.2.1 under Sparc Solaris 9 built with gcc 3.4.6

smb.conf, smblog, and backtrace from gdb available at http://urban.csuohio.edu/~bob/samba_3.2.1
Comment 1 Jeremy Allison 2008-08-13 15:59:19 UTC
It's dying at line 104 in smbd/session.c

#9  0x0007b5b0 in session_claim (vuser=0x65d130) at smbd/session.c:104
        sess_pid = {pid = 0}
        key = {dptr = 0xffbfdff8 "ID/1", dsize = 5}
        data = {dptr = 0x0, dsize = 0}
        i = 1
        sessionid = {uid = 0, gid = 0, username = '\0' <repeats 255 times>, 
  hostname = '\0' <repeats 255 times>, netbios_name = '\0' <repeats 255 times>, 
  remote_machine = '\0' <repeats 255 times>, id_str = '\0' <repeats 255 times>, id_num = 0, pid = {pid = 0}, 
  ip_addr_str = '\0' <repeats 255 times>, connect_start = 0}
        pid = {pid = 22767}
        keystr = "ID/1", '\0' <repeats 251 times>
        hostname = 0x634a24 ""
        ctx = (struct db_context *) 0x64c178
        rec = (struct db_record *) 0x6951e8
        status = {v = 2420}
        addr = '\0' <repeats 45 times>
        __FUNCTION__ = "session_claim"

That line is quite simple :

104                         rec = ctx->fetch_locked(ctx, NULL, key);

can you reproduce it and then dump out the contents of the ctx struct pointer. I can't see what on that line might cause a panic other than a bad pointer for ctx->fetch_locked.

Jeremy.
Comment 2 Robert M Martel 2008-08-14 09:15:22 UTC
(In reply to comment #1)
> It's dying at line 104 in smbd/session.c
> 
..
> can you reproduce it and then dump out the contents of the ctx struct pointer.
> I can't see what on that line might cause a panic other than a bad pointer for
> ctx->fetch_locked.

Sorry to be so clueless.  I should be able to reproduce the problem, but don't know how to dump out the contents of the pointer - so a pointer on that would be appreciated. 

Thanks, 
Bob


Comment 3 Robert M Martel 2008-09-19 12:10:55 UTC
(In reply to comment #0)
> Samba 3.2.1 on Sun Sparc Solaris 10 built with gcc 3.4.3
> 
> AD Member server using idmap_rid


Finally getting back to this project/issue.  Seeing the same problem with Samba 3.2.2.

 smb.conf, smblog, and backtrace from gdb available at
 http://urban.csuohio.edu/~bob/samba_3.2.2
 

Comment 4 Volker Lendecke 2008-09-23 13:39:24 UTC
Can you try recompiling with Sun Studio?

Volker
Comment 5 Robert M Martel 2008-09-23 14:49:36 UTC
(In reply to comment #4)
> Can you try recompiling with Sun Studio?

Not right away.  Any particular version of Sun Studio I should try?  

I have a machine with Sun Studio 11,REV=2005.10.13 on it currently.  I wil lhave to make sure the other packages (openssl, openldap, etc) are installed and the same version as the ones on my test machine.

-Bob



Comment 6 Robert M Martel 2008-09-26 09:48:16 UTC
(In reply to comment #4)
> Can you try recompiling with Sun Studio?

Built 3.2.2 with Sun Studio 11 on Solaris 10.  

Right now I am double-checking my install b/c when accessing the server as ad AD user authentication fails.  I think Samba "sees" who I am from AD b/c I have seen my real name in the error messages associated w/ my AD login name.


Comment 7 Robert M Martel 2008-09-29 14:01:28 UTC
Rebuilt kerberos5, sasl, openldap, and Samba all with gcc 3.4.6 under Solaris 10.  Still seeing the same problem with PANIC messages in the samba logs.


With Sun Studio 11 under Solaris 10 of a different machine I was unable to get authentication working even this far - nothing but "NT_STATUS_LOGON_FAILURE" messages.

Comment 8 Volker Lendecke 2008-09-30 05:21:20 UTC
Can you please upload the debug level 10 log of the Sun Studio compile leading to the NT_STATUS_LOGON_FAILURE?

Thanks,

Volker
Comment 9 Robert M Martel 2008-09-30 08:13:21 UTC
Machine with Samba built with Sun Studio 11 joined to AD.  Log from the server attached, it and other log files at  http://urban.csuohio.edu/~bob/samba_3.2.2/studio11/ 

Tried the following:

# wbinfo -t
checking the trust secret via RPC calls succeeded

# wbinfo -a 1001362%*********
plaintext password authentication succeeded
challenge/response password authentication succeeded


# smbclient -U 1001362 -L austin
Enter 1001362's password: 
session setup failed: NT_STATUS_LOGON_FAILURE

Unable to connect from a windows client - pronpted for password over and over.
Comment 10 Volker Lendecke 2008-10-01 01:30:54 UTC
Ok, now this time username 1001362 is not valid on your system. Can you create a user that does not begin with a digit?

Volker
Comment 11 Robert M Martel 2008-10-01 07:50:37 UTC
(In reply to comment #10)
> Ok, now this time username 1001362 is not valid on your system. Can you create
> a user that does not begin with a digit?

Sorry to say I cannot do that.  My group does not have any control over the Active Directory server (and they don't help us out much, either) so all my user accounts will be in the form of seven digits.

I have access to a test account that does not have any digits in its user name which I can try for testing.

So far I've not had any of these issues on the Solaris 9 test server running Samba 3.2.2.

I wonder why when built with Sun Studio on Solaris 10 it ends up not liking the account names when samba fails in a different place when built with gcc.

----------
I tried to access the server using the test account "martel-test" and it look like I am back to the earlier failure mode - I can't access the server and see a panic message in the log file.

running smbclient -L techops -Umartel-test from the command line works ONCE, but then fails on later attempts.

If I list a directory with files owned by AD users, I see the AD users listed under user and group:

#techops# ls -l
total 4
-rwxrw-rw-   1 1001362  10002          0 May 23 12:01 may22file.txt
-rw-r--r--   1 1001362  10002          0 May 23 12:04 may22file.txt-2
drwxr-xr-x   2 1001362  10002        512 May 23 12:01 may22folder
-rwxrw-rw-   1 1001362  10002        159 Apr 23 11:57 new text document.txt
-rw-r--r--   1 1001362  domain users       0 Aug 13 09:37 samba_3.2.1_test.txt


Logs from this attempt found are at 
http://urban.csuohio.edu/~bob/samba_3.2.2/studio11/try2/

Thank you.
Comment 12 Volker Lendecke 2008-10-01 08:38:38 UTC
> -rwxrw-rw-   1 1001362  10002          0 May 23 12:01 may22file.txt
> -rw-r--r--   1 1001362  10002          0 May 23 12:04 may22file.txt-2
> drwxr-xr-x   2 1001362  10002        512 May 23 12:01 may22folder
> -rwxrw-rw-   1 1001362  10002        159 Apr 23 11:57 new text document.txt
> -rw-r--r--   1 1001362  domain users       0 Aug 13 09:37 samba_3.2.1_test.txt

Do you use "winbind use default domain = yes"? The problem here is that it is not clear whether 1001362 is a user name or a numeric uid, and something in the NSS system is confused. You might work around this problem by removing "winbind use default domain = yes".

> Logs from this attempt found are at 
> http://urban.csuohio.edu/~bob/samba_3.2.2/studio11/try2/

This looks like a pretty normal access to me. Anything that did not work here?

The line 

Get_Pwnam_internals did find user [CSUNET\martel-test]!

shows that you do not use "winbind use default domain", so something in your NSS system is confused about 1001362, this should show a user name. The other log showed that the user CSUNET\10011362 (or so, the numeric one) can not be found. Is it possible that your libnss_winbind/libwbclient does not match the winbind you installed?

Volker
Comment 13 Robert M Martel 2008-10-01 15:11:55 UTC
> > Logs from this attempt found are at 
> > http://urban.csuohio.edu/~bob/samba_3.2.2/studio11/try2/
> 
> This looks like a pretty normal access to me. Anything that did not work here?

Yes - I cannot access the server via Samba:  No access and Samba panics

 
> The line 
> 
> Get_Pwnam_internals did find user [CSUNET\martel-test]!
> 
> shows that you do not use "winbind use default domain",...Is it possible that your libnss_winbind/libwbclient does not match the
> winbind you installed?

unlikely, but I will check that.



"1001362" is the user login name...the directory listing above is actually what I expected (and wanted) to see on the AD member server.  The idmap_rid gave AD user "1001362" the unix UID of 10513 - the same UID number used on my Solaris 10 and Solaris 9 test servers.  The Solaris 9 server is not displaying these issues and I can access it from client PCs using AD accounts

A section from the smb.conf - I *am* using "use default domain"

...
        idmap domains = CSUNET
        template homedir = /home/%U
        template shell = /usr/bin/bash
        winbind use default domain = Yes
        idmap config CSUNET:range = 10000-100000000
        idmap config CSUNET:base_rid = 0
        idmap config CSUNET:backend = rid
        idmap config CSUNET:default = yes


What happened on this latest attempt was when I used the "martel-test" AD account to attempt access to the Samba server on the Solaris 10 box built with Sun Studio it failed - in a manner that looked about the same as the failure mode on the version of samba built with gcc on this platform: Samba panics.  From the log file smblog.137.148.92.196:


[2008/10/01 08:38:33,  3] smbd/password.c:register_existing_vuid(314)
  register_existing_vuid: User name: CSUNET\ur20-02$    Real name: UR20-02$
[2008/10/01 08:38:33,  3] smbd/password.c:register_existing_vuid(326)
  register_existing_vuid: UNIX uid 211082 is UNIX user CSUNET\ur20-02$, and will be vuid 104
[2008/10/01 08:38:33, 10] lib/dbwrap_tdb.c:db_tdb_fetch_locked(100)
  Locking key 49442F3100
[2008/10/01 08:38:33, 10] lib/dbwrap_tdb.c:db_tdb_fetch_locked(129)
  Allocated locked data 0x694170
[2008/10/01 08:38:33,  0] lib/fault.c:fault_report(40)
  ===============================================================
[2008/10/01 08:38:33,  0] lib/fault.c:fault_report(41)
  INTERNAL ERROR: Signal 10 in pid 23624 (3.2.2)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2008/10/01 08:38:33,  0] lib/fault.c:fault_report(43)

  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2008/10/01 08:38:33,  0] lib/fault.c:fault_report(44)
  ===============================================================
[2008/10/01 08:38:33,  0] lib/util.c:smb_panic(1663)
  PANIC (pid 23624): internal error
[2008/10/01 08:38:33,  0] lib/util.c:log_stack_trace(1817)
  unable to produce a stack trace on this platform


It looks to me that building Samba with Sun Studio under Solaris 10 added a new problem: not functioning with the all digit AD user account names - a problem I did not see before, and the original issue with not being able to access the shares using an AD account from client machines persists.

From a client PC logged in as an AD user I can browse to the Samba server, It will show me the shares available, but as soon as I attempt to access one of those shares I receive an error message om the client and see the panic messages in the samba logs on the server.

Thank you
Bob 




Comment 14 Volker Lendecke 2008-10-01 19:27:34 UTC
Sorry, had missed the panic.

Now I'm completely lost how this line might get a SIGBUS. I need ssh access to a box that shows this behaviour to figure out more.

Sorry,

Volker
Comment 15 Robert M Martel 2008-10-03 09:55:23 UTC
Please contact me via Email so I can set up this access for you.

-Bob

Comment 16 Robert M Martel 2008-10-03 13:22:24 UTC
Today I tried to install Samba 3.2.4 on a Solaris 9 machine - it was working on a test machine So I decided to try it on a production machine that does not normally run samba.

When I try to access it from a client PC I get "Network name no longer available" messages.  The log file seems to indicate that Samba is stopping in the same place:

[2008/10/03 14:13:42,  3] smbd/password.c:register_existing_vuid(326)
  register_existing_vuid: UNIX uid 101888 is UNIX user CSUNET\1001362, and will be vuid 101
[2008/10/03 14:13:42, 10] lib/dbwrap_tdb.c:db_tdb_fetch_locked(100)
  Locking key 49442F3100
[2008/10/03 14:13:42, 10] lib/dbwrap_tdb.c:db_tdb_fetch_locked(129)
  Allocated locked data 0x6d0730
[2008/10/03 14:13:42,  0] lib/fault.c:fault_report(40)
  ===============================================================
[2008/10/03 14:13:42,  0] lib/fault.c:fault_report(41)
  INTERNAL ERROR: Signal 10 in pid 11819 (3.2.4)
...
  ===============================================================
[2008/10/03 14:13:42,  0] lib/util.c:smb_panic(1663)
  PANIC (pid 11819): internal error

There a some differences between the Solaris 9 box I was using for testing, and this "production" box as far as installed software goes (versions of OpenSSL for example.)
Comment 17 Jeremy Allison 2008-10-06 18:29:56 UTC
Add the line :

panic action = /bin/sleep 999999

to the [global] section of your smb.conf and reproduce the panic. That should allow you to attach to the panic'ed process and get a backtrace with symbols.

Jeremy.
Comment 18 Robert M Martel 2008-10-07 14:09:59 UTC
Created attachment 3668 [details]
samba 3.2.4 full backtrace on Solaris 9 AD member server
Comment 19 Robert M Martel 2008-10-07 14:10:56 UTC
At the risk of muddying the waters still further attached is a back trace from a Solaris 9 machine running Samba 3.2.4 which is an Active Directory member server.  It seems to be failing the same way I've been seeing on my Solaris 10 server.  Seeing "the specified network name is no longer available" on the cient PC.  From the log file:

 [2008/10/07 14:57:59,  3] smbd/password.c:register_existing_vuid(326)
  register_existing_vuid: UNIX uid 211082 is UNIX user CSUNET\ur20-02$, and will be vuid 104
[2008/10/07 14:57:59, 10] lib/dbwrap_tdb.c:db_tdb_fetch_locked(100)
  Locking key 49442F3100
[2008/10/07 14:57:59, 10] lib/dbwrap_tdb.c:db_tdb_fetch_locked(129)
  Allocated locked data 0x72cd18
[2008/10/07 14:57:59,  0] lib/fault.c:fault_report(40)
  ===============================================================
[2008/10/07 14:57:59,  0] lib/fault.c:fault_report(41)
  INTERNAL ERROR: Signal 10 in pid 22077 (3.2.4)
  Please read the Trouble-Shooting section of the Samba3-HOWTO
[2008/10/07 14:57:59,  0] lib/fault.c:fault_report(43)
  
  From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf
[2008/10/07 14:57:59,  0] lib/fault.c:fault_report(44)
  ===============================================================
[2008/10/07 14:57:59,  0] lib/util.c:smb_panic(1663)
  PANIC (pid 22077): internal error


Attached backtrace is from PID 22077.

Comment 20 Volker Lendecke 2008-10-07 14:13:59 UTC
Okay, *that* backtrace is different than the others. This one I can fix. Expect a patch v soon.

Volker
Comment 21 Volker Lendecke 2008-10-07 14:27:10 UTC
Created attachment 3669 [details]
patch

Can you try the attached patch?

Thanks,

Volker
Comment 22 Robert M Martel 2008-10-08 15:44:20 UTC
I tried the patch on both my Solaris 10 test server and the Solaris 9 server that were exhibiting problems.  From the  preliminary checking it looks like the patch corrected the problem I was seeing.  

I plan on doing some additional testing tomorrow.
Comment 23 Robert M Martel 2008-10-12 23:10:14 UTC
From what I can see thus far Samba is operating the way I;d expect it to on both mt Solaris 9 and Solaris 10 test servers.  Thanks very much for the patch!

-Bob
Comment 24 Volker Lendecke 2008-10-13 04:55:19 UTC
Thanks for testing. Checked in the patch.

Volker