Bug 5933 - STATUS_INVALID_PARAMETER returned for all Session Setup AndX requests until smbd killed
Summary: STATUS_INVALID_PARAMETER returned for all Session Setup AndX requests until s...
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.2
Classification: Unclassified
Component: File services (show other bugs)
Version: 3.2.4
Hardware: x64 Windows XP
: P3 critical
Target Milestone: ---
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-12-02 13:14 UTC by Ofer Tal
Modified: 2009-01-02 11:25 UTC (History)
1 user (show)

See Also:


Attachments
log.smbd (level: 10) while issue is happenning (521.74 KB, application/x-gzip-compressed)
2008-12-02 13:26 UTC, Ofer Tal
no flags Details
zipped pcap network trace from web2 while seesions are denied (377.86 KB, application/x-zip-compressed)
2008-12-02 13:29 UTC, Ofer Tal
no flags Details
log with single failing smbd proc at level 10 showing INVALID PARAMETER (19.42 KB, application/x-gzip-compressed)
2008-12-02 16:36 UTC, Ofer Tal
no flags Details
patch (1.83 KB, patch)
2008-12-29 15:22 UTC, Volker Lendecke
no flags Details
corrected patch (1.83 KB, patch)
2008-12-29 16:30 UTC, Volker Lendecke
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ofer Tal 2008-12-02 13:14:27 UTC
I am using SAMBA 3.2.4-37 (installed from sernet’s samba3 rpms) on a CentOS 5 machine (uname –a: Linux filer1 2.6.18-92.1.18.el5 #1 SMP Wed Nov 12 09:19:49 EST 2008 x86_64 x86_64 x86_64 GNU/Linux) . It is in the back end of a small network of windows servers with a trivial active directory domain (no forests, single site, very few users, windows 2003 DCs, samba is a member server) and I am using it as the remote UNC storage for my websites(IIS is configured to access these files using domain users).

This works well 99.9% of the time, but on occasion, one of my web servers (happens on both server, and in arbitrary times including times with very little traffic) stops communicating with the UNC shares.

Tracing the network with wireshark from the web servers shows something peculiar – the web server sends a “Session Setup AndX Request: NTLMSSP_NEGOTIATE” and the server responds (every time) with a “Session Setup AndX Response Error:  STATUS_INVALID_PARAMETER”
Tracing the same situation when the session is not in this bizarre state shows that the same NTLMSSP_NEGOTIATE request gets answered (as it should) with an NTLMSSP_CHALLENGE and the session gets built correctly.

On the samba server, if I then use smbstatus to identify the PID of the smbd process serving the active session from the server that can’t connect and “kill” it (just a normal kill, no “-9” necessary) the web server immediately is able to setup the session correctly and resume operation (at least for several hours until this happens again)

I have gone over the windows configuration with Microsoft supporters and they have concluded that the windows setup and configuration (iis, AD, smb, networking) is good.

Here are a couple of exerpts form network traces filtered for SMB Command 0x73 that show the problem:

In these traces:
10.0.0.115 is the samba server
10.0.2.1 is the web server affected by the issue when trace is taken
10.0.1.1 is the web server NOT affected by the issue when trace is taken
(before you ask... subnet mask is 255.255.0.0)


Trace 1: web2 when the issue occurs ( all session setup requests denied without a challenge)
===========================================================================
21:48:59.199        0.000000               1301       4.005879                               {NbtSS:157, TCP:156, IPv4:155}      10.0.2.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM NEGOTIATE MESSAGE
21:48:59.200        0.000157               1302       4.006036                               {NbtSS:157, TCP:156, IPv4:155}      10.0.0.115             10.0.2.1 SMB                SMB:R; Session Setup Andx - NT Status: System - Error, Code = (13) STATUS_INVALID_PARAMETER
21:48:59.360        0.160932               1368       4.166968                               {TCP:156, IPv4:155}           10.0.2.1 10.0.0.115             TCP        TCP: [Bad CheckSum]Flags=...A...., SrcPort=10044, DstPort=Microsoft-DS(445), PayloadLen=0, Seq=3500048810, Ack=2385985999, Win=65365
21:49:00.255        0.894679               1632       5.061647                               {NbtSS:157, TCP:156, IPv4:155}      10.0.2.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM NEGOTIATE MESSAGE
21:49:00.255        0.000177               1633       5.061824                               {NbtSS:157, TCP:156, IPv4:155}      10.0.0.115             10.0.2.1 SMB                SMB:R; Session Setup Andx - NT Status: System - Error, Code = (13) STATUS_INVALID_PARAMETER
21:49:00.371        0.115845               1693       5.177669                               {TCP:156, IPv4:155}           10.0.2.1 10.0.0.115             TCP        TCP: [Bad CheckSum]Flags=...A...., SrcPort=10044, DstPort=Microsoft-DS(445), PayloadLen=0, Seq=3500049080, Ack=2385986038, Win=65326
21:49:01.284        0.912846               1839       6.090515                               {NbtSS:157, TCP:156, IPv4:155}      10.0.2.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM NEGOTIATE MESSAGE
21:49:01.284        0.000213               1840       6.090728                               {NbtSS:157, TCP:156, IPv4:155}      10.0.0.115             10.0.2.1 SMB                SMB:R; Session Setup Andx - NT Status: System - Error, Code = (13) STATUS_INVALID_PARAMETER
21:49:01.473        0.188502               1967       6.279230                               {TCP:156, IPv4:155}           10.0.2.1 10.0.0.115             TCP        TCP: [Bad CheckSum]Flags=...A...., SrcPort=10044, DstPort=Microsoft-DS(445), PayloadLen=0, Seq=3500049350, Ack=2385986077, Win=65287


Trace 2: (for comparison) web1 not experiencing the problem at the same time (NTLMSSP_NEGOTIATE requests are responded as expected with a CHALLENGE)
========================================================================
22:14:57.916        0.000000               84           0.017983                               {NbtSS:8, TCP:7, IPv4:6}    10.0.1.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM NEGOTIATE MESSAGE
22:14:57.917        0.000267               85           0.018250                               {NbtSS:8, TCP:7, IPv4:6}    10.0.0.115             10.0.1.1 SMB                SMB:R; Session Setup Andx, NTLM CHALLENGE MESSAGE - NT Status: System - Error, Code = (22) STATUS_MORE_PROCESSING_REQUIRED
22:14:57.917        0.000168               86           0.018418                               {NbtSS:8, TCP:7, IPv4:6}    10.0.1.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM AUTHENTICATE MESSAGE, Domain: BAYNET, User: IUSR_BAYNET, Workstation: WEB1
22:14:57.927        0.009865               95           0.028283                               {NbtSS:8, TCP:7, IPv4:6}    10.0.0.115             10.0.1.1 SMB                SMB:R; Session Setup Andx, SpnegoNegTokenResp
22:14:58.349        0.422286               921         0.450569                               {NbtSS:8, TCP:7, IPv4:6}    10.0.1.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM NEGOTIATE MESSAGE
22:14:58.349        0.000244               923         0.450813                               {NbtSS:8, TCP:7, IPv4:6}    10.0.0.115             10.0.1.1 SMB                SMB:R; Session Setup Andx, NTLM CHALLENGE MESSAGE - NT Status: System - Error, Code = (22) STATUS_MORE_PROCESSING_REQUIRED
22:14:58.349        0.000178               926         0.450991                               {NbtSS:8, TCP:7, IPv4:6}    10.0.1.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM AUTHENTICATE MESSAGE, Domain: BAYNET, User: IUSR_BAYNET, Workstation: WEB1
22:14:58.359        0.009884               955         0.460875                               {NbtSS:8, TCP:7, IPv4:6}    10.0.0.115             10.0.1.1 SMB                SMB:R; Session Setup Andx, SpnegoNegTokenResp
22:14:58.547        0.187521               1847       0.648396                               {NbtSS:8, TCP:7, IPv4:6}    10.0.1.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM NEGOTIATE MESSAGE
22:14:58.547        0.000242               1848       0.648638                               {NbtSS:8, TCP:7, IPv4:6}    10.0.0.115             10.0.1.1 SMB                SMB:R; Session Setup Andx, NTLM CHALLENGE MESSAGE - NT Status: System - Error, Code = (22) STATUS_MORE_PROCESSING_REQUIRED
22:14:58.547        0.000215               1855       0.648853                               {NbtSS:8, TCP:7, IPv4:6}    10.0.1.1 10.0.0.115             SMB                SMB:C; Session Setup Andx, NTLM AUTHENTICATE MESSAGE, Domain: BAYNET, User: IUSR_BAYNET, Workstation: WEB1
22:14:58.557        0.009704               1932       0.658557                               {NbtSS:8, TCP:7, IPv4:6}    10.0.0.115             10.0.1.1 SMB                SMB:R; Session Setup Andx, SpnegoNegTokenResp
Comment 1 Ofer Tal 2008-12-02 13:26:06 UTC
Created attachment 3770 [details]
log.smbd (level: 10) while issue is happenning

briefly setting the log level to 10 while the issue is happening on one of my servers yielded a huge log.smbd file (very active server), but no entries that looked particularly bad to my untrained eye.

I have attachet a gzipped copy in case it helps
Comment 2 Ofer Tal 2008-12-02 13:29:09 UTC
Created attachment 3771 [details]
zipped pcap network trace from web2 while seesions are denied

Network trace taken on WEB2 (10.0.2.1) while it's session requests are denied by the samba server (\\filer.baynet.com = 10.0.0.115)
Comment 3 Ofer Tal 2008-12-02 13:42:58 UTC
Just in case it is of interest, here is my smb.conf file (the poor thing is riddled with comments of block after block of configuration comments we have tried to add/remove to overcome this situation over the last 9+ months)

[global]
  workgroup = BAYNET
  ##Added  07-03-08##
  #netbios name = filer
  netbios aliases = filer1 filer2 filer filer.baynet.com 10.0.0.115
  reset on zero vc = yes
  ######################
  server string = CentOS 5 File Server
  timestamp logs = yes
  dos filetimes = yes
  dos filemode = yes
  inherit acls = yes
  name cache timeout = 0
  winbind uid = 101-6000000
  winbind gid = 101-6000000
  local master = no
  preferred master = no
 #  guest account = guest
  map to guest = Never
  unix charset = CP1252
  dos charset = CP850
  allow trusted domains = yes
  security = ADS
  disable netbios = no
  password server = *
  encrypt passwords = yes
  username level = 5
  load printers = no
  printing = none
  printcap name = /dev/null
  disable spoolss = yes
#  utmp directory = /var/run
#  utmp = yes
  name resolve order = host lmhosts wins bcast
  ldap timeout = 5
  restrict anonymous = 0
  password server = nas2.baynet.com pdc1.baynet.com
  use kerberos keytab = true

#  panic action = /usr/bin/backtrace %d > /tmp/segv_samba_%d.out 2>&1
#  root preexec = /usr/local/samba/bin/log_connect.sh '%u' '%m' '%I' '%S'
#  root postexec = /usr/local/samba/bin/log_disconnect.sh '%u' '%m' '%I' '%S'
  wins server =  10.0.0.200 10.0.0.201
  username level = 0
  realm = BAYNET.COM

###ADDED 
   winbind enum users = yes
   winbind enum groups = yes
   client use spnego = yes
   client ntlmv2 auth = yes
   winbind use default domain = yes
   restrict anonymous = 2
   domain master = no
   local master = no
   preferred master = no
   os level = 0
   disable netbios = no
###END

###Added  05-07-08
  bind interfaces only = yes
  interfaces = 10.0.0.115
  debug level = 0
###END

###Added  8/7/2008
  winbind cache time = 600
  winbind refresh tickets = Yes

  # note that "winbind offline logon = Yes" 
  # requires also a change to /etc/security/pam_winbind.conf : "cached_login = yes" needs to be enabled
  winbind offline logon = Yes
###END

  smb ports = 445
  max mux = 32767
  log level = 1 winbind:5
  socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 IPTOS_LOWDELAY SO_KEEPALIVE
  read raw = 1
  write raw = 1
  oplocks = no
  level 2 oplocks = no
  #oplocks = yes
  max xmit = 65535
  dead time = 15
  getwd cache = 1

[MLSData]
path = /mnt/MLSData
read only = no
comment = MLS Data and Images
follow symlinks = yes
printable = no
browseable = yes
map acl inherit = yes
create mask = 0777
security mask = 0777
directory mask = 0777
directory security mask = 0777
force create mode = 0777
force security mode = 0777
force directory mode = 0777
force directory security mode = 0777
oplocks = no
level 2 oplocks = no
#oplocks = yes

guest ok = yes
valid users = \
     @"BAYNET\Domain Users"   \
     @"AllLocalUsers"   \
     @"BAYNET\Domain Admins"   \
     @"BAYNET\SQLServerExec" \
	
write list = \
     @"BAYNET\Domain Users"   \
     @"AllLocalUsers"   \
     @"BAYNET\Domain Admins"   \
     @"BAYNET\SQLServerExec"

[App]
path = /mnt/App
read only = no
comment = Application Data
follow symlinks = yes
printable = no
browseable = yes
map acl inherit = yes
create mask = 0777
security mask = 0777
directory mask = 0777
directory security mask = 0777
force create mode = 0777
force security mode = 0777
force directory mode = 0777
force directory security mode = 0777
guest ok = yes
writeable = yes
oplocks = no
level 2 oplocks = no
#oplocks = yes

valid users = \
     @"BAYNET\Domain Users"   \
     @"AllLocalUsers"   \
     @"BAYNET\Domain Admins"   \
     @"BAYNET\WEB Users"

write list = \
     @"BAYNET\Domain Users"   \
     @"AllLocalUsers"   \
     @"BAYNET\Domain Admins"   \
     @"BAYNET\WEB Users"

Comment 4 Volker Lendecke 2008-12-02 13:48:31 UTC
No NT_STATUS_INVALID_PARAMETER in that logfile. You might want to increase the debug level of a single smbd with

smbcontrol <pid> debug 10
Comment 5 Ofer Tal 2008-12-02 13:53:43 UTC
Thanks,
I wasn't aware of that command, and will try to capture a log with it the next time this issue happens.

I will post it when it is collected
Comment 6 Ofer Tal 2008-12-02 16:36:10 UTC
Created attachment 3772 [details]
log with single failing smbd proc at level 10 showing INVALID PARAMETER

The issue occurred again 10 minutes ago (14:20) I identified the process (pid=19949 FWIW) and used smbcontrol to switch it to log level 10.

the attached file is the tail of that log since 14:20. I checked and it has several errors with "invalid parameter" logged. Hope this will be helpful in understanding this issue.

as before, killing the smbd process caused the windows machine to immediately re-connect. logging for that process was still at level 10 when I killed it, so that might provide additional info in the log file...

Thank you again for your help.
Comment 7 Ofer Tal 2008-12-03 13:52:31 UTC
Volker, I have attached a log file with the failing smbd daemon process only at debug level 10 and verified that it contains multiple NT_STATUS_INVALID_PARAMETER problems logged. 

Is there *any* additional way I can help figure this out? any more information I can collect or detail that can help?

My samba server has been running into this problem on average every 4-6 hours (bad...). 
Before upgrading to version 3.2.4-37 (sernet samba3 rpms) I was using 3.2.4-28 (sernet samba3 rpms as well) which also had the same problem but back then it happened roughly "only" every 12-24 hours. The RPMs that came from the CentOS rpm repository also exhibited the same issue.


Comment 8 Volker Lendecke 2008-12-03 14:54:27 UTC
No, for now that is sufficient. I can understand that you're desparate, but right now I am extremely busy, sorry. Paying customers come first :-) -- Volker
Comment 9 Ofer Tal 2008-12-23 11:40:08 UTC
I found the problem and believe that the code in register_existing_vuid() is incorrect and num_validated_vuids++;  is not needed and effectively make us run out of  vuid.

For every session, we increment num_validated_vuids twice but only decrement it once and this is wrong.

Additional notes:
1. This bug was also confirmed using smbd compiled from 3.2.6 latest sources
2. Applying this fix on the 3.2.6 sources and then running our production samba server with the fix the STATUS_INVALID_PARAMETER issue hasn't been noticed in over 72 hours (vs. 4-6 hours between failures with the vuid leak in place)

Comment 10 Volker Lendecke 2008-12-29 09:27:45 UTC
Thanks for that analysis. Got it reproduce here now, looking at your suggested fix.

Volker
Comment 11 Volker Lendecke 2008-12-29 15:22:48 UTC
Created attachment 3836 [details]
patch

Thanks for that bug report. You were fully correct, incrementing num_validated_vuids is pointless. Testing with BENCH-SESSSETUP from master showed another flaw in allocating vuids though that the attached patch should fix. I haven't pushed that yet, still doing a recompile and running a "make test".

Can you test that patch?

Thanks,

Volker
Comment 12 Volker Lendecke 2008-12-29 16:30:17 UTC
Created attachment 3837 [details]
corrected patch

Corrected patch -- thanks to Stefan Metzmacher :-)
Comment 13 Volker Lendecke 2008-12-30 04:26:29 UTC
For me it's fixed in all branches now. Please re-open if you still have problems.

Thanks,

Volker
Comment 14 Ofer Tal 2008-12-30 10:59:28 UTC
Volker,
My engineer looked at the patch, and it looks very similar to the solution we came up with. However, my engineer thinks the patched code is still not 100% right because the new function increment_next_vuid(&next_vuid) must also be called in register_existing_vuid().
 
Once this will be added, it should fix the problem
 
He thinks smbd will still have vuid wrapping issues with the last patch if this suggestion is not implemented.
To test for this last wrapping issue, please test with at least 64k time session setups and close to make sure vuid wraps up nicely.

Ofer
Comment 15 Volker Lendecke 2008-12-30 13:19:54 UTC
Could you please ask your engineer for a closer explanation of the reasons why we should also increment next_vuid in register_existing_vuid()? According to what I've seen in sesssetup.c, we always call register_initial_vuid() before register_existing_vuid(), so I think to get a 1-step incremented vuid by each complete session setup sequence I think it is sufficient to only increment next_vuid only once. This view is also supported by a quick test: I've run 

smbtorture //127.0.0.1/tmp -Uuser%pw sesssetup_bench -o 100000

from master, and I ended up with a final vuid of 34666. 34666+65536=100202, which is pretty much what I would expect.

Can you please ask your engineer to give me more exact instructions how to reproduce the problem he still sees?

Thanks a lot,

Volker
Comment 16 Ofer Tal 2009-01-02 11:25:54 UTC
After further testing I can also confirm that the patch works in all our test cases.
Although slightly different than the code we used to fix the problem it does solve the issue. re-closing the bug :)