I have 2 IIS web servers acting as SAMBA clients with a back end SAMBA server for the shared file storage. Access to the share is done using a W2k3 domain user called BAYNET\ASPNET.
This works perfectly MOST of the time. recently, we upgraded samba to sernet's samba3-3.2.8-38 packages (server, client, winbind, utils, the works...)
And have been seeing occasional instances where one of our front end servers stops receiving files from the web servers until the smbd process associated with it's connection is killed.
Last time this happened, I took a network trace and switched the smbd process to debug 10 for a few seconds while this was going on.
The network trace just shows NT_STATUS_ACCESS_DENIED returned, and the log.smbd contained information such as:
[2009/04/13 08:24:50, 10, pid=29567] auth/auth_winbind.c:check_winbind_security(85)
check_winbind_security: wbcAuthenticateUserEx failed: WBC_ERR_AUTH_ERROR
[2009/04/13 08:24:50, 5, pid=29567] auth/auth.c:check_ntlm_password(272)
check_ntlm_password: winbind authentication for user [ASPNET] FAILED with error NT_STATUS_ACCESS_DENIED
[2009/04/13 08:24:50, 2, pid=29567] auth/auth.c:check_ntlm_password(318)
check_ntlm_password: Authentication for user [ASPNET] -> [ASPNET] FAILED with error NT_STATUS_ACCESS_DENIED
[2009/04/13 08:24:50, 5, pid=29567] auth/auth_util.c:free_user_info(1985)
attempting to free (and zero) a user_info structure
[2009/04/13 08:24:50, 10, pid=29567] auth/auth_util.c:free_user_info(1989)
structure was created for ASPNET
[2009/04/13 08:24:50, 3, pid=29567] smbd/error.c:error_packet_set(61)
error packet at smbd/sesssetup.c(127) cmd=115 (SMBsesssetupX) NT_STATUS_ACCESS_DENIED
[2009/04/13 08:24:50, 5, pid=29567] lib/util.c:show_msg(642)
[2009/04/13 08:24:50, 5, pid=29567] lib/util.c:show_msg(652)
[2009/04/13 08:24:50, 1, pid=17845] smbd/service.c:close_cnum(1405)
web2 (10.0.2.1) closed connection to service App
[2009/04/13 08:24:50, 1, pid=17845] smbd/service.c:make_connection_snum(1194)
web2 (10.0.2.1) connect to service App initially as user BAYNET\aspnet (uid=102, gid=104) (pid 17845)
smbd with pid=29567 is the one failing connections from WEB1. I included a couple of lines at the end from the smbd pid=17845 serving connections from WEB2 at the same time using the same domain user where the same problem was not happening at the same time.
server memory/cpu resources used at the time are very low as far as I can tell, and service by the second smbd process is uninterrupted, works perfectly.
both front end servers (samba clients) are identically configured, and the problem happens on either machine randomly as far as I can tell.
Looks to me like either smbd or winbind are running out of some resource (not necessarily cpu/ram, but more like entries in some internal session table, etc.)
I will shortly attach:
1. the log exerpt with pid=29567 at debug=10
2. network trace (wireshark pcap taken from the client. hint: filter by "ip.host=10.0.0.115" which is the file server's ip to see communication with the file server)
3. my samba configuration file.
Created attachment 4059 [details]
log.smbd showing access denied with failing process at debug level 10
Can you please upload the winbind log files ?
Created attachment 4061 [details]
network trace while problem is happening
To check communication with the file server, filter by "ip.host == 10.0.0.115" that is the ip address of the samba file server.
I had to filter out some unrelated traced network activity to fit it to the maximum upload size of 1mb... if the full one is needed, please contact me.
Created attachment 4062 [details]
Created attachment 4063 [details]
Just remembered I never mentioned the platform I am using, it is CentOS 5
Linux filer1 2.6.18-128.1.1.el5 #1 SMP Wed Mar 25 18:14:28 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux