Hi, We have a two-server clustered NAS setup that acts as SMB as well as NFS servers in active/active configuration managed by CTDB(http://ctdb.samba.org/) authenticating users via Active Directory. CTDB version is "ctdb-1.0-64", NFS is v3, and Samba version is "samba-3.2.3-ctdb.50". When a file is updated by SMB clients(followed by file-close), other SMB clients can see and modify the file. But when a NFS client (same user) updates the same file(followed by file-close), only one SMB server can see the updates. The clients mounting from other SMB server sees the file corrupted. In addition, files updated using only SMB clients seems to get corrupted after sometime. Our suspect is to do with NFS and SMB caching. We forced NFS server to export with sync option and same with SMB with the following, but does not help: strict allocate = yes strict locking = yes strict sync = yes sync always = yes NFS exports: /mnt/gpfs/nfsexport *(rw,no_root_squash,sync,fsid=222) Tried NFS mount with sync: node1:/mnt/gpfs/nfsexport /mnt/nfs nfs rw,tcp,hard,intr,sync,rsize=32768,wsize=32768,vers=3 0 0 Disabled oplocks (oplocks=no) to prevent client-side SMB caching. In addition forced in SMB client via registry HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\MRXSmb\Parameters\ OplocksDisabled REG_DWORD 1 Finally, we did SMB mount on the server itself + used Linux SMBclient to eliminate Windows OS + network issue. The data is consistent only on one of the SMB servers even though it is consistent in underlying storage + GPFS file-system. Access via NFS is consistent, its only via SMB which is weird. Seems like SMB server cache is not consistent/stale across clustered environment. Stracing the smbclient everything is similar except, the following: Good one (or SMB server where file is ok): select(8, [4 7], [], NULL, {9999, 0}) = 1 (in [4], left {9998, 999000})^M ioctl(4, FIONREAD, [81]) = 0^M recvfrom(4, "\0\0\0M\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0\0\0"..., 81, 0, NULL, NULL) = 81^M write(6, "New file from NFS\n", 18) = 18^M select(5, [4], NULL, NULL, {20, 0}) = 1 (in [4], left {20, 0})^M -------------- Bad one (or SMB server from which file obtained is corrupted): select(8, [4 7], [], NULL, {9999, 0}) = 1 (in [4], left {9999, 0})^M ioctl(4, FIONREAD, [144]) = 0^M recvfrom(4, "\0\0\0M\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0\0\0"..., 144, 0, NULL, NULL) = 144^M write(6, "\0\0\0M\377SMB.\0\0\0\0\210\1\310\0\0", 18) = 18^M write(4, "\0\0\0)\377SMB\4\0\0\0\0\10\1\310\0\0\0\0\0\0\0\0\0\0\0"..., 45) = 45^M select(5, [4], NULL, NULL, {20, 0}) = 1 (in [4], left {19, 999000})^M ---- Actual file content in underlying FS: [root@D1950-01 testuserD]# cat filee7.txt New file from NFS The actual file content is consistent in the underlying file-system in both the servers, somehow SMB seems to write chunk of SMB headers instead of original file contents to the client (when mounting from the bad SMB server). More details of strace is attached. Suggestions/thoughts/input to resolve this will be greatly appreciated. I do not see any errors being reported in log.smb, log.client.smb, log.ctdb or network dump on port 445. Let me know if you need additional details. Thanks in Advance, -Tim smb.conf [global] workgroup = TESTDOMAIN2 realm = TESTDOMAIN2.LOCAL netbios name = CTDB-NAS server string = Clustered CIFS security = ADS auth methods = winbind, sam password server = 172.16.2.25 private dir = /mnt/gpfs/CTDB_AD passdb backend = tdbsam log level = 3 winbind:5 auth:10 passdb:5 syslog = 0 log file = /var/log/samba/log.%m max log size = 10000 large readwrite = No deadtime = 15 use mmap = No clustering = Yes disable spoolss = Yes machine password timeout = 999999999 local master = No dns proxy = No ldap admin dn = cn=ldap,cn=Users,dc=testdomain2,dc=local ldap idmap suffix = dc=testdomain2,dc=local ldap suffix = dc=testdomain2,dc=local idmap backend = ad idmap uid = 5000-100000000 idmap gid = 5000-100000000 template homedir = /home/%D+%U template shell = /bin/bash winbind separator = + winbind enum users = Yes winbind enum groups = Yes notify:inotify = no idmap:cache = no nfs4:acedup = merge nfs4:chown = yes nfs4:mode = special gpfs:sharemodes = yes fileid:mapping = fsname force unknown acl user = Yes strict allocate = Yes strict sync = Yes sync always = Yes use sendfile = Yes mangled names = No blocking locks = No oplocks = No strict locking = Yes wide links = No vfs objects = gpfs, fileid [global-share] comment = GPFS File Share path = /mnt/gpfs/nfsexport read only = No inherit permissions = Yes inherit acls = Yes
Created attachment 3861 [details] Traces from Samba Client Traces from SMB client from different clustered SMB servers.
I'm not sure this bug is valid. NFS simply doesn't provide the file coherence needed for clustered Samba, NFSv3 is not a clustered filesystem. An NFS client redirector (which is what smbd is running on top of here) can make caching decisions that mean open files will not be seen in a coherent state. You need a clustered filesystem with lease callbacks in order to do this. Jeremy.
Hi Jeremy, We are using GPFS as the underlying clustered file system. NFS and SMB sits on top of GPFS. The clustered SMB and NFS servers is managed by CTDB. We are seeing file corruption when the same file is updated (open + modify + close) by both SMB and NFS clients. We have enforced synchronous behaviour in both SMB and NFS servers + clients. I presume clustered Samba and clustered NFS can co-exists on top of GPFS managed by CTDB. Please clarify and accept my apologies if I misinterpreted something. Thanks, -Tim (In reply to comment #2) > I'm not sure this bug is valid. NFS simply doesn't provide the file coherence > needed for clustered Samba, NFSv3 is not a clustered filesystem. An NFS client > redirector (which is what smbd is running on top of here) can make caching > decisions that mean open files will not be seen in a coherent state. You need a > clustered filesystem with lease callbacks in order to do this. > Jeremy.
Disabling "sendfile" seem to solve this issue. Not sure, if there is a bug in the Samba code when this option is enabled(especially when the same node serves both NFS and SMB). Tracing through the SMB Server Daemon, I found the following interesting. On the SMB server which seems to provide corrupt file contents to the SMB client, the following is seen: --- 2445 [pid 26444] fstat(36, {st_mode=S_IFREG|0744, st_size=103, ...}) = 0^M 2446 [pid 26444] sendto(32, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 63, MSG_MORE, NULL, 0) = 63 2447 [pid 26444] sendfile(32, 36, [0], 103) = -1 EINVAL (Invalid argument)^M 2448 [pid 26444] pread(36, "NFS client mounting from 97.6\n\r\n"..., 103, 0) = 103^M 2449 [pid 26444] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0^M 2450 [pid 26444] geteuid() = 11005^M 2451 [pid 26444] write(26, "[2009/01/08 11:17:07, 3] smbd/r"..., 61) = 61^M 2452 [pid 26444] geteuid() = 11005^M 2453 [pid 26444] write(26, " send_file_readX fnum=9487 max="..., 46) = 46^M 2454 [pid 26444] write(32, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\ 0\0\0"..., 166) = 166^M 2455 [pid 26444] select(33, [9 30 32], [], NULL, {39, 685075}) = 1 (in [32], ---- SMBclient trace from BAD SMB server confirm this: recvfrom(4, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 229, 0, NULL, NULL) = 229^M When the system call sendfile() errors, the SMB server sends SMB Header + File Contents to the SMB client. The total size of SMB header + file contents is size of the file. Hence the start of the file contains SMB header + actual file-contents written is (size of file - SMB_header_size). On the other server, since it did not have any NFS client mounted, sendfile() is successful and it sends actual file contents to the requesting SMB client. 2636 [pid 19495] fstat(37, {st_mode=S_IFREG|0744, st_size=103, ...}) = 0^M 2637 [pid 19495] sendto(32, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0 \0\0\0"..., 63, MSG_MORE, NULL, 0) = 63^M 2638 [pid 19495] sendfile(32, 37, [0], 103) = 103^M 2639 [pid 19495] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017, ...}) = 0^M 2640 [pid 19495] geteuid() = 11005^M 2641 [pid 19495] write(26, "[2009/01/08 11:29:07, 3] smbd/r"..., 61) = 61^M 2642 [pid 19495] geteuid() = 11005^M 2643 [pid 19495] write(26, " send_file_readX: sendfile fnum"..., 57) = 57^M 2644 [pid 19495] select(33, [9 30 32], [], NULL, {49, 19458}) = 1 (in [32], l SMBClient trace recvfrom(4, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 166, 0, NULL, NULL) = 166^ Note the SMBclient from Bad server receives 229 bytes instead of 166 bytes, since 63 bytes is padded before the file-contents. The SMB packet structure is "file contents followed by SMB header(63 bytes)". The bad SMB packet structure is "SMB Header(63 bytes) + file contents + SMB Header(63 bytes). With sendfile() disabled, I have not seen file-corruption in the past 5 hours. Please advise if the above hypothesis is valid. Thanks, -Tim
Created attachment 3865 [details] Traces from SMBD Traces from SMBD
Looks like a bad interaction between sendfile in the kernel and the NFS caching code to me. We depend on st_size being correct for sendfile. I don't think this is a Samba bug as there are many NAS vendors on Linux currently using sendfile in Samba. An interaction between NFS and GPFS looks likely to me. Sorry for the earlier confusion, I didn't undertand your setup and that you were using GPFS as your underlying filesystem. Jeremy.
Jeremy, Seems like this problem was resolved in the earlier release (but we see this now when NFS comes into picture) http://us3.samba.org/samba/history/samba-3.0.22.html This is documented as: ################## Changes since 3.0.10 -------------------- commits ------- o Jeremy Allison * Fix the problem we get on Linux where sendfile fails, but we've already sent the header using send(). #################### Samba version installed in my testbed is: rpm -qa | grep samba samba-client-3.2.3-ctdb.50 samba-common-3.2.3-ctdb.50 samba-3.2.3-ctdb.50 samba-doc-3.2.3-ctdb.50 samba-winbind-32bit-3.2.3-ctdb.50 samba-swat-3.2.3-ctdb.50 samba-debuginfo-3.2.3-ctdb.50 Regards, -Tim (In reply to comment #6) > Looks like a bad interaction between sendfile in the kernel and the NFS caching > code to me.
Created attachment 3871 [details] Patch Volker tracked this one down. Handling of EINVAL had been erroneously added to smbd/reply.c in the readX code instead of in lib/sendfile.c where it should have been. This fixes an identical test case to this bug (where I force sendfile to return EINVAL) for me. Jeremy.
Volker/Jeremy, Thanks for tracking this down. The patch seems to have fixed the issue and I have not seen any corruption in the past 8 hours. This bug may be closed. Thanks, -Tim (In reply to comment #8) >