Bug 6019 - File corruption in Clustered SMB/NFS environment managed via CTDB
Summary: File corruption in Clustered SMB/NFS environment managed via CTDB
Alias: None
Product: Samba 3.2
Classification: Unclassified
Component: Clustering (show other bugs)
Version: 3.2.3
Hardware: x86 Linux
: P3 critical
Target Milestone: ---
Assignee: Volker Lendecke
QA Contact: Samba QA Contact
Depends on:
Reported: 2009-01-07 13:36 UTC by Tim Clusters
Modified: 2009-01-14 18:06 UTC (History)
0 users

See Also:

Traces from Samba Client (10.81 KB, application/octet-stream)
2009-01-07 13:37 UTC, Tim Clusters
no flags Details
Traces from SMBD (40.04 KB, application/octet-stream)
2009-01-08 22:49 UTC, Tim Clusters
no flags Details
Patch (1.66 KB, patch)
2009-01-13 13:17 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Tim Clusters 2009-01-07 13:36:26 UTC

We have a two-server clustered NAS setup that acts as SMB as well as NFS servers in active/active configuration managed by CTDB(http://ctdb.samba.org/) authenticating users via Active Directory.

CTDB version is "ctdb-1.0-64", NFS is v3, and Samba version is "samba-3.2.3-ctdb.50".
When a file is updated by SMB clients(followed by file-close), other SMB clients can see and modify the file. But when a NFS client (same user) updates the same file(followed by file-close), only one SMB server can see the updates. The clients mounting from other SMB server sees the file corrupted. In addition, files updated using only SMB clients seems to get corrupted after sometime.

Our suspect is to do with NFS and SMB caching. We forced NFS server to export with sync option and same with SMB with the following, but does not help:
strict allocate = yes
strict locking = yes
strict sync = yes
sync always = yes
NFS exports:
/mnt/gpfs/nfsexport *(rw,no_root_squash,sync,fsid=222)
Tried NFS mount with sync:
node1:/mnt/gpfs/nfsexport       /mnt/nfs        nfs     rw,tcp,hard,intr,sync,rsize=32768,wsize=32768,vers=3       0 0

Disabled oplocks (oplocks=no) to prevent client-side SMB caching. In addition forced in SMB client via registry

OplocksDisabled REG_DWORD 1

Finally, we did SMB mount on the server itself + used Linux SMBclient to eliminate Windows OS + network issue. The data is consistent only on one of the SMB servers even though it is consistent in underlying storage + GPFS file-system. Access via NFS is consistent, its only via SMB which is weird. Seems like SMB server cache is not consistent/stale across clustered environment.

Stracing the smbclient everything is similar except, the following:

Good one (or SMB server where file is ok):
select(8, [4 7], [], NULL, {9999, 0})   = 1 (in [4], left {9998, 999000})^M
ioctl(4, FIONREAD, [81])                = 0^M
recvfrom(4, "\0\0\0M\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0\0\0"..., 81, 0, NULL, NULL) = 81^M
write(6, "New file from NFS\n", 18)     = 18^M
select(5, [4], NULL, NULL, {20, 0})     = 1 (in [4], left {20, 0})^M


Bad one (or SMB server from which file obtained is corrupted):
select(8, [4 7], [], NULL, {9999, 0})   = 1 (in [4], left {9999, 0})^M
ioctl(4, FIONREAD, [144])               = 0^M
recvfrom(4, "\0\0\0M\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0\0\0"..., 144, 0, NULL, NULL) = 144^M
write(6, "\0\0\0M\377SMB.\0\0\0\0\210\1\310\0\0", 18) = 18^M
write(4, "\0\0\0)\377SMB\4\0\0\0\0\10\1\310\0\0\0\0\0\0\0\0\0\0\0"..., 45) = 45^M
select(5, [4], NULL, NULL, {20, 0})     = 1 (in [4], left {19, 999000})^M


Actual file content in underlying FS:
[root@D1950-01 testuserD]# cat filee7.txt
New file from NFS

The actual file content is consistent in the underlying file-system in both the servers, somehow SMB seems to write chunk of SMB headers instead of original file contents to the client (when mounting from the bad SMB server). More details of strace is attached. 

Suggestions/thoughts/input to resolve this will be greatly appreciated. I do not see any errors being reported in log.smb, log.client.smb, log.ctdb or network dump on port 445. Let me know if you need additional details.

Thanks in Advance,


      workgroup = TESTDOMAIN2
      realm = TESTDOMAIN2.LOCAL
      netbios name = CTDB-NAS
      server string = Clustered CIFS
      security = ADS
      auth methods = winbind, sam
      password server =
      private dir = /mnt/gpfs/CTDB_AD
      passdb backend = tdbsam
      log level = 3 winbind:5 auth:10 passdb:5
      syslog = 0
      log file = /var/log/samba/log.%m
      max log size = 10000
      large readwrite = No
      deadtime = 15
      use mmap = No
      clustering = Yes
      disable spoolss = Yes
      machine password timeout = 999999999
      local master = No
      dns proxy = No
      ldap admin dn = cn=ldap,cn=Users,dc=testdomain2,dc=local
      ldap idmap suffix = dc=testdomain2,dc=local
      ldap suffix = dc=testdomain2,dc=local
      idmap backend = ad
      idmap uid = 5000-100000000
      idmap gid = 5000-100000000
      template homedir = /home/%D+%U
      template shell = /bin/bash
      winbind separator = +
      winbind enum users = Yes
      winbind enum groups = Yes
      notify:inotify = no
      idmap:cache = no
      nfs4:acedup = merge
      nfs4:chown = yes
      nfs4:mode = special
      gpfs:sharemodes = yes
      fileid:mapping = fsname
      force unknown acl user = Yes
      strict allocate = Yes
      strict sync = Yes
      sync always = Yes
      use sendfile = Yes
      mangled names = No
      blocking locks = No
      oplocks = No
      strict locking = Yes
      wide links = No
      vfs objects = gpfs, fileid

      comment = GPFS File Share
      path = /mnt/gpfs/nfsexport
      read only = No
      inherit permissions = Yes
      inherit acls = Yes
Comment 1 Tim Clusters 2009-01-07 13:37:37 UTC
Created attachment 3861 [details]
Traces from Samba Client

Traces from SMB client from different clustered SMB servers.
Comment 2 Jeremy Allison 2009-01-07 19:22:02 UTC
I'm not sure this bug is valid. NFS simply doesn't provide the file coherence needed for clustered Samba, NFSv3 is not a clustered filesystem. An NFS client redirector (which is what smbd is running on top of here) can make caching decisions that mean open files will not be seen in a coherent state. You need a clustered filesystem with lease callbacks in order to do this.
Comment 3 Tim Clusters 2009-01-08 11:20:54 UTC
Hi Jeremy,

We are using GPFS as the underlying clustered file system. NFS and SMB sits on top of GPFS. The clustered SMB and NFS servers is managed by CTDB.

We are seeing file corruption when the same file is updated (open + modify + close) by both SMB and NFS clients. We have enforced synchronous behaviour in both SMB and NFS servers + clients. 

I presume clustered Samba and clustered NFS can co-exists on top of GPFS managed by CTDB. 

Please clarify and accept my apologies if I misinterpreted something.


(In reply to comment #2)
> I'm not sure this bug is valid. NFS simply doesn't provide the file coherence
> needed for clustered Samba, NFSv3 is not a clustered filesystem. An NFS client
> redirector (which is what smbd is running on top of here) can make caching
> decisions that mean open files will not be seen in a coherent state. You need a
> clustered filesystem with lease callbacks in order to do this.
> Jeremy.

Comment 4 Tim Clusters 2009-01-08 22:47:00 UTC
Disabling "sendfile" seem to solve this issue. Not sure, if there is a bug in the Samba code when this option is enabled(especially when the same node serves both NFS and SMB).

Tracing through the SMB Server Daemon, I found the following interesting. On the SMB server which seems to provide corrupt file contents to the SMB client, the following is seen:

2445 [pid 26444] fstat(36, {st_mode=S_IFREG|0744, st_size=103, ...}) = 0^M
2446 [pid 26444] sendto(32, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 63, MSG_MORE, NULL, 0) = 63
2447 [pid 26444] sendfile(32, 36, [0], 103)  = -1 EINVAL (Invalid argument)^M
2448 [pid 26444] pread(36, "NFS client mounting from 97.6\n\r\n"..., 103, 0)         = 103^M
   2449 [pid 26444] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017,         ...}) = 0^M
   2450 [pid 26444] geteuid()                   = 11005^M
   2451 [pid 26444] write(26, "[2009/01/08 11:17:07,  3] smbd/r"..., 61) = 61^M
   2452 [pid 26444] geteuid()                   = 11005^M
   2453 [pid 26444] write(26, "  send_file_readX fnum=9487 max="..., 46) = 46^M
   2454 [pid 26444] write(32, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\        0\0\0"..., 166) = 166^M
   2455 [pid 26444] select(33, [9 30 32], [], NULL, {39, 685075}) = 1 (in [32],

SMBclient trace from BAD SMB server confirm this:
recvfrom(4, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 229, 0, NULL, NULL) = 229^M

When the system call sendfile() errors, the SMB server sends SMB Header + File Contents to the SMB client. The total size of SMB header + file contents is size of the file. Hence the start of the file contains SMB header + actual file-contents written is (size of file - SMB_header_size).

On the other server, since it did not have any NFS client mounted, sendfile() is successful and it sends actual file contents to the requesting SMB client.

2636 [pid 19495] fstat(37, {st_mode=S_IFREG|0744, st_size=103, ...}) = 0^M
2637 [pid 19495] sendto(32, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0        \0\0\0"..., 63, MSG_MORE, NULL, 0) = 63^M
2638 [pid 19495] sendfile(32, 37, [0], 103)  = 103^M
 2639 [pid 19495] stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=1017,         ...}) = 0^M
   2640 [pid 19495] geteuid()                   = 11005^M
   2641 [pid 19495] write(26, "[2009/01/08 11:29:07,  3] smbd/r"..., 61) = 61^M
   2642 [pid 19495] geteuid()                   = 11005^M
   2643 [pid 19495] write(26, "  send_file_readX: sendfile fnum"..., 57) = 57^M
   2644 [pid 19495] select(33, [9 30 32], [], NULL, {49, 19458}) = 1 (in [32], l 

SMBClient trace
recvfrom(4, "\0\0\0\242\377SMB.\0\0\0\0\210\1\310\0\0\0\0\0\0\0\0\0"..., 166, 0, NULL, NULL) = 166^

Note the SMBclient from Bad server receives 229 bytes instead of 166 bytes, since 63 bytes is padded before the file-contents.

The SMB packet structure is "file contents followed by SMB header(63 bytes)". The bad SMB packet structure is "SMB Header(63 bytes) + file contents + SMB Header(63 bytes).

With sendfile() disabled, I have not seen file-corruption in the past 5 hours. Please advise if the above hypothesis is valid.

Comment 5 Tim Clusters 2009-01-08 22:49:56 UTC
Created attachment 3865 [details]
Traces from SMBD

Traces from SMBD
Comment 6 Jeremy Allison 2009-01-09 22:04:54 UTC
Looks like a bad interaction between sendfile in the kernel and the NFS caching code to me. We depend on st_size being correct for sendfile. I don't think this is a Samba bug as there are many NAS vendors on Linux currently using sendfile in Samba. An interaction between NFS and GPFS looks likely to me.

Sorry for the earlier confusion, I didn't undertand your setup and that you were using GPFS as your underlying filesystem.

Comment 7 Tim Clusters 2009-01-12 19:03:30 UTC

Seems like this problem was resolved in the earlier release (but we see this now when NFS comes into picture)


This is documented as:

Changes since 3.0.10
o   Jeremy Allison 
* Fix the problem we get on Linux where sendfile fails, but we've
      already sent the header using send().

Samba version installed in my testbed is:

rpm -qa | grep samba


(In reply to comment #6)
> Looks like a bad interaction between sendfile in the kernel and the NFS caching
> code to me. 
Comment 8 Jeremy Allison 2009-01-13 13:17:14 UTC
Created attachment 3871 [details]

Volker tracked this one down. Handling of EINVAL had been erroneously added to smbd/reply.c in the readX code instead of in lib/sendfile.c where it should have been.
This fixes an identical test case to this bug (where I force sendfile to return EINVAL) for me.
Comment 9 Tim Clusters 2009-01-14 18:06:42 UTC

Thanks for tracking this down. The patch seems to have fixed the issue and I have not seen any corruption in the past 8 hours. This bug may be closed.


(In reply to comment #8)