Created attachment 14457 [details] Hack with parametric option I'm seeing occasional Durable Handle reconnect failures in a clustered Samba scenario when sharing out GlusterFS via fuse mount. After some digging I tracked it down to struct stat.st_ex_blocks not being consistent across cluster nodes. Samba log message: [2018/08/31 06:59:22.686724, 1, pid=31988, effective(0, 0), real(0, 0)] ../source3/smbd/durable.c:509(vfs_default_durable_reconnect_check_stat) vfs_default_durable_reconnect (dir/500mfile): stat_ex.st_ex_blocks differs: cookie:142608 != stat:142600, denying durable reconnect While we could implement the durable handle VFS functions in vfs_glusterfs, this would mean a *lot* of duplicated code that adds a maintenance burden. *scratches head* The attached hack can be used to work around the issue, it adds a parametric option that disables the check.
Note that this happens when the Durable Handle reconnect goes to a different node, eg when a client failover is triggered with ctdb moveip IP where IP is the one that the client connected to.
(In reply to Ralph Böhme from comment #0) > Created attachment 14457 [details] > Hack with parametric option > > I'm seeing occasional Durable Handle reconnect failures in a clustered Samba > scenario when sharing out GlusterFS via fuse mount. > > After some digging I tracked it down to struct stat.st_ex_blocks not being > consistent across cluster nodes. > > Samba log message: > > [2018/08/31 06:59:22.686724, 1, pid=31988, effective(0, 0), real(0, 0)] > ../source3/smbd/durable.c:509(vfs_default_durable_reconnect_check_stat) > vfs_default_durable_reconnect (dir/500mfile): stat_ex.st_ex_blocks > differs: cookie:142608 != stat:142600, denying durable reconnect > > While we could implement the durable handle VFS functions in vfs_glusterfs, > this would mean a *lot* of duplicated code that adds a maintenance burden. > > *scratches head* No, I think this looks like a bug in gluster. This info should be consistent across nodes, etc. > The attached hack can be used to work around the issue, it adds a parametric > option that disables the check.
(In reply to Michael Adam from comment #2) Yes, this is similar to the timestamp inconsistencies that recently got addressed: https://github.com/gluster/glusterfs/issues/208
(In reply to Ralph Böhme from comment #0) For the records: if this turns out to be by design, we should be able to pave over the issue by implementing SMB_VFS_RECONNECT in vfs_glusterfs.c like this: - do a stat on the file - unpack the passed in cookie - set cookie.stat_info.st_ex_blocks to the stat value from the file - pack the modified cookie - call SMB_VFS_NEXT_RECONNECT
No this has to be a Gluster bug. You can't report st_blocks as different on two different nodes for the same file.
Gluster RFE: https://github.com/gluster/glusterfs/issues/509
Fwiw, the same issue happens with st_blksize.
Created attachment 15593 [details] Patch for same problem with st_blksize