Bug 13596 - Durable Handle reconnect fails in cluster on GlusterFS because stat_ex.st_ex_blocks is not consistent
Summary: Durable Handle reconnect fails in cluster on GlusterFS because stat_ex.st_ex_...
Status: ASSIGNED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: unspecified
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Guenther Deschner
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-31 08:22 UTC by Ralph Böhme
Modified: 2019-10-31 14:52 UTC (History)
5 users (show)

See Also:


Attachments
Hack with parametric option (975 bytes, patch)
2018-08-31 08:22 UTC, Ralph Böhme
no flags Details
Patch for same problem with st_blksize (909 bytes, patch)
2019-10-31 14:52 UTC, Ralph Böhme
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ralph Böhme 2018-08-31 08:22:29 UTC
Created attachment 14457 [details]
Hack with parametric option

I'm seeing occasional Durable Handle reconnect failures in a clustered Samba scenario when sharing out GlusterFS via fuse mount.

After some digging I tracked it down to struct stat.st_ex_blocks not being consistent across cluster nodes.

Samba log message:

[2018/08/31 06:59:22.686724,  1, pid=31988, effective(0, 0), real(0, 0)] ../source3/smbd/durable.c:509(vfs_default_durable_reconnect_check_stat)
  vfs_default_durable_reconnect (dir/500mfile): stat_ex.st_ex_blocks differs: cookie:142608 != stat:142600, denying durable reconnect

While we could implement the durable handle VFS functions in vfs_glusterfs, this would mean a *lot* of duplicated code that adds a maintenance burden.

*scratches head*

The attached hack can be used to work around the issue, it adds a parametric option that disables the check.
Comment 1 Ralph Böhme 2018-08-31 08:24:19 UTC
Note that this happens when the Durable Handle reconnect goes to a different node, eg when a client failover is triggered with ctdb moveip IP where IP is the one that the client connected to.
Comment 2 Michael Adam 2018-08-31 09:44:23 UTC
(In reply to Ralph Böhme from comment #0)
> Created attachment 14457 [details]
> Hack with parametric option
> 
> I'm seeing occasional Durable Handle reconnect failures in a clustered Samba
> scenario when sharing out GlusterFS via fuse mount.
> 
> After some digging I tracked it down to struct stat.st_ex_blocks not being
> consistent across cluster nodes.
> 
> Samba log message:
> 
> [2018/08/31 06:59:22.686724,  1, pid=31988, effective(0, 0), real(0, 0)]
> ../source3/smbd/durable.c:509(vfs_default_durable_reconnect_check_stat)
>   vfs_default_durable_reconnect (dir/500mfile): stat_ex.st_ex_blocks
> differs: cookie:142608 != stat:142600, denying durable reconnect
> 
> While we could implement the durable handle VFS functions in vfs_glusterfs,
> this would mean a *lot* of duplicated code that adds a maintenance burden.
> 
> *scratches head*

No, I think this looks like a bug in gluster. This info should be consistent across nodes, etc.


> The attached hack can be used to work around the issue, it adds a parametric
> option that disables the check.
Comment 3 Ralph Böhme 2018-08-31 09:50:55 UTC
(In reply to Michael Adam from comment #2)
Yes, this is similar to the timestamp inconsistencies that recently got addressed:

https://github.com/gluster/glusterfs/issues/208
Comment 4 Ralph Böhme 2018-08-31 12:50:47 UTC
(In reply to Ralph Böhme from comment #0)
For the records: if this turns out to be by design, we should be able to pave over the issue by implementing SMB_VFS_RECONNECT in vfs_glusterfs.c like this:

- do a stat on the file

- unpack the passed in cookie

- set cookie.stat_info.st_ex_blocks to the stat value from the file

- pack the modified cookie

- call SMB_VFS_NEXT_RECONNECT
Comment 5 Jeremy Allison 2018-08-31 21:24:18 UTC
No this has to be a Gluster bug. You can't report st_blocks as different on two different nodes for the same file.
Comment 6 Ralph Böhme 2018-09-12 15:32:45 UTC
Gluster RFE: https://github.com/gluster/glusterfs/issues/509
Comment 7 Ralph Böhme 2019-10-30 13:31:21 UTC
Fwiw, the same issue happens with st_blksize.
Comment 8 Ralph Böhme 2019-10-31 14:52:41 UTC
Created attachment 15593 [details]
Patch for same problem with st_blksize