Bug 14217 - vfs_ceph_snapshots can't resolve deleted paths
Summary: vfs_ceph_snapshots can't resolve deleted paths
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: VFS Modules (show other bugs)
Version: 4.11.3
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: David Disseldorp
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-12-13 12:03 UTC by David Disseldorp
Modified: 2024-09-17 04:22 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description David Disseldorp 2019-12-13 12:03:45 UTC
Consider the following...

Base share contents (.snap is hidden by CephFS):
  /mnt/cephfs

Snapshot contents:
  /mnt/cephfs/.snap/mysnapshot/dir_created_b4_deleted_after

traffic:
  enum_snaps("/mnt/cephfs/") -> returns gmt_tok("/mnt/cephfs/.snap/mysnapshot/")

  twrp=gmt_tok("/mnt/cephfs/.snap/mysnapshot/")
  find(twrp, "/mnt/cephfs/") -> returns "dir_created_b4_deleted_after"
  create(twrp, "/mnt/cephfs/dir_created_b4_deleted_after") -> returns NOT_FOUND

This is because of the way vfs_ceph_snapshots resolves snapshot paths. Ceph propagates snapshots from parent directories to all children, and vfs_ceph_snapshots currently relies upon this to avoid walking up the directory tree looking for snapshots which match a given timewarp token. This logic breaks if any of the timewarped path components correspond to deleted directories, where in the example above, vfs_ceph_snapshots looks for the timewarp token via:

  readdir("/mnt/cephfs/dir_created_b4_deleted_after/.snap")

where instead is should be looking under "/mnt/cephfs/.snap".
Comment 1 Shwetha Acharya 2024-09-04 11:11:48 UTC
Hi David,

I was verifying the working of vfs_ceph_snapshots with cephfs kernel mount and vfs ceph. 

On deleting the snapshots, I saw that the snapshot entry got updated at the windows client, accordingly.

I am trying to figure out how exactly we reproduce this bug. Is it not possible to reproduce this bug from windows previous versions tab? If not, how exactly can I reproduce it?

Could you please let me know if I am missing something.
Appreciate any pointers as I am pretty new to this code base.

Regards,
Shwetha
Comment 2 David Disseldorp 2024-09-17 04:18:33 UTC
Hi Shwetha,

Sorry, I don't have a Ceph+Samba environment where I can reproduce this bug, but I would expect it to still be an issue.

(In reply to Shwetha Acharya from comment #1)
> Hi David,
> 
> I was verifying the working of vfs_ceph_snapshots with cephfs kernel mount
> and vfs ceph. 
> 
> On deleting the snapshots, I saw that the snapshot entry got updated at the
> windows client, accordingly.
> 
> I am trying to figure out how exactly we reproduce this bug. Is it not
> possible to reproduce this bug from windows previous versions tab? If not,
> how exactly can I reproduce it?

The initial example is quite descriptive, so should hopefully be enough as a reference. The problem isn't deleted snapshots, but deleted directories which need to be accessed via a parent directory's .snap path.
IIRC, a fix may need to involve checking each parent directory's .snap path all the way up to the base share path when attempting to resolve a timewarp token. Alternatively it might be better to maintain an in-memory twrp to snap-path map somewhere.
Comment 3 David Disseldorp 2024-09-17 04:22:42 UTC
(In reply to David Disseldorp from comment #2)
...
> Alternatively it might be better to maintain an in-memory twrp to
> snap-path map somewhere.

On second thought, in-memory might not be an option if a client can enumerate-snaps, drop the connection, reconnect and immediately access a snapshot via twrp token (without snap re-enumeration).