Bug 15440 - Unable to copy and write files from clients to Ceph cluster via SMB Linux gateway with Ceph VFS module
Summary: Unable to copy and write files from clients to Ceph cluster via SMB Linux gat...
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: VFS Modules (show other bugs)
Version: 4.17.7
Hardware: x86 Linux
: P5 major (vote)
Target Milestone: ---
Assignee: Jule Anger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-08-01 14:23 UTC by Ivan
Modified: 2024-01-31 20:42 UTC (History)
6 users (show)

See Also:


Attachments
patch from master (2.26 KB, patch)
2023-12-07 15:58 UTC, Guenther Deschner
anoopcs: review+
anoopcs: ci-passed+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ivan 2023-08-01 14:23:05 UTC
We are currently testing the Ceph VFS module to provide access to our Ceph cluster for Windows and MacOS clients with an Ubuntu SMB server acting as a gateway for these clients to access the cluster. We can establish sessions for our clients with the Samba server and they can view and transfer data from the Ceph cluster to their local devices. However, we are unable to transfer files from the clients to the cluster. We have tried Samba versions 4.14 to 4.17 and Ubuntu versions 20.04 to 23.04 but we have been unsuccessful so far where we are currently using the Samba version 4.17.7 under a Ubuntu 23.04 distribution with a 6.2 kernel.

When we try to write a test file (DellInstaller_x64.exe) from a Windows client to a samba_tests/ folder on the cluster via the "vfs" share on the SMB server, we receive the following errors in the server logs:

    [2023/07/14 14:14:54.336643,  5, pid=1332465, effective(27080, 27080), real(27080, 0)] ../../source3/smbd/smb2_trans2.c:3490(smbd_do_qfilepathinfo)
      smbd_do_qfilepathinfo: samba_tests/DellInstaller_x64.exe (fnum 319841074) level=1048 max_data=252
    [2023/07/14 14:14:54.336648, 10, pid=1332465, effective(27080, 27080), real(27080, 0)] ../../source3/smbd/dosmode.c:715(fdos_mode)
      fdos_mode: samba_tests/DellInstaller_x64.exe
    [2023/07/14 14:14:54.336653, 10, pid=1332465, effective(27080, 27080), real(27080, 0), class=vfs] ../../source3/modules/vfs_ceph.c:1309(cephwrap_fgetxattr)
      cephwrap_fgetxattr: [CEPH] fgetxattr(0x5609a735f4b0, 0x5609a74bf220, user.DOSATTRIB, 0x7ffc79383910, 256)
    [2023/07/14 14:14:54.336658,  0, pid=1332465, effective(27080, 27080), real(27080, 0)] ../../source3/smbd/fd_handle.c:115(fsp_get_io_fd)
      fsp_get_io_fd: fsp [samba_tests/DellInstaller_x64.exe] is a path referencing fsp
    [2023/07/14 14:14:54.336678, 10, pid=1332465, effective(27080, 27080), real(27080, 0), class=vfs] ../../source3/modules/vfs_ceph.c:1311(cephwrap_fgetxattr)
      cephwrap_fgetxattr: [CEPH] fgetxattr(...) = -9
    [2023/07/14 14:14:54.336683,  5, pid=1332465, effective(27080, 27080), real(27080, 0)] ../../source3/smbd/dosmode.c:387(fget_ea_dos_attribute)
      fget_ea_dos_attribute: Cannot get attribute from EA on file samba_tests/DellInstaller_x64.exe: Error = Bad file descriptor


This is the "vfs" share section in our /etc/samba/smb.conf:

[vfs]
  comment = Home Directories
  path = /ivan/
  vfs objects = ceph
  ceph: config_file = /etc/ceph/ceph.conf
  ceph: user_id = samba.gw
  read only = no
  oplocks = no
  kernel share modes = no
  inherit acls = Yes
  valid users = ivan


I think this may arise from the following lines in the cephwrap_fgetxattr function assigning "ret":

     static ssize_t cephwrap_fgetxattr(struct vfs_handle_struct *handle, struct files_struct *fsp, const char *name, void *value, size_t size)
    {
            int ret;
            DBG_DEBUG("[CEPH] fgetxattr(%p, %p, %s, %p, %llu)\n", handle, fsp, name, value, llu(size));
            ret = ceph_fgetxattr(handle->data, fsp_get_io_fd(fsp), name, value, size);


It seems that "fsp" is being passed to "fsp_get_io_fd" even though it is a pathref file handle. I was wondering whether a conditional similar to in "cephwrap_flistxattr" could be used to catch this in a manner such as below?

     static ssize_t cephwrap_fgetxattr(struct vfs_handle_struct *handle, struct files_struct *fsp, const char *name, void *value, size_t size)
    {
            int ret;
            DBG_DEBUG("[CEPH] fgetxattr(%p, %p, %s, %p, %llu)\n", handle, fsp, name, value, llu(size));
        if (!fsp->fsp_flags.is_pathref) {
                /*
                 * We can use an io_fd to get an xattr.
                 */
                ret = ceph_fgetxattr(handle->data,
                                        fsp_get_io_fd(fsp),
                                        name, value,
                                        size);
        } else {
                /*
                 * This is no longer a handle based call.
                 */
                ret = ceph_getxattr(handle->data,
                                        fsp->fsp_name->base_name,
                                        name, value,
                                        size);
        }


I'm however not an experienced filesystem developer so I'm unsure if this would result in further problems or even fix this issue. Curiously, uploading using "dd" from a Linux client was successful (though the transfer rates were below 1 MB/s). This may be a case of me writing my config incorrectly or having the wrong set-up, in which case I would more than happy to receive pointers on how we can make our VFS server work.
Comment 1 Ralph Böhme 2023-08-02 07:57:20 UTC
(In reply to Ivan from comment #0)
Yes, this is basically the correct approach. Cc'ing some folks involved with the ceph module. Hopefully one of em can pick this up?

Besides that, from what I've seen, people seem to make better experience with using a Ceph kernel mount and then just sharing that filesystem without the vfs_ceph module.
Comment 2 Ivan 2023-08-02 08:25:49 UTC
(In reply to Ralph Böhme from comment #1)
We've been using the kernel mount and sharing that via Samba in production for a couple of years now and it has been working very successfully.

However we are seeing that transfer rates from Windows clients plateau out at to a little over 1 Gbps (particularly with larger files > 10 GB) on 10 Gbit interfaces whilst Linux clients can maintain ~400 MB/s so long as the wsize and rsize are increased (we've found that wsize=rsize=8MB is generally optimal). We've not found a clear way to increase the SMB packet size on the Windows client side despite the Samba server advertising a larger maximum size.

Thus the motivation for us to explore the VFS module was to see how Windows behaves and whether that could provide an avenue for > 200 MB/s transfer rates without incorporating Windows servers in our estate.
Comment 3 Jones Syue 2023-10-17 00:56:24 UTC
(In reply to Ivan from comment #2)

https://lists.samba.org/archive/samba/2023-September/246446.html

It looks like performance issue is resolved :)
Comment 4 Samba QA Contact 2023-11-30 12:33:03 UTC
This bug was referenced in samba master:

83edfcff5ccd8c4c710576b6d5612e0578d168c8
Comment 5 Guenther Deschner 2023-12-07 15:58:03 UTC
Created attachment 18197 [details]
patch from master
Comment 6 Anoop C S 2023-12-07 17:42:37 UTC
Comment on attachment 18197 [details]
patch from master

Reassigning for inclusion in 4.19 and 4.18.
Comment 7 Jule Anger 2023-12-11 08:42:31 UTC
Pushed to autobuild-v4-{19,18}-test.
Comment 8 Samba QA Contact 2023-12-11 09:46:12 UTC
This bug was referenced in samba v4-19-test:

fcbda8c7525400fe85dde5b8edd1818a9d86f307
Comment 9 Samba QA Contact 2023-12-11 13:22:04 UTC
This bug was referenced in samba v4-18-test:

849c370d92a1fca18450ba7d0064e1adab4a77e4
Comment 10 Jule Anger 2023-12-12 10:07:15 UTC
Closing out bug report.

Thanks!
Comment 11 Samba QA Contact 2024-01-08 14:39:30 UTC
This bug was referenced in samba v4-19-stable (Release samba-4.19.4):

fcbda8c7525400fe85dde5b8edd1818a9d86f307
Comment 12 Samba QA Contact 2024-01-31 20:42:30 UTC
This bug was referenced in samba v4-18-stable (Release samba-4.18.10):

849c370d92a1fca18450ba7d0064e1adab4a77e4