Bug 14978 - Intermittent segfault+coredump when volume_label calls strlen
Summary: Intermittent segfault+coredump when volume_label calls strlen
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.13.13
Hardware: x64 Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-02-14 03:55 UTC by Richard Allen
Modified: 2022-04-02 02:17 UTC (History)
2 users (show)

See Also:


Attachments
tar of relevant config and log files (55.16 KB, application/x-gzip)
2022-02-14 03:55 UTC, Richard Allen
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Richard Allen 2022-02-14 03:55:27 UTC
Created attachment 17162 [details]
tar of relevant config and log files

This is an upstream report of https://bugs.debian.org/cgi-bin/bugreport.cgi?archive=no&bug=1005721, and seems to also be https://bugs.launchpad.net/ubuntu/+source/samba/+bug/1914420 .

> What led up to the situation?

Copying around 50GB consisting of 3-4MB JPG and 400MB to 4.9GB MP4 files from an SD card mounted by a Windows 10 Version 10.0.19044.1526 client over a home mesh WiFi to an AMD64 NAS running Debian 11 and smbd Version 4.13.13-Debian stored in a usershare on a ZFS filesystem.

> What exactly did you do (or not do) that was effective (or ineffective)?

I first tried copying a large directory from the SD card(D:\DCIM\). The Windows GUI client reported an error and asked if I wanted to try again. Trying again fixed the current file, but usually later another file failed and had to be repeated.

When I tried Beyond Compare 4's directory comparison function and selected the failed files to by re-copied, and more copied, but this time I saved the error text: "An unexpected network error occurred"

> What was the outcome of this action?

When windows GUI client failed to copy a file, it left a zero-byte file on the NAS. When I repeated the copy with beyond-compare, most of the failed files were correctly copied and only one remained.

It seems each time the "network error occurred" smbd actually crashed and left a coredump. I tried installing libc6-dbg, but after reproducing was unable to get the backtrace to resolve calls inside libc6.so.6. I reviewed the source code of volume_label() in loadparm.c - it seems the only libc function call there is a call to strcpy(), so perhaps the label pointer was NULL or otherwise invalid?

I copied a total of 47.2GB split over 291 files from this SD card, I would guess I had to repeat a file at most 10 times, perhaps a failure rate around 3%.

> Draw us a picture of your environment

┌──────┐    ┌───────────────────────────────┐
│LAPTOP├────┤WiFi Mesh Point and DHCP Server│
│Win 10│5ghz└──────────────────────┬────────┘
└──────┘wifi                       │5ghz wifi
┌───────────┐               ┌──────┴────────┐
│Debian 11  ├───────────────┤WiFi Mesh Point│
│AMD64 NAS  │    1gig eth   └───────────────┘
│Samba 4.13 │
└───────────┘

I've saved a copy of the coredump, but would prefer not to post it publicly, but will share with anyone from samba.org if you like. I've hopefully attached all other relevant config and log files. Recent crashes are in var/log/samba/log.rsaxvc-laptop

I believe this has been occasionally happening since I set up Debian 11 on the NAS, in late 2021, but only yesterday did it happen enough in a day to wonder why.

I understand 4.13 is out of maintenance with upstream -  when I have time I'll try to reproduce against samba.org master or otherwise report back. I'll try to catch it in GDB and increase the log level to 10 when I do so.
Comment 1 Richard Allen 2022-04-02 02:17:58 UTC
I've tried to reproduce this from a few other machines, and it only seems Windows(I tried only Windows 10) causes it.

I'll also note I have two malformed usershare configs.

It took a few minutes of running windirstat while copying and deleting files from a ramdisk mount, but I was able to catch this with tcpdump running on the nas.

I'm not familiar with SMB2, but there appear to be requests and responses. Many times it's a request and the nas responds. But right before the connection dies with TCP reset, client sends "GetInfo Request FILE_INFO/SMB2_FILE_STANDARD_INFO File: srvsvc" and immediately "GetInfo Request FS_INFO/FileFsFullSizeInformation File:" the nas responds with a "GetInfo Response" to the first request, the client sends two more packets(a Bind and a CreateRequestFile) but the server only responds with a TCP reset.

I can share the pcap privately if needed.

So, it seems to be related to the "GetInfo Request FS_INFO/FileFsFullSizeInformation"