Bug 14969 - Random(?) segfaults since upgrading from 4.9.5
Summary: Random(?) segfaults since upgrading from 4.9.5
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.13.14
Hardware: All All
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2022-02-02 19:46 UTC by наб
Modified: 2022-04-13 16:22 UTC (History)
1 user (show)

See Also:


Attachments
All 15 panic actions w/backtraces (65.19 KB, application/mbox)
2022-02-02 19:46 UTC, наб
no flags Details
New 4 crashes (16.42 KB, application/mbox)
2022-02-08 19:01 UTC, наб
no flags Details
panics since Feb 21 (147.94 KB, application/mbox)
2022-04-13 16:22 UTC, наб
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description наб 2022-02-02 19:46:29 UTC
Created attachment 17141 [details]
All 15 panic actions w/backtraces

This is a re-submission of Debian bug #1000158, as suggested by abartlet@: https://bugs.debian.org/1000158; the conffiles are attached to the bug as well (I seem to only be allowed to post one attachment, so?), but I don't have anything crazy or experimental on, AFAIK.

This doesn't (didn't, it started to happen a lot more for a bit, then I upgraded to 4.13.14+dfsg-1 and it started to happen not often again, but I don't know if this is because the supposed trigger stopped happening or because that version's more hardy) happen often, but it's started happening since upgrading to bullseye (4.13.13+dfsg-1~deb11u2, no significant patches) from buster (4.9.5+dfsg-5+deb10u2, likewise) ‒ I have fifteen panic-action triggers: Oct 15, Oct 25, Nov 17, Nov 18, Dec 09, Dec 10, Dec 15, Dec 21, Dec 27, Jan 02, Jan 02, Jan 05, Jan 05, Jan 06, Jan 20.

I've *never* had samba crash on 4.9.5. The work-load hasn't changed.

AFAICT, this isn't tied to any particular explicit client action, the clients are mostly Windows 10, sid, and Total Commander for Android (all of them work consistently, without interruption).

I've tried a few times to correlate this with any torturous action some of my clients do (particularly Sublime indexing tens of thousands of files at 4k+ levels deep), but to no avail.

I'm attaching all panic actions I have (these are bullseye and 4.13.14+dfsg-1); I also have the cores for some/most of these and would be happy to either post them or inspect them with a debugger under direction.
Comment 1 Ralph Böhme 2022-02-02 20:16:52 UTC
This looks a bit like bug 14882 which was fixed in 4.15.3.
Comment 2 наб 2022-02-03 00:00:40 UTC
I've cherry-picked the commits referenced in bug 14882 to 4.13.14 (anything much newer and the dependencies end up being too new for bullseye), thanks. Will update if it crashes again :)
Comment 3 наб 2022-02-08 19:01:34 UTC
Created attachment 17154 [details]
New 4 crashes

I got a few more, with the same(?) backtraces. The one common ground (esp. for the three ones from Feb 7) is (very) heavy I/O (and, hence, memory – all filesystems are ZFS) pressure – these crashes happened as I was imaging a disk to a "new", and quite bad, mirror, that couldn't keep up with the input, with iowait in the few-seconds range.
Comment 4 наб 2022-04-13 16:22:16 UTC
Created attachment 17269 [details]
panics since Feb 21

Attaching another few backtraces. Of note is March 11 with 11 crashes in 2 minutes – this correlates with start of incoming requests from a third party and subsequent banning of that third party.