Created attachment 16347 [details] backtrace Have been working this on the BackupPC users and Fedora mailing lists, but it's looking more like a general problem in Samba utility, 'smbclient.' BackupPC is a backup server package that uses Samba, rsync, and other protocols to backup clients, including Windows PC's. Basically, it runs a command like this for a SMB backup of a PC: /usr/bin/smbclient \\\\new-pelican\\C\$ -U backup -E -d 2 -c tarmode\ full -Tc - Output is piped to BackupPC utilities that unpack and index the backup, and do other housekeeping My backups have consistently failed since my upgrade to F33 (were working without problems for years before, and most recently on F32). Looking closely, it appears 'smbclient' is crashing after passing exactly 47 files. (BackupPC then quits after data stops coming in.) I've run this command at the command-line; it runs for a few seconds then reports a core dump. Gnome shell then prompted me to file a Fedora bugzilla, which I have done. https://bugzilla.redhat.com/show_bug.cgi?id=1900232 "[abrt] samba-client: remove_do_list_queue_head(): smbclient killed by SIGSEGV") Just got a note there is another Fedora bug filed on this exact issue, https://bugzilla.redhat.com/show_bug.cgi?id=1892745
This could well be my bug. I'm right now issuing your command against a W2k12, but it works fine for me. Not finished, but it is beyond dozens of directories and thousands of files now. It would be great if we could more information about the circumstances when this happens and how to reproduce it.
Not much I can add beyond my description and the backtrace, which clearly shows the crash. BackupPC's log file indicates failure happened after exactly 47 files--as noted BackupPC's utilities take stdout from 'smbclient' in tar mode and unpack/index/otherwise-housekeep, so when data stops coming in from 'smbclient,' the backup quits, too. I will repeat this began with the upgrade to Fedora 33 (from F32).
I might add I have reproduced the problem on a separate system, also running Fedora 33/smbclient 4.13.2. Backup client is Windows 10. Problem does NOT occur on my one remaining Fedora 32/smbclient 4.12.10 system.
tkevans - Is there any chance you can run this under valgrind and reproduce the crash ? With that output and the backtrace we might be faster in tracking this down. Thanks !
Better yet: see https://bugzilla.samba.org/show_bug.cgi?id=14517, which seems to address this issue.
Oh, well that's even better :-). I'll mark as a duplicate. It's waiting on Ralph to review before it gets merged into the next 4.13.X release. Volker, if you add me as a review I'll +1 it and re-assign to Karolin (as it seems I reviewed the original patch). *** This bug has been marked as a duplicate of bug 14517 ***
14517 in my mind was just done. I thought this one is new, as 14517 is already a few weeks old. Sorry for not monitoring it. Added added you, Jeremy, as reviewer for that.
No worries, it's impressive you'd already fixed it :-). Anyway, bounced over to Karolin (our release manager) now so - tkevans, it'll be in the next release. Sorry for the trouble.
The good folks at RedHat/Fedora are on this: --- Comment #8 from Alexander Bokovoy <abokovoy@redhat.com> --- I am building Samba for F33 and Rawhide with patches from https://bugzilla.samba.org/show_bug.cgi?id=14517 As I understand, this should fix the problem. Please test the builds once they are ready: F33: https://koji.fedoraproject.org/koji/taskinfo?taskID=56241974 F34: https://koji.fedoraproject.org/koji/taskinfo?taskID=56241908 Non-x86_64 runners are a bit slow, so the builds are still ongoing. Once they are done, I'll submit an update to F33.
Looks like the fix referenced in bugid 14517 is not a fix for this particular problem. That one has percolated through to RedHat/Fedora (https://bugzilla.redhat.com/show_bug.cgi?id=1892745) and their update to smbclient 4.13.2 does not fix it. After applying that fix from RedHat, I still get the same segfault.
I think realistically this is a duplicate of bug #14582 (or vice versa), lets reopen this one (since it was first)
*** Bug 14582 has been marked as a duplicate of this bug. ***
Sorry for closing this as a duplicate. When the reporter flagged the other bug in comment number 5, I thought he had tested the fix there and was confirming it was the fix for his issue also. FYI I think as part of this fix we need to re-enable the tar tests so we can catch any regressions.
Noel's MR for reference: https://gitlab.com/samba-team/samba/-/merge_requests/1704
Created attachment 16350 [details] git-am fix for master. Noel, here's what I think we should go with. Valgrind clean and you've re-enabled all the tar tests (hurrah!). Let me know what you think. Jeremy.
Created attachment 16352 [details] alternative patch version Alternative version of the patch, combines the first version I posted with the your idea of also saving the additional stack state. I am biased of course but I think this is simpler to read and understand, it uses the stack to automagically push/pop the do_list state (like it always did) but adds to the do_list state those elements we need to save (and use later). The queue management remains unchanged (except it no longer operates on a global queue variable but one that is passed in) To my mind the other version spreads the stack state around in 2 places, the new do_list_stack and the existing do_list_state.
https://gitlab.com/samba-team/devel/samba/-/pipelines/223564990 runs an alternative approach that avoids recursing into do_list() in clitar.c but instead uses do_list()'s builtin recursion capabilities. What do you think?
>https://gitlab.com/samba-team/devel/samba/-/pipelines/223564990 runs an >alternative approach that avoids recursing into do_list() in clitar.c but >instead uses do_list()'s builtin recursion capabilities. What do you think? probably Aurelien could look at this (the changes seems sensible to me (on just a glance but I have to say I am not familiar with clitar) whether this is a case of this patch instead or in addition to the previous fix (excluding the changes to get the tests working) I can't say. Jeremy, you and Volker definitely have more idea about this stuff than me so fight it out amongst yourselves :-)) Any version of the fix here is a win as far as I am concerned and more importantly I really hope we have solved the flapping test
Assuming the recursive do_list() works properly, this commit changing clitar.c call to do_list() looks good to me, you can add my RB https://gitlab.com/samba-team/devel/samba/-/commit/f351d72e037f90c6b3c3ee2172835da7bd2cc6c9
This bug was referenced in samba master: 4f5a7f11b7732c3efb511e68f6b9d434d59bb3e8 a250f73366983d2a7397608a611f295f10dbb548 6cb0a00f4ab4bc1a8193d50cc076ec7174a5ece9 be8dca68f89f110ef5947e0c2a7258554772cf9a fec1f8faffd9eb1aae77e7c515e57897be34a255 4bb3bffa4b7a770d36138c45f717a9048ef82cff 6f246658cf003f7e2f393f7b7490d9e8ae84e21c 99ffa4a98287f125e45690e87b32616f4d4254e4 363bfa4e1ca10e64057a6d04d6faff7c788db89d 6c7dc4959fd5de4382aee413b4cc711cc6f281f4 16ffa17ee28edfc3bc70c66abf41b5518aeab8fe 20e0ce508844fec2dd0011423b10484dc7ccfdb7 89e2d68bb4d93dc391af97f35ff1148aec7930b0
Created attachment 16353 [details] git-am fix for 4.13.next. Cherry-pick from master for 4.13.next. Noel, please review ! Thanks.
Comment on attachment 16353 [details] git-am fix for 4.13.next. lgtm!
Re-assiging to Karolin for next 4.13 release
(In reply to Noel Power from comment #23) Pushed to autobuild-v4-13-test.
This bug was referenced in samba v4-13-test: d67c3ea864b26e440f15162e429dec199e7304e8 7fb1333038085529334e8e3109e5eda6b5df14ae 5143b4875328196ed2766ba077055ce50704b5e2 5908aebf364802e7315aad8f116ad431544ac29d 896d93091abe6b667c52e87273f22a91d9175eb0 53a91d6cdc0e726d741ab217522da3f205392090 c19198e873224c07a19dfae14d3871c577768344 a0ab7adfd788bc8cc58579b94c75386d492c2e02 257ce5ed541c0e46bbd565bd8a89d5905287897c 5f1772d94a34922a4fc83ff8a036cbb3ce2dcdd5 2954051aa6db3b38d24801fe451019ccec0b5c77 8cec27328904e47462051878db2de97033ecbd9b 2ea7b5c43e814faef44cf76b5ffad93e4a2f4840
Closing out bug report. Thanks!
Can someone estimate when this fix will appear downstream to users, please? Thank you.
It will be in the next 4.13 release. Not sure exactly when that will be (Karolin has the schedule).
(In reply to tkevans@tkevans.com from comment #27) Tuesday, December 15 2020. Schedules are available here: https://wiki.samba.org/index.php/Samba_Release_Planning
This bug was referenced in samba v4-13-stable (Release samba-4.13.3): d67c3ea864b26e440f15162e429dec199e7304e8 7fb1333038085529334e8e3109e5eda6b5df14ae 5143b4875328196ed2766ba077055ce50704b5e2 5908aebf364802e7315aad8f116ad431544ac29d 896d93091abe6b667c52e87273f22a91d9175eb0 53a91d6cdc0e726d741ab217522da3f205392090 c19198e873224c07a19dfae14d3871c577768344 a0ab7adfd788bc8cc58579b94c75386d492c2e02 257ce5ed541c0e46bbd565bd8a89d5905287897c 5f1772d94a34922a4fc83ff8a036cbb3ce2dcdd5 2954051aa6db3b38d24801fe451019ccec0b5c77 8cec27328904e47462051878db2de97033ecbd9b 2ea7b5c43e814faef44cf76b5ffad93e4a2f4840
Wondering if this fix has changed output from 'smbclient?' While the segfault no longer occurs, BackupPC still reports a failed backup after 'smbclient' completes (after several HOURS). Log shows: tarExtract: Unexpected end of tar archive (tot = 10993, num = 10240, errno = , posn = ) tarExtract: Removing partial file $Recycle.Bin/S-1-5-21-1479735809-3582899389-406748754-1003/$RYTAKUR.eml [ skipped 60 lines ] tarExtract: BackupPC_tarExtact aborting (Unexpected end of tar archive) tarExtract: Done: 1 errors, 47 filesExist, 139478798 sizeExist, 137948090 sizeExistComp, 47 filesTotal, 139478798 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 10555998 inodeLast BackupPC_tarExtract exited with fail status 256 Xfer PIDs are now Got fatal error during xfer (BackupPC_tarExtract exited with fail status 256) Backup aborted (BackupPC_tarExtract exited with fail status 256) If final output from successful run of 'smbclient' is not what BackupPC is expecting, this may explain the apparent failure. Thanks.
Can you run the smbclient tar command that BackupPC uses to get the tar file it creates ? Then test that a normal tar command can extract it ? If it's good, then you might have to contact the makers of BackupPC to get them to look into the issue. So long as we're creating a valid tar file (which we do test via regression test) then the issue is probably with the invoking software, not smbclient.
Also, the file that is giving a problem is $Recycle.Bin/S-1-5-21-1479735809-3582899389-406748754-1003/$RYTAKUR.eml. Should BackupPC be backing up the contents of $Recycle.Bin ? Might that not change whilst the tar is in process ?
Ah, BackupPC is open source, found here: https://github.com/backuppc/backuppc feel free to put them in touch with us so we can work together to resolve.
Note in their source code: lib/BackupPC/Xfer/Smb.pm 237 # 238 # This section is highly dependent on the version of smbclient. 239 # If you upgrade Samba, make sure that these regexp are still valid. 240 # 241 # MAKSYM 14082016: The next regex will never match on Samba-4.3, as If our output format has changed slightly they'll need to fix this.
Thank you. Jeremy. As my initial report stated, BackupPC's taking stdout from 'smbclient' in tar mode: /usr/bin/smbclient \\\\new-pelican\\C\$ -U backup -E -d 2 -c tarmode\ full -Tc - So, my question was about the final output of a successful smbclient tar job on stdout. Just wondering if this might have changed in this fix; your most recent comment (which came in while I was writing this) appears to confirm that it has. I've reported this via the BackupPC users mailing list.
No, I don't know if it has changed or not, so don't take this as a confirmation that it has. Our regression tests didn't find any difference. However, if BackupPC is now having problems I'm certainly willing to work with them to help track it down.
Bug report posted: https://github.com/backuppc/backuppc/issues/404 No response to direct e-mail to the maintainer (~24 hours since)