Bug 14581 - smbclient crashes with segfault
Summary: smbclient crashes with segfault
Status: RESOLVED FIXED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: Tools (show other bugs)
Version: 4.13.2
Hardware: All Linux
: P5 major (vote)
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
: 14582 (view as bug list)
Depends on:
Blocks:
 
Reported: 2020-11-25 13:58 UTC by tkevans@tkevans.com
Modified: 2020-12-21 16:57 UTC (History)
2 users (show)

See Also:


Attachments
backtrace (34.00 KB, text/plain)
2020-11-25 13:58 UTC, tkevans@tkevans.com
no flags Details
git-am fix for master. (27.71 KB, patch)
2020-11-30 21:14 UTC, Jeremy Allison
no flags Details
alternative patch version (28.05 KB, patch)
2020-12-01 09:08 UTC, Noel Power
no flags Details
git-am fix for 4.13.next. (26.19 KB, patch)
2020-12-01 22:34 UTC, Jeremy Allison
npower: review+
Details

Note You need to log in before you can comment on or make changes to this bug.
Description tkevans@tkevans.com 2020-11-25 13:58:07 UTC
Created attachment 16347 [details]
backtrace

Have been working this on the BackupPC users and Fedora mailing lists, but it's looking more like a general problem in Samba utility, 'smbclient.'

BackupPC is a backup server package that uses Samba, rsync, and other protocols to backup clients, including Windows PC's. Basically, it runs a command like this for a SMB backup of a PC:

/usr/bin/smbclient \\\\new-pelican\\C\$ -U backup -E -d 2 -c tarmode\ full -Tc -

Output is piped to BackupPC utilities that unpack and index the backup, and do other housekeeping

My backups have consistently failed since my upgrade to F33 (were working without problems for years before, and most recently on F32).

Looking closely, it appears 'smbclient' is crashing after passing exactly 47 files. (BackupPC then quits after data stops coming in.)

I've run this command at the command-line; it runs for a few seconds then reports a core dump. Gnome shell then prompted me to file a Fedora bugzilla, which I have done.

https://bugzilla.redhat.com/show_bug.cgi?id=1900232 "[abrt] samba-client: remove_do_list_queue_head(): smbclient killed by SIGSEGV")

Just got a note there is another Fedora bug filed on this exact issue,  https://bugzilla.redhat.com/show_bug.cgi?id=1892745
Comment 1 Volker Lendecke 2020-11-25 15:13:51 UTC
This could well be my bug. I'm right now issuing your command against a W2k12, but it works fine for me. Not finished, but it is beyond dozens of directories and thousands of files now. It would be great if we could more information about the circumstances when this happens and how to reproduce it.
Comment 2 tkevans@tkevans.com 2020-11-25 16:00:18 UTC
Not much I can add beyond my description and the backtrace, which clearly shows the crash.

BackupPC's log file indicates failure happened after exactly 47 files--as noted BackupPC's utilities take stdout from 'smbclient' in tar mode and unpack/index/otherwise-housekeep, so when data stops coming in from 'smbclient,' the backup quits, too.

I will repeat this began with the upgrade to Fedora 33 (from F32).
Comment 3 tkevans@tkevans.com 2020-11-25 16:30:06 UTC
I might add I have reproduced the problem on a separate system, also running Fedora 33/smbclient 4.13.2.  Backup client is Windows 10.

Problem does NOT occur on my one remaining Fedora 32/smbclient 4.12.10 system.
Comment 4 Jeremy Allison 2020-11-25 17:23:00 UTC
tkevans - Is there any chance you can run this under valgrind and reproduce the crash ?

With that output and the backtrace we might be faster in tracking this down.

Thanks !
Comment 5 tkevans@tkevans.com 2020-11-25 17:29:01 UTC
Better yet:  see https://bugzilla.samba.org/show_bug.cgi?id=14517, which seems to address this issue.
Comment 6 Jeremy Allison 2020-11-25 17:35:49 UTC
Oh, well that's even better :-).

I'll mark as a duplicate. It's waiting on Ralph to review before it gets merged into the next 4.13.X release.

Volker, if you add me as a review I'll +1 it and re-assign to Karolin (as it seems I reviewed the original patch).

*** This bug has been marked as a duplicate of bug 14517 ***
Comment 7 Volker Lendecke 2020-11-25 17:41:51 UTC
14517 in my mind was just done. I thought this one is new, as 14517 is already a few weeks old. Sorry for not monitoring it. Added added you, Jeremy, as reviewer for that.
Comment 8 Jeremy Allison 2020-11-25 17:45:03 UTC
No worries, it's impressive you'd already fixed it :-). Anyway, bounced over to Karolin (our release manager) now so - tkevans, it'll be in the next release. Sorry for the trouble.
Comment 9 tkevans@tkevans.com 2020-11-25 18:55:52 UTC
The good folks at RedHat/Fedora are on this:

--- Comment #8 from Alexander Bokovoy <abokovoy@redhat.com> ---
I am building Samba for F33 and Rawhide with patches from
https://bugzilla.samba.org/show_bug.cgi?id=14517
As I understand, this should fix the problem.

Please test the builds once they are ready:
F33: https://koji.fedoraproject.org/koji/taskinfo?taskID=56241974
F34: https://koji.fedoraproject.org/koji/taskinfo?taskID=56241908

Non-x86_64 runners are a bit slow, so the builds are still ongoing. Once they
are done, I'll submit an update to F33.
Comment 10 tkevans@tkevans.com 2020-11-27 15:15:44 UTC
Looks like the fix referenced in bugid 14517 is not a fix for this particular problem.  That one has percolated through to RedHat/Fedora (https://bugzilla.redhat.com/show_bug.cgi?id=1892745) and their update to smbclient 4.13.2 does not fix it.

After applying that fix from RedHat, I still get the same segfault.
Comment 11 Noel Power 2020-11-27 16:29:14 UTC
I think realistically this is a duplicate of bug #14582 (or vice versa), lets reopen this one (since it was first)
Comment 12 Noel Power 2020-11-27 16:29:44 UTC
*** Bug 14582 has been marked as a duplicate of this bug. ***
Comment 13 Jeremy Allison 2020-11-29 21:12:49 UTC
Sorry for closing this as a duplicate. When the reporter flagged the other bug in comment number 5, I thought he had tested the fix there and was confirming it was the fix for his issue also.

FYI I think as part of this fix we need to re-enable the tar tests so we can catch any regressions.
Comment 14 Jeremy Allison 2020-11-29 21:17:31 UTC
Noel's MR for reference:

https://gitlab.com/samba-team/samba/-/merge_requests/1704
Comment 15 Jeremy Allison 2020-11-30 21:14:48 UTC
Created attachment 16350 [details]
git-am fix for master.

Noel, here's what I think we should go with. Valgrind clean and you've re-enabled all the tar tests (hurrah!).

Let me know what you think.

Jeremy.
Comment 16 Noel Power 2020-12-01 09:08:13 UTC
Created attachment 16352 [details]
alternative patch version

Alternative version of the patch, combines the first version I posted with the your idea of also saving the additional stack state. I am biased of course but I think this is simpler to read and understand, it uses the stack to automagically push/pop the do_list state (like it always did) but adds to the do_list state those elements we need to save (and use later). The queue management remains unchanged (except it no longer operates on a global queue variable but one that is passed in) To my mind the other version spreads the stack state around in 2 places, the new do_list_stack and the existing do_list_state.
Comment 17 Volker Lendecke 2020-12-01 09:15:09 UTC
https://gitlab.com/samba-team/devel/samba/-/pipelines/223564990 runs an alternative approach that avoids recursing into do_list() in clitar.c but instead uses do_list()'s builtin recursion capabilities. What do you think?
Comment 18 Noel Power 2020-12-01 09:47:13 UTC
>https://gitlab.com/samba-team/devel/samba/-/pipelines/223564990 runs an
>alternative approach that avoids recursing into do_list() in clitar.c but
>instead uses do_list()'s builtin recursion capabilities. What do you think?

probably Aurelien could look at this (the changes seems sensible to me (on just a glance but I have to say I am not familiar with clitar) 

whether this is a case of this patch instead or in addition to the previous fix (excluding the changes to get the tests working) I can't say. Jeremy, you and Volker definitely have more idea about this stuff than me so fight it out amongst yourselves :-)) Any version of the fix here is a win as far as I am concerned and more importantly I really hope we have solved the flapping test
Comment 19 Aurélien Aptel 2020-12-01 11:52:06 UTC
Assuming the recursive do_list() works properly, this commit changing clitar.c call to do_list() looks good to me, you can add my RB

https://gitlab.com/samba-team/devel/samba/-/commit/f351d72e037f90c6b3c3ee2172835da7bd2cc6c9
Comment 20 Samba QA Contact 2020-12-01 20:30:31 UTC
This bug was referenced in samba master:

4f5a7f11b7732c3efb511e68f6b9d434d59bb3e8
a250f73366983d2a7397608a611f295f10dbb548
6cb0a00f4ab4bc1a8193d50cc076ec7174a5ece9
be8dca68f89f110ef5947e0c2a7258554772cf9a
fec1f8faffd9eb1aae77e7c515e57897be34a255
4bb3bffa4b7a770d36138c45f717a9048ef82cff
6f246658cf003f7e2f393f7b7490d9e8ae84e21c
99ffa4a98287f125e45690e87b32616f4d4254e4
363bfa4e1ca10e64057a6d04d6faff7c788db89d
6c7dc4959fd5de4382aee413b4cc711cc6f281f4
16ffa17ee28edfc3bc70c66abf41b5518aeab8fe
20e0ce508844fec2dd0011423b10484dc7ccfdb7
89e2d68bb4d93dc391af97f35ff1148aec7930b0
Comment 21 Jeremy Allison 2020-12-01 22:34:10 UTC
Created attachment 16353 [details]
git-am fix for 4.13.next.

Cherry-pick from master for 4.13.next. Noel, please review ! Thanks.
Comment 22 Noel Power 2020-12-02 09:39:49 UTC
Comment on attachment 16353 [details]
git-am fix for 4.13.next.

lgtm!
Comment 23 Noel Power 2020-12-02 12:07:48 UTC
Re-assiging to Karolin for next 4.13 release
Comment 24 Karolin Seeger 2020-12-02 13:12:10 UTC
(In reply to Noel Power from comment #23)
Pushed to autobuild-v4-13-test.
Comment 25 Samba QA Contact 2020-12-02 14:50:42 UTC
This bug was referenced in samba v4-13-test:

d67c3ea864b26e440f15162e429dec199e7304e8
7fb1333038085529334e8e3109e5eda6b5df14ae
5143b4875328196ed2766ba077055ce50704b5e2
5908aebf364802e7315aad8f116ad431544ac29d
896d93091abe6b667c52e87273f22a91d9175eb0
53a91d6cdc0e726d741ab217522da3f205392090
c19198e873224c07a19dfae14d3871c577768344
a0ab7adfd788bc8cc58579b94c75386d492c2e02
257ce5ed541c0e46bbd565bd8a89d5905287897c
5f1772d94a34922a4fc83ff8a036cbb3ce2dcdd5
2954051aa6db3b38d24801fe451019ccec0b5c77
8cec27328904e47462051878db2de97033ecbd9b
2ea7b5c43e814faef44cf76b5ffad93e4a2f4840
Comment 26 Karolin Seeger 2020-12-03 07:57:37 UTC
Closing out bug report.

Thanks!
Comment 27 tkevans@tkevans.com 2020-12-10 16:09:52 UTC
Can someone estimate when this fix will appear downstream to users, please? Thank you.
Comment 28 Jeremy Allison 2020-12-10 16:32:04 UTC
It will be in the next 4.13 release. Not sure exactly when that will be (Karolin has the schedule).
Comment 29 Karolin Seeger 2020-12-11 08:27:22 UTC
(In reply to tkevans@tkevans.com from comment #27)
Tuesday, December 15 2020.

Schedules are available here:
https://wiki.samba.org/index.php/Samba_Release_Planning
Comment 30 Samba QA Contact 2020-12-16 12:22:42 UTC
This bug was referenced in samba v4-13-stable (Release samba-4.13.3):

d67c3ea864b26e440f15162e429dec199e7304e8
7fb1333038085529334e8e3109e5eda6b5df14ae
5143b4875328196ed2766ba077055ce50704b5e2
5908aebf364802e7315aad8f116ad431544ac29d
896d93091abe6b667c52e87273f22a91d9175eb0
53a91d6cdc0e726d741ab217522da3f205392090
c19198e873224c07a19dfae14d3871c577768344
a0ab7adfd788bc8cc58579b94c75386d492c2e02
257ce5ed541c0e46bbd565bd8a89d5905287897c
5f1772d94a34922a4fc83ff8a036cbb3ce2dcdd5
2954051aa6db3b38d24801fe451019ccec0b5c77
8cec27328904e47462051878db2de97033ecbd9b
2ea7b5c43e814faef44cf76b5ffad93e4a2f4840
Comment 31 tkevans@tkevans.com 2020-12-20 19:34:01 UTC
Wondering if this fix has changed output from 'smbclient?' While the segfault no longer occurs, BackupPC still reports a failed backup after 'smbclient' completes (after several HOURS).  Log shows:

tarExtract: Unexpected end of tar archive (tot = 10993, num = 10240, errno = , posn = )
tarExtract: Removing partial file $Recycle.Bin/S-1-5-21-1479735809-3582899389-406748754-1003/$RYTAKUR.eml
[ skipped 60 lines ]
tarExtract: BackupPC_tarExtact aborting (Unexpected end of tar archive)
tarExtract: Done: 1 errors, 47 filesExist, 139478798 sizeExist, 137948090 sizeExistComp, 47 filesTotal, 139478798 sizeTotal, 0 filesNew, 0 sizeNew, 0 sizeNewComp, 10555998 inodeLast
BackupPC_tarExtract exited with fail status 256
Xfer PIDs are now 
Got fatal error during xfer (BackupPC_tarExtract exited with fail status 256)
Backup aborted (BackupPC_tarExtract exited with fail status 256)

If final output from successful run of 'smbclient' is not what BackupPC is expecting, this may explain the apparent failure.

Thanks.
Comment 32 Jeremy Allison 2020-12-20 19:49:07 UTC
Can you run the smbclient tar command that BackupPC uses to get the tar file it creates ? Then test that a normal tar command can extract it ?

If it's good, then you might have to contact the makers of BackupPC to get them to look into the issue. So long as we're creating a valid tar file (which we do test via regression test) then the issue is probably with the invoking software, not smbclient.
Comment 33 Jeremy Allison 2020-12-20 19:50:54 UTC
Also, the file that is giving a problem is $Recycle.Bin/S-1-5-21-1479735809-3582899389-406748754-1003/$RYTAKUR.eml. Should BackupPC be backing up the contents of $Recycle.Bin ? Might that not change whilst the tar is in process ?
Comment 34 Jeremy Allison 2020-12-20 19:53:33 UTC
Ah, BackupPC is open source, found here:

https://github.com/backuppc/backuppc

feel free to put them in touch with us so we can work together to resolve.
Comment 35 Jeremy Allison 2020-12-20 19:56:01 UTC
Note in their source code:

lib/BackupPC/Xfer/Smb.pm

237         #
238         # This section is highly dependent on the version of smbclient.
239         # If you upgrade Samba, make sure that these regexp are still valid.
240         #
241         # MAKSYM 14082016: The next regex will never match on Samba-4.3, as

If our output format has changed slightly they'll need to fix this.
Comment 36 tkevans@tkevans.com 2020-12-20 20:00:30 UTC
Thank you. Jeremy.  As my initial report stated, BackupPC's taking stdout from 'smbclient' in tar mode:

/usr/bin/smbclient \\\\new-pelican\\C\$ -U backup -E -d 2 -c tarmode\ full -Tc -

So, my question was about the final output of a successful smbclient tar job on stdout. Just wondering if this might have changed in this fix; your most recent comment (which came in while I was writing this) appears to confirm that it has.

I've reported this via the BackupPC users mailing list.
Comment 37 Jeremy Allison 2020-12-21 00:25:12 UTC
No, I don't know if it has changed or not, so don't take this as a confirmation that it has. Our regression tests didn't find any difference. However, if BackupPC is now having problems I'm certainly willing to work with them to help track it down.
Comment 38 tkevans@tkevans.com 2020-12-21 16:57:02 UTC
Bug report posted:

https://github.com/backuppc/backuppc/issues/404

No response to direct e-mail to the maintainer (~24 hours since)