Bug 11413 - Crash when using multiple contexts
Summary: Crash when using multiple contexts
Status: ASSIGNED
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: libsmbclient (show other bugs)
Version: unspecified
Hardware: All All
: P5 major (vote)
Target Milestone: 4.3
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-26 15:54 UTC by Ross Lagerwall
Modified: 2021-10-04 18:46 UTC (History)
6 users (show)

See Also:


Attachments
tet program (2.35 KB, text/x-csrc)
2015-07-26 15:56 UTC, Ross Lagerwall
no flags Details
Patch for master (12.38 KB, patch)
2015-08-03 11:41 UTC, Stefan Metzmacher
no flags Details
Includes Metze's patch. (15.96 KB, patch)
2015-09-04 21:04 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Ross Lagerwall 2015-07-26 15:54:56 UTC
I've been integrating smbc_notify support into gvfs and doing this requires using multiple smb contexts.

This fails in a couple of ways which the attached test program demonstrates.
Firstly, simultaneous initialization of contexts fails:
$ gdb testnotify 
(gdb) r
Starting program: /home/ross/src/samba/samba/examples/libsmbclient/testnotify 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/lib/libthread_db.so.1".
talloc: access after free error - first free may be at ../source3/param/loadparm.c:282
Bad talloc magic value - access after free
[New Thread 0x7fffe925d700 (LWP 16997)]
[New Thread 0x7fffe9a5e700 (LWP 16996)]

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fffe925d700 (LWP 16997)]
0x00007ffff7621528 in raise () from /usr/lib/libc.so.6
(gdb) bt
#0  0x00007ffff7621528 in raise () from /usr/lib/libc.so.6
#1  0x00007ffff762293a in abort () from /usr/lib/libc.so.6
#2  0x00007ffff436ed7c in ?? () from /usr/lib/libtalloc.so.2
#3  0x00007ffff4374cfa in ?? () from /usr/lib/libtalloc.so.2
#4  0x00007ffff436da7b in _talloc_free () from /usr/lib/libtalloc.so.2
#5  0x00007ffff5f06846 in free_global_parameters () at ../source3/param/loadparm.c:446
#6  0x00007ffff5f06d13 in init_globals (lp_ctx=0x7fffdc000aa0, reinit_globals=true) at ../source3/param/loadparm.c:533
#7  0x00007ffff5f14788 in lp_load_ex (pszFname=0x7fffdc000a10 "/home/ross/.smb/smb.conf", global_only=true, save_defaults=false, add_ipc=false, reinit_globals=true, allow_include_registry=true, 
    load_all_shares=false) at ../source3/param/loadparm.c:3655
#8  0x00007ffff5f14c98 in lp_load (pszFname=0x7fffdc000a10 "/home/ross/.smb/smb.conf", global_only=true, save_defaults=false, add_ipc=false, reinit_globals=true) at ../source3/param/loadparm.c:3793
#9  0x00007ffff5f14d0a in lp_load_global (file_name=0x7fffdc000a10 "/home/ross/.smb/smb.conf") at ../source3/param/loadparm.c:3818
#10 0x00007ffff5f14d5d in lp_load_client (file_name=0x7fffdc000a10 "/home/ross/.smb/smb.conf") at ../source3/param/loadparm.c:3846
#11 0x00007ffff7bb7666 in SMBC_module_init (punused=0x0) at ../source3/libsmb/libsmb_context.c:57
#12 0x00007ffff7bb7946 in smbc_new_context () at ../source3/libsmb/libsmb_context.c:142
#13 0x0000000000400bba in thread_fn (unused=0x0) at testnotify.c:58
#14 0x00007ffff7997354 in start_thread () from /usr/lib/libpthread.so.0
#15 0x00007ffff76d5bfd in clone () from /usr/lib/libc.so.6

It works if I put a 5 second sleep between the creation of the two threads.


Secondly, with two threads waiting in smbc_notify, after receiving about 4 events:
1
l2: 3
num_setup=4, max_setup=0, param_total=0, this_param=0, max_param=1000, data_total=0, this_data=0, max_data=0, param_offset=82, param_pad=1, param_disp=0, data_offset=84, data_pad=2, data_disp=0
Freed frame ../libcli/smb/smbXcli_base.c:1710, expected ../libcli/smb/smbXcli_base.c:1710.
PANIC (pid 17017): Frame not freed in order.
1
l3: 1
num_setup=4, max_setup=0, param_total=0, this_param=0, max_param=1000, data_total=0, this_data=0, max_data=0, param_offset=82, param_pad=1, param_disp=0, data_offset=84, data_pad=2, data_disp=0
BACKTRACE: 25 stack frames:
 #0 /home/ross/blead/gg/lib/libsmbconf.so.0(log_stack_trace+0x1f) [0x7ff343346560]
 #1 /home/ross/blead/gg/lib/libsmbconf.so.0(smb_panic_s3+0x6d) [0x7ff3433463b1]
 #2 /home/ross/blead/gg/lib/libsamba-util.so.0(smb_panic+0x28) [0x7ff343c842e4]
 #3 /home/ross/blead/gg/lib/libsamba-util.so.0(+0xef33) [0x7ff343c6ef33]
 #4 /usr/lib/libtalloc.so.2(_talloc_free+0x718) [0x7ff3417d60a8]
 #5 /home/ross/blead/gg/lib/private/libcli-smb-common-samba4.so(+0xf179) [0x7ff342943179]
 #6 /home/ross/blead/gg/lib/private/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7ff3435af0ae]
 #7 /home/ross/blead/gg/lib/private/libtevent.so.0(+0x6183) [0x7ff3435af183]
 #8 /home/ross/blead/gg/lib/private/libtevent.so.0(_tevent_req_done+0x25) [0x7ff3435af1ab]
 #9 /home/ross/blead/gg/lib/private/libsmb-transport-samba4.so(+0x2d12) [0x7ff33f15fd12]
 #10 /home/ross/blead/gg/lib/private/libtevent.so.0(_tevent_req_notify_callback+0x6a) [0x7ff3435af0ae]
 #11 /home/ross/blead/gg/lib/private/libtevent.so.0(+0x6183) [0x7ff3435af183]
 #12 /home/ross/blead/gg/lib/private/libtevent.so.0(_tevent_req_done+0x25) [0x7ff3435af1ab]
 #13 /home/ross/blead/gg/lib/private/libsmb-transport-samba4.so(+0x255b) [0x7ff33f15f55b]
 #14 /home/ross/blead/gg/lib/private/libtevent.so.0(+0xcf34) [0x7ff3435b5f34]
 #15 /home/ross/blead/gg/lib/private/libtevent.so.0(+0xd553) [0x7ff3435b6553]
 #16 /home/ross/blead/gg/lib/private/libtevent.so.0(+0xa43e) [0x7ff3435b343e]
 #17 /home/ross/blead/gg/lib/private/libtevent.so.0(_tevent_loop_once+0xf4) [0x7ff3435ad5b7]
 #18 /home/ross/blead/gg/lib/private/libtevent.so.0(tevent_req_poll+0x25) [0x7ff3435af43f]
 #19 /home/ross/blead/gg/lib/libtevent-util.so.0(tevent_req_poll_ntstatus+0x27) [0x7ff3448542c9]
 #20 /home/ross/blead/gg/lib/libsmbclient.so.0(+0x127d2) [0x7ff3450277d2]
 #21 /home/ross/blead/gg/lib/libsmbclient.so.0(+0x12c02) [0x7ff345027c02]
 #22 ./testnotify() [0x400ccb]
 #23 /usr/lib/libpthread.so.0(+0x7354) [0x7ff344dff354]
 #24 /usr/lib/libc.so.6(clone+0x6d) [0x7ff344b3dbfd]
smb_panic(): calling panic action [/bin/sleep 999999999]
1
l3: 3
num_setup=4, max_setup=0, param_total=0, this_param=0, max_param=1000, data_total=0, this_data=0, max_data=0, param_offset=82, param_pad=1, param_disp=0, data_offset=84, data_pad=2, data_disp=0
0
Comment 1 Ross Lagerwall 2015-07-26 15:56:05 UTC
Created attachment 11286 [details]
tet program
Comment 2 Stefan Metzmacher 2015-07-31 07:26:37 UTC
Can you try to use smbc_thread_posix() before anything else?
Comment 3 Ross Lagerwall 2015-08-01 18:01:31 UTC
It doesn't compile when I try to use that function:

$ gcc -Wall -O0 -g -o testnotify testnotify.c `pkg-config --cflags --libs smbclient` -pthread
/tmp/ccnncbJ7.o: In function `main':
/home/ross/src/samba/samba/examples/libsmbclient/testnotify.c:89: undefined reference to `smbc_thread_posix'
collect2: error: ld returned 1 exit status

It's not exported by the shared library:
$ nm -D /usr/lib/libsmbclient.so | grep thread
                 U smb_thread_once

Neither libsmb_thread_posix.c nor libsmb_thread_impl.c are mentioned in any build files in samba:
$ fgrep -r libsmb_thread_posix
Binary file .git/index matches
$ fgrep -r libsmb_thread_impl
Binary file .git/index matches
Comment 4 Stefan Metzmacher 2015-08-03 11:41:57 UTC
Created attachment 11303 [details]
Patch for master

Can you test this patch? We should provide smbc_thread_posix() now.
Comment 5 Ross Lagerwall 2015-08-03 17:05:12 UTC
Well it builds now but gets a segfault immediately:

(gdb) bt full
#0  0x00007ffff7bceddd in smb_get_tls_pthread (pkey=0x0, location=0x7ffff681bfa0 "../lib/util/talloc_stack.c:134") at ../source3/libsmb/libsmb_thread_posix.c:37
No locals.
#1  0x00007ffff6806044 in talloc_stackframe_internal (location=0x7ffff7bcf0d0 "../source3/libsmb/libsmb_context.c:139", poolsize=0) at ../lib/util/talloc_stack.c:133
        tmp = 0x0
        top = 0x0
        ts = 0x0
#2  0x00007ffff68061dd in _talloc_stackframe (location=0x7ffff7bcf0d0 "../source3/libsmb/libsmb_context.c:139") at ../lib/util/talloc_stack.c:179
No locals.
#3  0x00007ffff7bb6db1 in smbc_new_context () at ../source3/libsmb/libsmb_context.c:139
        context = 0x7ffff7ffe130
        frame = 0x7fffe9a5d700
#4  0x0000000000400c7a in thread_fn (unused=0x0) at testnotify.c:58
        path = "smb://192.168.1.48/Public", '\000' <repeats 2022 times>
        smbc_notify = 0x7fffe9a5cf30
        smbc_opendir = 0x7fffe9a5d700
        smb_context = 0x7ffff7ff7528
        dir = 0x7fffe9a5d700
        count = 1000
#5  0x00007ffff7996354 in start_thread () from /usr/lib/libpthread.so.0
No symbol table info available.
#6  0x00007ffff76d4bfd in clone () from /usr/lib/libc.so.6
No symbol table info available.

It looks like global_ts hasn't been initialized in talloc_stack.c
Comment 6 Ross Lagerwall 2015-08-17 17:06:38 UTC
It looks like there're a couple of problems here:
talloc_*() calls SMB_THREAD_GET_TLS(global_ts) and expects it to return NULL if it has not been initialized. However, this ends up calling pthread_getspecific() which segfaults if it hasn't been initialized yet.

smbc_new_context calls SMB_THREAD_ONCE to run SMBC_module_init. This would eventually end up calling SMB_THREAD_ONCE to run talloc_stackframe_init. However, a global lock is used for this (once_mutex) and since the locks are not recursive, it deadlocks.
Comment 7 Jeremy Allison 2015-08-26 17:03:15 UTC
(In reply to Ross Lagerwall from comment #6)

I'll take a look at this. It used to work (I tested it :-) but hasn't been used in anger for a long time. Might need some care & attention.

Jeremy.
Comment 8 Jeremy Allison 2015-08-28 22:00:57 UTC
Arg. Got this code to work, but running valgrind --tool=helgrind or valgrind --tool=drd on libsmbclient is a horror show..

I'm going to have to do a lot of work to properly make this thread-safe :-(.
Comment 9 Jeremy Allison 2015-09-01 20:41:35 UTC
Had a long chat with Volker, something like the BKL is going to be needed for this.
Comment 10 Jeremy Allison 2015-09-04 21:04:43 UTC
Created attachment 11412 [details]
Includes Metze's patch.

This gets us further along, and I'm planning to add a global libsmbclient mutex to get us to a safe mt space with libsmbclient.

Jeremy.
Comment 11 voidcastr 2017-02-27 11:47:02 UTC
As I understand, this is a blocker to solving https://bugzilla.gnome.org/show_bug.cgi?id=776339 , which in turn results in inferior throughput during transfers to/from dynamically-mounted smb shares (affects GIO->GVFS and KIO, and therefore nautilus, pcmanfm, dolphin ...).

Is a thread safe context implementation still being considered/investigated/in progress?
Comment 12 Jeremy Allison 2017-02-27 16:47:44 UTC
I have it on the list of Summer of Code projects for Samba. Hoping this gets picked up by an eager student.
Comment 13 Shan 2017-09-18 14:36:04 UTC
Hi,

Is this bug fixed or do we have work around to use smbcontext in multithreading environment.

I don't see smbc_thread_posix() in any version of samba. What version of samba it was included . I can see this in source code but don't find symbols included.
Comment 14 Jeremy Allison 2017-09-18 18:44:58 UTC
This bug is not fixed, and there is no easy fix for this right now, sorry.
Comment 15 voidcastr 2021-09-05 12:30:54 UTC
Would any of the experienced developers kindly provide guidance to potential new contributors, on how to tackle this particular bug?

What exactly needs to be thread safe - both on an abstract level, and as concrete parts of the existing code? What are the necessary steps to reproduce the current issue/crash - and, in turn, to validate potential patches?

Judging from the proposal regarding a student developer, it appears that new contributors could indeed stand a chance. Yet, it seems like nobody picked this up, to date.

Most of the information in this bug is rather dated by now; did the challenges to overcome change in any way? Would something like the BKL (Big Kernel Lock?) still be required, or did more fine-grained ways of locking emerge in the meantime?

Just to re-iterate on the scope:
AFAIK (which could be wrong), this bug still indirectly affects countless Linux desktop users who work with SMB shares, since it is a blocker to GVFS bug https://gitlab.gnome.org/GNOME/gvfs/-/issues/292. Also, there should be an equivalent bug in KIO (KDE).

I'd gladly provide any help I can here. Unfortunately, I'm inexperienced in C - but I could for sure help with testing stuff.
Comment 16 Jeremy Allison 2021-10-04 18:46:33 UTC
(In reply to voidcastr from comment #15)

A BKL (aka global lock) around use of libsmbclient inside gvfs should allow it to be used in a pthread.