Bug 5847 - SIGABORT in getattr()
Summary: SIGABORT in getattr()
Status: RESOLVED LATER
Alias: None
Product: Samba 3.2
Classification: Unclassified
Component: libsmbclient (show other bugs)
Version: unspecified
Hardware: x86 Linux
: P3 major
Target Milestone: ---
Assignee: Derrell Lipman
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2008-10-23 10:37 UTC by Mikhail Kshevetskiy (dead mail account)
Modified: 2009-05-12 12:25 UTC (History)
2 users (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Mikhail Kshevetskiy (dead mail account) 2008-10-23 10:37:38 UTC
using libsmbclient from latest git of samba-3.2 with smbnetfs leads to SIGABRT


Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb596cb90 (LWP 16931)]
0xb7f69424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7f69424 in __kernel_vsyscall ()
#1  0xb7a55640 in raise () from /lib/i686/cmov/libc.so.6
#2  0xb7a57018 in abort () from /lib/i686/cmov/libc.so.6
#3  0xb7a225ad in talloc_abort_double_free ()
   from /usr/local/samba-3.2/lib/libtalloc.so.1
#4  0xb7a2614a in talloc_free () from /usr/local/samba-3.2/lib/libtalloc.so.1
#5  0xb7bfcded in SMBC_getatr ()
   from /usr/local/samba-3.2/lib/libsmbclient.so.0
#6  0xb7c0123d in SMBC_stat_ctx ()
   from /usr/local/samba-3.2/lib/libsmbclient.so.0
#7  0x08052c09 in samba_getattr (
    path=0x9d3cbb0 "/mowd046a.ww600.siemens.net/PUBLIC/CIO_Division", 
    stbuf=0xb596c224) at function.c:399
#8  0xb7bb2352 in fuse_fs_getattr () from /usr/lib/libfuse.so.2
#9  0xb7bb315a in ?? () from /usr/lib/libfuse.so.2
Comment 1 Derrell Lipman 2008-10-23 10:54:59 UTC
Upon code inspection, I don't see quickly what the problem could be.  If you could compile and link libsmbclient with -g and get me a backtrace with line numbers, that might point me in the right direction.  Since you're using latest git, you can use 'configure.developer --enable-debug' to generate with debugging symbols.

Derrell
Comment 2 Mikhail Kshevetskiy (dead mail account) 2008-10-24 02:24:31 UTC
the same bug, but in other place

Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb1828b90 (LWP 8682)]
0xb7f3d424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7f3d424 in __kernel_vsyscall ()
#1  0xb7a36640 in raise () from /lib/i686/cmov/libc.so.6
#2  0xb7a38018 in abort () from /lib/i686/cmov/libc.so.6
#3  0xb7a035c4 in talloc_abort_unknown_value () at lib/talloc/talloc.c:148
#4  0xb7a04ecd in _talloc_realloc (context=0x81ebb18, ptr=0x0, size=8, 
    name=0xb7ec54c6 "lib/charcnv.c:582") at lib/talloc/talloc.c:160
#5  0xb7c063e6 in convert_string_allocate (ctx=0x81ebb18, from=CH_UTF16LE, 
    to=CH_UNIX, src=0x81edb5e, srclen=2, dst=0xb1827a88, 
    converted_size=0xb1827a4c, allow_bad_conv=true) at lib/charcnv.c:582
#6  0xb7c06d4d in convert_string_talloc (ctx=0x81ebb18, from=CH_UTF16LE, 
    to=CH_UNIX, src=0x81edb5e, srclen=2, dst=0xb1827a88, 
    allow_bad_conv=<value optimized out>) at lib/charcnv.c:764
#7  0xb7c06fb2 in pull_ucs2_base_talloc (ctx=0x81ebb18, base_ptr=0x82c3c40, 
    ppdest=0xb1827ca0, src=0x81edb5e, src_len=2, flags=<value optimized out>)
    at lib/charcnv.c:1604
#8  0xb7c070ac in pull_string_talloc_fn (function=0xb7ebbedb "", line=0, 
    ctx=0x81ebb18, base_ptr=0x21ea, smb_flags2=6, ppdest=0xb1827ca0, 
    src=0x81edb5e, src_len=2, flags=0) at lib/charcnv.c:1846
#9  0xb7c587f2 in clistr_pull_talloc_fn (function=0xb7ebbedb "", line=0, 
    ctx=0x81ebb18, cli=0x81d8000, pp_dest=0xb1827ca0, src=0x81edb5e, 
    src_len=2, flags=0) at libsmb/clistr.c:75
#10 0xb7c546af in interpret_long_filename (ctx=0x81ebb18, cli=0x81d8000, 
    level=260, p=0x81edb5e ".", pdata_end=0x81ee604 "", finfo=0xb1827c70,
    p_resume_key=0xb1827c58, p_last_name_raw=0xb1827c4c)
    at libsmb/clilist.c:189
#11 0xb7c54dc0 in cli_list_new (cli=0x81d8000, Mask=0x81d9c38 "\030", 
    attribute=22, fn=0xb7bcec88 <dir_list_fn>, state=0x81d9b28)
    at libsmb/clilist.c:396
#12 0xb7c55323 in cli_list (cli=0x21ea, Mask=0x81d9c38 "\030", attribute=0, 
    fn=0xb7bcec88 <dir_list_fn>, state=0x6) at libsmb/clilist.c:682
#13 0xb7bd01b5 in SMBC_opendir_ctx (context=0x8144570, 
    fname=0x8142834 "smb://witb159a.ww200.siemens.net/FSWITS_00023/Apollo_V1/Integrationfiles") at libsmb/libsmb_dir.c:783
Comment 3 Mikhail Kshevetskiy (dead mail account) 2008-10-24 02:29:55 UTC
Program received signal SIGABRT, Aborted.
[Switching to Thread 0xb59b1b90 (LWP 9161)]
0xb7fae424 in __kernel_vsyscall ()
(gdb) bt
#0  0xb7fae424 in __kernel_vsyscall ()
#1  0xb7aa7640 in raise () from /lib/i686/cmov/libc.so.6
#2  0xb7aa9018 in abort () from /lib/i686/cmov/libc.so.6
#3  0xb7a745ad in talloc_abort_double_free () at lib/talloc/talloc.c:143
#4  0xb7a7814a in talloc_free (ptr=0x9bdaf48) at lib/talloc/talloc.c:158
#5  0xb7c41ded in SMBC_getatr (context=0x9bb1bb8, srv=0x9bd0dd0, 
    path=0x9bf2048 "\\CDC", mode=0xb59b10fe, size=0xb59b1100, 
    create_time_ts=0x0, access_time_ts=0xb59b1110, write_time_ts=0xb59b1118, 
    change_time_ts=0xb59b1108, ino=0xb59b10f0) at libsmb/libsmb_file.c:558
#6  0xb7c4623d in SMBC_stat_ctx (context=0x9bb1bb8, 
    fname=0x9b49814 "smb://mowd046a.ww600.siemens.net/PUBLIC/CDC", 
    st=0xb59b1224) at libsmb/libsmb_stat.c:176
#7  0x0805393f in ?? ()
#8  0x09bb1bb8 in ?? ()
#9  0x09b49814 in ?? ()
#10 0xb59b1224 in ?? ()
#11 0x08055456 in ?? ()
#12 0x09bda660 in ?? ()
#13 0x00000000 in ?? ()
Comment 4 Derrell Lipman 2008-10-24 08:13:40 UTC
I don't see any problem with double freeing of memory in this code, or with anything associated with the stack trace you provided.  Given that you had a similar problem occur elsewhere, I suspect that your application may be corrupting memory.  Running your application under valgrind may help find the problem.

I'll close this bug for now.  If you find additional evidence that there's a problem here, feel free to re-open it and provide any additional info you can.

Derrell
Comment 5 Mikhail Kshevetskiy (dead mail account) 2008-10-24 11:20:12 UTC
The application under discussion is SMBNetFS. It works without serious problem with libsmbclient-3.0.x. Also, it has no known memory leaks or corruptions (checked with valgrind, but see samba bugs 5105, 5585, 3338, 3937, 3273).

It's hard to say what is the real problem. It appear on my office laptop (debian, linux-2.6.26, gcc-4.3.2, windows domain) and does not appear at home (slackware, linux-2.6.26, gcc-4.2.4, network without domain). In both cases I use the latest samba git (branch origin/v3-2-stable)

Mikhail
Comment 6 Derrell Lipman 2008-10-24 11:42:31 UTC
I'd be glad to fix a bug if I could find one...

Let's try to isolate this a bit better.  See if you can make this happen on your office network (or wherever you're currently having the problem) using the teststat utility in examples/libsmbclient instead of using SMBNetFS.  Run the application under valgrind if possible, so when it breaks, we'll be able to see where and why.

I've not used SMBNetFS, but if you can't make the problem occur with teststat then since you can make it happen with SMBNetFS, run that under valgrind and cause this problem to be activated.  Maybe we'll be able to figure out from the valgrind output what/where the problem is.

I'll re-open this ticket so that it doesn't get forgotten, but I think I need to wait for more input from you before I can do anything with it.

Derrell
Comment 7 Mikhail Kshevetskiy (dead mail account) 2008-10-27 10:28:24 UTC
It look like i found the root of the problem. This is thread safety. Early (libsmbclient-3.0.x) it was possible to work with 2 or more samba context simultaneously without locking. For example: smbnetfs can read file and scan network without mutual locking unless it use the same samba contexts for both operations. Currently this is broken.

Mikhail
Comment 8 Derrell Lipman 2008-10-27 10:56:54 UTC
Ah, ok.  We've had some discussion about this recently.  Although there is very little if anything left in the libsmbclient-specific code that is not reentrant, there are still Samba core areas that are not reentrant.  The fact that it worked in the older version is, unfortunately, just accidental.  We know it'd be nice to have a reentrant libsmbclient (and Samba core) but the code isn't there yet.  I don't know when Jeremy will be working on that.  We may get there in Samba3, or it may require waiting for a Samba4 version of libsmbclient.  In the mean time, I'd recommend implementing your own mutex of some sort to handle this in an application that requires concurrent threads executing inside of Samba.
Comment 10 luca 2009-05-12 12:25:54 UTC
Hi,

i don't know if this can help in some way, however i've experienced a similar problem only in a specific case:

- *CRON* running a script which use smbnetfs to "join" a windows network

if the smbnetfs command is executed from CRON, smbnetfs fail:
-------------------------------------------
OSError: [Errno 103] Software caused connection abort: '/home/duplicity/MPOINT/RETE/ANTEK/ADHOC-NEW/ADVISUAL'

This is how smbnetfs mount point then appear (cmd: smbnetfs -o ro /my/path/RETE):

drwxr-xr-x 2 duplicity duplicity 4096 2009-05-06 10:14 192.168.0.115
d????????? ? ?         ?            ?                ? RETE
-------------------------------------------

If the smbnetfs command is manually executed on the shell (my cron script found that it has been already mounted and do not try to execute smbnetfs again) everything works fine.

OS: Ubuntu 9.04
uname -a: Linux 2.6.28-11-generic #42-Ubuntu SMP Fri Apr 17 01:58:03 UTC 2009 x86_64 GNU/Linux
SMBNetFS version: 0.3.11a
FUSE library version: 2.7.4
fusermount version: 2.7.4
using FUSE kernel interface version 7.8