Bug 5953 - smbclient crashes: cli_list_new segmentation fault
smbclient crashes: cli_list_new segmentation fault
Status: RESOLVED FIXED
Product: Samba 3.2
Classification: Unclassified
Component: Client tools
3.2.5
All Linux
: P3 critical
: ---
Assigned To: Samba Bugzilla Account
Samba QA Contact
:
: 5942 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2008-12-07 15:07 UTC by Marcus T. Schmitz
Modified: 2008-12-29 08:11 UTC (History)
2 users (show)

See Also:


Attachments
tcpdump trace (3.47 KB, text/plain)
2008-12-07 16:19 UTC, Marcus T. Schmitz
no flags Details
tcpdump -i eth0 -n -s 1500 -w /tmp/trace.cap (3.92 KB, application/cap)
2008-12-08 10:57 UTC, Marcus T. Schmitz
no flags Details
Patch (435 bytes, patch)
2008-12-08 14:19 UTC, Jeremy Allison
no flags Details
trace ls and put after applying patch (33.21 KB, application/cap)
2008-12-09 15:24 UTC, Marcus T. Schmitz
no flags Details
Proposed patch to make sure all files are displayed (381 bytes, patch)
2008-12-13 01:23 UTC, Kai Blin
no flags Details
trace: successful put via 3.0.33 (11.59 KB, application/cap)
2008-12-17 15:12 UTC, Marcus T. Schmitz
no flags Details
Manually revert padding added in 3d3d1b (1.51 KB, patch)
2008-12-18 02:58 UTC, Kai Blin
no flags Details
trace (padding removed) (46.89 KB, application/cap)
2008-12-19 11:29 UTC, Marcus T. Schmitz
no flags Details
trace ~5kB via 3.0.33 (13.39 KB, application/cap)
2008-12-21 03:40 UTC, Marcus T. Schmitz
no flags Details
trace ~5kB via 3.2.6 (12.50 KB, application/cap)
2008-12-21 03:46 UTC, Marcus T. Schmitz
no flags Details
wild guess... (561 bytes, patch)
2008-12-21 04:05 UTC, Volker Lendecke
no flags Details
trace of wild guess patch :) (13.95 KB, application/cap)
2008-12-21 05:27 UTC, Marcus T. Schmitz
no flags Details
patch (6.36 KB, patch)
2008-12-23 00:19 UTC, Volker Lendecke
no flags Details
Complete patch on top of the v3-2-stable branch (7.06 KB, patch)
2008-12-29 08:11 UTC, Kai Blin
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Marcus T. Schmitz 2008-12-07 15:07:42 UTC
Hello together,

I recently came across a segmentation fault problem in smbclient (or more specific the functions interpret_long_filename and cli_list_new of clilist.c).
(Checked only version 3.2.5)

I get the following message when connect to my NAS system and performing an "ls".

cli_list_new: Error: unable to parse name from info level 1
Segmentation fault

Tracking down the problem revealed that the function interpret_long_filename (called from function cli_list_new) can return out of "switch (level) case (1)"
without setting finfo->name which is then accessed within function cli_list_new.

Here is the call scenario:
1. Function "cli_list_new" line 396 calls function "interpret_long_filename"
2. Function "interpret_long_filename" returns (line 91) finfo->name is not set
3. Function "cli_list_new" line 421 access with Null-Pointer, causing the seg. fault. 

This problem is not present for me in 3.0.33 (interpret_long_filename did not leave finfo->name unset).

Regards,
Marcus Schmitz
Comment 1 Volker Lendecke 2008-12-07 15:42:10 UTC
Can you upload a network trace of that error?

Thanks,

Volker
Comment 2 Marcus T. Schmitz 2008-12-07 16:19:56 UTC
Created attachment 3787 [details]
tcpdump trace

This is the network trace:
192.168.113 is the NAS server
192.168.101 is the samba client
Comment 3 Volker Lendecke 2008-12-08 02:35:20 UTC
Sorry, we need the real trace, not the stdout of tcpdump. Please try

tcpdump -i eth0 -n -s 1500 -w /tmp/trace.cap

and attach the file /tmp/trace.cap to this bug report.

Thanks,

Volker
Comment 4 Marcus T. Schmitz 2008-12-08 10:57:13 UTC
Created attachment 3789 [details]
tcpdump -i eth0 -n -s 1500 -w /tmp/trace.cap

new tcpdump trace generated with: tcpdump -i eth0 -n -s 1500 -w /tmp/trace.cap
Comment 5 Jeremy Allison 2008-12-08 14:19:33 UTC
Created attachment 3790 [details]
Patch

Great bug report thanks ! Patch attached should fix it.
Jeremy.
Comment 6 Marcus T. Schmitz 2008-12-09 15:24:54 UTC
Created attachment 3805 [details]
trace ls and put after applying patch

Hello,

the bug fixed the seg fault, however, some problems remain:

1. performing a "ls" does not show all the files/dirs in the directory
e.g.:

 smb: \neu\> ls
 cli_list_new: Error: unable to parse name from info level 1
   .                                   D        0  Sat Jan 31 08:10:00 2082
   ..                                  D        0  Sat Jan 31 08:10:00 2082
   smb.conf                            A        0  Fri Mar  6 08:10:00 2082
  cli_list_new: unable to parse name from info level 1

Here we should also see a directory called "test".


2. Changing into the dir "test" works, however executing a "put" command produces following output:

  smb: \neu\test\> put smb.conf
  Error writing file: ERRHRD - 39
  putting file smb.conf as \neu\test\smb.conf Receiving SMB: Server stopped responding

I am not sure to which extend these problems are related to the mention seg fault. Nevertheless I have attached another trace file. When trying 3.0.33 these problems don't show up.

Have fun
Marcus
Comment 7 Volker Lendecke 2008-12-09 15:36:20 UTC
According to the trace the NAS returns some weird error code responding to the write request from smbclient. What kind of NAS device is this??

Volker
Comment 8 Marcus T. Schmitz 2008-12-09 16:19:37 UTC
It is a PyroGate MM-4110. I think it is also sold under different names by different "manufacturers". 

Maybe it wasn't the best buy ever :( 

In case it is "just" a hardware/firmware related problem I will stick to samba-3.0.33 for the moment, as I did not experience troubles here.

Marcus

Comment 9 Kai Blin 2008-12-10 16:23:48 UTC
*** Bug 5942 has been marked as a duplicate of this bug. ***
Comment 10 Kai Blin 2008-12-13 01:23:45 UTC
Created attachment 3809 [details]
Proposed patch to make sure all files are displayed

Hi,
could you try attached patch on top of current v3-2-test git or on your source plus Jeremy's patch?
Comment 11 Marcus T. Schmitz 2008-12-13 04:08:17 UTC
The patch fixed the visibility problem :) Writing problem remains unchanged.

Marcus

PS: The used device is a RDC2882 based system with NAS-BASIC49 installed (NAS-BASIC48 show the same problems).
Comment 12 Jeremy Allison 2008-12-13 16:13:30 UTC
Two things. 1). I'm pretty sure the +1 was required by an OS/2 server. I'm hoping kukks can comment on this (he was the person with the capture trace that showed this). We may end up having to store a "server arch" flag in the cli struct and use two different codepaths here.
2). NAS-BASIC49 is a custom (Tiawanese I think) implementation of a CIFS server which only recently started supporting user-level security, so it may be a little behind the times. I'm guessing that the way our client write code stresses the server is causing the server to die.
We need to know if the server is dying when we write to it. If so there's not much pressure to fix the directory listing bug as the server is not long for this world anyway.
Jeremy.
Comment 13 Kai Blin 2008-12-14 02:11:06 UTC
(In reply to comment #12)
> Two things. 1). I'm pretty sure the +1 was required by an OS/2 server. I'm
> hoping kukks can comment on this (he was the person with the capture trace that
> showed this). We may end up having to store a "server arch" flag in the cli
> struct and use two different codepaths here.

Sure, I'll wait for Günter's feedback before pushing anything here.

> 2). NAS-BASIC49 is a custom (Tiawanese I think) implementation of a CIFS server
> which only recently started supporting user-level security, so it may be a
> little behind the times. I'm guessing that the way our client write code
> stresses the server is causing the server to die.
> We need to know if the server is dying when we write to it. If so there's not
> much pressure to fix the directory listing bug as the server is not long for
> this world anyway.

Hm, is there a good way to detect this server if the server reply doesn't send it's OS and version? I've seen multiple reports with different NAS devices mentioned.
Comment 14 Jeremy Allison 2008-12-14 03:18:04 UTC
I've been thinking about this, the easiest way to fix this is to change the 'bool is_samba' in cli_struct to a enum arch_type (can't remember the exact name of the type :-) and then set it to RA_OS2 if the string OS/2 is detected in the server type (like we do with the "Samba" name in the cli_connect functions). Then we'll only use the +1 in the case where it's RA_OS2, and just len in every other case.

We don't need to specifically test for the NAS-BASIC server type as it acts the same as Windows. It's OS/2 that's the odd one out here.

Jeremy.
Comment 15 Guenter Kukkukk 2008-12-15 21:57:56 UTC
In February 2008, the main (OS/2 related) segfaulting issues were related to 
  a) cli_list_new() not checking finfo->name == NULL after returning 
    from interpret_long_filename() 
  b) to avoid case a) fix interpret_long_filename() to be "OS/2 compatible" 
 
That days, interpret_long_filename() was using  
    if (p + len + 2 > pdata_end) { 
and had "to be fixed to be OS/2 friendly" to 
    if (p + len + 1 > pdata_end) { 
------------------------- 
 
Kai's new patch 
    if (p + len > pdata_end) { 
does also work with os/2 servers .... (as a side note) 
 
BUT - Marcus already said: 
 "The patch fixed the visibility problem :) Writing problem remains unchanged." 
-------------------------------------------- 
 
So a _write_ (smbclient "put") problem is leftover here! 
 
What semantics/async write additions have been added lately to 3.2.x smbclient  (libs) which are _not_ in 3.0.x ? 
 
Cheers, Günter
Comment 16 Jeremy Allison 2008-12-16 13:09:14 UTC
Ok Kai, if the +1 works with OS2 then check into all branches please.
Jeremy.
Comment 17 Kai Blin 2008-12-16 17:01:30 UTC
(In reply to comment #15)

> BUT - Marcus already said: 
>  "The patch fixed the visibility problem :) Writing problem remains unchanged." 
> -------------------------------------------- 
> 
> So a _write_ (smbclient "put") problem is leftover here! 
> 
> What semantics/async write additions have been added lately to 3.2.x smbclient 
> (libs) which are _not_ in 3.0.x ? 

It'd be intersting to see a Samba 3.0 network trace putting a file onto this share to compare it with the failing put in 3.2.
Comment 18 Guenter Kukkukk 2008-12-16 22:08:58 UTC
Jeremy, Volker,

I'm a bit confused when looking at the sniff "trace.cap" from Marcus
using wireshark:
 1.) There's a failing SMB WriteAndX response, but i don't see _any_ SMB
     WriteAndX request!
 2.) Why is the write data send via NBSS continuation messages and not via tcp?
     Or is that just a different view, cause the sniff was done with "tcpdump"?
     I don't remember having seen this before ...

You surely can explain that a bit further.  :-)
Cheers, Günter
Comment 19 Marcus T. Schmitz 2008-12-17 15:12:12 UTC
Created attachment 3814 [details]
trace: successful put via 3.0.33

Hello together,

here is some more food for the wireshark ;)

The attachment contains a trace taken while putting a file via 3.0.33.

Hope this is of any help.

Marcus
Comment 20 Kai Blin 2008-12-18 02:58:10 UTC
Created attachment 3816 [details]
Manually revert padding added in 3d3d1b

Hm, looking at traces I got in the Launchpad bug report, the only difference I can see is the null padding Jeremy added in 3d3d1b806aef3617abaac46daf230ed32076e2ce

Here's a patch reverting the parts of the change that seem responsible for padding and bcc calculation. Can you check if that helps and get me a trace if it doesn't?
Comment 21 Jeremy Allison 2008-12-18 12:25:34 UTC
This null padding was added to make us match Windows. It also has the effect of allowing the server to use receivefile on smbclient connections (as it makes our write calls identical to those from a Windows client) so I don't think it's a good idea to lose this padding.
Jeremy.
Comment 22 Kai Blin 2008-12-18 23:35:51 UTC
(In reply to comment #21)
> This null padding was added to make us match Windows. It also has the effect of
> allowing the server to use receivefile on smbclient connections (as it makes
> our write calls identical to those from a Windows client) so I don't think it's
> a good idea to lose this padding.

I'm just trying to figure out if the padding causes the problem, as it's the only difference I found in the traces. I'm not planning to remove it, just trying to put a finger on the problem.
Comment 23 Marcus T. Schmitz 2008-12-19 11:29:14 UTC
Created attachment 3818 [details]
trace (padding removed)

The padding doesn't seem to be the problem. After reverting the changes, the put problem remains.

Marcus
Comment 24 Jeremy Allison 2008-12-19 11:43:16 UTC
Yeah that doesn't surprise me, the changes make us closer to a Windows client.

So lets take a step back, the problem seems to be that when we do a put the server dies, correct ? My guess is that this server can't do max-mux packets, as Windows rarely sends more than one outstanding write. I'd try adding an sleep(1) between each packet send to see if it's a speed issue.

Jeremy.
Comment 25 Guenter Kukkukk 2008-12-20 22:16:08 UTC
Marcus,
please catch 2 new network sniffs when "putting/writing" the _same_
file (ASCII file around 5KB size) using
 - smbclient 3.2.x  (failing)
 - smbclient 3.0.x  (working)

Cheers, Günter
Comment 26 Marcus T. Schmitz 2008-12-21 03:40:14 UTC
Created attachment 3820 [details]
trace ~5kB via 3.0.33

This is the trace of 3.0.33 (successful put).
Marcus
Comment 27 Marcus T. Schmitz 2008-12-21 03:46:13 UTC
Created attachment 3821 [details]
trace ~5kB via 3.2.6

This is the trace of 3.2.6 (unsuccessful put). For both traces the same ascii file has been used. 

Jeremy,
I have tried adding the sleep has suggested to slow down the packet sends, nevertheless, the problem persists.

Marcus
Comment 28 Volker Lendecke 2008-12-21 03:53:48 UTC
Wow, that looks weird. Wireshark does not recognize the write request from 3.2.6 as such. Looking deeper....

Volker
Comment 29 Volker Lendecke 2008-12-21 04:05:41 UTC
Created attachment 3822 [details]
wild guess...

Can you try the attached patch, just to minimize the differences in behaviour. This should just avoid the header being sent in a separate tcp frame? Can you get us a new 3.2.6 sniff with that?

Thanks,

Volker
Comment 30 Marcus T. Schmitz 2008-12-21 05:27:15 UTC
Created attachment 3823 [details]
trace of wild guess patch :)

Volker,
Perfect! Your "wild guess" patch has worked :) The put-transfer now completes without error. A trace is attached.

Marcus
Comment 31 Jeremy Allison 2008-12-22 12:19:53 UTC
Oh, that's very strange. All direct writes should do is split the TCP stream a little differently. In no way should a server be sensitive to that :-).

Jeremy.
Comment 32 Volker Lendecke 2008-12-23 00:19:07 UTC
Created attachment 3824 [details]
patch

Attached find a patch that might be a real fix, not just a bad workaround. If you feel like it, you might try it as well.

Jeremy, if that looks ok, I've got it as a 4-step patchset.

Thanks,

Volker
Comment 33 Jeremy Allison 2008-12-23 14:35:51 UTC
Oh, that looks like nice work ! Let's see if it fixes the issue with this server. I'm 100% convinced this is a server error, not a client one, but this is a really elegant workaround that maintains the speed of direct writes - thanks !
Jeremy.
Comment 34 Marcus T. Schmitz 2008-12-23 17:46:51 UTC
I will check and report if the patch has fixed the problem as soon as I am back to "office".

Marcus

PS: I don't think Santa has to bring along a new NAS :)
Comment 35 Kai Blin 2008-12-24 01:43:34 UTC
I've posted your patch to the Ubuntu bugtracker as well and one of the Ububtu folks has created a deb for easy testing. It looks like this patch fixes the issue.
Comment 36 Marcus T. Schmitz 2008-12-27 11:52:17 UTC
The patch also fixed the write problem for my NAS :) Well done!

Nevertheless, to get around the visibility problem, patch 3809 from Kai had to be applied too. 

Marcus
Comment 37 Kai Blin 2008-12-29 08:11:11 UTC
Created attachment 3835 [details]
Complete patch on top of the v3-2-stable branch

The Debian packagers asked for a complete patch fixing this issue.