Hello together, I recently came across a segmentation fault problem in smbclient (or more specific the functions interpret_long_filename and cli_list_new of clilist.c). (Checked only version 3.2.5) I get the following message when connect to my NAS system and performing an "ls". cli_list_new: Error: unable to parse name from info level 1 Segmentation fault Tracking down the problem revealed that the function interpret_long_filename (called from function cli_list_new) can return out of "switch (level) case (1)" without setting finfo->name which is then accessed within function cli_list_new. Here is the call scenario: 1. Function "cli_list_new" line 396 calls function "interpret_long_filename" 2. Function "interpret_long_filename" returns (line 91) finfo->name is not set 3. Function "cli_list_new" line 421 access with Null-Pointer, causing the seg. fault. This problem is not present for me in 3.0.33 (interpret_long_filename did not leave finfo->name unset). Regards, Marcus Schmitz
Can you upload a network trace of that error? Thanks, Volker
Created attachment 3787 [details] tcpdump trace This is the network trace: 192.168.113 is the NAS server 192.168.101 is the samba client
Sorry, we need the real trace, not the stdout of tcpdump. Please try tcpdump -i eth0 -n -s 1500 -w /tmp/trace.cap and attach the file /tmp/trace.cap to this bug report. Thanks, Volker
Created attachment 3789 [details] tcpdump -i eth0 -n -s 1500 -w /tmp/trace.cap new tcpdump trace generated with: tcpdump -i eth0 -n -s 1500 -w /tmp/trace.cap
Created attachment 3790 [details] Patch Great bug report thanks ! Patch attached should fix it. Jeremy.
Created attachment 3805 [details] trace ls and put after applying patch Hello, the bug fixed the seg fault, however, some problems remain: 1. performing a "ls" does not show all the files/dirs in the directory e.g.: smb: \neu\> ls cli_list_new: Error: unable to parse name from info level 1 . D 0 Sat Jan 31 08:10:00 2082 .. D 0 Sat Jan 31 08:10:00 2082 smb.conf A 0 Fri Mar 6 08:10:00 2082 cli_list_new: unable to parse name from info level 1 Here we should also see a directory called "test". 2. Changing into the dir "test" works, however executing a "put" command produces following output: smb: \neu\test\> put smb.conf Error writing file: ERRHRD - 39 putting file smb.conf as \neu\test\smb.conf Receiving SMB: Server stopped responding I am not sure to which extend these problems are related to the mention seg fault. Nevertheless I have attached another trace file. When trying 3.0.33 these problems don't show up. Have fun Marcus
According to the trace the NAS returns some weird error code responding to the write request from smbclient. What kind of NAS device is this?? Volker
It is a PyroGate MM-4110. I think it is also sold under different names by different "manufacturers". Maybe it wasn't the best buy ever :( In case it is "just" a hardware/firmware related problem I will stick to samba-3.0.33 for the moment, as I did not experience troubles here. Marcus
*** Bug 5942 has been marked as a duplicate of this bug. ***
Created attachment 3809 [details] Proposed patch to make sure all files are displayed Hi, could you try attached patch on top of current v3-2-test git or on your source plus Jeremy's patch?
The patch fixed the visibility problem :) Writing problem remains unchanged. Marcus PS: The used device is a RDC2882 based system with NAS-BASIC49 installed (NAS-BASIC48 show the same problems).
Two things. 1). I'm pretty sure the +1 was required by an OS/2 server. I'm hoping kukks can comment on this (he was the person with the capture trace that showed this). We may end up having to store a "server arch" flag in the cli struct and use two different codepaths here. 2). NAS-BASIC49 is a custom (Tiawanese I think) implementation of a CIFS server which only recently started supporting user-level security, so it may be a little behind the times. I'm guessing that the way our client write code stresses the server is causing the server to die. We need to know if the server is dying when we write to it. If so there's not much pressure to fix the directory listing bug as the server is not long for this world anyway. Jeremy.
(In reply to comment #12) > Two things. 1). I'm pretty sure the +1 was required by an OS/2 server. I'm > hoping kukks can comment on this (he was the person with the capture trace that > showed this). We may end up having to store a "server arch" flag in the cli > struct and use two different codepaths here. Sure, I'll wait for Günter's feedback before pushing anything here. > 2). NAS-BASIC49 is a custom (Tiawanese I think) implementation of a CIFS server > which only recently started supporting user-level security, so it may be a > little behind the times. I'm guessing that the way our client write code > stresses the server is causing the server to die. > We need to know if the server is dying when we write to it. If so there's not > much pressure to fix the directory listing bug as the server is not long for > this world anyway. Hm, is there a good way to detect this server if the server reply doesn't send it's OS and version? I've seen multiple reports with different NAS devices mentioned.
I've been thinking about this, the easiest way to fix this is to change the 'bool is_samba' in cli_struct to a enum arch_type (can't remember the exact name of the type :-) and then set it to RA_OS2 if the string OS/2 is detected in the server type (like we do with the "Samba" name in the cli_connect functions). Then we'll only use the +1 in the case where it's RA_OS2, and just len in every other case. We don't need to specifically test for the NAS-BASIC server type as it acts the same as Windows. It's OS/2 that's the odd one out here. Jeremy.
In February 2008, the main (OS/2 related) segfaulting issues were related to a) cli_list_new() not checking finfo->name == NULL after returning from interpret_long_filename() b) to avoid case a) fix interpret_long_filename() to be "OS/2 compatible" That days, interpret_long_filename() was using if (p + len + 2 > pdata_end) { and had "to be fixed to be OS/2 friendly" to if (p + len + 1 > pdata_end) { ------------------------- Kai's new patch if (p + len > pdata_end) { does also work with os/2 servers .... (as a side note) BUT - Marcus already said: "The patch fixed the visibility problem :) Writing problem remains unchanged." -------------------------------------------- So a _write_ (smbclient "put") problem is leftover here! What semantics/async write additions have been added lately to 3.2.x smbclient (libs) which are _not_ in 3.0.x ? Cheers, Günter
Ok Kai, if the +1 works with OS2 then check into all branches please. Jeremy.
(In reply to comment #15) > BUT - Marcus already said: > "The patch fixed the visibility problem :) Writing problem remains unchanged." > -------------------------------------------- > > So a _write_ (smbclient "put") problem is leftover here! > > What semantics/async write additions have been added lately to 3.2.x smbclient > (libs) which are _not_ in 3.0.x ? It'd be intersting to see a Samba 3.0 network trace putting a file onto this share to compare it with the failing put in 3.2.
Jeremy, Volker, I'm a bit confused when looking at the sniff "trace.cap" from Marcus using wireshark: 1.) There's a failing SMB WriteAndX response, but i don't see _any_ SMB WriteAndX request! 2.) Why is the write data send via NBSS continuation messages and not via tcp? Or is that just a different view, cause the sniff was done with "tcpdump"? I don't remember having seen this before ... You surely can explain that a bit further. :-) Cheers, Günter
Created attachment 3814 [details] trace: successful put via 3.0.33 Hello together, here is some more food for the wireshark ;) The attachment contains a trace taken while putting a file via 3.0.33. Hope this is of any help. Marcus
Created attachment 3816 [details] Manually revert padding added in 3d3d1b Hm, looking at traces I got in the Launchpad bug report, the only difference I can see is the null padding Jeremy added in 3d3d1b806aef3617abaac46daf230ed32076e2ce Here's a patch reverting the parts of the change that seem responsible for padding and bcc calculation. Can you check if that helps and get me a trace if it doesn't?
This null padding was added to make us match Windows. It also has the effect of allowing the server to use receivefile on smbclient connections (as it makes our write calls identical to those from a Windows client) so I don't think it's a good idea to lose this padding. Jeremy.
(In reply to comment #21) > This null padding was added to make us match Windows. It also has the effect of > allowing the server to use receivefile on smbclient connections (as it makes > our write calls identical to those from a Windows client) so I don't think it's > a good idea to lose this padding. I'm just trying to figure out if the padding causes the problem, as it's the only difference I found in the traces. I'm not planning to remove it, just trying to put a finger on the problem.
Created attachment 3818 [details] trace (padding removed) The padding doesn't seem to be the problem. After reverting the changes, the put problem remains. Marcus
Yeah that doesn't surprise me, the changes make us closer to a Windows client. So lets take a step back, the problem seems to be that when we do a put the server dies, correct ? My guess is that this server can't do max-mux packets, as Windows rarely sends more than one outstanding write. I'd try adding an sleep(1) between each packet send to see if it's a speed issue. Jeremy.
Marcus, please catch 2 new network sniffs when "putting/writing" the _same_ file (ASCII file around 5KB size) using - smbclient 3.2.x (failing) - smbclient 3.0.x (working) Cheers, Günter
Created attachment 3820 [details] trace ~5kB via 3.0.33 This is the trace of 3.0.33 (successful put). Marcus
Created attachment 3821 [details] trace ~5kB via 3.2.6 This is the trace of 3.2.6 (unsuccessful put). For both traces the same ascii file has been used. Jeremy, I have tried adding the sleep has suggested to slow down the packet sends, nevertheless, the problem persists. Marcus
Wow, that looks weird. Wireshark does not recognize the write request from 3.2.6 as such. Looking deeper.... Volker
Created attachment 3822 [details] wild guess... Can you try the attached patch, just to minimize the differences in behaviour. This should just avoid the header being sent in a separate tcp frame? Can you get us a new 3.2.6 sniff with that? Thanks, Volker
Created attachment 3823 [details] trace of wild guess patch :) Volker, Perfect! Your "wild guess" patch has worked :) The put-transfer now completes without error. A trace is attached. Marcus
Oh, that's very strange. All direct writes should do is split the TCP stream a little differently. In no way should a server be sensitive to that :-). Jeremy.
Created attachment 3824 [details] patch Attached find a patch that might be a real fix, not just a bad workaround. If you feel like it, you might try it as well. Jeremy, if that looks ok, I've got it as a 4-step patchset. Thanks, Volker
Oh, that looks like nice work ! Let's see if it fixes the issue with this server. I'm 100% convinced this is a server error, not a client one, but this is a really elegant workaround that maintains the speed of direct writes - thanks ! Jeremy.
I will check and report if the patch has fixed the problem as soon as I am back to "office". Marcus PS: I don't think Santa has to bring along a new NAS :)
I've posted your patch to the Ubuntu bugtracker as well and one of the Ububtu folks has created a deb for easy testing. It looks like this patch fixes the issue.
The patch also fixed the write problem for my NAS :) Well done! Nevertheless, to get around the visibility problem, patch 3809 from Kai had to be applied too. Marcus
Created attachment 3835 [details] Complete patch on top of the v3-2-stable branch The Debian packagers asked for a complete patch fixing this issue.