Bug 2271 - files missing in directory listing from smbclient 'dir' command with windows xp server
Summary: files missing in directory listing from smbclient 'dir' command with windows...
Status: CLOSED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: smbclient (show other bugs)
Version: 3.0.10
Hardware: x86 Linux
: P3 critical
Target Milestone: none
Assignee: Jeremy Allison
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-01-20 22:17 UTC by Bill Ryder
Modified: 2005-08-24 10:20 UTC (History)
1 user (show)

See Also:


Attachments
Diff -u output for clilist.c patch to fix samba dir listing problem (2.30 KB, text/plain)
2005-02-06 06:12 UTC, Satwik Hebbar
no flags Details
Patch I've committed (1.36 KB, patch)
2005-03-08 17:01 UTC, Jeremy Allison
no flags Details
Additional patch (1.13 KB, patch)
2005-03-09 18:57 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Bill Ryder 2005-01-20 22:17:03 UTC
This is the same as bug 939 but I can't reopen it because I don't own it. And I
have tested with smbclient 3.0.10

The XP server is SP1.

The directory has around 6000 files in it.

Different files will be missing from a 'dir' command at random. 

The traces show find_next2 continuation packets with missing data - although
ethereal says the checksum etc was OK. 

I have traces available if necessary.

It's critical because we can't really constantly dir the directory to get a good
 union of all the files.
Comment 1 Bill Ryder 2005-01-20 22:30:30 UTC
Note that sometimes at random the listing will work fine with smbclient. Grr.

Also if I don't use smbclient - but use  the linux kernel's (2.6.9) filesystem I
see this in syslog:
 smb_proc_readdir_long: error=-512, breaking
Comment 2 Satwik Hebbar 2005-02-01 05:03:12 UTC
 I have seen this behaviour a number of times too, especially when the no of 
entries in a dir is > 1000 (not a typical testing scenario). And not just with 
Windows XP but with earlier Win versions too. I believe this is not due to some 
incorrect implementation of the CIFS protocol. The dir listing functionality in 
samba depends on the CIFS server to maintain state information (how many dir-
entries have been passed to the client in FIND_FIRST/NEXT responses so far, and 
where to resume listing from for the next FIND_NEXT request). However, it is 
completely possible that the CIFS server doesn't handle this properly and skips 
files at random now and then.

There is an alternative, we maintain stat info (last file received) and send 
this to the CIFS server in the next request so it can start listing from the 
next file. I have tried this fix on samba-3.0.9 and it seems to work.  I am 
figuring out how to get this patch in to the samba subversion.


Comment 3 Jeremy Allison 2005-02-01 19:17:53 UTC
Please send in the patch as a diff -u.
Thanks, Jeremy.
Comment 4 Satwik Hebbar 2005-02-06 06:12:21 UTC
Created attachment 950 [details]
Diff -u output for clilist.c patch to fix samba dir listing problem
Comment 5 Satwik Hebbar 2005-02-06 06:14:12 UTC
Hey Jeremy,

Attaching the diff -u output. You may see the backup operator flag turned on in 
the FIND_* request, but it isnt actually necessary. All Ive basically done is 
read the lastname from the listing and extracted all the lastname info into a 
file_info struct and then used the name as the mask for the next FIND_* call. I 
have called the function from python modules and seen it working. Hope it isnt 
broken if called from elsewhere.

Satwik.
Comment 6 David Mathog 2005-02-07 13:19:01 UTC
(In reply to comment #0)
> This is the same as bug 939 but I can't reopen it because I don't own it. And I
> have tested with smbclient 3.0.10

I see similar problems against a Win 2K server, using both 3.0.6 and 3.0.10
smbclient (on Mandrake 10.0.)  Additionally this  (not surprisingly) affects
a recursive mget.

One point I didn't see made in the original post: this seems to
affect some files more than others.  In our case one file might
have problems half the time while others are seen/transferred
every time in 10 tests.  This may have something to do with timing
or directory structure on the server.  There is nothing special about
the files involved. At least they don't have peculiar names and
they aren't either zero bytes or particularly large.

A fix sooner rather than later would be very helpful, since this problem
is currently requiring a lot of manual intervention in an otherwise 
fully automated data transfer process.
Comment 7 Kevin Dalley 2005-03-01 17:20:04 UTC
I tried Satwik's patch for solving the problems of the missing files.
Files stopped disappearing, but another problem cropped up.

This patch causes some smbclient commands to run *very* slowly, and
grow to an amazing size. I tried removing the backup flag, as Satwik
suggests.  While smbclient still works, the test case is still slow.  

First, my use and a description of the problem.

I am using a patched version of samba-3.0.10, which includes Debian
patches, on Debian Linux on an x86.

I use amanda for backing up file systems, using samba for its Windows
backups. When Satwik's patch is used, smbclient consistently runs
slowly on one of my file systems, and consistently runs at a
reasonable speed on most of my file systems, including some file
systems which previously lost files.

Here's one command which runs slowly with the patch:
smbclient '\\kinka\c$' -d 0 -U amanda -E -c 'recurse;du'

Without the patch, the command completes quickly.

kevin@nereocystis:/tmp$ time smbclient '\\kinka\c$' -d 0 -U amanda -E -c
'recurse;du'
Password:

                39958 blocks of size 65536. 2277 blocks available
Total number of bytes: 2328492578

real    0m14.404s
user    0m0.272s
sys     0m0.190s


With Satwik's patch, after a few minutes, smbclient grows to > 1GB of
memory.  I had this command running for a few hours without
returning. I killed this run after nearly 9 minutes.

Memory usage, from ps:
kevin     9443  8.1 55.6 1051560 287084 pts/4 S+  11:11   0:41 smbclient
\\kinka\c$ -d 0 -U amanda -E -c recurse;du

Timing of command, killed after a while
kevin@nereocystis:/tmp$ time smbclient '\\kinka\c$' -d 0 -U amanda -E -c
'recurse;du'
Password:


real    8m40.825s
user    0m16.887s
sys     0m26.555s

The real time includes the time to enter the password, so it is
artificially inflated.

When I add the continue flag to this line

 			SSVAL(param,10,/*8+*/4+2);	/* continue + continue + resume required + close
on end */

the tests fail, but the running time is back down.

Here's the test I use.

 for num in 1 2 3 4 5 6 7 8 9; do smbclient '\\puffin\c$' -U 'amanda%password'
-E -d1 -Tcan /dev/null  '/Kathy/DATA/2120 WNmod/*' > /tmp/foo/foolog$num.txt
2>&1; done
for num in 2 3 4 5 6 7 8 9; do diff -u /tmp/foo/foolog1.txt
/tmp/foo/foolog${num}.txt; done 
Comment 8 Jeremy Allison 2005-03-08 17:01:42 UTC
Created attachment 1019 [details]
Patch I've committed

This turned out to be a problem in the server code as well as the
client. I've committed the attached patch which I've tested under
valgrind (and it doesn't leak memory).
Jeremy.
Comment 9 Jeremy Allison 2005-03-08 17:07:21 UTC
I'm closing this one out - the fix is in SVN. Please test and re-open if you see
any more problems.
Jeremy.
Comment 10 Kevin Dalley 2005-03-09 17:48:19 UTC
Sorry.  I tried this and still have missing files.  The server machine is a
Windows machine.  After it was rebooted, everything worked fine for an hour or
two.  Now, it's back to failing again.

I'm using the SAMBA_3_0 branch.  I hope that this is the correct branch.  Please
reopen the bug.  I don't seem to have permission to do this.
Comment 11 Jeremy Allison 2005-03-09 18:06:09 UTC
I'm reopening as per request, however the patch that have gone into the
SAMBA_3_0 tree is identical to Satwik's patch except for the flag bits change.
I'll examine that as a possibility.
Jeremy.
Comment 12 Kevin Dalley 2005-03-09 18:27:11 UTC
I'm confused.  The clilist.c which I have shows your patch, but not Satwik's
patch.  This line is missing, for example.
+			SSVAL(param,4,16+4+2);	/* backup + resume required + close on end */

Here's the info on clilist.c:

% svn info clilist.c
Path: clilist.c
Name: clilist.c
URL: svn://svnanon.samba.org/samba/branches/SAMBA_3_0/source/libsmb/clilist.c
Repository UUID: 0c0555d6-39d7-0310-84fc-f1cc0bd64818
Revision: 5713
Node Kind: file
Schedule: normal
Last Changed Author: jra
Last Changed Rev: 5702
Last Changed Date: 2005-03-08 16:06:27 -0800 (Tue, 08 Mar 2005)
Text Last Updated: 2005-03-09 17:22:42 -0800 (Wed, 09 Mar 2005)
Properties Last Updated: 2005-03-09 13:07:14 -0800 (Wed, 09 Mar 2005)
Checksum: d9e377d33341bd3d8a9beeeffb451a36
Comment 13 Jeremy Allison 2005-03-09 18:45:24 UTC
That's correct - I didn't see how that could affect the problem being described
so I didn't add that part. I'll look into it asap.
Thanks,
Jeremy.
Comment 14 Jeremy Allison 2005-03-09 18:57:07 UTC
Created attachment 1023 [details]
Additional patch

Ok, I've committed an additional change after analysing the actions of a XP
client against a Samba server. It never uses the "continue" flag, but always
does "new search, continue from this file" instead. That was the missing part
of the original patch.
Jeremy.
Comment 15 Kevin Dalley 2005-03-10 00:30:13 UTC
This looks good.  I'll be running a bunch of smbclients overnight for backup.
I'll give more information in 12-18 hours.

Now for an admission.  I may have screwed up with my test of the earlier patch.
 I think that it did not lose files, though it have problems with memory. In any
case, the newest patch looks much better.

Thanks.
Comment 16 Kevin Dalley 2005-03-10 17:43:21 UTC
Everything looks fine.  This was one of the cleanest amanda runs I have had. 
I'm sure that the patch was responsible for some of the improvement.  Rebooting
most of the Microsoft machines after yesterday's power failure may have helped
as well.
Comment 17 Jeremy Allison 2005-03-10 18:17:44 UTC
Closing out as we have confirmation it works ! Thanks,
Jeremy.
Comment 18 Gerald (Jerry) Carter (dead mail address) 2005-08-24 10:20:11 UTC
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.