The Samba-Bugzilla – Bug 2271
files missing in directory listing from smbclient 'dir' command with windows xp server
Last modified: 2005-08-24 10:20:11 UTC
This is the same as bug 939 but I can't reopen it because I don't own it. And I
have tested with smbclient 3.0.10
The XP server is SP1.
The directory has around 6000 files in it.
Different files will be missing from a 'dir' command at random.
The traces show find_next2 continuation packets with missing data - although
ethereal says the checksum etc was OK.
I have traces available if necessary.
It's critical because we can't really constantly dir the directory to get a good
union of all the files.
Note that sometimes at random the listing will work fine with smbclient. Grr.
Also if I don't use smbclient - but use the linux kernel's (2.6.9) filesystem I
see this in syslog:
smb_proc_readdir_long: error=-512, breaking
I have seen this behaviour a number of times too, especially when the no of
entries in a dir is > 1000 (not a typical testing scenario). And not just with
Windows XP but with earlier Win versions too. I believe this is not due to some
incorrect implementation of the CIFS protocol. The dir listing functionality in
samba depends on the CIFS server to maintain state information (how many dir-
entries have been passed to the client in FIND_FIRST/NEXT responses so far, and
where to resume listing from for the next FIND_NEXT request). However, it is
completely possible that the CIFS server doesn't handle this properly and skips
files at random now and then.
There is an alternative, we maintain stat info (last file received) and send
this to the CIFS server in the next request so it can start listing from the
next file. I have tried this fix on samba-3.0.9 and it seems to work. I am
figuring out how to get this patch in to the samba subversion.
Please send in the patch as a diff -u.
Created attachment 950 [details]
Diff -u output for clilist.c patch to fix samba dir listing problem
Attaching the diff -u output. You may see the backup operator flag turned on in
the FIND_* request, but it isnt actually necessary. All Ive basically done is
read the lastname from the listing and extracted all the lastname info into a
file_info struct and then used the name as the mask for the next FIND_* call. I
have called the function from python modules and seen it working. Hope it isnt
broken if called from elsewhere.
(In reply to comment #0)
> This is the same as bug 939 but I can't reopen it because I don't own it. And I
> have tested with smbclient 3.0.10
I see similar problems against a Win 2K server, using both 3.0.6 and 3.0.10
smbclient (on Mandrake 10.0.) Additionally this (not surprisingly) affects
a recursive mget.
One point I didn't see made in the original post: this seems to
affect some files more than others. In our case one file might
have problems half the time while others are seen/transferred
every time in 10 tests. This may have something to do with timing
or directory structure on the server. There is nothing special about
the files involved. At least they don't have peculiar names and
they aren't either zero bytes or particularly large.
A fix sooner rather than later would be very helpful, since this problem
is currently requiring a lot of manual intervention in an otherwise
fully automated data transfer process.
I tried Satwik's patch for solving the problems of the missing files.
Files stopped disappearing, but another problem cropped up.
This patch causes some smbclient commands to run *very* slowly, and
grow to an amazing size. I tried removing the backup flag, as Satwik
suggests. While smbclient still works, the test case is still slow.
First, my use and a description of the problem.
I am using a patched version of samba-3.0.10, which includes Debian
patches, on Debian Linux on an x86.
I use amanda for backing up file systems, using samba for its Windows
backups. When Satwik's patch is used, smbclient consistently runs
slowly on one of my file systems, and consistently runs at a
reasonable speed on most of my file systems, including some file
systems which previously lost files.
Here's one command which runs slowly with the patch:
smbclient '\\kinka\c$' -d 0 -U amanda -E -c 'recurse;du'
Without the patch, the command completes quickly.
kevin@nereocystis:/tmp$ time smbclient '\\kinka\c$' -d 0 -U amanda -E -c
39958 blocks of size 65536. 2277 blocks available
Total number of bytes: 2328492578
With Satwik's patch, after a few minutes, smbclient grows to > 1GB of
memory. I had this command running for a few hours without
returning. I killed this run after nearly 9 minutes.
Memory usage, from ps:
kevin 9443 8.1 55.6 1051560 287084 pts/4 S+ 11:11 0:41 smbclient
\\kinka\c$ -d 0 -U amanda -E -c recurse;du
Timing of command, killed after a while
kevin@nereocystis:/tmp$ time smbclient '\\kinka\c$' -d 0 -U amanda -E -c
The real time includes the time to enter the password, so it is
When I add the continue flag to this line
SSVAL(param,10,/*8+*/4+2); /* continue + continue + resume required + close
on end */
the tests fail, but the running time is back down.
Here's the test I use.
for num in 1 2 3 4 5 6 7 8 9; do smbclient '\\puffin\c$' -U 'amanda%password'
-E -d1 -Tcan /dev/null '/Kathy/DATA/2120 WNmod/*' > /tmp/foo/foolog$num.txt
for num in 2 3 4 5 6 7 8 9; do diff -u /tmp/foo/foolog1.txt
Created attachment 1019 [details]
Patch I've committed
This turned out to be a problem in the server code as well as the
client. I've committed the attached patch which I've tested under
valgrind (and it doesn't leak memory).
I'm closing this one out - the fix is in SVN. Please test and re-open if you see
any more problems.
Sorry. I tried this and still have missing files. The server machine is a
Windows machine. After it was rebooted, everything worked fine for an hour or
two. Now, it's back to failing again.
I'm using the SAMBA_3_0 branch. I hope that this is the correct branch. Please
reopen the bug. I don't seem to have permission to do this.
I'm reopening as per request, however the patch that have gone into the
SAMBA_3_0 tree is identical to Satwik's patch except for the flag bits change.
I'll examine that as a possibility.
I'm confused. The clilist.c which I have shows your patch, but not Satwik's
patch. This line is missing, for example.
+ SSVAL(param,4,16+4+2); /* backup + resume required + close on end */
Here's the info on clilist.c:
% svn info clilist.c
Repository UUID: 0c0555d6-39d7-0310-84fc-f1cc0bd64818
Node Kind: file
Last Changed Author: jra
Last Changed Rev: 5702
Last Changed Date: 2005-03-08 16:06:27 -0800 (Tue, 08 Mar 2005)
Text Last Updated: 2005-03-09 17:22:42 -0800 (Wed, 09 Mar 2005)
Properties Last Updated: 2005-03-09 13:07:14 -0800 (Wed, 09 Mar 2005)
That's correct - I didn't see how that could affect the problem being described
so I didn't add that part. I'll look into it asap.
Created attachment 1023 [details]
Ok, I've committed an additional change after analysing the actions of a XP
client against a Samba server. It never uses the "continue" flag, but always
does "new search, continue from this file" instead. That was the missing part
of the original patch.
This looks good. I'll be running a bunch of smbclients overnight for backup.
I'll give more information in 12-18 hours.
Now for an admission. I may have screwed up with my test of the earlier patch.
I think that it did not lose files, though it have problems with memory. In any
case, the newest patch looks much better.
Everything looks fine. This was one of the cleanest amanda runs I have had.
I'm sure that the patch was responsible for some of the improvement. Rebooting
most of the Microsoft machines after yesterday's power failure may have helped
Closing out as we have confirmation it works ! Thanks,
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.