Bug 8701 - Directory listings fail under some situations with large number of files
Summary: Directory listings fail under some situations with large number of files
Status: RESOLVED WONTFIX
Alias: None
Product: Samba 3.2
Classification: Unclassified
Component: File services (show other bugs)
Version: 3.2.5
Hardware: x64 Linux
: P5 normal
Target Milestone: ---
Assignee: Volker Lendecke
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-01-13 21:10 UTC by Erik
Modified: 2012-01-14 08:36 UTC (History)
0 users

See Also:


Attachments
Server config (9.74 KB, application/octet-stream)
2012-01-13 21:10 UTC, Erik
no flags Details
Debug output from smbclient (9.71 KB, text/plain)
2012-01-13 21:14 UTC, Erik
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Erik 2012-01-13 21:10:40 UTC
Created attachment 7238 [details]
Server config

Server: Debian/Lenny, 2.6.26-2-amd64 
Samba package: 2:3.2.5-4lenny14 

This report was filed with Debian(#641598) and it was suggested that I file it here.

The folder contains 146 directories and 26461 files at the top level. When listing the folder contents only 4616 items are listed. 

My testing shows this happens with all Linux Samba client implementations I have access to:
  SMBClient under Ubuntu/10.04, Debian/Squeeze, Debian/Lenny, Debian/Etch
  GVFS Under Ubuntu/10.04

It happens from the server itself using the command line client. It does NOT happen with a Windows XP client.

It does NOT happen with files served by Windows Server 2003 and Linux smbclient and GVFS.

Note that some information has been redacted as it contains names and member numbers of our customers.

What I found is that Linux Samba clients are reporting an error at some point due to a certain file:
  Error: Looping in FIND_NEXT as name xxxxx_gggggggg, Amy.pdf has already been seen?

Using the Samba command line client I see duplicate entry during listing of files:
...
  xxxxx_gggggggg, Amy.pdf                 396332  Fri Jun 17 11:44:35 2011
  xxxxx_gggggggg, Amy.pdf                 396332  Fri Jun 17 11:44:35 2011
...
The above two lines are the 4499th and 4500th in the list.

The only other copy of the file in that folder is in a subfolder:
# find . -name '*xxxxx*' -print0 |xargs -0 ls -l
-rw-rw-r-- 1 root    DomainUsers 396332 2011-06-17 11:44 ./xxxxx_gggggggg, Amy.pdf
-rw-rw-r-- 1 l.olson DomainUsers 396332 2010-07-02 18:01 ./X St. Co-op PDF Membership Disc 1/xxxxx_gggggggg, Amy.pdf

I verified that there are no binary characters in the file name. 
If I remove the file the problem ceases. 
If I replace it with an empty but identically named file the error happens. 
The error does not happen if I alter the file name with prefixes or suffixes. 
I can prevent the problem from happening by creating an additonal file named: "xxxxx_gggggggg, AmyA.pdf". Note the "A" added to the name. Creating "xxxxx_gggggggg\,\ AmyZ.pdf" did nothing.
If I create another folder with 27,000 files(named 1 though 27000)  and 200 folders(named d1 though d200) and list its contents the problem does not happen. If I create an additional file in that folder named "xxxxx_gggggggg, Amy.pdf" the problem does not happen.

I created the identical directory structure as the problematic folder with the exception that all the files are empty. The command line client fails in an identical way, on the same file. I continued to test on this directory because it was not production data. 
If I move some directories away the problem ceases. Removing one directory with 65 files in it caused the problem to stop. Removing only the files in that directory did nothing. Removing some other directories does nothing. Whether a directory has this effect does not seem to depend on whether it is before or after the duplicate entries.
As far as I can tell it has to do with the size of data being sent rather than the count of files or folders. I can remove the first entry returned from the 'dir' command and the error does not happen the next time. If only the 8th item is removed the error does happen. 

I copied the test folder to another Lenny server with the same Samba version but a different smb.conf config. I was *NOT* able to reproduce the problem there.

Other notes:
  Samba shares are on an ext3 file system.
  The server where the problem happens is a production server so there are limits to the kind of testing I can do.

I am willing to send the following information but not publicly because the filenames themselves contain names and membership numbers of our customers:
  tcpdump capture of communications with server
  tarball of the directory(with empty files)
  debug log(-d10) from smbclient
  debug log from server(please let me know how you want this collected)
Comment 1 Erik 2012-01-13 21:14:45 UTC
Created attachment 7239 [details]
Debug output from smbclient
Comment 2 Volker Lendecke 2012-01-14 08:36:05 UTC
Can you reproduce this problem with Samba 3.6? Samba 3.2 is out of upstream support for a while. Please re-open this bug when you can reproduce it with either 3.5.12 or 3.6.1.

Thanks,

Volker