Bug 3835 - smbc_opendir on large directories returns wrong values w/ short timeout
Summary: smbc_opendir on large directories returns wrong values w/ short timeout
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: libsmbclient (show other bugs)
Version: 3.0.23
Hardware: x86 Other
: P3 normal
Target Milestone: none
Assignee: Derrell Lipman
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-06-14 04:46 UTC by Henrik
Modified: 2006-09-04 09:17 UTC (History)
1 user (show)

See Also:


Attachments
Small C++ tesfile using the exmaplecode from Samba. (5.70 KB, application/octet-stream)
2006-06-14 04:48 UTC, Henrik
no flags Details
Full backtrace of testprogram (19.40 KB, text/plain)
2006-06-30 10:28 UTC, Henrik
no flags Details
Small backtrace of testprogram (1.18 KB, text/plain)
2006-06-30 10:29 UTC, Henrik
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Henrik 2006-06-14 04:46:13 UTC
I'm trying to traverse a directory tree using libsmbclient, but I get some strange errors that I've failed to catch. I've attached a .cpp file with some test code.
I know my URLs aren't fully URLEncoded, but that is not the problem here.

If you have boost installed, compile the program with:
g++ -O3 -s traversedirtree.cpp -lsmbclient -lboost_regex -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE

If you do NOT have boost installed, then comment the first line "#define HAVEBOOST" and compile with:
g++ -O3 -s traversedirtree.cpp -lsmbclient -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE

g++ --version
Compiled with g++ (GCC) 4.0.2 20051125 (Red Hat 4.0.2-8)
uname -a
Linux fedora 2.6.13-1.1532_FC4smp #1 SMP Thu Oct 20 01:51:51 EDT 2005 i686 i686 i386 GNU/Linux
smbclient --version
Version 3.0.23rc1


In the output you can see "[0](URL)" and "[/0]". smbc_opendir is called after "[0](URL)" and finished before "[/0]". In between "Could not open directory" should be printed out if smbc_opendir returns a value < 0.

Example

First run, files are not in file cache:

[jonas@fedora]$ ./a.out smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core
[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core)[/0]
2164 files, 2 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn)[/0]
2169 files, 6 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources)[/0]
2169 files, 8 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/prop-base)[/0]
2799 files, 8 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/props)write_data: write failure in writing to client . Error Bad file descriptor
[/0]
4964 files, 8 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/text-base)[/0]
7129 files, 8 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp)[/0]
7130 files, 11 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/wcprops)[/0]
7130 files, 12 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn)[/0]
7134 files, 16 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/prop-base)[/0]
7134 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/props)[/0]
7134 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/text-base)[/0]
7134 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/wcprops)[/0]
7134 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/prop-base)[/0]
7134 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/props)[/0]
7135 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/text-base)[/0]
7136 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp)[/0]
7137 files, 20 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/wcprops)[/0]
7137 files, 21 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/prop-base)[/0]
7137 files, 21 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/props)[/0]
7137 files, 21 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/text-base)[/0]
7137 files, 21 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/wcprops)[/0]
7137 files, 21 dirs, 0 bytes.
[jonas@fedora]$

On line 5 on the first run, there is an error (write_data: write failure in writing to client . Error Bad file descriptor) but smbc_opendir did NOT return a value < 0.
Also notice the file count of 7137.

Second run, files are in file cache:

[jonas@fedora]$ ./a.out smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core
[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core)[/0]
2164 files, 2 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn)[/0]
2169 files, 6 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources)[/0]
2169 files, 8 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/prop-base)[/0]
4334 files, 8 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/props)[/0]
6499 files, 8 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/text-base)[/0]
8664 files, 8 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp)[/0]
8665 files, 11 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/wcprops)[/0]
8665 files, 12 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn)[/0]
8669 files, 16 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/prop-base)[/0]
8669 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/props)[/0]
8669 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/text-base)[/0]
8669 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/wcprops)[/0]
8669 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/prop-base)[/0]
8669 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/props)[/0]
8670 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/text-base)[/0]
8671 files, 17 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp)[/0]
8672 files, 20 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/wcprops)[/0]
8672 files, 21 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/prop-base)[/0]
8672 files, 21 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/props)[/0]
8672 files, 21 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/text-base)[/0]
8672 files, 21 dirs, 0 bytes.[0](smb://10.168.1.135/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/wcprops)[/0]
8672 files, 21 dirs, 0 bytes.
[jonas@fedora]$

No errors, and we found all 8672 files and 21 dirs, which are the correct numbers.
Obviously something happened on the smbc_opendir during the first run, but there was no error reported.

According to the comments in libsmbclient.h smbc_opendir should return the following:
	"Valid directory handle. < 0 on error with errno set"


We are using a shorter timeout than the standard 20 sek as you can see in the code which apparently effects this problem.
Comment 1 Henrik 2006-06-14 04:48:01 UTC
Created attachment 1960 [details]
Small C++ tesfile using the exmaplecode from Samba.

Just a small C++ testprogram that traverse some dirs.
Comment 2 Derrell Lipman 2006-06-26 21:08:18 UTC
This error appears to be occurring when a keep-alive packet can't be sent.  All other calls to the function which issues that error, write_data(), have other DEBUG messages that would have appeared.  If the error is occurring on a keep-alive packet, it is due to smbc_check_server() being called to ensure that the server is still good.  If the server is not good (the keep-alive could not be sent), the connection is reestablished.

Although I'm not sure why the connection is becoming gone, recovery seems to be occurring properly (which is why smbc_opendir() didn't return an error.  You didn't indicate that any entries were missing from the tree scan.

I believe this is an inoccuous debug message.  Please re-open the bug if you have additional information that indicates that something else is occurring.

Derrell

Comment 3 Henrik 2006-06-27 00:34:42 UTC
Hi Derrell
(In reply to comment #2)
> This error appears to be occurring when a keep-alive packet can't be sent.  All
> other calls to the function which issues that error, write_data(), have other
> DEBUG messages that would have appeared.  If the error is occurring on a
> keep-alive packet, it is due to smbc_check_server() being called to ensure that
> the server is still good.  If the server is not good (the keep-alive could not
> be sent), the connection is reestablished.
> 
We get this error very often if we stress the harddrives while doing the opendir call and as you can see we don't get the error when the files on the client has been enumerated once as on the seconds run they are still in the filecache. If we clear the filecache the problem reoccurs.

> Although I'm not sure why the connection is becoming gone, recovery seems to be
> occurring properly (which is why smbc_opendir() didn't return an error.  You
> didn't indicate that any entries were missing from the tree scan.
We are missing about 1500 files. (We get only 7137 instead of 8672). And we also get bad file descriptors errors. (Look at the two runs in the initial bug description).
> 
> I believe this is an inoccuous debug message.  Please re-open the bug if you
> have additional information that indicates that something else is occurring.
> 

Well, I can understand that timeouts can occur while opening large directories and you use shorter timeouts. The problem is that I can't catch the timeout error and make a retry with longer timeout if I get the error. The error must be catchable or I can never write code to handle it. :-)
Is there a way to make opendir only to retrieve a limited number of entries at a time?
> Derrell
> 
Cheers, Henrik
Comment 4 Derrell Lipman 2006-06-28 21:12:25 UTC
I'm trying to figure out how what's happening is happening.  Either write_data() is being passed fd == -1 or client_fd != -1, and neither one of those should ever be true in a simple libsmbclient application.

Henrik, would you please compile your test application and samba with -g -O0 and run it in a debugger.  Set a breakpoint at the DEBUG message in write_data(), at around source/lib/util_sock.c:561.  Run your application, and when it stops at the breakpoint, obtain a complete statck backtrace, including parameter values, and attach it to this bug.

Thanks!
Comment 5 Henrik 2006-06-30 10:28:56 UTC
Created attachment 2007 [details]
Full backtrace of testprogram
Comment 6 Henrik 2006-06-30 10:29:42 UTC
Created attachment 2008 [details]
Small backtrace of testprogram
Comment 7 Henrik 2006-06-30 10:31:10 UTC
Well it looks like we get fd == -1 after all.
I've attached both a full and a smaller backtrace.
Let me know if you need any more info.

Cheers,
Henrik
Comment 8 Derrell Lipman 2006-07-13 12:20:30 UTC
Henrik, the only place I can find that closes the connection without generating a DEBUG message is at clientgen.c:114.  Please apply the following patch and see if you get the new message displayed during the failure case.

Index: source/libsmb/clientgen.c
===================================================================
--- source/libsmb/clientgen.c	(revision 17018)
+++ source/libsmb/clientgen.c	(working copy)
@@ -109,6 +109,7 @@
 	/* If the server is not responding, note that now */
 
 	if (!ret) {
+                DEBUG(0, ("Receiving SMB: Server stopped responding\n"));
 		cli->smb_rw_error = smb_read_error;
 		close(cli->fd);
 		cli->fd = -1;
Comment 9 Derrell Lipman 2006-07-14 13:38:45 UTC
Checking a little bit further...  smb_read_error is defined as 0, so cli_is_error() does not recognize that an error has occurred.  Please try this patch to see if it fixes the problem.  I need to check with others to ensure that this doesn't break something else, but it'd be nice to know that the problem area has been found.

Index: libsmb/clientgen.c
===================================================================
--- libsmb/clientgen.c	(revision 17018)
+++ libsmb/clientgen.c	(working copy)
@@ -79,7 +79,6 @@
 
 BOOL cli_receive_smb(struct cli_state *cli)
 {
-	extern int smb_read_error;
 	BOOL ret;
 
 	/* fd == -1 causes segfaults -- Tom (tom@ninja.nl) */
@@ -109,7 +108,8 @@
 	/* If the server is not responding, note that now */
 
 	if (!ret) {
-		cli->smb_rw_error = smb_read_error;
+                DEBUG(0, ("Receiving SMB: Server stopped responding\n"));
+		cli->smb_rw_error = READ_TIMEOUT;
 		close(cli->fd);
 		cli->fd = -1;
 		return ret;
Comment 10 Henrik 2006-08-01 16:31:17 UTC
Hello Derrell,

Sorry for taking so long but somehow I haven't been notified from bugzilla that you posted the patch.
I'll test this first thing in the morning.

Cheers,
Henrik
Comment 11 Henrik 2006-08-02 09:15:38 UTC
Hi Derrell,

It seems that you found the right problem area.

Every time we get errors we get "Receiving SMB: Server stopped responding" BUT smbc_opendir does just sometimes return a value < 0 when the error occurs so we still get bad file descriptors further down.

Here is the call in our code (this is from the testprogram pasted before so you can find it all there)

 // retrieve contents of current directory
               errno = 0;
               std::cout << "[0](" << strEscapedDirPath << ")" << std::flush;
               if( (dir = smbc_opendir(strEscapedDirPath.c_str())) < 0 )
               {
                std::cout << "[1]Could not open directory [" << strEscapedDirPath << "] (" << errno << ":" << strerror(errno) << ")[/1]" << std::endl;
                       smbc_closedir( dir );
                       continue;
               }
               std::cout << "(" << dir << ")[/0]" << std::endl;




Here is an output of a run:

./test smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core
[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core)Receiving SMB: Server stopped responding
(10000)[/0]
874 files, 1 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn)write_data: write failure in writing to client . Error Bad file descriptor
(10000)[/0]
879 files, 2 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/prop-base)Receiving SMB: Server stopped responding
[1]Could not open directory [smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/prop-base] (110:Connection timed out)[/1]
[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/props)write_data: write failure in writing to client . Error Bad file descriptor
Receiving SMB: Server stopped responding
(10000)[/0]
980 files, 3 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/text-base)write_data: write failure in writing to client . Error Bad file descriptor
Receiving SMB: Server stopped responding
(10000)[/0]
1 082 files, 4 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp)write_data: write failure in writing to client . Error Bad file descriptor
(10000)[/0]
1 083 files, 5 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/wcprops)(10000)[/0]
1 083 files, 6 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/prop-base)(10000)[/0]
1 083 files, 7 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/props)(10000)[/0]
1 083 files, 8 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/text-base)(10000)[/0]
1 083 files, 9 dirs, 0 bytes.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn/tmp/wcprops)(10000)[/0]
1 083 files, 10 dirs, 0 bytes.



Cheers,
Henrik
Comment 12 Derrell Lipman 2006-08-02 09:26:23 UTC
Please confirm that you applied the *second* patch I sent (comment 9), not the first (comment 8).  Ensure that the patch included:

+               cli->smb_rw_error = READ_TIMEOUT;

Thanks,

Derrell
Comment 13 Henrik 2006-08-02 12:40:54 UTC
Yes, that is confirmed.

smbc_opendir still doesn't always return values < 0 even though the new DEBUG message is printed.
Could there be something wrong with the way smbc_opendir handels smb_rw_error?

I applied your patch to samba-latest.

Cheers,
Henrik
Comment 14 Derrell Lipman 2006-08-02 12:51:06 UTC
Ok.  I'll need to look at it again with this new information in mind.  If I haven't gotten back to you by the middle of next week, bug me.

Derrell
Comment 15 Derrell Lipman 2006-08-05 21:33:05 UTC
It looks like when cli_list() has retrieved any files, it does not return an error when the connection times out  Please try this patch.  It's probably not the correct fix, in that cli_list() should probably handle this error trapping directly, but for the time being, let's try catching it in smbc_opendir_ctx()...

Index: source/libsmb/libsmbclient.c
===================================================================
--- source/libsmb/libsmbclient.c	(revision 17431)
+++ source/libsmb/libsmbclient.c	(working copy)
@@ -2877,7 +2877,8 @@
 			
 			if (cli_list(targetcli, targetpath,
                                      aDIR | aSYSTEM | aHIDDEN,
-                                     dir_list_fn, (void *)dir) < 0) {
+                                     dir_list_fn, (void *)dir) < 0 ||
+                            cli_is_error(targetcli)) {
 
 				if (dir) {
 					SAFE_FREE(dir->fname);
Comment 16 Derrell Lipman 2006-08-05 21:34:42 UTC
BTW, the patch in comment 15 should be applied IN ADDITION TO the patch in comment 9.

Derrell
Comment 17 Henrik 2006-08-07 08:35:15 UTC
Hi Derrell,

Well it seems that that last piece of code at least did that we always get a timeout error when doing smc_opendir on large dirs. So we are now able to catch our exceptions.

We still got some weird bad file descriptors but they disappeared when we implemented our latest solution which works as follows:

1. Do a smbc_opendir and check that we got a valid handle back. 
2. If we get an error we free our context and initialize a new one with longer timeout than previous try and try again. We do this 3 times.

We have tested this solution for some time and we always get the right amount of files back.

If you have any other suggestions, comments on the solution we would be happy to hear them.

If you need any more tests conducted with code changes we'll gladly help you out.

Thanks,
Henrik
Comment 18 Derrell Lipman 2006-08-07 12:39:06 UTC
Ok, so now we know what's going on.  Try this patch and remove your reinitializaton of the context.  Reinitializing the context should not be necessary; we just need to ensure that the connection that's been shut down is not reused.

Note that the second hunk of this patch will fail since you've already applied it.  Not a problem.

Once we get the requirements for a patch worked out, I'll go back and try to get this fixed the right way, i.e. handling the error in cli_list instead of smbc_opendir_ctx.

Index: libsmb/libsmbclient.c
===================================================================
--- libsmb/libsmbclient.c	(revision 17431)
+++ libsmb/libsmbclient.c	(working copy)
@@ -2522,6 +2522,7 @@
         char *p;
 	SMBCSRV *srv  = NULL;
 	SMBCFILE *dir = NULL;
+        struct _smbc_callbacks *cb;
 	struct in_addr rem_ip;
 
 	if (!context || !context->internal ||
@@ -2877,7 +2878,8 @@
 			
 			if (cli_list(targetcli, targetpath,
                                      aDIR | aSYSTEM | aHIDDEN,
-                                     dir_list_fn, (void *)dir) < 0) {
+                                     dir_list_fn, (void *)dir) < 0 ||
+                            cli_is_error(targetcli)) {
 
 				if (dir) {
 					SAFE_FREE(dir->fname);
@@ -2905,6 +2907,27 @@
                                     }
                                 }
 
+                                /*
+                                 * If there was an error and the server is no
+                                 * good any more...
+                                 */
+                                cb = &context->callbacks;
+                                if (cli_is_error(targetcli) &&
+                                    cb->check_server_fn(context, srv)) {
+
+                                    /* ... then remove it. */
+                                    if (cb->remove_unused_server_fn(context,
+                                                                    srv)) { 
+                                        /*
+                                         * We could not remove the server
+                                         * completely, remove it from the
+                                         * cache so we will not get it
+                                         * again. It will be removed when the
+                                         * last file/dir is closed.
+                                         */
+                                        cb->remove_cached_srv_fn(context, srv);
+                                    }
+                                }
 				return NULL;
 
 			}
Comment 19 Henrik 2006-08-08 04:19:46 UTC
Hello again,

With all the patches up to and including the one you provided in comment #15 we always get the correct number of files and directories,
which is 8672 files, 22 dirs. The error we get is "errno: 110, Connection timed out", and as we said, if we reinitialize the context we won't get any bad file descriptor problems.

If we also use the patch in comment #18 smbc_opendir will print out "write_data: write failure in writing to client . Error Bad file descriptor" but at least we will still always get an error returned when we should (errno: 9, Bad file descriptor). We also dont need to reinitialize the context anymore as we had to do before the patch in comment #18.

Only question remaining is: Should we get a EBADF or a ETIMEDOUT when this error occurs? 
After the latest path we only get EBADF and not ETIMEDOUT. This of course doesn't effect our code just wanted to point it out. :-)

Here are the output with the latest patches (Note: we removed the byte count from the output, since it wasn't used).

./test smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core
timeout set to: 1000
[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core)Receiving SMB: Server stopped responding
write_data: write failure in writing to client . Error Bad file descriptor
[1]Could not open directory [smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core] (9:Bad file descriptor)[/1]
timeout set to: 2000
(-1)[/0]
[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core)Receiving SMB: Server stopped responding
write_data: write failure in writing to client . Error Bad file descriptor
[1]Could not open directory [smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core] (9:Bad file descriptor)[/1]
timeout set to: 4000
(-1)[/0]
[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core)timeout set to default: 1000
(10000)[/0]
2 164 files, 1 dirs.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/.svn)(10000)[/0]

-- snip --

8 672 files, 20 dirs.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/text-base)(10000)[/0]
8 672 files, 21 dirs.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level3/core/resources/.svn/tmp/wcprops)(10000)[/0]
8 672 files, 22 dirs.

Cheers,
Henrik
Comment 20 Henrik 2006-08-31 06:16:44 UTC
Hello Derrell,

I just wanted to know if you plan to add this to the svn repo?

Cheers,
Henrik
Comment 21 Derrell Lipman 2006-08-31 07:39:43 UTC
Yes, this is part of my planned upcoming work.
Comment 22 Derrell Lipman 2006-09-02 19:51:58 UTC
Check-in r18011 should fix this in the correct way.  I'm awaiting confirmation from Jeremy that there are no adverse effects from the change.

It should now be maintaining the correct errno value.  Please confirm.

Derrell
Comment 23 Henrik 2006-09-04 05:01:42 UTC
Hello Derrell,

We've done some tests with the 1811 revision and now we get the correct errno. (110: Connection timed out)

Here's some output
863 files, 15 dirs.[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level2  /html)Receiving SMB: Server stopped responding
write_data: write failure in writing to client . Error Bad file descriptor
[1]Could not open directory [smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level2/html] (110:Connection timed out)[/1]
timeout set to: 2000
(-1)[/0]
[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level2/html)Receiving SMB: Server stopped responding
write_data: write failure in writing to client . Error Bad file descriptor
[1]Could not open directory [smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level2/html] (110:Connection timed out)[/1]
timeout set to: 4000
(-1)[/0]
[0](smb://10.168.1.133/C$/Webkit/LayoutTests/dom/xhtml/level2/html)timeout set to: 1000
(10000)[/0]


Looking good hey? :-)

Thans for a job well done!
Cheers, Henrik
Comment 24 Derrell Lipman 2006-09-04 09:17:04 UTC
Thanks for the report and for testing!  Marking as Fixed.