Bug 5034 - Files missing in folder listings on XP platform
Files missing in folder listings on XP platform
Status: NEW
Product: Samba 3.0
Classification: Unclassified
Component: VFS
3.0.25a
Sparc Windows XP
: P3 normal
: none
Assigned To: Samba Bugzilla Account
Samba QA Contact
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2007-10-23 01:45 UTC by sam.liapis
Modified: 2007-10-24 20:22 UTC (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description sam.liapis 2007-10-23 01:45:09 UTC
I'm running samba 3.0.25a on a Solaris 9 box with separate W2K and XP clients 
accessing a share. The issue I'm having is that not all files appear in folder 
listings via Explorer/DOS windows on the XP client. This only happens when an
app on the W2K side is simulateously accessing and creating files on the same 
share but not neccessarily in the same folder. I then started samba with level 
10 trace and examined the content after some test reruns. What I found is that 
every missing file in folder listing equates to a stat() call that has been 
interrupted during the readdir phase. 


Here's a relevant excerpt from the trace with a bad then good dir entry read: 

____________________________________________________________________________
FAILED PASS:
____________________________________________________________________________
... 
  dos_mode returning a[sparse] 
[2007/10/23 08:48:29, 5] smbd/trans2.c:get_lanman2_dir_entry(1255) 
  get_lanman2_dir_entry found archv6/arc14/0726/1RBE/1192/351-0-0-1.DS fname=351-0-0-1.DS 
[2007/10/23 08:48:29, 10] smbd/trans2.c:get_lanman2_dir_entry(1398) 
  get_lanman2_dir_entry: SMB_FIND_FILE_BOTH_DIRECTORY_INFO 
[2007/10/23 08:48:29, 10] smbd/mangle_hash2.c:name_map(617) 
  name_map: 351-0-0-1.DS -> 7F100938 -> 3Z96YI~G.DS (cache=1) 
[2007/10/23 08:48:29, 8] smbd/trans2.c:get_lanman2_dir_entry(1161) 
  get_lanman2_dir_entry:readdir on dirptr 0x3bd238 now at offset 16784629 
[2007/10/23 08:48:29, 5] smbd/trans2.c:get_lanman2_dir_entry(1221) 
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
  get_lanman2_dir_entry:Couldn't stat [archv6/arc14/0726/1RBE/1192/353-0-0-1.DV] (Interrupted system call)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 
[2007/10/23 08:48:29, 8] smbd/trans2.c:get_lanman2_dir_entry(1161) 
  get_lanman2_dir_entry:readdir on dirptr 0x3bd238 now at offset 16784650 
[2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode(371) 
  dos_mode: archv6/arc14/0726/1RBE/1192/353-0-0-1.DS 
[2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode_from_sbuf(188) 
  dos_mode_from_sbuf returning a 
[2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode(409) 
...
____________________________________________________________________________
GOOD PASS:
____________________________________________________________________________
...
  dos_mode returning a[sparse] 
[2007/10/23 08:48:29, 5] smbd/trans2.c:get_lanman2_dir_entry(1255) 
  get_lanman2_dir_entry found archv6/arc14/0726/1RBE/1192/353-0-0-1.DS fname=353-0-0-1.DS 
[2007/10/23 08:48:29, 10] smbd/trans2.c:get_lanman2_dir_entry(1398) 
  get_lanman2_dir_entry: SMB_FIND_FILE_BOTH_DIRECTORY_INFO 
[2007/10/23 08:48:29, 10] smbd/mangle_hash2.c:name_map(617) 
  name_map: 353-0-0-1.DS -> 21F83E5E -> 39FBAS~E.DS (cache=1) 
[2007/10/23 08:48:29, 8] smbd/trans2.c:get_lanman2_dir_entry(1161) 
  get_lanman2_dir_entry:readdir on dirptr 0x3bd238 now at offset 16784671 
[2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode(371) 
  dos_mode: archv6/arc14/0726/1RBE/1192/355-0-0-1.DV 
[2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode_from_sbuf(188) 
  dos_mode_from_sbuf returning a 
[2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode(409) 
... 

Note: If the W2K app which creates files at a rapid rate (10 files/sec) isn't 
running then all files list ok on the XP client. You can even browse folders 
concurrently from the W2K and XP clients and all files show ok. Its only when 
there's file creation interleaved with readdirs, stats, closes, etc running on
the W2K side that the same share has files go missing on the XP client side.

This problem DOESN'T occur if there's mulitple W2K ONLY clients running at the 
same time accessing and even creating, stat-ing, closing, etc files. I had a 
quick hunt through the source code and was wondering why the sys_stat() 
function in .../source/lib/system.c doesn't cater for EINTR errnos? 

The behaviour is reproducable under the above mentioned scenario.
Comment 1 Jeremy Allison 2007-10-23 12:14:48 UTC
Thaty's because a system that can return EINTR for a stat call is unimaginably broken.....

Seriously, can Solaris return EINTR on a stat call ? Under what circumstances ?

Jeremy.
Comment 2 sam.liapis 2007-10-24 00:01:38 UTC
(In reply to comment #1)
> That's because a system that can return EINTR for a stat call is unimaginably
> broken....Seriously, can Solaris return EINTR on a stat call ? Under what 
> circumstances ?
> Jeremy.


Solaris man pages for stat() lists EINTR as a possible return value:

...
EINTR A signal  was  caught  during  the  execution  of  the
      stat() or lstat() function.
...

The question came about since smb trace in this original bug submission
shows such a return value:

...
get_lanman2_dir_entry:Couldn't stat [.../353-0-0-1.DV] (Interrupted system call)
...

Delivered by this line of source from .../source/smdb/trans2.c:

...
1220               DEBUG(5,("get_lanman2_dir_entry:Couldn't stat [%s] (%s)\n",
1221                       pathreal,strerror(errno)));
...

Indicating the following stat64 call was interrupted during execution:

(From .../source/lib/system.c)
...
273  int sys_stat(const char *fname,SMB_STRUCT_STAT *sbuf)
...
277                ret = stat64(fname, sbuf);
...

Here's the truss of this stat call which produced the errno in question:

...
15150/1@1:  -> get_lanman2_dir_entry(0x419a60, 0x3d52a8, 0x3f56f8, 0xffbfee80)
...
15150/1@1:              -> vfswrap_stat(0x417178, 0xffbfe070, 0xffbfe870, 0x423a12)
15150/1@1:                -> sys_stat(0xffbfe070, 0xffbfe870, 0x0, 0x0)


15150/1:   Received signal #16, SIGUSR1, in stat64() [caught]


15150/1:   stat64("archv6/arc14/0726/1RBE/1192/353-0-0-1.DV", 0xFFBFE870) Err#91 ERESTART


15150/1:   sigprocmask(SIG_SETMASK, 0xFFBFCB34, 0x00000000) = 0
15150/1@1:                  -> sig_usr1(0x10, 0x0, 0xffbfcc18, 0x0)
15150/1@1:                    -> sys_select_signal(0x10, 0x0, 0x0, 0x0)
15150/1:   write(20, "10", 1)                              = 1
15150/1@1:                    <- sys_select_signal() = 0x3b1174
15150/1@1:                  <- sig_usr1() = 16
15150/1:   sigprocmask(SIG_SETMASK, 0xFF38A074, 0xFFBFC8E8) = 0
15150/1:   lwp_unpark(1, 1)                                = 0
15150/1:   setcontext(0xFFBFC8F8)
15150/1@1:                <- sys_stat() = -1
15150/1@1:              <- vfswrap_stat() = -1
15150/1@1:              -> lp_host_msdfs(0xffffffff, 0xffbfe070, 0xffbfe870, 0x423a12)
15150/1@1:              <- lp_host_msdfs() = 1
15150/1@1:              -> lp_msdfs_root(0x2, 0xffbfe070, 0xffbfe870, 0x423a12)
15150/1@1:              <- lp_msdfs_root() = 0
15150/1@1:              -> dptr_TellDir(0x3bf578, 0xffbfe070, 0xffbfe870, 0x423a12)
...

As the man page states `...A signal was caught during the execution...' in this
case a SIGUSR1 signal. I believe ERESTART is translated/mapped by the kernel/libc
to EINTR then returned to stat/64() hence `Interrupted system call' in smb trace.

Sam.

Comment 3 Jeremy Allison 2007-10-24 11:44:59 UTC
Wow - that's amazingly broken :-). On most systems the stat() family of calls are fast system calls that can't be interrupted by a signal I believe. So tell me, can unlink() return EINTR also on Solaris ? This opens up a hideous new can of worms on this platform. What other disk system calls can return EINTR.

Jeremy
Comment 4 sam.liapis 2007-10-24 20:22:12 UTC
(In reply to comment #3)
> Wow - that's amazingly broken :-).  On most systems the stat() family of 
> calls are fast system calls that can't be interrupted by signals I believe. 
> So tell me, can unlink() return EINTR also on Solaris ?  This opens up a 
> hideous new can of worms on this platform.  What other disk system calls
> can return EINTR.
> Jeremy

Yes unlink() returns EINTR for the same reason stat() does. Many other system
calls do the same such as access(), close(), chmod(), chown(), open(), read(), write() to name a few. In most if not all cases its due to signal interrupts.

Sam.