I'm running samba 3.0.25a on a Solaris 9 box with separate W2K and XP clients accessing a share. The issue I'm having is that not all files appear in folder listings via Explorer/DOS windows on the XP client. This only happens when an app on the W2K side is simulateously accessing and creating files on the same share but not neccessarily in the same folder. I then started samba with level 10 trace and examined the content after some test reruns. What I found is that every missing file in folder listing equates to a stat() call that has been interrupted during the readdir phase. Here's a relevant excerpt from the trace with a bad then good dir entry read: ____________________________________________________________________________ FAILED PASS: ____________________________________________________________________________ ... dos_mode returning a[sparse] [2007/10/23 08:48:29, 5] smbd/trans2.c:get_lanman2_dir_entry(1255) get_lanman2_dir_entry found archv6/arc14/0726/1RBE/1192/351-0-0-1.DS fname=351-0-0-1.DS [2007/10/23 08:48:29, 10] smbd/trans2.c:get_lanman2_dir_entry(1398) get_lanman2_dir_entry: SMB_FIND_FILE_BOTH_DIRECTORY_INFO [2007/10/23 08:48:29, 10] smbd/mangle_hash2.c:name_map(617) name_map: 351-0-0-1.DS -> 7F100938 -> 3Z96YI~G.DS (cache=1) [2007/10/23 08:48:29, 8] smbd/trans2.c:get_lanman2_dir_entry(1161) get_lanman2_dir_entry:readdir on dirptr 0x3bd238 now at offset 16784629 [2007/10/23 08:48:29, 5] smbd/trans2.c:get_lanman2_dir_entry(1221) vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv get_lanman2_dir_entry:Couldn't stat [archv6/arc14/0726/1RBE/1192/353-0-0-1.DV] (Interrupted system call) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [2007/10/23 08:48:29, 8] smbd/trans2.c:get_lanman2_dir_entry(1161) get_lanman2_dir_entry:readdir on dirptr 0x3bd238 now at offset 16784650 [2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode(371) dos_mode: archv6/arc14/0726/1RBE/1192/353-0-0-1.DS [2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode_from_sbuf(188) dos_mode_from_sbuf returning a [2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode(409) ... ____________________________________________________________________________ GOOD PASS: ____________________________________________________________________________ ... dos_mode returning a[sparse] [2007/10/23 08:48:29, 5] smbd/trans2.c:get_lanman2_dir_entry(1255) get_lanman2_dir_entry found archv6/arc14/0726/1RBE/1192/353-0-0-1.DS fname=353-0-0-1.DS [2007/10/23 08:48:29, 10] smbd/trans2.c:get_lanman2_dir_entry(1398) get_lanman2_dir_entry: SMB_FIND_FILE_BOTH_DIRECTORY_INFO [2007/10/23 08:48:29, 10] smbd/mangle_hash2.c:name_map(617) name_map: 353-0-0-1.DS -> 21F83E5E -> 39FBAS~E.DS (cache=1) [2007/10/23 08:48:29, 8] smbd/trans2.c:get_lanman2_dir_entry(1161) get_lanman2_dir_entry:readdir on dirptr 0x3bd238 now at offset 16784671 [2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode(371) dos_mode: archv6/arc14/0726/1RBE/1192/355-0-0-1.DV [2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode_from_sbuf(188) dos_mode_from_sbuf returning a [2007/10/23 08:48:29, 8] smbd/dosmode.c:dos_mode(409) ... Note: If the W2K app which creates files at a rapid rate (10 files/sec) isn't running then all files list ok on the XP client. You can even browse folders concurrently from the W2K and XP clients and all files show ok. Its only when there's file creation interleaved with readdirs, stats, closes, etc running on the W2K side that the same share has files go missing on the XP client side. This problem DOESN'T occur if there's mulitple W2K ONLY clients running at the same time accessing and even creating, stat-ing, closing, etc files. I had a quick hunt through the source code and was wondering why the sys_stat() function in .../source/lib/system.c doesn't cater for EINTR errnos? The behaviour is reproducable under the above mentioned scenario.
Thaty's because a system that can return EINTR for a stat call is unimaginably broken..... Seriously, can Solaris return EINTR on a stat call ? Under what circumstances ? Jeremy.
(In reply to comment #1) > That's because a system that can return EINTR for a stat call is unimaginably > broken....Seriously, can Solaris return EINTR on a stat call ? Under what > circumstances ? > Jeremy. Solaris man pages for stat() lists EINTR as a possible return value: ... EINTR A signal was caught during the execution of the stat() or lstat() function. ... The question came about since smb trace in this original bug submission shows such a return value: ... get_lanman2_dir_entry:Couldn't stat [.../353-0-0-1.DV] (Interrupted system call) ... Delivered by this line of source from .../source/smdb/trans2.c: ... 1220 DEBUG(5,("get_lanman2_dir_entry:Couldn't stat [%s] (%s)\n", 1221 pathreal,strerror(errno))); ... Indicating the following stat64 call was interrupted during execution: (From .../source/lib/system.c) ... 273 int sys_stat(const char *fname,SMB_STRUCT_STAT *sbuf) ... 277 ret = stat64(fname, sbuf); ... Here's the truss of this stat call which produced the errno in question: ... 15150/1@1: -> get_lanman2_dir_entry(0x419a60, 0x3d52a8, 0x3f56f8, 0xffbfee80) ... 15150/1@1: -> vfswrap_stat(0x417178, 0xffbfe070, 0xffbfe870, 0x423a12) 15150/1@1: -> sys_stat(0xffbfe070, 0xffbfe870, 0x0, 0x0) 15150/1: Received signal #16, SIGUSR1, in stat64() [caught] 15150/1: stat64("archv6/arc14/0726/1RBE/1192/353-0-0-1.DV", 0xFFBFE870) Err#91 ERESTART 15150/1: sigprocmask(SIG_SETMASK, 0xFFBFCB34, 0x00000000) = 0 15150/1@1: -> sig_usr1(0x10, 0x0, 0xffbfcc18, 0x0) 15150/1@1: -> sys_select_signal(0x10, 0x0, 0x0, 0x0) 15150/1: write(20, "10", 1) = 1 15150/1@1: <- sys_select_signal() = 0x3b1174 15150/1@1: <- sig_usr1() = 16 15150/1: sigprocmask(SIG_SETMASK, 0xFF38A074, 0xFFBFC8E8) = 0 15150/1: lwp_unpark(1, 1) = 0 15150/1: setcontext(0xFFBFC8F8) 15150/1@1: <- sys_stat() = -1 15150/1@1: <- vfswrap_stat() = -1 15150/1@1: -> lp_host_msdfs(0xffffffff, 0xffbfe070, 0xffbfe870, 0x423a12) 15150/1@1: <- lp_host_msdfs() = 1 15150/1@1: -> lp_msdfs_root(0x2, 0xffbfe070, 0xffbfe870, 0x423a12) 15150/1@1: <- lp_msdfs_root() = 0 15150/1@1: -> dptr_TellDir(0x3bf578, 0xffbfe070, 0xffbfe870, 0x423a12) ... As the man page states `...A signal was caught during the execution...' in this case a SIGUSR1 signal. I believe ERESTART is translated/mapped by the kernel/libc to EINTR then returned to stat/64() hence `Interrupted system call' in smb trace. Sam.
Wow - that's amazingly broken :-). On most systems the stat() family of calls are fast system calls that can't be interrupted by a signal I believe. So tell me, can unlink() return EINTR also on Solaris ? This opens up a hideous new can of worms on this platform. What other disk system calls can return EINTR. Jeremy
(In reply to comment #3) > Wow - that's amazingly broken :-). On most systems the stat() family of > calls are fast system calls that can't be interrupted by signals I believe. > So tell me, can unlink() return EINTR also on Solaris ? This opens up a > hideous new can of worms on this platform. What other disk system calls > can return EINTR. > Jeremy Yes unlink() returns EINTR for the same reason stat() does. Many other system calls do the same such as access(), close(), chmod(), chown(), open(), read(), write() to name a few. In most if not all cases its due to signal interrupts. Sam.
hmm, is there anything that we can do here? Or do we finally have to declare Solaris as unsuitable for Samba?