Requirements: - Samba 3.0.25a on a FreeBSD 6.2-RELEASE-p5, build from ports. - Vista/Office2K7 tryouts on a UDF image - a samba config with a share definition which points to the image mountpoint. /mnt/office2k7 in this case is a memory-disk, produced by FreeBSD's mdconfig and then mounted locally via mount_udf (the whole stuff is similar to lunux mount -loop option). Bug is reproduceable. I can also provide an ssh access to the machine running this installation. Note: this used to work in Samba 3.0.24, a user was able to open a share, and samba was crashing later when user was performing some of the file operations. I'm quite sure this bug is a continuation of a bug 3683, described here - https://bugzilla.samba.org/show_bug.cgi?id=3683. Config follows: [global] workgroup = SOFTLAB machine password timeout = 0 netbios name = PANICBOX server string = Samba 3.0.25a on FreeBSD 6.2-RELEASE-p5 hosts allow = 192.168. 127. 172.16. guest account = pcguest map to guest = bad user log file = /var/log/samba/log.%m encrypt passwords = yes socket options = TCP_NODELAY dns proxy = no local master = no os level = 32 interfaces = fxp0 lo0 bind interfaces only = yes log level = 9 syslog = 4 deadtime = 15 wins server = 192.168.3.6 printing = BSD unix charset = KOI8-R dos charset = 866 passdb backend = smbpasswd security = user [office2k7] comment = Panic Campground path = /mnt/office2k7 browseable = yes guest ok = yes guest only = yes writeable = no #posix locking = no [public] comment = Public Share path = /usr/local/public browseable = yes guest ok = yes guest only = yes writeable = yes Log output follows: [2007/06/20 13:01:55, 8] smbd/dosmode.c:dos_mode_from_sbuf(188) dos_mode_from_sbuf returning rd [2007/06/20 13:01:55, 8] smbd/dosmode.c:dos_mode(409) dos_mode returning rd [2007/06/20 13:01:55, 5] smbd/trans2.c:get_lanman2_dir_entry(1255) get_lanman2_dir_entry found ./sources fname=sources [2007/06/20 13:01:55, 8] smbd/trans2.c:get_lanman2_dir_entry(1161) get_lanman2_dir_entry:readdir on dirptr 0x8384080 now at offset 156 [2007/06/20 13:01:55, 8] smbd/dosmode.c:dos_mode(371) dos_mode: ./support [2007/06/20 13:01:55, 8] smbd/dosmode.c:dos_mode_from_sbuf(188) dos_mode_from_sbuf returning rd [2007/06/20 13:01:55, 8] smbd/dosmode.c:dos_mode(409) dos_mode returning rd [2007/06/20 13:01:55, 5] smbd/trans2.c:get_lanman2_dir_entry(1255) get_lanman2_dir_entry found ./support fname=support [2007/06/20 13:01:55, 0] lib/fault.c:fault_report(41) =============================================================== [2007/06/20 13:01:55, 0] lib/fault.c:fault_report(42) INTERNAL ERROR: Signal 6 in pid 1756 (3.0.25a) Please read the Trouble-Shooting section of the Samba3-HOWTO [2007/06/20 13:01:55, 0] lib/fault.c:fault_report(44) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2007/06/20 13:01:55, 0] lib/fault.c:fault_report(45) =============================================================== [2007/06/20 13:01:55, 0] lib/util.c:smb_panic(1632) PANIC (pid 1756): internal error [2007/06/20 13:01:55, 0] lib/util.c:log_stack_trace(1786) unable to produce a stack trace on this platform [2007/06/20 13:01:55, 3] smbd/sec_ctx.c:push_sec_ctx(208) push_sec_ctx(1004, 1003) : sec_ctx_stack_ndx = 1 [2007/06/20 13:01:55, 3] smbd/uid.c:push_conn_ctx(358) push_conn_ctx(103) : conn_ctx_stack_ndx = 0 [2007/06/20 13:01:55, 3] smbd/sec_ctx.c:set_sec_ctx(243) setting sec ctx (0, 0) - sec_ctx_stack_ndx = 1 [2007/06/20 13:01:55, 5] auth/auth_util.c:debug_nt_user_token(448) NT user token: (NULL) [2007/06/20 13:01:55, 5] auth/auth_util.c:debug_unix_user_token(474) UNIX token of user 0 Primary group is 0 and contains 0 supplementary groups [2007/06/20 13:01:55, 0] lib/fault.c:dump_core(181) dumping core in /var/log/samba/cores/smbd
Can you run a version with symbols, set the parameter : panic action = "/bin/sleep 90000" re-create the panic and then attach to the parent of the sleep process with gdb and get me a backtrace please ? Thanks, Jeremy.
Still cannot obtain the backtrace. :/ When reassembled with -o0 -g options, and running unstrippped - started to work as intended, at least reverting to the bug 3683 original behaviour.
I am able to reproduce this. Tested with: FreeBSD 6.2-RELEASE, FreeBSD 6.2-RELEASE-p5 or FreeBSD 6.2-STABLE. The problem I have is identical, however, I'm using this with ntfs-3g and/or regular mount_ntfs. There is also a report of it occuring with ZFS: http://lists.freebsd.org/pipermail/freebsd-current/2007-May/072918.html There's a PR here on the issue with a memory disk: http://www.freebsd.org/cgi/query-pr.cgi?pr=113158 The PR states the problem doesn't exist in 3.0.24, I confirmed that it also does not exist in 3.0.23. This seems to have been introduced in 3.0.25. I tested it both with a ntfs drive and a file-based image. Empty drive loaded ok, but as soon as I put a file in it, it would crash. Its interesting that I created 3 directories, test, test2, and test3, but reading the output, it runs the get_lanman2_dir_entry stuff on test and test2, but doesn't get to test3. I examined this on my other drives, and it crashes on reading the last directory on all of them. I downgraded to samba-3.0.23 and the problem went away. Here's some debug output: [2007/06/30 21:16:05, 5] smbd/trans2.c:get_lanman2_dir_entry(1255) get_lanman2_dir_entry found ./test fname=test [2007/06/30 21:16:05, 10] smbd/trans2.c:get_lanman2_dir_entry(1398) get_lanman2_dir_entry: SMB_FIND_FILE_BOTH_DIRECTORY_INFO [2007/06/30 21:16:05, 8] smbd/trans2.c:get_lanman2_dir_entry(1161) get_lanman2_dir_entry:readdir on dirptr 0x836d180 now at offset 56 [2007/06/30 21:16:05, 8] smbd/dosmode.c:dos_mode(371) dos_mode: ./test2 [2007/06/30 21:16:05, 8] smbd/dosmode.c:dos_mode_from_sbuf(188) dos_mode_from_sbuf returning d [2007/06/30 21:16:05, 8] smbd/dosmode.c:dos_mode(409) dos_mode returning d [2007/06/30 21:16:05, 5] smbd/trans2.c:get_lanman2_dir_entry(1255) get_lanman2_dir_entry found ./test2 fname=test2 [2007/06/30 21:16:05, 10] smbd/trans2.c:get_lanman2_dir_entry(1398) get_lanman2_dir_entry: SMB_FIND_FILE_BOTH_DIRECTORY_INFO [2007/06/30 21:16:05, 0] lib/fault.c:fault_report(41) =============================================================== [2007/06/30 21:16:05, 0] lib/fault.c:fault_report(42) INTERNAL ERROR: Signal 6 in pid 2669 (3.0.25a) Please read the Trouble-Shooting section of the Samba3-HOWTO [2007/06/30 21:16:05, 0] lib/fault.c:fault_report(44) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2007/06/30 21:16:05, 0] lib/fault.c:fault_report(45) =============================================================== [2007/06/30 21:16:05, 0] lib/util.c:smb_panic(1632) PANIC (pid 2669): internal error [2007/06/30 21:16:05, 0] lib/util.c:log_stack_trace(1786) unable to produce a stack trace on this platform [2007/06/30 21:16:05, 3] smbd/sec_ctx.c:push_sec_ctx(208) push_sec_ctx(1001, 100) : sec_ctx_stack_ndx = 1 [2007/06/30 21:16:05, 3] smbd/uid.c:push_conn_ctx(358) Reproduction is quite simple, get a basic freebsd install, install samba and ntfs-3g (or use a memory based disk or ZFS). I initialized a 50mb image: dd if=/dev/zero of=ntfs-test.img count=100000 mkntfs -fF -c 512 -s 512 ntfs-test.img 100000 then mounted it ntfs-3g ntfs-test.img /mnt then I added "/mnt" to my share list. Put some directories in there.. mkdir /mnt/test1 /mnt/test2 Watch smbd crash. If instructed, I can generate additional crash dumps perhaps, but it would seem its unable to produce a stack trace, and I'm unsure why (sorry!).
I need a stack trace. I'm not a *BSD user and so it isn't so simple to reproduce for me. As it's happening in the directory code I have a sneaky feeling it's to do with this code : lib/replace/repdir_getdents.c /* a replacement for opendir/readdir/telldir/seekdir/closedir for BSD systems This is needed because the existing directory handling in FreeBSD and OpenBSD (and possibly NetBSD) doesn't correctly handle unlink() on files in a directory where telldir() has been used. On a block boundary it will occasionally miss a file when seekdir() is used to return to a position previously recorded with telldir(). This also fixes a severe performance and memory usage problem with telldir() on BSD systems. Each call to telldir() in BSD adds an entry to a linked list, and those entries are cleaned up on closedir(). This means with a large directory closedir() can take an arbitrary amount of time, causing network timeouts as millions of telldir() entries are freed Note! This replacement code is not portable. It relies on getdents() always leaving the file descriptor at a seek offset that is a multiple of DIR_BUF_SIZE. If the code detects that this doesn't happen then it will abort(). It also does not handle directories with offsets larger than can be stored in a long, --------------------- This only goes to show that you should never try and work around bugs in the underlying platform - you should always scream until they get fixed properly :-). I'm guessing the problem is the section that states : "It relies on getdents() always leaving the file descriptor at a seek offset that is a multiple of DIR_BUF_SIZE. If the code detects that this doesn't happen then it will abort()." Can you try disabling this code and see if the problem goes away ? I'm guessing the seek offset was always a multiple of DIR_BUF_SIZE on ufs, and isn't on other *BSD filesystems. Can you also check if the underlying bug has been fixed ? If so we'll just remove this code. If not then you're between a rock and a hard place. *BSD is still broken for large directories. Jeremy.
(In reply to comment #4) Hi, Jeremy! > "It relies on getdents() always leaving the file descriptor at a seek offset > that is a multiple of DIR_BUF_SIZE. If the code detects that this doesn't > happen then it will abort()." > > Can you try disabling this code and see if the problem goes away ? I'm guessing > the seek offset was always a multiple of DIR_BUF_SIZE on ufs, and isn't on > other *BSD filesystems. You was absolutely right in your guess, commenting out abort() let me to connect to the MS-DOS partition and get it's listing in smbclient. Still, an attempt to fetch the file leaded to the "short read" message and 0 length file on the client side. But, at least, we know where it comes from. > Can you also check if the underlying bug has been fixed ? If so we'll just > remove this code. If not then you're between a rock and a hard place. *BSD > is still broken for large directories. I remember old conversation between Tridge an PHK(phk@FreeBSD.org) about this problem and workaround, and IIRC, PHK said there is no bug in FreeBSD, but rather wrong(and Linux specific:) approach of the Samba team. I can try to ask him again, but don't expect that anything changed in the FreeBSD kernel. Is there a possibility not to rely on this very specific behaviour in Samba instead? With best regards, Timur.
I recently spoke to Julian Elisher about this. He thinks it might still be a problem in *BSD. You can comment out this replacement code for *BSD, but the problem will then be that you'll randomly miss files in directory listings, plus you'll still have the linked-list scaling problem. Someone who *knows the code* needs to examine the *BSD kernels and give a definitive answer on this. Jeremy.
*** Bug 4858 has been marked as a duplicate of this bug. ***
*** Bug 4738 has been marked as a duplicate of this bug. ***
I still need someone from the FreeBSD community to answer the question raised by this comment in the Samba code. This is needed because the existing directory handling in FreeBSD and OpenBSD (and possibly NetBSD) doesn't correctly handle unlink() on files in a directory where telldir() has been used. On a block boundary it will occasionally miss a file when seekdir() is used to return to a position previously recorded with telldir(). Has this bug been fixed in FreeBSD ? I can remove this replacement code in Samba, but if I do the underlying bug of missing entries on unlink will return. I urgently need confirmation from FreeBSD kernel developers. Jeremy.
Just a further note for Timur@FreeBSD.org. You say : "I remember old conversation between Tridge an PHK(phk@FreeBSD.org) bout this problem and workaround, and IIRC, PHK said there is no bug in FreeBSD, but rather wrong(and Linux specific:) approach of the Samba team." All Samba is doing in our directory code is the following : while ((n = vfs_readdirname(conn, dirp->dir))) { .... dirp->offset = SMB_VFS_TELLDIR(conn, dirp->dir); } Where dirp->dir is a directory handle opened with opendir. While this directory handle is open we expect that a SMB_VFS_SEEKDIR(dirp->conn, dirp->dir, offset); call will return to the same point in the directory, even after an unlink() call. All other platforms that Samba runs on have this property - does *BSD ? If you say yes, remember to check the case described above : "FreeBSD and OpenBSD (and possibly NetBSD) doesn't correctly handle unlink() on files in a directory where telldir() has been used. On a block boundary it will occasionally miss a file when seekdir() is used to return to a position previously recorded with telldir()." This code isn't going to change in Samba, it makes directory operations from CIFS clients fast even on large directories. Without it we have to rewinddir and seek from the start after every unlink. When deleting a directory this becomes order O(n^2) - not feasible in working code (and a DOS attack to boot). Jeremy.
Actually it's not true when I say this code isn't going to change. I can change the Samba code here, but if I do so what I will have to do is disable the name cache for *BSD systems. This will mean the performance on these systems goes down the toilet for large directories, but at least I won't get bug reports complaining about missing files. All directory operations will require a rewindir() and a search from the start on name on every FindNext operation. Is this what you want for *BSD ? This will hurt. Jeremy.
James Peach - can you bug Terry Lambert and Jordan Hubbard to get some feedback on the state of this bug within FreeBSD please ? Thanks, Jeremy.
Can you show me where in the documentation for any BSD OS it states that directory entry offsets do not change after deleting files in the directory? I think you are relying on undocumented, unspecified internal behavior on the other platforms, which happens to be different from the behavior on BSD. (It may also be different from how the other platforms behave next week.) Since SVR4 uses Berkeley FFS (ufs) and FFS is the foundation for Veritas vxfs I would expect to see the same problem on more than just BSD systems. QUESTION: What is in your "name cache": ASCII strings representing file names? Numbers representing directory entry offsets? Something else? Please bear with me: I don't have the time to become familiar w/ the Samba source.
No problem then - I'll just disable this code for *BSD. This is going to hurt you much more than it'll hurt me :-). I use Linux :-) :-). We're depending on seekdir(telldir()) being an identity, so long as the handle isn't closed in the meantime. That seems pretty reasonable to me, and indeed all other platforms seem to guarentee this. Jeremy.
(In reply to comment #9) > > I urgently need confirmation from FreeBSD kernel developers. > > Jeremy. Wish I can point to any appropriate one :( I'll send a call to the MLs, maybe, someone will raise his voice. Timur
Hi, Jeremy! (In reply to comment #14) > No problem then - I'll just disable this code for *BSD. This is going to hurt > you much more than it'll hurt me :-). I use Linux :-) :-). Can we isolate this code and leave some #ifdef that would choose (currently non-portable) caching with replacement lib or portable, but slow rewind/telldir/seekdir sequence. I can't give you educated answer right now, as I'm not kernel developer and can't find yet the one, familiar with VFS code enough, but that will make some compromise meanwhile for the next few versions of Samba. Two more questions - is it possible to perform deleting of several files in a directory in a batch, so rewinding will be done only once? How it happen, that prior to 3.0.25 people were able to use non-UFS partitions in FreeBSD without a problem? I.e. 3.0.24 still works ok with ISO9660, UDF, MSDOS, ZFS, etc. And for UFS the fix from Tridge works ok. I remember that originally it was implemented around 2005(http://samba.sernet.de/irclog/2005/01/20050130-Sun.log), but till now it wasn't a problem. Would it be possible to to revert back to .24 behavior? With regards, Timur.
Yes, that's what I'm talking about. It was an easy change to just delete the directory caching for *BSD - the problem is that for the sequence : findfirst -> findnext -> findnext....... findnext (end of dir) for a large directory the performance will be *horrible* without the directory cache as I have to rewinddir/readdir to find the correct resume point on every findnext. With a seekdir(telldir()) identity system we can associate the last 100 filenames read with a telldir() offset we know we can resume from - very efficient. We can't change the orders of deletes as this is completely client driven. We delete what the client told us to delete as it tells us to delete it. Jeremy.
Timur wrote : "Would it be possible to to revert back to .24 behavior?" Nope, sorry. That behaviour was the cause of bug reports and so that's why it was changed. Jeremy.
(In reply to comment #17) > Yes, that's what I'm talking about. It was an easy change to just delete the > directory caching for *BSD - the problem is that for the sequence : findfirst > -> findnext -> findnext....... findnext (end of dir) for a large directory the Can we meanwhile leave a #ifdef'ed code around this caching, something like: #ifdef FREEBSD_DIR_CACHE_OFF and make it possible to define macro during compilation so, that either caching code with the replacement telldir/seekdir will be used or no caching at all(i.e. rewind() after each delete). Then we can collect some user statistics and see, what is better for the end user - after all, directories with more than 1000 files not quick on UFS anyhow, but rare. Other option, but possibly too complex to implement is to fit cache into the size of a sector and make sure it doesn't cross the boundary. But that sounds overcomplicated. So, Jeremy, would it be possible to implement #ifdef solution for 3.0.26, for example? I guess, it's too late for 3.0.25c(but if you give me a patch, I can include it into the FreeBSD package). As for Samba4, let's see, what would be the outcome with the Samba3 first, before making similar changes there.
(In reply to comment #18) > Timur wrote : "Would it be possible to to revert back to .24 behavior?" > > Nope, sorry. That behaviour was the cause of bug reports and so that's why it > was changed. Just curious - what was the bug or what is it's number in bugzilla if any?
[sorry, made this comment on bug#4858 earlier, copying it here as well] I tested os2_delete.c on FreeBSD/amd64 6.2-stable (updated 8/2/2007) and FreeBSD/x86 6.2-release and both still fail the test. This issue arises (at least on my machine) because an assumption in the replacement code is violated -- that the file position is always padded out to 512byte alignment[*]. Is it possible to fix the assumption in the replacment code without throwing out the (caching) replacement code? [*] repdir_getdirentries line 135, abort() if d->seekpos & (DIR_BUF_SIZE-1))
FWIW, Terry says: There are two bugs; one is in Samba, the other in FreeBSD. The first is in the Samba assumptions about being able to delete, create, and iterate at the same time. POSIX specifies that it shall be possible to iterate and delete at the same time; however, it also states that an intervening change of position within the directopy may result in undefined behaviour: <http://www.opengroup.org/onlinepubs/009695399/functions/readdir.html> Specifically, from knowledge of historical implementation which are POSIX conformant, I would argue that any application which does this type of operation (deletion or creation of a file during the iteration) should expect that it may be required to perform duplicate elimination at a higher abstraction layer in their own software. Similarly, historical implementaitons involving NFS will potentially result in an incomplete iteration of directory contents. These particular problems arise because of cached multiple entries potentially being in a directory block which spans the area in which a new file is created, or file system or opendir implementations which make it impossible for a single directory entry block to be returned by the system's getdirentries (or equivalent) system call. In the FreeBSD case in particular, the directory block boundary problem can result in incorrect operation with an intervening delete; this is specific to the read-restart code at the VNOP implementation layer. The NetBSD implementation is more technically correct in this regard (supports arbitrary restart without lost entries), but both are insufficient. The particular issue is an interaction between the cached getdirentries system call (this is the system call on BSD-based systems), and the library implementation of seekdir/telldir. There are two ways of addressing this issue, but both will result in the necessity of duplicate suppression being implemented by the calling application. Particularly, coalescing of free space in directory entry blocks in UFS, or, in other FS's, the rebalancing of btrees makes it impossible to avoid the problem, when seeking between offsets representing once cached buffer object an another.
So should I take this : "I would argue that any application which does this type of operation (deletion or creation of a file during the iteration) should expect that it may be required to perform duplicate elimination at a higher abstraction layer in their own software." as a "will not fix" from you Terry ? :-). Do you know of *any* file management software that does this ? Jeremy.
Ok, I'm going to parameterize the directory cache so it can be disabled on *BSD systems. I'll try and get this into any releases after 3.0.25c. Jeremy.
Created attachment 2879 [details] Patch to turn off directory cache with new parameter. To test this set "directory name cache size = 0". We still need to detect the broken directory handling on *BSD and set the #define accordingly. Jeremy.
(In reply to comment #25) > Created an attachment (id=2879) [edit] > Patch to turn off directory cache with new parameter. > > To test this set "directory name cache size = 0". We still need to detect the > broken directory handling on *BSD and set the #define accordingly. Sorry for the delay, it's not so easy to test this feature, apparently. In fact, I'm not sure still - does it work for me or not... At minimum, repdir_* functions have to be disabled in the code, as otherwise they core dump on non-UFS FS anyhow. I attach a tiny patch to do this. Ok, with repdir_* disabled it was possible to test directory caching. I've created two shares, one on UFS2 FS, another on FAT16. For both shares I set 'directory name cache size'. On a first run I set cache to 0 for both shares, created 1200 files on each share and removed them via smbclient. All files has gone. Then I set cache size to 200 for both shares, assuming that for FAT16 it should improve speed and work and for UFS2 it should expose bug with some files remain undeleted. But, all the files have gone on UFS2 share as well. So, here I'm puzzled already. I understand, that such testing doesn't prove anything, as there is quite specific OS2 deletion pattern, that have to be followed to expose the bug with caching enabled for UFS2 and should be eliminated by disabling caching. But I'm not sure if it is possible to repeat this pattern with smbclient only. Possibly, an smbtorture test exists (or can be created) to test this behavior against SMB share. So, at this point I don't know, how I can check, that given patch works and addresses the bug in UFS2 dir handling. With regards, Timur
Created attachment 2905 [details] Q&D patch to disable repdir_* replacement functions.
(In reply to comment #1) > Can you run a version with symbols, set the parameter : > > panic action = "/bin/sleep 90000" > > re-create the panic and then attach to the parent of the sleep process with gdb > and get me a backtrace please ? Just happen to get the backtrace for the original bug, although, I guess, it's too late I add it for completeness. [2007/08/29 00:38:44, 1] smbd/service.c:make_connection_snum(1033) build (10.10.10.10) connect to service dos initially as user nobody (uid=65534, gid=65534) (pid 78443) [2007/08/29 00:38:45, 0] lib/fault.c:fault_report(41) =============================================================== [2007/08/29 00:38:45, 0] lib/fault.c:fault_report(42) INTERNAL ERROR: Signal 6 in pid 78443 (3.0.25c) Please read the Trouble-Shooting section of the Samba3-HOWTO [2007/08/29 00:38:45, 0] lib/fault.c:fault_report(44) From: http://www.samba.org/samba/docs/Samba3-HOWTO.pdf [2007/08/29 00:38:45, 0] lib/fault.c:fault_report(45) =============================================================== [2007/08/29 00:38:45, 0] lib/util.c:smb_panic(1626) smb_panic: clobber_region() last called from [get_lanman2_dir_entry(1140)] [2007/08/29 00:38:45, 0] lib/util.c:smb_panic(1632) PANIC (pid 78443): internal error [2007/08/29 00:38:45, 0] lib/util.c:log_stack_trace(1736) BACKTRACE: 19 stack frames: #0 0x8247278 <smb_panic+164> at /var/tmp/samba/sbin/smbd #1 0x82346e8 <debug_parse_levels+1224> at /var/tmp/samba/sbin/smbd #2 0x88826183 <sigaction+2503> at /usr/lib/libpthread.so.2 #3 0xbfbfff94 #4 0x888f39eb <abort+87> at /lib/libc.so.6 #5 0x822eac0 <seekdir+0> at /var/tmp/samba/sbin/smbd #6 0x8236ff7 <sys_telldir+27> at /var/tmp/samba/sbin/smbd #7 0x810dce5 <posix_mangle_init+1337> at /var/tmp/samba/sbin/smbd #8 0x809e5df <ReadDirName+143> at /var/tmp/samba/sbin/smbd #9 0x809e8a2 <dptr_SearchDir+126> at /var/tmp/samba/sbin/smbd #10 0x809e977 <dptr_ReadDirName+155> at /var/tmp/samba/sbin/smbd #11 0x80d986c <send_trans2_replies+9296> at /var/tmp/samba/sbin/smbd #12 0x80db970 <send_trans2_replies+17748> at /var/tmp/samba/sbin/smbd #13 0x80e1e84 <handle_trans2+4132> at /var/tmp/samba/sbin/smbd #14 0x80e6959 <reply_trans2+2957> at /var/tmp/samba/sbin/smbd #15 0x8100615 <smb_fn_name+981> at /var/tmp/samba/sbin/smbd #16 0x8101afd <smbd_process+2613> at /var/tmp/samba/sbin/smbd #17 0x82f1b41 <main+2261> at /var/tmp/samba/sbin/smbd #18 0x8089ce1 <_start+137> at /var/tmp/samba/sbin/smbd [2007/08/29 00:38:45, 0] lib/util.c:smb_panic(1637) smb_panic(): calling panic action [/bin/sleep 999999999] And gdb backtrace looks like: [Switching to LWP 100323] 0x88895db1 in wait4 () at wait4.S:2 2 RSYSCALL(wait4) (gdb) bt #0 0x88895db1 in wait4 () at wait4.S:2 #1 0x8885df34 in __system (command=0x88a1efb0 "/bin/sleep 999999999") at /usr/src/lib/libc/stdlib/system.c:91 #2 0x88820423 in _system (string=0x88a1efb0 "/bin/sleep 999999999") at /usr/src/lib/libpthread/thread/thr_system.c:47 #3 0x082472c0 in smb_panic (why=0x834fafa "internal error") at lib/util.c:1638 #4 0x082346e8 in sig_fault (sig=6) at lib/fault.c:47 #5 0x88826183 in _thr_sig_handler (sig=6, info=0xbfbf9980, ucp=0xbfbf96c0) at /usr/src/lib/libpthread/thread/thr_sig.c:392 #6 0xbfbfff94 in ?? () #7 0x00000006 in ?? () #8 0xbfbf9980 in ?? () #9 0xbfbf96c0 in ?? () #10 0x00000000 in ?? () #11 0x88825ddc in _thr_sig_dispatch (curkse=0xbfbf9a00, sig=-1077962232, info=0x0) at /usr/src/lib/libpthread/thread/thr_sig.c:291 #12 0x888f39eb in abort () at /usr/src/lib/libc/stdlib/abort.c:69 #13 0x0822eac0 in telldir (dir=0x88a0fc00) at lib/replace/repdir_getdirentries.c:135 #14 0x08236ff7 in sys_telldir (dirp=0x88a0fc00) at lib/system.c:492 #15 0x0810dce5 in vfswrap_telldir (handle=0x88a0f030, dirp=0x88a0fc00) at modules/vfs_default.c:127 #16 0x0809e5df in ReadDirName (dirp=0x88970540, poffset=0xbfbfa004) at smbd/dir.c:1169 #17 0x0809e8a2 in dptr_normal_ReadDirName (dptr=0x889b3680, poffset=0xbfbfa004, pst=0xbfbfb420) at smbd/dir.c:563 #18 0x0809e977 in dptr_ReadDirName (dptr=0x889b3680, poffset=0xbfbfa004, pst=0xbfbfb420) at smbd/dir.c:642 #19 0x080d986c in get_lanman2_dir_entry (conn=0x88a16030, inbuf=0x0, outbuf=0x88a49000 "", path_mask=0xbfbfb5b0 "*", dirtype=<error type>, info_level=260, requires_resume_key=4, dont_descend=0, ppdata=0xbfbfb538, base_data=0x88a91000 "`", space_remaining=14168, out_of_space=0xbfbfb53c, got_exact_match=0xbfbfb540, last_entry_off=0xbfbfb544, name_list=0x0, ea_ctx=0x0) at smbd/trans2.c:1149 #20 0x080db970 in call_trans2findfirst (conn=0x88a16030, inbuf=0x88a28000 "", outbuf=0x88a49000 "", bufsize=65535, pparams=0x88a1a368, total_params=18, ppdata=0x88a1a370, total_data=0, max_data_bytes=<error type>) at smbd/trans2.c:1857 #21 0x080e1e84 in handle_trans2 (conn=0x88a16030, state=0x88a1a230, inbuf=0x88a28000 "", outbuf=0x88a49000 "", size=90, bufsize=65535) at smbd/trans2.c:6382 #22 0x080e6959 in reply_trans2 (conn=0x88a16030, inbuf=0x88a28000 "", outbuf=0x88a49000 "", size=90, bufsize=65535) at smbd/trans2.c:6652 #23 0x08100615 in switch_message (type=50, inbuf=0x88a28000 "", outbuf=0x88a49000 "", size=90, bufsize=65535) at smbd/process.c:1003 #24 0x08101afd in smbd_process () at smbd/process.c:1030 #25 0x082f1b41 in main (argc=4, argv=0xbfbfecb0) at smbd/server.c:1120 Current language: auto; currently asm
(In reply to comment #24) > Ok, I'm going to parameterize the directory cache so it can be disabled on *BSD > systems. I'll try and get this into any releases after 3.0.25c. Tried this patch independently and in 3.0.26a - seems, it doesn't address the problem. Running 'smbtorture4 RAW-SEARCH' against UFS share with both enabled and disabled cache and native libc seekdir/telldir leaves files in the test directory. In case of caching enable 673 out of 700 files deleted, with disabled caching only 4 are... Would it be possible to retrieve type of FS underneath the share and use different seekdir/telldir routines? Actually, that possibly won't help either, as the same code used for all the FS types, just blocksize is different, what breaks replacement functions... I'm lost at the moment... How did it work in pre-3.0.25 era?
what is the current status of this problem on FreeBSD these days?
no feedback, also I assume, this is not a problem with latest Samba and latest FreeBSD these days. Open a new bug report, if it is still a problem.