I have replaced an older AIX system with a new one running AIX 5.3, all the latest patches. It is acting as a PDC (I think irrelevant). The old server was running AIX 4.3.2 with Samba 3.0.14a (upgraded from 2.0.7) , and was working 100% fine. I had the old server running 3.0.14a for 6 weeks prior to the upgrade as part of my migration plan. There are Windows 98 boxes that connect to this server (workgroup), as well as XP SP2 boxes that connect to the server (domain). The shares that I am having problems with are on IBM's "jfs2" filesystem. The XP boxes are working perfectly. The Windows 98 boxes work to read and save files. HOWEVER... if one "Explores" into one of the folders, Samba goes into an endless loop. The little flashlight in Windows 98 Explorer just keeps waving back and forth. The behavior can be duplicated by going into a DOS prompt and doing a "DIR" on the shared directory. It is more obvious what is happening, because the screen updates continuously. It just scrolls forever. It gets to the end of the directory listing and starts again at the top...looping forever. 1. AIX 4.3.2, jfs, samba-3.0.14a worked perfectly 2. AIX 5.3, jfs2, samba-3.0.14a & samba-3.0.20pre2 have problem with Windows 98 computers
From samba email list: Jeremy Allison wrote: > On Wed, Aug 17, 2005 at 05:26:36PM -0500, Gerald (Jerry) Carter wrote: > > >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA1 >> >> Steve Williams wrote: >> >> >> >>> My "gut feeling" is that it is related to jfs2. No concrete proof though. This is the ONLY problem we encountered >>> with the entire upgrade, and the only thing that we did >>> "radically" different was use jfs2 rather than JFS. The advantage >>> we saw was that JFS2 can "shrink" the filesystems, which can >>> be nice in a year or two when requirements change. >>> >>> Did you do testing on AIX? I was not aware that I could get an "ext3" fs on AIX. If you are interested in persuing >>> this further, I will try to set things up to do some >>> troubleshooting... I am remote to the location & will >>> need to have someone work with me.. not a big deal, they >>> have a good summer student... but does need some coordination. >>> >> >> I spoke with Jeremy about it. He believes that it is a >> problem with the way we implement resume keys now. Apparently >> on;y win9x uses resume keys these days in the findfirst/findnext >> sequence. WinNT and later uses resume by name. >> > > > Although to confirm it I'd like to see a debug level 10 log > of one of your clients "looping" with a directory listing > against a 3.0.20 Samba server please. > > Jeremy. > > Hi, That's cool, I will try to get this for you tomorrow morning. How would you like me to get this to you? Cheers, Steve Williams
Created attachment 1384 [details] 3.0.14a OR 3.0.20 debug from PC looping I am not sure if this is a log file from 3.0.14a or from 3.0.20. To be honest, I am not even sure what level of log file it is! I was trying to troubleshoot the problem in a production environment. I include it here in case it will help. I will try to create a new debug level 10 ASAP
Further investigation has revealed the attached logfile is from 3.0.14a. I am not sure that it is an internal problem in Samba. I was running 3.0.14a on AIX 4.3.2 for several months prior to the upgrade with Windows 98 hosts accessing the system with no problem at all. I installed a new system with AIX 5.3 and a freshly compiled 3.0.14a and encountered the problem. If it was a problem with Samba internals, I would have thought that the problem would have arisen when I upgraded the original server from 2.0.7 to 3.0.14a.
OOPS.. I forgot to mention... I upgraded the server to 3.0.20 to find out if the problem went away. It did not. The 3.0.20 was compiled locally with IBM's C compiler. ./configure --prefix=/usr/local/samba-3.0.20 Looking at the configure and compile logfiles everything seemd to work fine.
Thereis a known bug with 3.14a that can cause this. Please test the same thing immediately with 3.0.20 as I've done a *lot* of work in this area between the two releases. Thanks, Jeremy.
Created attachment 1386 [details] log level 10 of 3.0.20rc2 and Win98 DOS client This is running on AIX 5.2 ML-06 on a JFS (not JFS2) filesystem.
Jerry, I have an appointment August 18 at 15:00 Eastern Canada time (GMT-5??) to get a debug level 10 from my client's system. Just a FYI.. Cheers, Steve
Created attachment 1388 [details] ZIP file containing debug 10 with AIX 5.3 and Samba 3.0.20 map_drive.log - This is a Windows 98 PC "win98test" connecting to a samba share called "\\OSHAWA\EKG", and mapping it to drive letter. log.win98test.part1 log.win98test.part2 Are debug level 10 output's from the "looping" problem.
Created attachment 1391 [details] lame patch...still digging on bad offset values. Ok, this patch is lame, but points out the flaw. I had to mod some DEBUGs to find it. SeekDir RewindDir's when dptr->offset==END_OF_DIRECTORY_OFFSET. The real problem is reply_search doesn't update the values with dptr_fill properly because of a scope problem in smbd/dir.c get_dir_entry() or AIX's telldir is broken...but since 3.0.11 works fine, I'm leaning toward the former.
I don't think this part of the patch is correct : void SeekDir(struct smb_Dir *dirp, long offset) { + + if ( dirp->offset == END_OF_DIRECTORY_OFFSET ) + return ; + Shouldn't this be void SeekDir(struct smb_Dir *dirp, long offset) { + + if ( offset == END_OF_DIRECTORY_OFFSET ) + return ; + instead ? Jeremy.
Created attachment 1392 [details] Proposed patch. Can you try this patch instead please. I think it may fix the problems with END_OF_DIRECTORY_OFFSET not being handled consistently. Thanks, Jeremy.
Reassigned to me - probably my bug. Jeremy.
Jeremy, What would you like me to try this patch against? In the interest of least change, I would be inclined to test it against 3.0.20pre2. However, I have downloaded & compiled 3.0.20 release. I'd need to put it into production, but that's not a big issue. What is your preference? Thanks, Steve
Try against 3.0.20 - that's what I've applied it to. If your analysis is correct on the mishandling of the END_OF_DIRECTORY "magic" value I'm hoping it'll work. You might want to try it on a non-production server first - especially if the looping behaviour is reproducible on demand. Jeremy
The primary problem is that AIX does not have a "DIR" abstraction between a "normal" directory entry (32 bit??) and 64 bit DIR entry. Instead, they chose to have a DIR, and a DIR64. The assumption throughout configure and Samba was that "DIR" would always be the "correct" type. Well, on AIX it isn't. This was causing configure to do assorted "random" things, mixing 32 bit & 64 bit calls, thus hammering memory, or subsequent calls not finding what they were expecting. There was a change to return properly at the end of a diretory, as well as changes to configure.in to test for "telldir64", etc. Jeremy added an an abstraction "SMB_STRUCT_DIR" which will always be either DIR, or DIR64 as appropriate. To properly resolve this problem, the following SVN patches were made. These have been applied to the 3.0.20 tree and have resolved the problem. Most of the work was done by William Jojo to troubleshoot this problem. svn_9456.patch svn_9481.patch svn_9484.patch svn_9534.patch svn_9536.patch svn_9545.patch Cheers, Steve This can be considered "RESOLVED".
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.
This PR is still not fully resolved. An end-less loop over `.' and `..' entries still occurs if the directory only contains entries that the client does not want to see, e.g. that are invisible or don't match a requested pattern) Suggested patch: *** dir.c.trunk Mon Aug 29 08:38:02 2005 --- dir.c.fix Mon Aug 29 08:37:56 2005 *************** *** 1136,1142 **** void RewindDir(struct smb_Dir *dirp, long *poffset) { SMB_VFS_REWINDDIR(dirp->conn, dirp->dir); ! dirp->file_number = 0; dirp->offset = START_OF_DIRECTORY_OFFSET; *poffset = START_OF_DIRECTORY_OFFSET; } --- 1136,1143 ---- void RewindDir(struct smb_Dir *dirp, long *poffset) { SMB_VFS_REWINDDIR(dirp->conn, dirp->dir); ! if (*poffset != DOT_DOT_DIRECTORY_OFFSET) ! dirp->file_number = 0; dirp->offset = START_OF_DIRECTORY_OFFSET; *poffset = START_OF_DIRECTORY_OFFSET; }
This patch is not correct. Still examining the right way to fix this. Jeremy.
I've now reproduced this. Working on a final fix. Jeremy.
Created attachment 1459 [details] Fix going into 3.0.20a.
Hopefully the long saga of this bug is now at an end.... Please test and let me know. Thanks, Jeremy.
(In reply to comment #21) Yes, this fixes the problem, -thanks!