I have an opaque VFS layer, where symlink call is implemented. When I run it through vfstest, everything is fine and works. If I run through samba mounted share, the symlink succeeds, gets properly created, the function call corretly returns 0, but then later samba freezes for some 10+ seconds and after a timeout it will give Input/Output error (errno=29) to the 'ln' command. VFS Output: jk@zaya:~/src/dbvfs$ ./vfstest Initialising default vfs hooks vfstest $> debuglevel 10 debuglevel is 10 vfstest $> load dbvfs Initialising custom vfs hooks from [dbvfs] Probing module 'dbvfs' Probing module 'dbvfs': Trying to load from /usr/local/samba/lib/vfs/dbvfs.so Module '/usr/local/samba/lib/vfs/dbvfs.so' loaded load: ok vfstest $> connect dbvfs_connect(netlogon,vfstest) vfstest $> chdir opice/slepice/kopec dbvfs_chdir(opice/slepice/kopec) chdir: ok vfstest $> symlink iglu opice/slepice/kopec/eskymo dbvfs_symlink: start errno=0 dbvfs_symlink(iglu,opice/slepice/kopec/eskymo) db_symlink(25,eskymo,41471,iglu) dbvfs_symlink: done errno=0 symlink: ok vfstest $> quit SMBD Output: is_mangled cupak ? is_mangled_component cupak (len 5) ? New file cupak dbvfs_lstat(opice/slepice/kopec/cupak) call_trans2qfilepathinfo: SMB_VFS_LSTAT of opice/slepice/kopec/cupak failed (No such file or directory) set_bad_path_error: err = 2 bad_path = 0 error string = No such file or directory error packet at smbd/trans2.c(2197) cmd=50 (SMBtrans2) eclass=1 ecode=2 size=35 smb_com=0x32 smb_rcls=1 smb_reh=0 smb_err=2 smb_flg=136 smb_flg2=65 smb_tid=1 smb_pid=1 smb_uid=100 smb_mid=5 smt_wct=0 smb_bcc=0 write_socket(26,39) write_socket(26,39) wrote 39 got smb length of 111 got message type 0x0 of len 0x6f Transaction 9 of length 115 size=111 smb_com=0x32 smb_rcls=0 smb_reh=0 smb_err=0 smb_flg=8 smb_flg2=3 smb_tid=1 smb_pid=1 smb_uid=100 smb_mid=6 smt_wct=15 smb_vwv[ 0]= 33 (0x21) smb_vwv[ 1]= 7 (0x7) smb_vwv[ 2]= 64 (0x40) smb_vwv[ 3]=16480 (0x4060) smb_vwv[ 4]= 0 (0x0) smb_vwv[ 5]= 0 (0x0) smb_vwv[ 6]= 0 (0x0) smb_vwv[ 7]= 0 (0x0) smb_vwv[ 8]= 0 (0x0) smb_vwv[ 9]= 33 (0x21) smb_vwv[10]= 68 (0x44) smb_vwv[11]= 7 (0x7) smb_vwv[12]= 104 (0x68) smb_vwv[13]= 1 (0x1) smb_vwv[14]= 6 (0x6) smb_bcc=46 [000] 00 00 00 01 02 00 00 00 00 5C 6F 70 69 63 65 5C ........ .\opice\ [010] 73 6C 65 70 69 63 65 5C 6B 6F 70 65 63 5C 63 75 slepice\ kopec\cu [020] 70 61 6B 00 00 00 00 68 6C 75 70 61 6B 00 pak....h lupak. switch message SMBtrans2 (pid 15789) conn 0x838ea28 change_to_user: Skipping user change - already user unix_convert called on file "opice/slepice/kopec/cupak" stat_cache_lookup: lookup failed for name [OPICE/SLEPICE/KOPEC/CUPAK] stat_cache_lookup: lookup succeeded for name [OPICE/SLEPICE/KOPEC] -> [opice/slepice/kopec] dbvfs_stat(opice/slepice/kopec) dbvfs_stat(opice/slepice/kopec/cupak) unix_convert begin: name = opice/slepice/kopec/cupak, dirpath = opice/slepice/kopec, start = cupak is_mangled cupak ? is_mangled_component cupak (len 5) ? dbvfs_stat(opice/slepice/kopec/cupak) is_mangled cupak ? is_mangled_component cupak (len 5) ? dbvfs_opendir(opice/slepice/kopec) dbvfs_readdir() == (25,.) dbvfs_readdir() == (7,..) dbvfs_readdir() == (26,nanuk) dbvfs_readdir() == (27,za_kacku) dbvfs_readdir() == (28,makacku) dbvfs_readdir() == (29,hlusec) dbvfs_readdir() == (31,eskymo) dbvfs_closedir() is_mangled cupak ? is_mangled_component cupak (len 5) ? New file cupak call_trans2setfilepathinfo(6) opice/slepice/kopec/cupak (fnum -1) info_level=513 totdata=7 dos_mode: opice/slepice/kopec/cupak call_trans2setfilepathinfo: SMB_SET_FILE_UNIX_LINK doing symlink opice/slepice/kopec/cupak -> hlupak dbvfs_symlink: start errno=2 dbvfs_symlink(hlupak,opice/slepice/kopec/cupak) db_symlink(25,cupak,41471,hlupak) dbvfs_symlink: done errno=2 t2_rep: params_sent_thistime = 2, data_sent_thistime = 0, useable_space = 65475 t2_rep: params_to_send = 2, data_to_send = 0, paramsize = 2, datasize = 0 write_socket(26,62) write_socket(26,62) wrote 62 setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0 NT user token: (NULL) UNIX token of user 0 Primary group is 0 and contains 0 supplementary groups change_to_root_user: now uid=(0,0) gid=(0,0)
What happens when you gdb this ? As smbd is one monolithic process with your vfs it should be easy to attach a breakpoint to your symlink call and follow execution from its return. Jeremy.
I have even tried and recompiled the code with debug messages and run the smbd in debugger, but I didn't get anything out of it. The problem does not seem to be some error, but rather a wait in a response which makes the client to timeout and call it an Input/Output error, while the server actually does what is supposed to do correctly, just somehow fails to inform the client in timely manner. The problem of running in debugger is that the client just timeouts very fast and its not the same. I think it actually gets all the way to the select on the socket file descriptor in the process.c (the main loop) and then waits and when the client timeouts it gets a next message from it and that makes it go again. It somehow looks like the client does not get a response or gets a wrong type of response.
The error is more serious than I have previously thought. It affects CHMOD as well. And thanks to this it affects a mkdir command. Since mkdir does a mkdir call, then a chmod call. The only reason why mkdir does not fail is that after timeout it does a stat call, which goes through fine and so it suceeds, but with the timeout delay. (Which makes mkdir terribly slow.)
If this problem doesn't occur without your vfs, but does occur with it, then it's most likely a bug in your vfs. Without real debugging info there's not much I can do to help here. Jeremy.
I have some more details. I have tried to mount a normal non-VFS share and try the chmod. It looks like the problem is there too. It will timeout and give me Input/Output error. The test case is very easy, just create a share for one of your directories on linux, mount it with mount -t smbfs ... and then try to do chmod through samba. It will timeout and fail.
The symlink problem is there too on non-VFS share. Its the same thing. I seriously think its some problem in the call_trans2setfilepathinfo function that handles both calls. Alternatively there could be a bug in the receiving end in the smb client.
Can you send me the ethereal capture trace. It's almost certainly an smbfs client bug - it's not very well maintained whereas that part of the Samba server code is very well tested. Can you get smbclient to generate the same commands ? If so you will easily be able to confirm if it's a smbfs or smbd bug. The other thing you can do is test with the new cifsfs linux client to see if you can reproduce there. Jeremy.
Ok, so there is one more information bit, I have tried to connect from my other linux box with libsmbclient version 3.0.10 and everything went fine against the same 3.0.14a server. Including the VFS shares and non-VFS shares. What more, I have also tried the 3.0.14a client against an unmodified 3.0.10 server and it failed in the same way. (timeout, I/O error) So its definitely a newly introduced bug between 3.0.10 and 3.0.14a in the client handling of the reply from a TRANSACT2_SETFILEINFO / TRANSACT2_SETPATHINFO call. If you look in the diffs for the libsmb/ between 3.0.10 and 3.0.14a you will surely find some new change that can cause this.
As I have already asked - can you reproduce this with smbclient ? Not libsmbclient. Can you send me an ethereal capture trace please. Or am I confused and you're now telling me this is a bug in libsmbclient, not in the smbd server ? If so I'll assign it to Darrel, who is the libsmbclient maintainer. Jeremy.
(In reply to comment #9) > As I have already asked - can you reproduce this with smbclient ? Not > libsmbclient. Can you send me an ethereal capture trace please. Or am I confused > and you're now telling me this is a bug in libsmbclient, not in the smbd server ? > If so I'll assign it to Darrel, who is the libsmbclient maintainer. > Jeremy. > Yes, its very likely a bug in the client. When I tried to connect with the client to a different server, there was still this problem. But when I try to connect with 3.0.10-1-fc2 to this server, everything works. The interesting part is that if I try to downgrade debian packages to 3.0.10-1, its still broken. I tried to downgrade both smbfs and libsmbclient and it didn't help. So this is strange. It looks like the client is at fault though. Also, I cannot reproduce this with smbclient code, I can only reproduce it when I mount the volume using mount -t smbfs.
BTW I do not ignore the request for the ethereal output, but I don't have it installed yet and I don't know how to use it, so give me time to learn :)