Bug 2828 - Call to SMB_VFS_SYMLINK succeeds, but later fails in transfer layer with errno 29
Summary: Call to SMB_VFS_SYMLINK succeeds, but later fails in transfer layer with errn...
Status: ASSIGNED
Alias: None
Product: Samba 3.0
Classification: Unclassified
Component: smbmount (unmaintained) (show other bugs)
Version: 3.0.14a
Hardware: All Linux
: P3 major
Target Milestone: none
Assignee: Samba Bugzilla Account
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2005-06-27 08:50 UTC by Jiri Klouda
Modified: 2005-06-28 13:37 UTC (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jiri Klouda 2005-06-27 08:50:55 UTC
I have an opaque VFS layer, where symlink call is implemented. When I run it
through vfstest, everything is fine and works. If I run through samba mounted
share, the symlink succeeds, gets properly created, the function call corretly
returns 0, but then later samba freezes for some 10+ seconds and after a timeout
it will give Input/Output error (errno=29) to the 'ln' command.

VFS Output:

jk@zaya:~/src/dbvfs$ ./vfstest
Initialising default vfs hooks
vfstest $> debuglevel 10
debuglevel is 10
vfstest $> load dbvfs
Initialising custom vfs hooks from [dbvfs]
Probing module 'dbvfs'
Probing module 'dbvfs': Trying to load from /usr/local/samba/lib/vfs/dbvfs.so
Module '/usr/local/samba/lib/vfs/dbvfs.so' loaded
load: ok
vfstest $> connect
dbvfs_connect(netlogon,vfstest)
vfstest $> chdir opice/slepice/kopec
dbvfs_chdir(opice/slepice/kopec)
chdir: ok
vfstest $> symlink iglu opice/slepice/kopec/eskymo
dbvfs_symlink: start errno=0
dbvfs_symlink(iglu,opice/slepice/kopec/eskymo)
db_symlink(25,eskymo,41471,iglu)
dbvfs_symlink: done errno=0
symlink: ok
vfstest $> quit

SMBD Output:

is_mangled cupak ?
is_mangled_component cupak (len 5) ?
New file cupak
dbvfs_lstat(opice/slepice/kopec/cupak)
call_trans2qfilepathinfo: SMB_VFS_LSTAT of opice/slepice/kopec/cupak failed (No
such file or directory)
set_bad_path_error: err = 2 bad_path = 0
error string = No such file or directory
error packet at smbd/trans2.c(2197) cmd=50 (SMBtrans2) eclass=1 ecode=2
size=35
smb_com=0x32
smb_rcls=1
smb_reh=0
smb_err=2
smb_flg=136
smb_flg2=65
smb_tid=1
smb_pid=1
smb_uid=100
smb_mid=5
smt_wct=0
smb_bcc=0
write_socket(26,39)
write_socket(26,39) wrote 39
got smb length of 111
got message type 0x0 of len 0x6f
Transaction 9 of length 115
size=111
smb_com=0x32
smb_rcls=0
smb_reh=0
smb_err=0
smb_flg=8
smb_flg2=3
smb_tid=1
smb_pid=1
smb_uid=100
smb_mid=6
smt_wct=15
smb_vwv[ 0]=   33 (0x21)
smb_vwv[ 1]=    7 (0x7)
smb_vwv[ 2]=   64 (0x40)
smb_vwv[ 3]=16480 (0x4060)
smb_vwv[ 4]=    0 (0x0)
smb_vwv[ 5]=    0 (0x0)
smb_vwv[ 6]=    0 (0x0)
smb_vwv[ 7]=    0 (0x0)
smb_vwv[ 8]=    0 (0x0)
smb_vwv[ 9]=   33 (0x21)
smb_vwv[10]=   68 (0x44)
smb_vwv[11]=    7 (0x7)
smb_vwv[12]=  104 (0x68)
smb_vwv[13]=    1 (0x1)
smb_vwv[14]=    6 (0x6)
smb_bcc=46
[000] 00 00 00 01 02 00 00 00  00 5C 6F 70 69 63 65 5C  ........ .\opice\
[010] 73 6C 65 70 69 63 65 5C  6B 6F 70 65 63 5C 63 75  slepice\ kopec\cu
[020] 70 61 6B 00 00 00 00 68  6C 75 70 61 6B 00        pak....h lupak.
switch message SMBtrans2 (pid 15789) conn 0x838ea28
change_to_user: Skipping user change - already user
unix_convert called on file "opice/slepice/kopec/cupak"
stat_cache_lookup: lookup failed for name [OPICE/SLEPICE/KOPEC/CUPAK]
stat_cache_lookup: lookup succeeded for name [OPICE/SLEPICE/KOPEC] ->
[opice/slepice/kopec]
dbvfs_stat(opice/slepice/kopec)
dbvfs_stat(opice/slepice/kopec/cupak)
unix_convert begin: name = opice/slepice/kopec/cupak, dirpath =
opice/slepice/kopec, start = cupak
is_mangled cupak ?
is_mangled_component cupak (len 5) ?
dbvfs_stat(opice/slepice/kopec/cupak)
is_mangled cupak ?
is_mangled_component cupak (len 5) ?
dbvfs_opendir(opice/slepice/kopec)
dbvfs_readdir() == (25,.)
dbvfs_readdir() == (7,..)
dbvfs_readdir() == (26,nanuk)
dbvfs_readdir() == (27,za_kacku)
dbvfs_readdir() == (28,makacku)
dbvfs_readdir() == (29,hlusec)
dbvfs_readdir() == (31,eskymo)
dbvfs_closedir()
is_mangled cupak ?
is_mangled_component cupak (len 5) ?
New file cupak
call_trans2setfilepathinfo(6) opice/slepice/kopec/cupak (fnum -1) info_level=513
totdata=7
dos_mode: opice/slepice/kopec/cupak
call_trans2setfilepathinfo: SMB_SET_FILE_UNIX_LINK doing symlink
opice/slepice/kopec/cupak -> hlupak
dbvfs_symlink: start errno=2
dbvfs_symlink(hlupak,opice/slepice/kopec/cupak)
db_symlink(25,cupak,41471,hlupak)
dbvfs_symlink: done errno=2
t2_rep: params_sent_thistime = 2, data_sent_thistime = 0, useable_space = 65475
t2_rep: params_to_send = 2, data_to_send = 0, paramsize = 2, datasize = 0
write_socket(26,62)
write_socket(26,62) wrote 62
setting sec ctx (0, 0) - sec_ctx_stack_ndx = 0
NT user token: (NULL)
UNIX token of user 0
Primary group is 0 and contains 0 supplementary groups
change_to_root_user: now uid=(0,0) gid=(0,0)
Comment 1 Jeremy Allison 2005-06-27 12:44:30 UTC
What happens when you gdb this ? As smbd is one monolithic process with your vfs
it should be easy to attach a breakpoint to your symlink call and follow
execution from its return.
Jeremy.
Comment 2 Jiri Klouda 2005-06-27 20:07:46 UTC
I have even tried and recompiled the code with debug messages and run the smbd
in debugger, but I didn't get anything out of it. The problem does not seem to
be some error, but rather a wait in a response which makes the client to timeout
and call it an Input/Output error, while the server actually does what is
supposed to do correctly, just somehow fails to inform the client in timely
manner. The problem of running in debugger is that the client just timeouts very
fast and its not the same. 

I think it actually gets all the way to the select on the socket file descriptor
in the process.c (the main loop) and then waits and when the client timeouts it
gets a next message from it and that makes it go again. It somehow looks like
the client does not get a response or gets a wrong type of response.
Comment 3 Jiri Klouda 2005-06-28 11:14:08 UTC
The error is more serious than I have previously thought. It affects CHMOD as
well. And thanks to this it affects a mkdir command. Since mkdir does a mkdir
call, then a chmod call. The only reason why mkdir does not fail is that after
timeout it does a stat call, which goes through fine and so it suceeds, but with
the timeout delay. (Which makes mkdir terribly slow.)
Comment 4 Jeremy Allison 2005-06-28 11:31:36 UTC
If this problem doesn't occur without your vfs, but does occur with it, then
it's most likely a bug in your vfs. Without real debugging info there's not much
I can do to help here.
Jeremy.
Comment 5 Jiri Klouda 2005-06-28 11:38:00 UTC
I have some more details. I have tried to mount a normal non-VFS share and try
the chmod. It looks like the problem is there too. It will timeout and give me
Input/Output error. The test case is very easy, just create a share for one of
your directories on linux, mount it with mount -t smbfs ... and then try to do
chmod through samba. It will timeout and fail.
Comment 6 Jiri Klouda 2005-06-28 11:41:15 UTC
The symlink problem is there too on non-VFS share. Its the same thing. I
seriously think its some problem in the call_trans2setfilepathinfo function that
handles both calls. Alternatively there could be a bug in the receiving end in
the smb client.
Comment 7 Jeremy Allison 2005-06-28 11:59:14 UTC
Can you send me the ethereal capture trace. It's almost certainly an smbfs
client bug - it's not very well maintained whereas that part of the Samba server
code is very well tested.
Can you get smbclient to generate the same commands ? If so you will easily be
able to confirm if it's a smbfs or smbd bug.

The other thing you can do is test with the new cifsfs linux client to see if
you can reproduce there.
Jeremy.
Comment 8 Jiri Klouda 2005-06-28 12:14:06 UTC
Ok, so there is one more information bit, I have tried to connect from my other
linux box with libsmbclient version 3.0.10 and everything went fine against the
same 3.0.14a server. Including the VFS shares and non-VFS shares.

What more, I have also tried the 3.0.14a client against an unmodified 3.0.10
server and it failed in the same way. (timeout, I/O error)

So its definitely a newly introduced bug between 3.0.10 and 3.0.14a in the
client handling of the reply from a TRANSACT2_SETFILEINFO /
TRANSACT2_SETPATHINFO call. 

If you look in the diffs for the libsmb/ between 3.0.10 and 3.0.14a you will
surely find some new change that can cause this.
Comment 9 Jeremy Allison 2005-06-28 12:28:41 UTC
As I have already asked - can you reproduce this with smbclient ? Not
libsmbclient. Can you send me an ethereal capture trace please. Or am I confused
and you're now telling me this is a bug in libsmbclient, not in the smbd server ?
If so I'll assign it to Darrel, who is the libsmbclient maintainer.
Jeremy.
Comment 10 Jiri Klouda 2005-06-28 12:52:21 UTC
(In reply to comment #9)
> As I have already asked - can you reproduce this with smbclient ? Not
> libsmbclient. Can you send me an ethereal capture trace please. Or am I confused
> and you're now telling me this is a bug in libsmbclient, not in the smbd server ?
> If so I'll assign it to Darrel, who is the libsmbclient maintainer.
> Jeremy.
> 

Yes, its very likely a bug in the client. When I tried to connect with the
client to a different server, there was still this problem. But when I try to
connect with 3.0.10-1-fc2 to this server, everything works. 

The interesting part is that if I try to downgrade debian packages to 3.0.10-1,
its still broken. I tried to downgrade both smbfs and libsmbclient and it didn't
help. So this is strange.

It looks like the client is at fault though.

Also, I cannot reproduce this with smbclient code, I can only reproduce it when
I mount the volume using mount -t smbfs.

Comment 11 Jiri Klouda 2005-06-28 12:57:01 UTC
BTW I do not ignore the request for the ethereal output, but I don't have it
installed yet and I don't know how to use it, so give me time to learn :)