In samba 3.0.28 a read of x bytes from a 0 byte file was passed to the file system driver (our custom driver) and this allowed us to do clever things, like returning data based on reading entity pid etc. In 3.2.3 it seems that the number of bytes to read is set to 0 after an fstat shows the file size to be 0 bytes. I can see how this 'optimisation' might be kinda nice, but I don't think it's saving a lot in this case and it's rather inconvenient for us to say the least. Plus, it means that a read via samba is not the same as a direct read, and it really should be, as much as possible.. I thought that was the point.
We thought to solve our trouble by making our 0 byte file a device node ala /dev/random. It seems that samba does not treat a device node, like /dev/random any differently to a 0 byte file. Consequently nothing can be read from a device file via samba. Surely this is a bug? We have solved our trouble by reporting a file size of 1 block, in our case 4096 bytes and simple returning a read of 0 bytes when there is nothing there for the requesting process.
Can we consider this fixed now? Volker
No, it's not 'fixed'. We've avoided the problem. I guess whether it's a bug is up to you, is it a bug that it is impossible to read from a unix device node/file via samba? FYI.. The code in Q that causes the problem is here: [source/smbd/reply.c] static void send_file_readX(connection_struct *conn, struct smb_request *req, files_struct *fsp, SMB_OFF_T startpos, size_t smb_maxcnt) { SMB_STRUCT_STAT sbuf; ssize_t nread = -1; if(SMB_VFS_FSTAT(fsp, &sbuf) == -1) { reply_unixerror(req, ERRDOS, ERRnoaccess); return; } if (startpos > sbuf.st_size) { smb_maxcnt = 0; } else if (smb_maxcnt > (sbuf.st_size - startpos)) { smb_maxcnt = (sbuf.st_size - startpos); <<-- this line } ... nread = read_file(fsp, smb_buf(req->outbuf), startpos, smb_maxcnt); sbuf.st_size is 0 for device files, as it was for our special file also. So, when it comes to the read_file call it has set smb_maxcnt to 0. In read_file we have: [source/smbd/fileio.c] ssize_t read_file(files_struct *fsp,char *data,SMB_OFF_T pos,size_t n) { ssize_t ret=0,readret; ... if (n > 0) { ... readret = SMB_VFS_PREAD(fsp,data,n,pos); So, the actual OS read function is only called if more than 0 bytes have been requested. So, if you attempt to read N bytes from a device file or any file which has a size of 0 bytes you don't actually perform any read at all. Meaning you cannot read from /dev/random or any other device file.
How would you like to see it fixed? Just remove the "if (n > 0)" ? Would you mind to provide a patch for review? Volker
I don't know how to provide a patch (I'm primarily a windows developer and haven't ever used patches), sorry. Removing the "if (n > 0)" is not enough. This would ensure a request was made to the OS to read /but/ the number of bytes it tried to read would be 0, as opposed to what was actually requested by the remote end. I think this code at the start of send_file_readX should be removed: SMB_STRUCT_STAT sbuf; if(SMB_VFS_FSTAT(fsp, &sbuf) == -1) { reply_unixerror(req, ERRDOS, ERRnoaccess); return; } if (startpos > sbuf.st_size) { smb_maxcnt = 0; } else if (smb_maxcnt > (sbuf.st_size - startpos)) { smb_maxcnt = (sbuf.st_size - startpos); } if (smb_maxcnt == 0) { goto normal_read; } The OS is already going to check all these things on read() and give appropriate errors, there is no need to pre-empt it.
Ok, this logic is done to make sure that sendfile() has enough data inside the file to work with. sendfile() needs to do a zero-copy from the kernel buffer cache to the network buffers, and we'll potentially corrupt the file copy on the client if we don't do the check in userspace. Remember, we have to prepare the return SMB header for the client in userspace, before we make the kernel call, so we can't just let the kernel do this. There is a race condition that if another process truncated the file after we've prepared the header, then we could end up returning zero-filled data after the sendfile() returns short, but oplocks make this less likely, plus this is also what Windows does in these situations (we have done tests for this :-). What seems to me to be a better fix is to always use the "slow" (non-sendfile) path without modifying the read size if the file size == 0, or if the file is not a regular file. Would that work for you ? Jeremy.
Yes, I believe that solution will work for situations like ours and sounds generally 'correct' to me. After all, you can't use the sendfile path for /dev/random and other non-regular files (you cannot know how much you will read before you read it) and using the "slow" path for normal 0 length files shouldn't be all that slow. I wasn't sure what the sendfile code path was even for :) In our case it seemed always to be taking the "slow" path and avoiding the sendfile code path. Thanks for the explaination :)
Created attachment 4097 [details] Patch for 3.2 and 3.3 Here's what I'm proposing for 3.2.x and 3.3.x. Jeremy.
Created attachment 4098 [details] Patch for 3.4.x and master. Equivalent patch for 3.4.x and master. Please let me know if this fixes your issue. Thanks, Jeremy.
Created attachment 4101 [details] different patch Alternative patch proposal. Jeremy I don't think your patch catches the case for normal files of size 0. If I got the bug reporter right, then /dev/random was just an example. Volker
Ok, I'm missing what I missed here :-). My patch just ensured we did exactly the smb_maxcnt manipulation we're currently doing but only in the case of S_IFREG (a normal file). Doesn't that include the case with normal files of size zero ? For non-normal files I go to the slow read path which always makes the syscall. Volker, let's discuss on IRC or on the phone on Monday. Cheers, Jeremy.
Ah, now I see the difference :-). The only thing I think you missed is that /dev/random is a device character special file, not a normal file so S_ISREG on /dev/random returns false. Your patch causes the read syscall to be send for normal files of size zero, which is what we used to do. Would eliminate race conditions after we've done the size check.... is this what we want to go back to ? Hmmmm. Probably :-). Let's discuss. Jeremy.
In our original situation the file was a 'normal' file of size zero. (We have a custom kernel driver catching disk access read/write to this file and in specific cases it supplies data to a read request) What we tried when we hit this problem was first making the file a device file (same results) and then after I looked at the samba code decided to advertise a file size (1 block, 4096 bytes) but return short reads in cases where there was less/no data. This last solution works, and we'll likely keep using it to avoid any future problems. As far as I can see Volker's is the patch which would work for our orignal case. I generally like the idea of passing all 'short' reads to the system, it will eliminate the race condition between the file size check and the read and that's probably a good thing. But, am I understanding the patch correctly, will it send the last 'short' read of /every/ file (see "(startpos > sbuf.st_size)") via the non 'sendfile' path? Is that desirable?
Not sure it is desirable, but it does not hurt either. If we expect a short read, the potential benefit from sendfile is small enough to not matter IMO. And IMO the code is simpler this way. Less special cases :-) Volker
Yes, +1 on Volker's patch. Volker, do you want to put this into 3.4, or shall I ? But a "normal" file of size zero actually being a special device file is *evil*. There's a special place in hell reserved for kernel hacks that do that :-). Jeremy.
Yeah, using a 'normal' file was *evil*. Using a device file is actually the "correct" solution, provided samba plays along :)
Pushed this patch to master and 3.4. Not sure if you want this for 3.3 or not. Jeremy.
We hit the problem originally in 3.2.3. I know 3.0.28 didn't have the issue. So it was introduced somewhere between the two but I am not sure when. As we're providing our software and driver for customers and cannot be certain which version of samba they will be running we cannot rely on this fix/patch being present, even if you patched every version back to when the fstat check was added. So, we're going to have to pretend our special file has a fixed size in the meantime and from that point of view we don't mind which versions you patch :) It would be nice if there was a patch for all the versions with the problem, especially if we later discover a problem with advertising a file size, but at this stage we're happy.
Closing as fixed, will be in 3.4