Bug 9049 - access() response wrong for nocase mount / broken inode lookup cache.
access() response wrong for nocase mount / broken inode lookup cache.
Status: ASSIGNED
Product: CifsVFS
Classification: Unclassified
Component: kernel fs
2.6
x86 Linux
: P5 major
: ---
Assigned To: Jeff Layton
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2012-07-18 11:18 UTC by Jan-Marek Glogowski
Modified: 2013-08-21 14:10 UTC (History)
4 users (show)

See Also:


Attachments
Minimal Python bug script. (536 bytes, text/plain)
2012-07-18 11:18 UTC, Jan-Marek Glogowski
no flags Details
Disable the inode lookup cache for caseless mounts. (1.53 KB, patch)
2012-07-18 12:15 UTC, Jan-Marek Glogowski
no flags Details
Sorce of a minimal UTF-8 case fold tool (case up, down, fold). (57.69 KB, application/x-xz)
2013-07-22 21:59 UTC, Jan-Marek Glogowski
no flags Details
Folder renaming test script. (654 bytes, text/x-python)
2013-08-14 17:06 UTC, Jan-Marek Glogowski
no flags Details
win_ucase_convert.pl -- script to convert MS table to set of arrays in C (1.49 KB, patch)
2013-08-20 13:31 UTC, Jeff Layton
no flags Details
Fixed python UTF-8 uppercase / lowercase handling. (590 bytes, text/x-python)
2013-08-20 17:49 UTC, Jan-Marek Glogowski
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jan-Marek Glogowski 2012-07-18 11:18:17 UTC
Created attachment 7708 [details]
Minimal Python bug script.

I'm on Ubuntu LTS 10.04 with linux-lts-backport-oneiric (3.0.0-16.29~lucid1), which is a Ubuntu patched 3.0.19.

I've also tried the latest v3.5-rc7 kernel[1] on Ubuntu LTS 12.04.

My original problem is a KDE folder renaming problem on a case-insensitive CIFS share from a NetApp filer. The user can eventually delete the whole folder when renaming the folder by just changing the case of the folder name.

The tests were also run against a Samba (Debian 2:3.6.5-1~bpo60+1).

The problem vanishes, if the inode lookup cache is disabled:

echo "0" > /proc/fs/cifs/LookupCacheEnabled

When running the appended Python script against a nocase mounted share, the access system call returns 0 for the removed directory.

strace -e trace=access,mkdir,rmdir,lstat64 ./cifs-nocase-inode-cache-bug.py /mnt/share11

mkdir("/mnt/share11/HELP", 0777)        = 0
lstat64("/mnt/share11/help", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0
access("/mnt/share11/HELP", F_OK)       = 0
rmdir("/mnt/share11/HELP")              = 0
access("/mnt/share11/HELP", F_OK)       = -1 ENOENT (No such file or directory)
access("/mnt/share11/help", F_OK)       = 0

I've had a look at inode.c's rmdir, which just invalides the current inode and doesn't check for the "other-case" inodes.

And I've noticed that lstat returns different inodes for the "same" directory in nocase mode with a different "case" names.



[1] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.5-rc7-quantal
Comment 1 Jan-Marek Glogowski 2012-07-18 12:15:58 UTC
Created attachment 7709 [details]
Disable the inode lookup cache for caseless mounts.

This patch is a minimal workaround. Workaround, because functions like cifs_rmdir should invalidate all matching inodes in the nocase mode. Minimal, because the README seems to be wrong, stating "nocase ... case sensitive is the default if the server supports it). I couldn't find any code assigning nocase, except for the mount option code.
Comment 2 Jan-Marek Glogowski 2012-07-18 13:23:05 UTC
After some more searching I couldn't find a way for a server to indicate its case sensitivity reliably. I found the MSDN "2.2.3.1 The SMB Header" section, which says for the SMB_FLAGS_CASE_INSENSITIVE flag:

Obsolete; if this bit is set then all pathnames in the SMB SHOULD be treated as case-insensitive. This bit is ignored by Windows systems, which always handle pathnames as case-insensitive.

And actually a Win2k8 NEGOTIATION response has this bit set to 0. OTOH my NetApp filer always responds with this bit set, Samba without.
Comment 3 Jan-Marek Glogowski 2012-07-18 13:30:05 UTC
Just to make it clear: the share isn't mounted with the nocase option.
Comment 4 Steve French 2013-07-16 20:50:00 UTC
Is the obvious workaround - adding "actime=0" on mount not sufficient?  The app should obviously be doing a rename not the manual delete and then mkdir, but this is hard to workaround in the client because I don't see any clues that we can use to invalidate this dentry for the narrow case.

What happens on FAT32 in this case?
Comment 5 Steve French 2013-07-16 21:07:45 UTC
(In reply to comment #4)
> Is the obvious workaround - adding "actime=0" on mount not sufficient?  The app
> should obviously be doing a rename not the manual delete and then mkdir, but
> this is hard to workaround in the client because I don't see any clues that we
> can use to invalidate this dentry for the narrow case.
> 
> What happens on FAT32 in this case?

I missed the comment about above disabling the LookupCache (globally, rather than on per-mount).  So obviously you are aware of this.
Comment 6 Steve French 2013-07-16 22:07:04 UTC
In looking at this problem - looks like Samba server has a bug.

I did a mkdir of /mnt/lower

and then did a mkdir of /mnt/LOWER which failed

QueryPathInfo with the case sensitive flag in the header on returns success on LOWER even though lower doesn't exits (this is with unix extensions configured).  I tried this to fairly recent 4.1 and to 3.6 - both look broken - seems weird.
Comment 7 Steve French 2013-07-16 22:07:33 UTC
The Samba server bug is not helpful for the NetApp case - but it is strange.
Comment 8 Björn Jacke 2013-07-18 13:48:12 UTC
(In reply to comment #6)
> In looking at this problem - looks like Samba server has a bug.
> 
> I did a mkdir of /mnt/lower
> 
> and then did a mkdir of /mnt/LOWER which failed
> 
> QueryPathInfo with the case sensitive flag in the header on returns success on
> LOWER even though lower doesn't exits (this is with unix extensions
> configured).  I tried this to fairly recent 4.1 and to 3.6 - both look broken -
> seems weird.

smbclient again a samba server with unix extensions enabled, then enter "case_sensitive" then "posix_mkdir zoo 755" then "posix_mkdir ZOO 755"

... creates "zoo" and "ZOO" as expected. What is the difference to what you did?
Comment 9 Jeff Layton 2013-07-22 15:01:34 UTC
Disabling the lookupcache on a caseless mount seems like you're just papering over the real bug. The real question is this:

"Why are we seeing multiple dentries for names that differ only by case on case-insensitive mounts?"

I suspect that cifs.ko is not d_add'ing the dentries correctly in this situation, but I've not looked at the problem in detail and can't suggest any fix at this point.
Comment 10 Jeff Layton 2013-07-22 15:54:47 UTC
Ok, mounted up a share with -o nocase and turned up cifsFYI. I then did this:

    $ mkdir /mnt/cifs/zoo
    $ mkdir /mnt/cifs/ZOO
    mkdir: cannot create directory ‘/mnt/cifs/ZOO’: File exists

...while running with cifsFYI.

[  701.613970] fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 32 with uid: 0
[  701.613976] fs/cifs/dir.c: parent inode = 0xffff8800598580f8 name is: zoo and dentry = 0xffff8800598d1e10

[  703.330669] fs/cifs/dir.c: CIFS VFS: in cifs_lookup as Xid: 34 with uid: 0
[  703.330681] fs/cifs/dir.c: parent inode = 0xffff8800598580f8 name is: ZOO and dentry = 0xffff8800598d1280

...so it looks like the VFS is handing cifs two different dentries, when it should be handing the same one down, AFAICT.
Comment 11 Jeff Layton 2013-07-22 17:29:50 UTC
Oof, ok -- I think I see the problem now but it's not trivial to fix correctly in all cases:

static struct nls_table table = {
        .charset        = "utf8",
        .uni2char       = uni2char,
        .char2uni       = char2uni,
        .charset2lower  = identity,     /* no conversion */
        .charset2upper  = identity,
        .owner          = THIS_MODULE,
};

...when you're using utf8 as a character set (and pretty much everyone is these days), it has no concept of upper and lower case characters. I think there is some support in the spec for paired characters, but building such a table is going to be quite an undertaking.

What we probably could do in the near term is simply hack in support for charset2upper/lower when the character in question is in the ASCII range. It'd be ugly and limited, but it would work for ASCII-only names at least.
Comment 12 Jan-Marek Glogowski 2013-07-22 21:57:08 UTC
glib has a well tested UTF-8 implementation. Most of the tables are generated. As a test I ripped enough code from the source to get a minimal use case, which includes the functions g_utf8_strup, g_utf8_strdown and g_utf8_casefold.

The resulting binary just has minimal library dependencies:

__printf_chk
g_free
g_malloc
g_return_if_fail_warning
g_string_append
g_string_append_unichar
g_string_free
g_string_new
memcpy
setlocale
strlen

Most of the binary size (170K) is in rodata (156K) section for the unicode tables.

OTOH it would probably be better to handle the conversation by a userspace tool, just like the DNS resolver (not sure about the speed impact). And I'm not sure the kernel knows of the userspace locale...
Comment 13 Jan-Marek Glogowski 2013-07-22 21:59:32 UTC
Created attachment 9068 [details]
Sorce of a minimal UTF-8 case fold tool (case up, down, fold).
Comment 14 Jeff Layton 2013-07-23 00:59:53 UTC
(In reply to comment #12)

> 
> OTOH it would probably be better to handle the conversation by a userspace
> tool, just like the DNS resolver (not sure about the speed impact). And I'm not
> sure the kernel knows of the userspace locale...
>

You realize that's mean we'd have to upcall on *every* lookup, even ones that come out of the dentry cache? That would absolutely trash any sort of metadata performance. I don't think we want to do that.

The right answer is probably to fix the NLS subsystem such that utf8 can handle case conversion, at least for ASCII. Later we can consider expanding that into other multibyte characters.

A non-trivial bit of work, but I'll see if I can hack together a patchset in the near future.
Comment 15 Jan-Marek Glogowski 2013-07-23 11:09:42 UTC
> https://bugzilla.samba.org/show_bug.cgi?id=9049
> 
> --- Comment #14 from Jeff Layton <jlayton@samba.org> 2013-07-23 00:59:53 UTC ---
> (In reply to comment #12)
> 
>>
>> OTOH it would probably be better to handle the conversation by a userspace
>> tool, just like the DNS resolver (not sure about the speed impact). And I'm not
>> sure the kernel knows of the userspace locale...
>>
> 
> You realize that's mean we'd have to upcall on *every* lookup, even ones that
> come out of the dentry cache? That would absolutely trash any sort of metadata
> performance. I don't think we want to do that.

That was my impression too. I just wanted to get feedback on the idea, because I don't have much kernel knowledge.

> The right answer is probably to fix the NLS subsystem such that utf8 can handle
> case conversion, at least for ASCII. Later we can consider expanding that into
> other multibyte characters.
> 
> A non-trivial bit of work, but I'll see if I can hack together a patchset in
> the near future.

As the current NLS interface just supports changing the case byte by byte with a lookup table it's not sufficient for any unicode case change. I can probably strip down the glib code to compile into a kernel module until Friday, but I won't have time in the next two weeks to finish the work and propose a change to the NLS interface.

At least there aren't many NLS users. grep "nls_\(strnicmp\|toupper\|tolower\)" just showed 10 hits outside of the NLS dir and the include header.
Comment 16 Jeff Layton 2013-07-23 13:49:34 UTC
(In reply to comment #15)
> 
> That was my impression too. I just wanted to get feedback on the idea, because
> I don't have much kernel knowledge.
> 

It's actually even worse. Since we call d_hash on each dentry, then you'd have to upcall on each path component.

> 
> As the current NLS interface just supports changing the case byte by byte with
> a lookup table it's not sufficient for any unicode case change. I can probably
> strip down the glib code to compile into a kernel module until Friday, but I
> won't have time in the next two weeks to finish the work and propose a change
> to the NLS interface.
> 
> At least there aren't many NLS users. grep "nls_\(strnicmp\|toupper\|tolower\)"
> just showed 10 hits outside of the NLS dir and the include header.

Yep. I think the right approach is to add a few new operations vectors to struct nls_table. Something like this maybe:

    int (*char2lower)(const unsigned char *out, const unsigned char *in);
    int (*char2upper)(const unsigned char *out, const unsigned char *in);

...that would convert "in" to "out" in lower and uppercase, and return the length in bytes of the converted character. This of course assumes that
the converted characters are always the same length as the original. If that's not the case then we'd need to modify the API some.

Then, we'd hook up nls_toupper and nls_tolower to call this operation if it exists. If it doesn't then fall back to the old table conversion routine. We'd probably also need to fix up some other callers that reach inside the nls_table directly too.

We'd probably also need something like a strnicmp function too and hook nls_strnicmp to that as well, since you really don't want to be doing byte-by-byte string comparison in utf8 and trying to convert the case.

The utf8 converters could just do an "isascii()" check and then use the tolower and toupper converters if that comes back true. If not, it'll just copy the character as-is. That also allows us to get rid of the silly "identity" table in that code.

We might also be able to get clever with the 8559-1 chars that are > 0x80. Do a mathematical conversion into that charset for the appropriate range of characters, and then call into the nls_iso8859-1.ko to do the actual conversion...or maybe just copy the appropriate tables into utf8 handling code.

Eventually, we could expand the case conversion into other charsets once we determine the proper mappings between them and their utf8 equivalents.
Comment 17 Jeff Layton 2013-07-23 14:12:06 UTC
Of course, the catch here is that this makes assumptions about what sort of case-equivalence the server has.

It's probably reasonable to assume that most servers will handle ASCII letters correctly. Is that true for stuff in the Latin-1 block? What about Greek?

Getting all of those special cases right will probably take time and an iterative approach, so it seems reasonable to me to start with the ones we know we can easily handle and build on that over time.
Comment 18 Jan-Marek Glogowski 2013-07-23 16:54:00 UTC
(In reply to comment #16)
> (In reply to comment #15)
> > 
> > That was my impression too. I just wanted to get feedback on the idea, because
> > I don't have much kernel knowledge.
> > 
> 
> It's actually even worse. Since we call d_hash on each dentry, then you'd have
> to upcall on each path component.
> 
> > 
> > As the current NLS interface just supports changing the case byte by byte with
> > a lookup table it's not sufficient for any unicode case change. I can probably
> > strip down the glib code to compile into a kernel module until Friday, but I
> > won't have time in the next two weeks to finish the work and propose a change
> > to the NLS interface.
> > 
> > At least there aren't many NLS users. grep "nls_\(strnicmp\|toupper\|tolower\)"
> > just showed 10 hits outside of the NLS dir and the include header.
> 
> Yep. I think the right approach is to add a few new operations vectors to
> struct nls_table. Something like this maybe:
> 
>     int (*char2lower)(const unsigned char *out, const unsigned char *in);
>     int (*char2upper)(const unsigned char *out, const unsigned char *in);
> 
> ...that would convert "in" to "out" in lower and uppercase, and return the
> length in bytes of the converted character. This of course assumes that
> the converted characters are always the same length as the original. If that's
> not the case then we'd need to modify the API some.

From the glib source:

 * g_utf8_strup:
 * 
 * Converts all Unicode characters in the string that have a case
 * to uppercase. The exact manner that this is done depends
 * on the current locale, and may result in the number of
 * characters in the string increasing. (For instance, the
 * German ess-zet will be changed to SS.)

> Then, we'd hook up nls_toupper and nls_tolower to call this operation if it
> exists. If it doesn't then fall back to the old table conversion routine. We'd
> probably also need to fix up some other callers that reach inside the nls_table
> directly too.
> 
> We'd probably also need something like a strnicmp function too and hook
> nls_strnicmp to that as well, since you really don't want to be doing
> byte-by-byte string comparison in utf8 and trying to convert the case.
> 
> The utf8 converters could just do an "isascii()" check and then use the tolower
> and toupper converters if that comes back true. If not, it'll just copy the
> character as-is. That also allows us to get rid of the silly "identity" table
> in that code.
> 
> We might also be able to get clever with the 8559-1 chars that are > 0x80. Do a
> mathematical conversion into that charset for the appropriate range of
> characters, and then call into the nls_iso8859-1.ko to do the actual
> conversion...or maybe just copy the appropriate tables into utf8 handling code.
> 
> Eventually, we could expand the case conversion into other charsets once we
> determine the proper mappings between them and their utf8 equivalents.

The upper and lower code in glib seem much more complicated then the casefold code and it depend on the locale. The unicode upper / lower code from libicu also depends on the locale...

But AFAIK we just want to ignore the case and the casefold function seems simpler, doesn't depend on the locale, but still needs those huge tables.

FWIW I already have an nls_utf8.ko module with all the glib code included. It builds without glib - same for the test app, but the glib config header is from an i386 install. Also much more stuff can be cleaned up. To get UTF-8 case changes right seems quite complicated.

Source at: https://github.com/lhm-limux/linux-nls-utf8.git
Comment 19 Jan-Marek Glogowski 2013-07-23 17:53:42 UTC
(In reply to comment #17)
> Of course, the catch here is that this makes assumptions about what sort of
> case-equivalence the server has.
>
> It's probably reasonable to assume that most servers will handle ASCII letters
> correctly. Is that true for stuff in the Latin-1 block? What about Greek?
> 
> Getting all of those special cases right will probably take time and an
> iterative approach, so it seems reasonable to me to start with the ones we know
> we can easily handle and build on that over time.

But all of this doesn't depend on the correct UTF-8 NLS casefold code, because this doesn't distinguish character sets.

But maybe the cifs code would reparse the output of the casefold function to "zero" characters with known conversation problems and pass these strings to the the dentry cache.

But currently I still have the problem, that users are renaming a folder in KDE on a CIFS share and end up without a folder and all included items, if the LookupCache is enabled. OTOH for larger folders it gets very slow...

Shouldn't it be enought to change the iocharset to something like "iso-8859-1"  to "temporarily" fix the problem? Not sure if this helps in the rename case.
Comment 20 Jeff Layton 2013-07-23 18:04:10 UTC
> Shouldn't it be enought to change the iocharset to something like "iso-8859-1" 
> to "temporarily" fix the problem? Not sure if this helps in the rename case.

Yeah, that might work around it. It's worth testing anyway simply as a sanity check to ensure that we understand the problem correctly. Let me know if it helps.
Comment 21 Jan-Marek Glogowski 2013-07-24 16:38:06 UTC
(In reply to comment #20)
> > Shouldn't it be enought to change the iocharset to something like "iso-8859-1" 
> > to "temporarily" fix the problem? Not sure if this helps in the rename case.
> 
> Yeah, that might work around it. It's worth testing anyway simply as a sanity
> check to ensure that we understand the problem correctly. Let me know if it
> helps.

I probably checked it already a year ago ... changing the iocharset for the mount didn't change anything in the result - neither for Samba nor for NetApp.

My userspace locale is still de_DE.UTF-8 and I don't know what iocharset really changes in this case.
Comment 22 Jeff Layton 2013-07-26 15:25:18 UTC
Yes, it looks like things are still not right even with iocharset=iso8859-1. I think the way that new dentries are being added into the dcache in cifs_lookup is not correct for nocase mounts, but I haven't studied the problem in detail and can't suggest a fix just yet.

So, I think the utf8 conversion fixes are a necessary part of this fix but aren't fully sufficient. Sadly, all of this nocase code seems to be quite broken, and in need of a fair bit of work...
Comment 23 Jeff Layton 2013-07-27 11:04:50 UTC
It looks to me like using iocharset=iso8859-1 is a valid workaround in most cases.

One problem is that the root of the mount gets a different sent of dentry ops than the rest of the mount. Al Viro seems to have done that consciously several years ago, so I need to chat with him to figure out why. If you go into a subdirectory of the mount however, then there doesn't seem to be any dentry aliasing doing on.

With negative dentries, I see them getting tossed out of the cache pretty quickly, which may be wrong but shouldn't really be a huge problem.
Comment 24 Jeff Layton 2013-07-30 15:38:40 UTC
Ok, patch sent to fix the d_ops on root dentry:

    http://article.gmane.org/gmane.linux.kernel.cifs/8705

...this is necessary to fix the problem, but not sufficient itself. With that patch though, I suspect that mounting with iocharset=iso8859-1 will work around the rest of the problems. Jan-Marek, can you confirm whether that's the case?

If it does, then that still leaves us needing to fix case-sensitivity in nls_utf8.ko somehow, but one problem at a time...
Comment 25 Jan-Marek Glogowski 2013-08-14 17:06:02 UTC
Created attachment 9134 [details]
Folder renaming test script.
Comment 26 Jan-Marek Glogowski 2013-08-14 18:35:31 UTC
Thanks for the patch. The workaround fixes the access problem, but I still have a my original folder rename problem, and I'm not sure if this is fixable.

For Windows 7 and my NetApp filer I'm unable to rename at all, with the nocase and iocharset=iso8859-1 set:

mkdir("/mnt/win7/RENAME", 0777)         = 0
RENAME
--- SIGCHLD (Child exited) @ 0 (0) ---
rename("/mnt/win7/RENAME", "/mnt/win7/rename") = 0
RENAME
--- SIGCHLD (Child exited) @ 0 (0) ---

I guess this is expected, as man 2 rename states:

 If oldpath and newpath are existing hard links referring to the same file, then rename() does nothing, and returns a success status.

Probably what I need now is a fix in konqueror, but I'm not sure.

The actual use case: rename a folder, but just change the case: "r => R"

In "case" mode:
 * lstat folder r, dentry 1
 * access R, dentry 2
 * Whoo R exists, user: want to overwrite r with R?
  * Yes - delete R, which deletes r - renaming r => R fails - data gone - Mpf
  * No - data is save, but user want's to rename the folder.

For the user the actual message doesn't make any sense.

So I thought - go nocase...

In "nocase" mode with iocharset != utf8:
 * folder r, dentry 1
 * access R, dentry 1
 * Now I thought I could add a check to konqueror like:
   if dentry r == dentry R and R != r => do rename.

But as you can see in the test program, the rename fails.

I'm not sure, if there is a solution for a userspace program, which has no knowledge of "nocase"?
Comment 27 Björn Jacke 2013-08-14 18:58:26 UTC
you should be aware that you get other problems with using iocharset=iso8859-1. When you use non-ASCII characters in a UTF-8 environment then the files will be written like the well known "broken UTF-8" letters to the CIFS server. Windows clients will see it this way then. Also case insensitivity will only work right with ASCII.
Comment 28 Jeff Layton 2013-08-15 15:30:42 UTC
Right. The problem of course is that all of this "nocase" handling is predicated on the assumption that the client knows how the server is going to convert the string, and it's very easy for us to get that wrong.

...as far as the rename no-op goes. We're sort of stuck there. If you enable "nocase" then you're saying that lookups where the case matches should resolve to the same inode. At that point, the VFS shortcuts the operation before it ever gets down to cifs.ko.

The only way I can see ever to "fix" that would be to add a rename operation that operates without ever resolving the last component. Sort of like how atomic_open works in the kernel, but for rename. That's a major effort however and if you start down that path, then you really need "by-name" version of most of the ops vectors.

Your best bet in the near term is probably a workaround that renames twice. Once to an interim name where the dentry doesn't match.

We do still need to add some sort of case-conversion to utf8 as well. I wonder how hard it would be to get the UTF16 case-conversion tables that MS servers use internally? Then we could just do a utf8->utf16 conversion, convert the case using those, and then convert it back to utf8. Fun!
Comment 29 Björn Jacke 2013-08-15 16:02:44 UTC
> We do still need to add some sort of case-conversion to utf8 as well. I wonder
> how hard it would be to get the UTF16 case-conversion tables that MS servers
> use internally?

voila...

3.1.5.3 Mapping UTF-16 Strings to Upper Case:

http://msdn.microsoft.com/en-us/library/hh877830.aspx

http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=10921
Comment 30 Jeff Layton 2013-08-15 17:50:16 UTC
Nice! That doesn't look too tough to process the "Windows 8 Upper Case Mapping Table.txt" file into something that we could use in utf8. I do wonder.. would we run afoul of any weirdo licensing or anything if we do so?
Comment 31 Björn Jacke 2013-08-15 18:56:57 UTC
On 2013-08-15 at 17:50 +0000 samba-bugs@samba.org sent off:
> https://bugzilla.samba.org/show_bug.cgi?id=9049
> 
> --- Comment #30 from Jeff Layton <jlayton@samba.org> 2013-08-15 17:50:16 UTC ---
> Nice! That doesn't look too tough to process the "Windows 8 Upper Case Mapping
> Table.txt" file into something that we could use in utf8. I do wonder.. would
> we run afoul of any weirdo licensing or anything if we do so?

as this is just a subset of the data from the unicode consortium
ftp://ftp.unicode.org/Public/UNIDATA/CaseFolding.txt I would say it's not a
problem, that table also links to the Unicode copyleft, which looks okay. IBM
also offers quite the same mapping table here:
http://publib.boulder.ibm.com/infocenter/iseries/v5r4/index.jsp?topic=%2Fnls%2Frbagslowtoupmaptable.htm
Comment 32 Jan-Marek Glogowski 2013-08-16 15:07:31 UTC
For the iocharset != utf8 case: from my point of view wrong names are still better then a deleted folder / lost data on rename.

And with the LookupCacheEnabled disabled cifs becomes awfully slow.

About the UTF-16 casefolding: the MS casefolding pseudo code looks much simpler then the code in libicu, as it just uses the table. Not sure if it's generally correct.

But before I'll do some KDE4 / dolphin testing next week and review it's rename logic and see how it handles CIFS with and without the nocase option.
Comment 33 Jeff Layton 2013-08-16 15:24:59 UTC
So the tricky part is how to make that table lookup efficient. Walking it linearly would be horribly slow. Since this will need to be done a lot, you really need to come up with a way that doesn't involve doing that.

Also, we don't want to eat up tons of memory for a table that is really sparsely populated, so doing this as a set of giant arrays is probably a bad idea. Maybe there's some way to turn this table into a multi-level lookup. Have a top level table that looks at the first byte or something, and then that points to another table where you can lookup the other byte.

Also, there's stuff like this little gem in there:

    1FD7; F; 03B9 0308 0342; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND PERISPOMENI

...so here we have a 2-byte character that can become 6 bytes. So it can't be a simple unsigned short - unsigned short conversion table either.
Comment 34 Björn Jacke 2013-08-16 21:52:50 UTC
(In reply to comment #33)
> Also, there's stuff like this little gem in there:
> 
>     1FD7; F; 03B9 0308 0342; # GREEK SMALL LETTER IOTA WITH DIALYTIKA AND
> PERISPOMENI
> 
> ...so here we have a 2-byte character that can become 6 bytes. So it can't be a
> simple unsigned short - unsigned short conversion table either.

afaics this is what also Microsoft does *not* include in the case conversion table. as far as i can see they have same byte lengths for lower and upper case versions only. Also they only include mappings that are reversible, for example the "ß"->"SS" mapping is not included because this is irreversibel. We really need to stick to the subset of mapping that they use.

For the performance issue of this: maybe we should just only put UPPSERCASE versions of the filenames into the lookup table. The performance penalty for this should not be higher than the conversions that are being done by the NLS conversion anyhow. Just an idea, I don't really have much insight in how the kernel is working here.
Comment 35 Björn Jacke 2013-08-16 21:55:40 UTC
(In reply to comment #34)
> For the performance issue of this: maybe we should just only put UPPSERCASE
> versions of the filenames into the lookup table

the dentry cache and the lookups we do here I actually mean. And talking about *all* filesystems that are case-insensitive here actually.
Comment 36 Jeff Layton 2013-08-20 10:54:01 UTC
Ok, good point. That simplifies things some...

We have two dcache functions that we need to concern ourselves with here. d_compare is what compares two dentries for equivalence. This is where we need to convert the names to uppercase and then compare them.

There's also d_hash, which computes a hash value that we use to determine which hash bucket a dentry sits on. For this we'll need to again convert the name to some consistent uppercase value and then hash it.

So here's what I think we need to do:

Stop using nls_strnicmp and nls_toupper/nls_tolower. What we should do instead is simply convert each character to utf16 and then convert that to uppercase using the table in "Windows 8 Upper Case Mapping Table.txt".

Looking at that table, I think we want to basically turn it into a 2-level lookup. We'll have one table that is a set of 256 pointers (one for each possible first byte). If there are no uppercase versions of characters with that first byte, we'll just keep a NULL pointer there.

Then we can have a set of second level tables with 256 uint16_t's. One for each possible second byte. We'll just set those to "0x0000" or something if there is no mapping.

Here's a small perl script that I used to gauge what this looks like:

#!/usr/bin/perl -w

while(<>) {
	next if (!/^0x(..)/);
	$counts{$1}++;
}

foreach(keys(%counts)) {
	print $_ . "\t" . $counts{$_} ."\n";
}

...and here's the result:

00	57
01	109
02	58
03	55
04	124
05	56
1d	2
1e	123
1f	96
21	18
24	26
2c	105
2d	38
a6	34
a7	46
ff	26

...the first column is the first byte of the lowercase character. The second column is a count of how many characters in the list have that first byte. What I'll likely do is flesh out this script and use it to generate the tables for us programmatically.

So, we'll only need 16 second-level tables, and a single first-level one. That should give us relatively quick lookup times for this without too much memory consumption:

Quick back of envelope calculation:

    256 * 8 + (16 * (256 * 2)) = 10240 bytes

So, this will cost us about 10k of kernel memory on a 64-bit arch. Not too bad...
Comment 37 Jeff Layton 2013-08-20 13:31:48 UTC
Created attachment 9151 [details]
win_ucase_convert.pl -- script to convert MS table to set of arrays in C

First pass at a script to do the conversion. The result is not yet compile tested. I'll still need to roll up the code to do the actual work too.
Comment 38 Jeff Layton 2013-08-20 15:34:13 UTC
Ok, I have a potential patchset that will fix this using the above scheme. See it here:

    http://git.samba.org/?p=jlayton/linux.git;a=shortlog;h=refs/heads/cifs-3.12

...it's still pretty rough, and I need to flesh out the commit log messages, but it seems to work correctly AFAICT.

If any of you can take that patchset for a spin and let me know if it fixes the issue, then that would be helpful.

It's possible that this would be useful to other filesystems as well. We might want to move this code into the generic NLS code instead, maybe even inside its own module. I think it'd be best to make this part of the cifs code for now until we determine that that is the case.
Comment 39 Jeff Layton 2013-08-20 15:50:44 UTC
Oh and to be clear, the new case conversion code should still work even when the client is using UTF8 for the NLS.
Comment 40 Jan-Marek Glogowski 2013-08-20 17:19:06 UTC
I tested cifs.ko from Ubuntu (Kernel 3.8.0-27-generic) with the following patchset:

01_correct-s_d_op-initialization.patch
02_show-iocharset-in-mounts.patch
03_cifs-add-new-case-insensitive-conversion-routines.patch
04_cifs-convert-case-insensitive-dentry-ops-to-use-new-.patch
05_cifs-netapp-legacy-fallback-fix.patch

which works for me as expected on Samba, Windows 7 and NetApp, including an ü in the test directory name using default iocharset and nocase option and enabled LookupCache.
Comment 41 Jeff Layton 2013-08-20 17:29:04 UTC
Excellent! Thanks for helping test it.

I'll add "Tested-by: " tag to the patches for you, unless you have objections...
Comment 42 Jan-Marek Glogowski 2013-08-20 17:49:36 UTC
Created attachment 9152 [details]
Fixed python UTF-8 uppercase / lowercase handling.

Fixed test program to handle uppercase / lowercase of UTF-8 correctly.
Comment 43 Jan-Marek Glogowski 2013-08-20 17:53:03 UTC
Hmm - my original testcase didn't work as expected, as Python just seems to upper and lowercase ASCII chars:

access("/mnt/netapp/TEST\303\274", F_OK) = -1 ENOENT (No such file or directory)
access("/mnt/netapp/test\303\274", F_OK) = -1 ENOENT (No such file or directory)

I attached a fixed version of the test program, which produces:

access("/mnt/samba/TEST\303\234", F_OK) = -1 ENOENT (No such file or directory)
access("/mnt/samba/test\303\274", F_OK) = -1 ENOENT (No such file or directory)

No changes to the previously reported test results - all seems to work.

Feel free to add the "Tested-by:" tags.
Comment 44 Jeff Layton 2013-08-20 19:52:25 UTC
Patchset posted to linux-cifs and linux-fsdevel mailing lists.
Comment 45 Jan-Marek Glogowski 2013-08-21 13:29:59 UTC
I just checked how vFAT is handling the "ignore case on compare" problem. Seems the vfat module just ignores the file, if the case doesn't match:

lstat64("/tmp/cifs-nocase-inode-cache-bug-utf8.py", {st_mode=S_IFREG|0700, st_size=590, ...}) = 0
mkdir("/mnt/vfat/TEST\303\234", 0777)   = 0                  
lstat64("/mnt/vfat/test\303\274", 0xbfaafea0) = -1 ENOENT (No such file or directory)
access("/mnt/vfat/TEST\303\234", F_OK)  = 0
rmdir("/mnt/vfat/TEST\303\234")         = 0                   
access("/mnt/vfat/TEST\303\234", F_OK)  = -1 ENOENT (No such file or directory)                                                                                                               
access("/mnt/vfat/test\303\274", F_OK)  = -1 ENOENT (No such file or directory)

Probably that could be the way to go for CIFS too? At least rename would work as expected...
Comment 46 Jeff Layton 2013-08-21 13:43:10 UTC
I don't think we can get away with that. Suppose we do a stat() for different case variations of the path, say "test" and "TEST":

The server is going to do that case conversion whether we like it or not, so both on the wire calls will succeed. They'll be the same file at that point, but it'll look like they're hardlinked...sort of.

I don't see any alternative to trying to do the case conversion locally since we can't count on the server to reject the situation where we've sent a "wrong-case" variation of the canonical name.

A pox on microsoft for making this all case-insensitive in the first place. It's nothing but misery for anyone (including them) that have to do this stuff.
Comment 47 Jeff Layton 2013-08-21 13:49:02 UTC
FWIW, you can get the behavior you suggest by simply mounting w/o -o nocase. Also, it looks like you can also make vfat do case conversion internally by mounting it with -o nocase, though it uses a similar scheme as cifs does and is probably just as broken.
Comment 48 Jan-Marek Glogowski 2013-08-21 14:07:06 UTC
No - vFAT and CIFS handle the lstat differently for me. For vFAT I get:

lstat64("/mnt/vfat/test\303\274", 0xbfaafea0) = -1 ENOENT (No such file or directory)

for Win7 CIFS mount I get:

lstat64("/mnt/win7/test\303\274", {st_mode=S_IFDIR|0755, st_size=0, ...}) = 0

Both just mounted with a uid= flag (and CIFS credentials).
Comment 49 Jeff Layton 2013-08-21 14:10:46 UTC
Well no, they handle them the same in that case -- the result is just different:

With vfat, the filesystem has a way to know what the canonical case of the filename is and can simply reject names that don't match exactly. With CIFS, we can't do that since the case conversion is done transparently by the server.