The Samba-Bugzilla – Bug 7816
get_tmpname() can create invalid UTF-8 filenames
Last modified: 2011-01-03 22:13:03 UTC
get_tmpname() creates filenames consisting of the directory, a dot, some bytes from the filename and .XXXXXX\0. No consideration is made for the fact that UTF-8 characters can be several bytes long and arbitrarily truncating the name can create an invalid UTF-8 sequence. Normally this isn't a problem but if the filesystem strictly enforces UTF-8 then the temp file cannot be created and the transfer fails.
An example of the problem is:
sending incremental file list
rsync: mkstemp "/fan/data/.MS_R\#303.001058" failed: Permission denied (13)
ö in UTF-8 is \#303\#266.
We got around the problem by specifying --inplace which avoids the temp file.
I think that the easiest way to handle the problem is to replace all characters in the file name with # if bit 7 is set.
Created attachment 6086 [details]
A simple heuristic that tries to avoid split high-bit characters
Most of the time the name won't be trimmed, as it only happens if the path is long enough that the temp name needs more room to add the unique suffix. The attached patch is a simple heuristic that triggers if the name gets trimmed and there is a high-bit character as both the first-trimmed character and the last retained character. In such a case, we'll just make the name shorter (removing all dangling high-bit characters). If we end up with just a leading dot for the name, the trimming will stop, and the name will be kinda sad, but still usable.
This fix will be in 3.0.8.