Samba escapes filenames which cannot be used under Windows, such as COM1, allowing files with these names to be accessed from a Windows machine. However according to the list from Microsoft, there are some reserved names which Samba does not escape and thus those files cannot be accessed from Windows via Samba.
Samba escapes COM[1-3], but not COM[4-9]. It escapes LPT[1-3] but not LPT[4-9].
good point! We should somehow fix this. The number of illegal filnames frightens me a bit though. It also reminds me of the dicsussion about tailing "." in filesnames from bug 11255. Instead of mangling all of those names, we also just might add some magic (private range) unicode character. If we decide to do so we should coordinate with Linux cifs, so that the mapping is some the the same way. I'm wondering what macOS SMB clients are doing when the user tries to create those illegal files names. Anybody there with a Mac client who can check?
*** Bug 5290 has been marked as a duplicate of this bug. ***
macOS does not map reserved names. On Samba filename "COM1" is created as mangled name, on Windows 10 it gets a permission denied. and gets a permission denied from Windows.
I think we should introduce a well defined mapping for SMB reserved names just like the mapping for trailing spaces in filenames.
How about mapping "COM1" to "COM1" followed by U+FFFFD (which is the highest code point of the Unicaode private range). And the same for all the other reserved names. If we would introduce such a name mapping for cifs vfs and samba we would be able to handle such file names. Coordination with Apple would be nice also, so that macOS is not introducing something different in the future.
Adding U+FFFFD would work but wouldn't it make it difficult to type the name if you're in a console?
Would an ASCII solution be possible, like appending "~1" instead, with the number increasing until an available name is found? I like this scheme because it has been used for many years to indicate a mangled name, so it makes it obvious to the user that they may not be looking at the original filename.
Of course either way this would have to work on all basenames so for example "COM1.txt" ends up as "COM1~1.txt" or the like.
This proposal of course makes it impossible to automatically de-mangle the name, whereas the U+FFFFD idea is probably more practical in the case of a user creating "COM1\uFFFFD.txt" on a share and having Samba demangle it automatically, creating "COM1.txt" on the local filesystem.
Is the intention for Samba to mangle and demangle transparently, or only handle the mangling?
Adam: this is something, that the client will mainly do. It would not send a filename as "COM1" but as "COM1"+<private-unicaode+character> - this way it will be possible the file to be stored on any SMB server. The client would also map the name back to the noremal one if it receives such a name from a SMB server. The same way it works already with illegal filenames that have a trailing space or a trailing period. For smbd, there could then also be a mapping (instead of name mangling) which stores "COM1"+<private-unicaode+character> locally as "COM1" only and sends the "COM1"+<private-unicaode+character> over the wire when such a reserved name pops up in the filesystem.
Mapping COM[1-4] -> COM[1-4]+extra char isn't a great idea. Changing the length of a name component is tricky in the server/client. Is there a private unicode mapping we can use to map the last character only ?
if we don't want to change the length of the filename string, then we might pick a range of 128 code points in the private unicode range. Like starting from 0xF200 and up and count the desires ascii hex value on top of that That would map
COM1 -> COM + 0xF231
COM2 -> COM + 0xF232
LPT1 -> LTP + 0xF231
NULL -> NUL + 0xF24C
and so on...
How about this?
That brings up another question then: case handling. Would we map lowercase "l" to 0xF26C and uppercase "L" to 0xF24C then, I think this is what we want, right?
Probably not too relevant given Jeremy's comment but I just noticed that the old mtools package (for working with FAT filesystems) handles reserved names by adding "~1". The example they give is "prn.txt" is written as "PRN~1.TXT":
@Björn: What if you replaced the first character instead of the last? Then you would only need two characters, one for "uppercase" and one for "lowercase" with the translated value depending on the few characters following:
<U>OM1 -> _OM=COM -> COM1
<U>om2 -> _OM=COM -> Com2
<L>rn.txt -> _RN=PRN -> prn.txt
<L>ul -> _UL=NUL -> nul