Bug 11846 - Directory entries missing when some file names include invalid UTF8 sequences
Directory entries missing when some file names include invalid UTF8 sequences
Status: NEW
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services
x64 Linux
: P5 major
: ---
Assigned To: Samba QA Contact
Samba QA Contact
Depends on:
  Show dependency treegraph
Reported: 2016-04-16 21:37 UTC by Jean-Marc Le Peuvédic
Modified: 2016-04-17 23:37 UTC (History)
2 users (show)

See Also:

How Nautilus shows local incorrectly encoded file names (58.26 KB, image/png)
2016-04-16 21:37 UTC, Jean-Marc Le Peuvédic
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Jean-Marc Le Peuvédic 2016-04-16 21:37:53 UTC
Created attachment 12000 [details]
How Nautilus shows local incorrectly encoded file names

I use samba to share files from a server running Ubuntu 14.04 LTS x64. The version is reported as 4.1.6+dfsg. 

There are public shares and home shares. They are accessed from multiple devices and computers including WinXP virtual machines, Win 7 64 bit and 32 bit PCs, a laptop running Ubuntu 14.04 IA32, a Samsung smart TV, an OUYA console and other Android devices, an iPad...

When at last one entry in a directory has an invalid unicode encoding, some entries can be missing when the directory listing is obtained using smbclient.

I use smbclient version 4.1.6+dfsg which is the most up to date version released for Ubuntu 14.04 LTS.

How to reproduce:
A file with an incorrect encoding can be quite easily created using Emacs.
1-Create a normal file with 'touch', putting Unicode characters in it.
2-Open Emacs and in "Options/MULE/Set Coding Systems/For File Name" choose "raw".
3-Using Emacs dired mode in "Immediate" menu choose "Edit file name".
4-Emacs should display the unicode sequence with octal escapes \350\...
5-Move the cursor to the octal escape and delete the first octal character
6-Save the buffer to rename the file.
7-ls the folder with bash: the incorrectly encoded file name should contain question marks now.
8-On the system where the file resides, browse to the folder and check that Nautilus displays the file with question marks in black diamonds replacing the bad characters and "(invalid encoding)" appended. Note that "(invalid encoding)" is localized and may differ on your system.
9-Using Nautilus on another system, navigate to the same directory exported as a samba share: the directory now shows up empty.
10-Using smbclient from the same or another system, connect to the share, type in the password if needed, navigate to the correct folder and type "dir": the file does not show up. There is also an error message saying: 
cli_list: Error: unable to parse name from info level 260
11-Now create a few other files in the same folder: give them simple names. They are not listed either!
11-Create many files: I used a for loop to create files a a2 a3 a4 a5 a6 b b2 ... After creating 156 normal files, smbclient and nautilus list only "s5".
12-Delete s5: no file is listed. Create 26 new files a7 to z7, k7 is listed.

In many cases it is now very obvious that files and folders are missing. In addition if the commande "dir a*" is given to smbclient, the answer is correct: it lists a a2 a3 a4 a5 a6 a7 (in another order). My bad file name starts with 'b', so I tried "dir b*". I got the error message:

cli_list: Error: unable to parse name from info level 260
NT_STATUS_NO_MEMORY listing \Test\b*

The bug severity is major because it affects silently filesystem copies and some backup systems. Some users might be unable to recover their operations from backup disks if whole files and directories are missing. Part of the blame is on Ubuntu's Nautilus, which unlike smbclient, does not show any error message. But the bug is the incorrect handling of badly encoded file names.

I guess that Samba must decode and recode file names to adapt Unix names to the more restrictive rules of Windows. One bad name should not trigger the disappearance of tens of other files.