Depending on the startingpoint of a Spotlight search in Macos Finder, the search fails. ### Env smbd -V Version 4.18.2 Fruits: fruit:veto_appledouble = yes fruit:aapl = yes fruit:nfs_aces = no fruit:metadata = stream fruit:resource = xattr fruit:copyfile = yes vfs objects = catia fruit streams_xattr recycle shadow_copy2 Samba Spotlight Backend: Elasticsearch Macos Ventura 13.2 Filesystem: Ubuntu included ZFS ### Setup /myshare/MYSUBDIR_Ä/files/doc1.pdf /myshare/MYSUBDIR/files/doc2.pdf ### Success and Fail Doing the following in Macos Finder: enter directory and search if starting my Spotlight search from folder "/myshare", both "doc1.pdf and "doc2.pdf" CAN be found by name. if starting my Spotlight search from folder "/myshare/MYSUBDIR", then "doc2.pdf" CAN be found by name. if starting my Spotlight search from folder "/myshare/MYSUBDIR_Ä", then "doc1.pdf" CANNOT be found by name. ### My guess I suspect that Spotlight sends a directory string that is somehow corrupted when reaching samba. Other characters may suffer from this too.
I suspect combined diaeresis. My folder "MYSUBDIR_Ä" contains that nasty double character. This is however not in the elasticsearch db. It seems samba produces these, as the folder name seen by Linux is a normal single character. Is there a way to disable combined diaeresis in Samba, so the Mac gets a real Umlaut in the first place? Thanks M
Note: Updated to Ventura 13.4. Bug persists.
I ran into this as well but in my case the share name has the umlaut. Fast searches don't work at all and Finder starts a full directory travelsal. Removing the umlaut from the share name instantly fixes the issue, even without reindexing the directory structure. But obviously Samba should support special characters since native macOS SMB & Spotlight do as well. Hopefully we get a fix at some point.
Hello M Weber, I had tried to reproduce the issue with the mentioned steps , but it seems the issue is not being reproduced with the mentioned steps , could you please provide in detail steps to reproduce or any document for the reference
(In reply to Priyanka Soni from comment #4) Hi Priyanka Soni have you indexed the directory using Elasticsearch, and queried with Samba against it? Greets Manu
Hello M Weber, Please find the update regarding the directory indexing and Samba query issue: Initial Status: Earlier attempts failed to reproduce because the manual queries used NFC (Composed) characters, which matched the existing index data perfectly. The Change: Reproduction was achieved by using printf to force NFD (Decomposed) encoding, which is the exact byte sequence sent by macOS Spotlight. Indexing & Query Confirmation: We have confirmed that the directory is indexed in Elasticsearch and successfully reproduced the search failure when querying via Samba. The Findings: While standard ASCII paths like MYSUBDIR match perfectly, searching from MYSUBDIR_Ä fails because Samba/Spotlight sends the path in NFD (Decomposed) encoding. Reproduction: This was verified using curl by forcing NFD encoding in the query, which resulted in 0 hits despite the file existing in the index. used below to reproduce the issue DECOMPOSED=$(printf "/mnt/cephfs/MYSUBDIR_A\xcc\x88/files/doc2.pdf") curl -X GET "localhost:9200/files/_search?pretty" -H 'Content-Type: application/json' -d" { \"query\": { \"term\": { \"path.real\": \"$DECOMPOSED\" } } }" Technical Conflict: Since path.real is a keyword field, it performs a strict binary match and treats the Composed and Decomposed versions of Ä as two different strings. I will work on the resolution and update the ticket accordingly Thanks & Regards, Priyanka Soni
I guess we're missing to call smb_iconv() on scope in slrpc_open_query().
Created attachment 18947 [details] screenshot-1 screenshot -1
Hello I have implemented a fix related to handling of directories containing umlaut characters and have attached screenshots demonstrating the Elastic Search behavior before and after the change. The first attached image(screenshot -1) shows the previous behavior (without the fix), where the umlaut directory handling was incorrect. The second attached image(screenshot -2) shows the current behavior (with the fix), reflecting the updated handling. Could you please review the attachments and confirm if this aligns with the expected behavior and acceptance criteria? Once this is verified, I will proceed to submit the fix for approval. Thanks, Priyanka
Created attachment 18948 [details] screenshot -2 screenshot -2
The first screenshot look exactly as I experienced it. And the second looks like the correct fix. Thank you for your work! m