Bug 15379 - Spotlight fails if search starts from path containing an Umlaut
Summary: Spotlight fails if search starts from path containing an Umlaut
Status: NEW
Alias: None
Product: Samba 4.1 and newer
Classification: Unclassified
Component: File services (show other bugs)
Version: 4.18.2
Hardware: x64 Linux
: P5 normal (vote)
Target Milestone: ---
Assignee: Samba QA Contact
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2023-05-27 18:39 UTC by M Weber
Modified: 2026-04-17 21:53 UTC (History)
4 users (show)

See Also:


Attachments
screenshot-1 (1.47 MB, image/png)
2026-04-17 18:49 UTC, Priyanka Soni
no flags Details
screenshot -2 (1.47 MB, image/png)
2026-04-17 18:55 UTC, Priyanka Soni
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description M Weber 2023-05-27 18:39:41 UTC
Depending on the startingpoint of a Spotlight search in Macos Finder, the search fails.

### Env
smbd -V
Version 4.18.2

Fruits:
    fruit:veto_appledouble = yes
    fruit:aapl = yes
    fruit:nfs_aces = no
    fruit:metadata = stream
    fruit:resource = xattr
    fruit:copyfile = yes
    vfs objects = catia fruit streams_xattr recycle shadow_copy2

Samba Spotlight Backend: Elasticsearch
Macos Ventura 13.2
Filesystem: Ubuntu included ZFS


### Setup
/myshare/MYSUBDIR_Ä/files/doc1.pdf
/myshare/MYSUBDIR/files/doc2.pdf


### Success and Fail
Doing the following in Macos Finder: enter directory and search

if starting my Spotlight search from folder "/myshare", both "doc1.pdf and "doc2.pdf" CAN be found by name.

if starting my Spotlight search from folder "/myshare/MYSUBDIR", then "doc2.pdf" CAN be found by name.

if starting my Spotlight search from folder "/myshare/MYSUBDIR_Ä", then "doc1.pdf" CANNOT be found by name.


### My guess
I suspect that Spotlight sends a directory string that is somehow corrupted when reaching samba.
Other characters may suffer from this too.
Comment 1 M Weber 2023-05-29 12:46:54 UTC
I suspect combined diaeresis.
My folder "MYSUBDIR_Ä" contains that nasty double character.
This is however not in the elasticsearch db.

It seems samba produces these, as the folder name seen by Linux is a normal single character.

Is there a way to disable combined diaeresis in Samba, so the Mac gets a real Umlaut in the first place?

Thanks
M
Comment 2 M Weber 2023-05-29 12:48:31 UTC
Note:
Updated to Ventura 13.4.
Bug persists.
Comment 3 Perttu Aaltonen 2024-07-10 09:53:16 UTC
I ran into this as well but in my case the share name has the umlaut. Fast searches don't work at all and Finder starts a full directory travelsal.

Removing the umlaut from the share name instantly fixes the issue, even without reindexing the directory structure. But obviously Samba should support special characters since native macOS SMB & Spotlight do as well.

Hopefully we get a fix at some point.
Comment 4 Priyanka Soni 2026-02-02 10:05:06 UTC
Hello M Weber,

I had tried to reproduce the issue with the mentioned steps , but it seems the issue is not being reproduced with the mentioned steps , could you please provide in detail steps to reproduce or any document for the reference
Comment 5 M Weber 2026-02-02 14:20:30 UTC
(In reply to Priyanka Soni from comment #4)

Hi Priyanka Soni

have you indexed the directory using Elasticsearch, and queried with Samba against it?

Greets
Manu
Comment 6 Priyanka Soni 2026-02-23 11:01:33 UTC
Hello M Weber,

Please find the update regarding the directory indexing and Samba query issue:

Initial Status: Earlier attempts failed to reproduce because the manual queries used NFC (Composed) characters, which matched the existing index data perfectly.

The Change: Reproduction was achieved by using printf to force NFD (Decomposed) encoding, which is the exact byte sequence sent by macOS Spotlight.

Indexing & Query Confirmation: We have confirmed that the directory is indexed in Elasticsearch and successfully reproduced the search failure when querying via Samba.
The Findings: While standard ASCII paths like MYSUBDIR match perfectly, searching from MYSUBDIR_Ä fails because Samba/Spotlight sends the path in NFD (Decomposed) encoding.

Reproduction: This was verified using curl by forcing NFD encoding in the query, which resulted in 0 hits despite the file existing in the index.

used below to reproduce the issue

DECOMPOSED=$(printf "/mnt/cephfs/MYSUBDIR_A\xcc\x88/files/doc2.pdf")

curl -X GET "localhost:9200/files/_search?pretty" -H 'Content-Type: application/json' -d"
{
  \"query\": {
    \"term\": {
      \"path.real\": \"$DECOMPOSED\"
    }
  }
}"


Technical Conflict: Since path.real is a keyword field, it performs a strict binary match and treats the Composed and Decomposed versions of Ä as two different strings.

I will work on the resolution and update the ticket accordingly

Thanks & Regards,
Priyanka Soni
Comment 7 Ralph Böhme 2026-02-23 11:39:06 UTC
I guess we're missing to call smb_iconv() on scope in slrpc_open_query().
Comment 8 Priyanka Soni 2026-04-17 18:49:56 UTC
Created attachment 18947 [details]
screenshot-1

screenshot -1
Comment 9 Priyanka Soni 2026-04-17 18:52:13 UTC
Hello 

I have implemented a fix related to handling of directories containing umlaut characters and have attached screenshots demonstrating the Elastic Search behavior before and after the change.

The first attached image(screenshot -1) shows the previous behavior (without the fix), where the umlaut directory handling was incorrect.
The second attached image(screenshot -2) shows the current behavior (with the fix), reflecting the updated handling.

Could you please review the attachments and confirm if this aligns with the expected behavior and acceptance criteria?

Once this is verified, I will proceed to submit the fix for approval.

Thanks,
Priyanka
Comment 10 Priyanka Soni 2026-04-17 18:55:20 UTC
Created attachment 18948 [details]
screenshot -2

screenshot -2
Comment 11 M Weber 2026-04-17 21:53:38 UTC
The first screenshot look exactly as I experienced it. 
And the second looks like the correct fix. 

Thank you for your work!

m