We have a problem regarding the Samba's character case conversions under Turkish locales: tr_TR (ISO-8859-9) and tr_TR.UTF-8. Turkish has an odd property wrt case conversions. Certain ASCII characters ('i' and 'I') of Turkish alphabet turn to multi-byte characters during the case changes. This problem is so common in that, I'll leave the words to i18nguy: http://www.i18nguy.com/unicode/turkish-i18n.html As a result of this oddity: Samba (1) totally fails under tr_TR.ISO-8859-9 though using a service name without i/I and any special Turkish character, since the built-in service name IPC$ has an 'I'; (2) almost totally fails under tr_TR.UTF-8 when using a service name with i/I chars. I've successfully tested the patch attached for Turkish. Permanently changing the {upcase,lowcase}.dat for Turkish could not be a solution, so I'm using a dynamic schema incorporating a hook mechanism triggered under a Turkic language environment (as for now, Turkish and Azeri languages). The patch provides a somewhat generic mechanism to handle such type of oddities, so I think it may be useful for some other languages having similar requirements. As part of the corrections, I've also fixed some wrong assumptions in certain string functions. These changes may show a little negative impact regarding the optimizations of string optimization, though I think they could be tolerated. However, if you prefer to keep the optimization level, we could implement a set of string wrappers of function pointers which would be frobnicated on startup according to locale. (I've prototyped and tested such a code. But I believe it would be complicated the things.) Let me know if you need extra information. I would also be glad for suggestions. P.S. Hope I've selected the right component for the bug report.
Created attachment 775 [details] Patch to fix the failure of odd case conversions
the question is: should we at all do a setlocale(LC_ALL, "") or should we force an LC_ALL setlocale call for en_US.UTF-8, which enforces ASCII compatible case conversion. At which points do we need special locale depending case conversions, which for example converts "I" to "dotless i"?
First of all, I apologize to have sent the wrong patch (too much tired these days). Attached is the correct one, which also includes support for a new locale: tr_CY[1] (Turkish locale for Cyprus). Please ignore the first patch and consider using the new one. Regarding your question, please note that as far I observe during my tests, charcnv.c issues a setlocale(LC_ALL, "") whenever switching to multi-byte mode anyway. This happens for the Turkish service names. And for the latter part of your question, I have traced this I->'dotless i' conversion in smbd/service.c make_connection-->str_lower_m(service) activated for reply_tconn_X against an smbclient call. For ISO-8859-9 encoding this line simply evaluates an "$[dotless i]PC' for the built-in $IPC. This patch just suggests a fix for the string processing infrastructure for Turkish like languages. It won't totally fix the "Turkish chars in service names" problem, since we should also change some other things at the client side. The problem with the 'i/I' will not be completely resolved, but the patch will somewhat improve the situation. For example: service name --> FOO<I with dot above>BAR This works: smbclient //<netbios_name>/<service name written as it is> But this doesn't work even applying the patch: smbclient //<netbios_name>/<service name written as lower case: fooibar> I've prepared another patch against libsmb/cliconnect.c and will submit a separate bug report for this issue (or should I?), though I'm not so sure it fits the rules for service names in NETBIOS protocol. [1] tr_CY is newly introduced locale which you can find the relevant bugzilla entry as follows: http://sources.redhat.com/bugzilla/show_bug.cgi?id=531
Comment on attachment 775 [details] Patch to fix the failure of odd case conversions Obsolete the wrong patch
Created attachment 776 [details] The correct patch The correct patch which also incorporates support for tr_CY locale.
how do Turish Windows 2k/XP versions react when you do for example have a share called "fooibar"? Can it be reached by another Turkish Windows 2k/XP unter the name "FOOIBAR" and/or under the name "FOO<I with dot above>BAR"?
Sorry, I don't use Windows/XP, don't have a Windows box at the moment. I've used Linux clients during the tests performed here. Will try this later. Some other notes... setlocale(LC_ALL, "") in the patch is only needed to determine the so-called 'lang_speciality'. We could temporarily switch to native locale and after the lang_speciality has been determined, could restore the prior locale state. But asides for all string operations (tolower/toupper), we need to work in native locale for some unusual operations, for example when falling back to lame case tables creation in load_case_tables(). Though not evident at first, my patch also fixes another minor bug. In the current code base, load_case_tables() is called before the globals initialization. As a result, lp_use_mmap() always evaluates to False since the Globals.bUseMmap flag has not been set, hence mmap is not utilized.
Comment on attachment 776 [details] The correct patch Obsoleted by revised patch.
Created attachment 782 [details] Revised patch I've revised the patch with some sane changes.
Created attachment 783 [details] Test script to validate the patch This script runs the 'torture/t_push_ucs2' and 't_strcmp' with some UTF-8 encoded Turkish test input. You should run it under the tr_TR.UTF-8 locale, check the results against the 'equal' and 'non-equal' string comparisons.
Created attachment 784 [details] Output of the test script for the unpatched (current) case I'm attaching the output of test script for the unpatched case, for your conveniency.
Created attachment 785 [details] Output of the test script for the patched case And here is the one for the patched case.
to know at which places we should fix things we really first need to know how Windows reacts. Please try to find out what I wrote in #2.
Well, I've finally managed to arrange some tests with Windows boxes. Here are the test results: Shared names: fooibar bazIbar foo<I above dot>baz Server side: Debian GNU/Linux Sarge with Samba 3.0.7 Client side: Turkish WinXP and Turkish Win98 Case 1. Samba server locale: tr_TR.ISO-8859-9 Server unix charset: ISO-8859-9 Shared names were all encoded in ISO-8859-9 Result 1 Total failure for both clients. No machine in the network neighborhood. Case 2 Samba server locale: tr_TR.UTF-8 Server unix charset: (left as default, that is, UTF-8) Shared names were all encoded in UTF-8 Result 2 Machine appeared in the network neighborhood. WinXP: Shared names were appeared as follows: fooibar --> fooibar bazIbar --> bazIbar foo<I above dot>baz --> foo<I above dot>baz Connections: fooibar --> failure, couldn't connect. Logged as 'fooIbar'. bazIbar --> success, couldn't connect. Logged as it is. foo<I above dot>baz --> success. Logged as it is. Win98: Shared name were appeared as follows: fooibar --> fooibar bazIbar --> bazIbar foo<I above dot>baz --> foo (yes, 'foo') Connections: fooibar --> failure, couldn't connect. Shared name logged as 'fooIbar'. bazIbar --> success. Shared name logged as it is. foo<I above dot>baz --> failure. Shared name logged as 'foo'. I couldn't repeat the tests for the patched case. I'll make this in a few days.
thanks a lot for your tests! There is however another important test we need to do: Create a share like "fooibar" on a Turkish Windows 2k/XP and try connect to the share from another Turkish Windows 2k/XP machine to "FOOIBAR" and to "fooibar" and see which of them are accessable. This dottet/dotless i is a nightmare ;-)
Hi Björn, I've created a shared as 'fooibar' on a WinXP box and attempted to connect it from another XP box: //host/fooibar OK //host/FOOIBAR OK //host/foo<I dot above>bar (or FOO<I dot above>BAR) FAILED This seems to be the same i/<I dot above> issue experienced with DOS filenames: mkdir fooibar; dir fooibar --> listed as FOOIBAR cd FOOIBAR --> OK, but cd FOO<I dot above>BAR --> FAILED mkdir FOO<I dot above>BAR; cd FOO<I dot above>BAR --> OK Now, where should we go from here? Should we create a very minimal patch just fixes this i/<I dot above> issue? (The patch should also address the totally failed case of ISO-8859-9 locale)
can you please try the attached fix?
Created attachment 826 [details] locale fix for ASCII compat string functions
Ok, I'm going to add the simple locale fix for the next release. Can someone confirm this is all that is needed for the fix ? Jeremy.
Sorry for the late response. I'll be able to test it in the next week. At the first glance, the patch seems fine to me; it is simple and not so invasive wrt my patch. But please note that it doesn't directly address the Turkish irregularity in the built-in multi-byte string library of samba, which my invasive patch targets. But of course, it should solve the problems experienced by Turkish users, and there is little chance to hit another Turkish related bug as far as the scope of samba string operations concerned. Ok, as I said before, I hope to report the test result in a few days.
I confirm that this patch works. I can now access to samba shares with 'i/I' characters. I've tested it for the worst case, that is, smbd was running under tr_TR (ISO-8859-9) locale. It even works with 'idotless' and 'Idotabove'. Thanks for your efforts.
thanks for your tests! Reguarding your comments in #20, that we do not address the i/I irregularity in the mb-functions. Yes, that's true, but Windows also does not address the i/I rules and it is ASCII-compatible, so this is the easiest and cleanes way to go.
sorry for the same, cleaning up the database to prevent unecessary reopens of bugs.
Recai, can you please test Samba 3.0.22 when it's out and confirm that this dotless i issue did not come up again? There was done a change in the code so that we no longer switch the locale to C but use alternative case functions instead. Thanks in advance Bjoern
(In reply to comment #24) > Recai, can you please test Samba 3.0.22 when it's out and confirm that this > dotless i issue did not come up again? There was done a change in the code so > that we no longer switch the locale to C but use alternative case functions > instead. Thanks in advance Hi Björn, Thanks for the notice! Regarding this issue, as your message implies, there has been a regression in the latest versions (i.e. 3.0.20b). Unfortunately I won't be able to make a test for some time (due to my workload these days). But I've contacted to one of my friend, and I hope he will deal with the issue.
(In reply to comment #25) > (In reply to comment #24) > > Recai, can you please test Samba 3.0.22 when it's out and confirm that this > > dotless i issue did not come up again? There was done a change in the code so > > that we no longer switch the locale to C but use alternative case functions > > instead. Thanks in advance Sorry for long delay, here is the some test results under tr_TR-UTF-8 locale; caglar@pardus source $ svn info URL: svn://svnanon.samba.org/samba/branches/SAMBA_3_0/source Revision: 13042 With these sharings all linux to linux, linux to windows, windows to linux cases seems works without a problem. [paylasim] comment = Ortak Paylasim Alani path = /home/samba read only = no guest ok = yes create mask = 0777 [paylaşım] comment = Ortak Paylasim Alani path = /home/samba read only = no guest ok = yes create mask = 0777 [çÇöÖşŞiİğĞüÜıI] comment = Ortak Paylasim Alani path = /home/samba read only = no guest ok = yes create mask = 0777 but 2 more problem exists. First one is smbclient still cant understand utf8 chars; pardus samba # smbclient -L pardus Password: Domain=[PARDUS] OS=[Unix] Server=[Samba 3.0.22pre1-SVN-build-13042] Sharename Type Comment --------- ---- ------- paylasim Disk Ortak Paylasim Alani payla Disk Ortak Paylasim Alani çÇöÖ Disk Ortak Paylasim Alani IPC$ IPC IPC Service (pardus - is istasyonu) ADMIN$ IPC IPC Service (pardus - is istasyonu) Domain=[PARDUS] OS=[Unix] Server=[Samba 3.0.22pre1-SVN-build-13042] Server Comment --------- ------- PARDUS pardus - is istasyonu and smbmount gives seq fault at least for me.
Thanks for the tests! For smbmount/smbfs issues see bug #1920 ...