Please avoid setlocale calls in library functions, like in lib/util/charset/charcnv.c:get_conv_handle it breaks applications using samba4 in a way that for example string compare based on the user's locale changes. Seeing the comment that you reset user's locale only because tolower/toupper surprised me quite much, but not as that much as when I realized you use this only for letters between 0 and 128 ascii code, which technically means 'a-z' <-> 'A-Z', thus really no need to reset locale to C in the calling application only because of this. Please see attached proposed patch, which makes things behave sanely.
Created attachment 5795 [details] proposed samba4 patch for samba4; To not use setlocale in lib/util/charset/charcnv.c. I tested this with one application and it seems to do things as I expect.
Jelmer, could you please look into this?
How did you tell that we're only using tolower() and toupper() for letters between 0 and 128 ? I see quite a few calls to these functions and I'm not confident enough to say that none of them will call either of these functions with a different character.
Oops, seems I overlooked some places. I'm not sure how I was grepping sources. The places which I changed made sure they call toupper/tolower on the 7bit letters. And based on place of the use of setlocale I supposed it's enough to take care in this lib/util/charset, because the rest of the code may not expect any such functionality from the toupper/tolower from my point of view. Anyway, it's my fault. I'm at commit 9cddf89 and I care only on samba4, thus I believe I can safely ignore source3 directory, but I should cover all the others. I would go with the comment suggestion from the patch, implement ascii_tolower and ascii_toupper for use in samba. What is the best place I should implement this in (and add include to in the next files), please?
I just tested and tolower/toupper covers only 'a-z' <-> 'A-Z' for letters between 0..255 inclusive for a C locale.
Created attachment 5806 [details] proposed samba4 patch ][ for samba4; Updated patch. I replaced all tolower/toupper with ascii_tolower/ascii_toupper, where it seemed to have a sense. There left some in source3 and one in a comment in source4. I also didn't find any suitable file where to define new functions, thus they are defined in respective modules, or "expanded inline". I wasn't able to test the build, because I cannot compile samba4 git master at commit 0be1820 due to: Compiling librpc/rpc/dcerpc.c librpc/rpc/dcerpc.c: In function ‘dcerpc_ship_next_request’: librpc/rpc/dcerpc.c:1047: error: too few arguments to function ‘ndr_size_GUID’ Anyway, these changes seem very straightforward, so I guess there shouldn't be a problem.
Thanks for the updated patch. I'd rather see the actual logic in a single place, a single ascii_tolower/ascii_toupper that gets called from various places rather than redefining that function in various places and/or reimplementing it inline. lib/util is probably the best place for such functions to live (lib/util/util_str.c ?) We shouldn't really change the calls to toupper()/tolower() in Heimdal. It shouldn't rely on the locale being C anyway.
(In reply to comment #7) > Thanks for the updated patch. > > I'd rather see the actual logic in a single place, a single > ascii_tolower/ascii_toupper that gets called from various places rather than > redefining that function in various places and/or reimplementing it inline. > > lib/util is probably the best place for such functions to live > (lib/util/util_str.c ?) if I got it right, then there is no good place, due to different libraries where the functions are used. Though on the other hand, I believe the first patch was correct and enough to do the job, as only that parts were expecting C locale, no other explicitly, and if implicitly, then it would be written there in a comment, I suppose. > We shouldn't really change the calls to toupper()/tolower() in Heimdal. It > shouldn't rely on the locale being C anyway. Maybe. That's what I'm talking about, the Heimdal technically doesn't know what locale the calling application is using when its functions are called, same as other places here, though they, I guess, expects only 7bit letters, where toupper/tolower doesn't differ between different locales, I suppose. I noticed similar code and comments in source3, where is written that particular functions are used to not change locale to C, so it's changed only once, the values are cached, and then this cache is used. My first patch even avoids the cache, because of known results in the C locale. Please re-consider the first patch, as only that part of source4 seems to depend on the C locale being set, which you can easily accomplish by the patch without using setlocale.
the iniparser d(In reply to comment #8) > (In reply to comment #7) > > Thanks for the updated patch. > > > > I'd rather see the actual logic in a single place, a single > > ascii_tolower/ascii_toupper that gets called from various places rather than > > redefining that function in various places and/or reimplementing it inline. > > > > lib/util is probably the best place for such functions to live > > (lib/util/util_str.c ?) > if I got it right, then there is no good place, due to different libraries > where the functions are used. Though on the other hand, I believe the first > patch was correct and enough to do the job, as only that parts were expecting > locale, no other explicitly, and if implicitly, then it would be written there > in a comment, I suppose. lib/util/ should be linked into everything for which it matters. iniparser and heimdal are the only things as far as I can tell for which the system functions should definitely be sufficient as they are not originally part of Samba. The ntvfs code for example links against libsamba-util. > > We shouldn't really change the calls to toupper()/tolower() in Heimdal. It > > shouldn't rely on the locale being C anyway. > Maybe. That's what I'm talking about, the Heimdal technically doesn't know what > locale the calling application is using when its functions are called, same as > other places here, though they, I guess, expects only 7bit letters, where > toupper/tolower doesn't differ between different locales, I suppose. I noticed > similar code and comments in source3, where is written that particular > functions are used to not change locale to C, so it's changed only once, the > values are cached, and then this cache is used. My first patch even avoids the > cache, because of known results in the C locale. > > Please re-consider the first patch, as only that part of source4 seems to > depend on the C locale being set, which you can easily accomplish by the patch > without using setlocale. Your first patch left a bunch of toupper()/tolower() calls in place so we would end up with non-ascii semantics from those functions with that patch where we were relying on them.
Created attachment 5808 [details] proposed samba4 patch ]I[ for samba; Hopefully the last one. I chose safe_string.h because of its simplicity and because I was unable to use util.h in lib/ldb. Feel free to change anything, I do not know all the samba4 internals, and I have nothing more to add to this.
don't the *_ascii functions from source3/lib/util_unistr.c do what you need? I would propose that those functions move from source3/lib/ to lib/ if you need them in s4 now.
(In reply to comment #11) > don't the *_ascii functions from source3/lib/util_unistr.c do what you need? > > I would propose that those functions move from source3/lib/ to lib/ if you need > them in s4 now. > Thanks, that's a good point. I've looked at this and it appears to rely on quite some s3-specific functions though, so perhaps we could for now focus on naming those functions the same for the time being and then getting rid of one of the implementations.
ping, what is this waiting for?
Jelmer is still on holidays I think. He will return next week.
Jelmer, how to proceed here?
Jelmer, what to do here?
This is still waiting for a correct patch. libreplace should not depend on libutil; the replacement for strcasestr should use the original toupper(), not the ascii one. As Bjoern mentions, we should use the functions from s3 by moving them to lib/util or at the very least putting them in lib/util but giving them the same signature to make it possible to reconcile the two implementations in the future.
Just for your information, this break localization, because when I run the application in other than C language, then anything opened after your setlocale call is back in the C language, instead of the one I run the application from. It's basically a deal breaker and this bug should be marked as a blocker and fixed as soon as possible.
I think we also need to discuss patches like this on samba-technical malling list..
I have brought this up for discussion on the list: http://lists.samba.org/archive/samba-technical/2011-February/076172.html This is a pretty major issue for anything linking against a samba library, as it totally breaks many things revolving around translations and character sets :(
thanks for reporting this, and my apologies for the very slow response! I've removed the setlocale() call in master