Bug 7519 - Avoid setlocale calls in library functions
Avoid setlocale calls in library functions
Status: RESOLVED FIXED
Product: Samba 4.0
Classification: Unclassified
Component: Tools
unspecified
Other Linux
: P3 regression
: ---
Assigned To: Andrew Tridgell
samba4-qa@samba.org
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2010-06-16 09:22 UTC by Milan Crha
Modified: 2011-02-03 23:04 UTC (History)
6 users (show)

See Also:


Attachments
proposed samba4 patch (2.27 KB, patch)
2010-06-16 09:25 UTC, Milan Crha
no flags Details
proposed samba4 patch ][ (17.75 KB, patch)
2010-06-24 08:13 UTC, Milan Crha
jelmer: review-
Details
proposed samba4 patch ]I[ (10.75 KB, patch)
2010-06-25 06:36 UTC, Milan Crha
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Milan Crha 2010-06-16 09:22:52 UTC
Please avoid setlocale calls in library functions, like in
lib/util/charset/charcnv.c:get_conv_handle
it breaks applications using samba4 in a way that for example string compare based on the user's locale changes. Seeing the comment that you reset user's locale only because tolower/toupper surprised me quite much, but not as that much as when I realized you use this only for letters between 0 and 128 ascii code, which technically means 'a-z' <-> 'A-Z', thus really no need to reset locale to C in the calling application only because of this. Please see attached proposed patch, which makes things behave sanely.
Comment 1 Milan Crha 2010-06-16 09:25:10 UTC
Created attachment 5795 [details]
proposed samba4 patch

for samba4;

To not use setlocale in lib/util/charset/charcnv.c.
I tested this with one application and it seems to do things as I expect.
Comment 2 Matthias Dieter Wallnöfer 2010-06-16 09:32:00 UTC
Jelmer, could you please look into this?
Comment 3 Jelmer Vernooij 2010-06-16 10:06:39 UTC
How did you tell that we're only using tolower() and toupper() for letters between 0 and 128 ? I see quite a few calls to these functions and I'm not confident enough to say that none of them will call either of these functions with a different character.
Comment 4 Milan Crha 2010-06-16 11:36:38 UTC
Oops, seems I overlooked some places. I'm not sure how I was grepping sources. The places which I changed made sure they call toupper/tolower on the 7bit letters. And based on place of the use of setlocale I supposed it's enough to take care in this lib/util/charset, because the rest of the code may not expect any such functionality from the toupper/tolower from my point of view.

Anyway, it's my fault. I'm at commit 9cddf89 and I care only on samba4, thus I believe I can safely ignore source3 directory, but I should cover all the others. I would go with the comment suggestion from the patch, implement ascii_tolower and ascii_toupper for use in samba. What is the best place I should implement this in (and add include to in the next files), please?
Comment 5 Milan Crha 2010-06-16 11:41:09 UTC
I just tested and tolower/toupper covers only 'a-z' <-> 'A-Z' for letters between 0..255 inclusive for a C locale.
Comment 6 Milan Crha 2010-06-24 08:13:53 UTC
Created attachment 5806 [details]
proposed samba4 patch ][

for samba4;

Updated patch. I replaced all tolower/toupper with ascii_tolower/ascii_toupper, where it seemed to have a sense. There left some in source3 and one in a comment in source4. I also didn't find any suitable file where to define new functions, thus they are defined in respective modules, or "expanded inline". I wasn't able to test the build, because I cannot compile samba4 git master at commit 0be1820 due to:

Compiling librpc/rpc/dcerpc.c
librpc/rpc/dcerpc.c: In function ‘dcerpc_ship_next_request’:
librpc/rpc/dcerpc.c:1047: error: too few arguments to function ‘ndr_size_GUID’

Anyway, these changes seem very straightforward, so I guess there shouldn't be a problem.
Comment 7 Jelmer Vernooij 2010-06-24 11:05:02 UTC
Thanks for the updated patch.

I'd rather see the actual logic in a single place, a single ascii_tolower/ascii_toupper that gets called from various places rather than redefining that function in various places and/or reimplementing it inline.

lib/util is probably the best place for such functions to live (lib/util/util_str.c ?)

We shouldn't really change the calls to toupper()/tolower() in Heimdal. It shouldn't rely on the locale being C anyway.
Comment 8 Milan Crha 2010-06-24 11:18:58 UTC
(In reply to comment #7)
> Thanks for the updated patch.
> 
> I'd rather see the actual logic in a single place, a single
> ascii_tolower/ascii_toupper that gets called from various places rather than
> redefining that function in various places and/or reimplementing it inline.
>
> lib/util is probably the best place for such functions to live
> (lib/util/util_str.c ?)

if I got it right, then there is no good place, due to different libraries where the functions are used. Though on the other hand, I believe the first patch was correct and enough to do the job, as only that parts were expecting C locale, no other explicitly, and if implicitly, then it would be written there in a comment, I suppose.
 
> We shouldn't really change the calls to toupper()/tolower() in Heimdal. It
> shouldn't rely on the locale being C anyway.

Maybe. That's what I'm talking about, the Heimdal technically doesn't know what locale the calling application is using when its functions are called, same as other places here, though they, I guess, expects only 7bit letters, where toupper/tolower doesn't differ between different locales, I suppose. I noticed similar code and comments in source3, where is written that particular functions are used to not change locale to C, so it's changed only once, the values are cached, and then this cache is used. My first patch even avoids the cache, because of known results in the C locale.

Please re-consider the first patch, as only that part of source4 seems to depend on the C locale being set, which you can easily accomplish by the patch without using setlocale.
Comment 9 Jelmer Vernooij 2010-06-24 13:00:30 UTC
the iniparser d(In reply to comment #8)
> (In reply to comment #7)
> > Thanks for the updated patch.
> > 
> > I'd rather see the actual logic in a single place, a single
> > ascii_tolower/ascii_toupper that gets called from various places rather than
> > redefining that function in various places and/or reimplementing it inline.
> >
> > lib/util is probably the best place for such functions to live
> > (lib/util/util_str.c ?)
> if I got it right, then there is no good place, due to different libraries
> where the functions are used. Though on the other hand, I believe the first
> patch was correct and enough to do the job, as only that parts were expecting 
> locale, no other explicitly, and if implicitly, then it would be written there
> in a comment, I suppose.
lib/util/ should be linked into everything for which it matters. iniparser and heimdal are the only things as far as I can tell for which the system functions should definitely be sufficient as they are not originally part of Samba.

The ntvfs code for example links against libsamba-util.

> > We shouldn't really change the calls to toupper()/tolower() in Heimdal. It
> > shouldn't rely on the locale being C anyway.
> Maybe. That's what I'm talking about, the Heimdal technically doesn't know what
> locale the calling application is using when its functions are called, same as
> other places here, though they, I guess, expects only 7bit letters, where
> toupper/tolower doesn't differ between different locales, I suppose. I noticed
> similar code and comments in source3, where is written that particular
> functions are used to not change locale to C, so it's changed only once, the
> values are cached, and then this cache is used. My first patch even avoids the
> cache, because of known results in the C locale.
> 
> Please re-consider the first patch, as only that part of source4 seems to
> depend on the C locale being set, which you can easily accomplish by the patch
> without using setlocale.
Your first patch left a bunch of toupper()/tolower() calls in place so we would end up with non-ascii semantics from those functions with that patch where we were relying on them.
Comment 10 Milan Crha 2010-06-25 06:36:06 UTC
Created attachment 5808 [details]
proposed samba4 patch ]I[

for samba;

Hopefully the last one. I chose safe_string.h because of its simplicity and because I was unable to use util.h in lib/ldb. Feel free to change anything, I do not know all the samba4 internals, and I have nothing more to add to this.
Comment 11 Björn Jacke 2010-06-25 08:20:11 UTC
don't the *_ascii functions from source3/lib/util_unistr.c do what you need?

I would propose that those functions move from source3/lib/ to lib/ if you need them in s4 now.
Comment 12 Jelmer Vernooij 2010-06-25 17:27:05 UTC
(In reply to comment #11)
> don't the *_ascii functions from source3/lib/util_unistr.c do what you need?
> 
> I would propose that those functions move from source3/lib/ to lib/ if you need
> them in s4 now.
> 
Thanks, that's a good point. I've looked at this and it appears to rely on quite some s3-specific functions though, so perhaps we could for now focus on naming those functions the same for the time being and then getting rid of one of the implementations.
Comment 13 Milan Crha 2010-07-15 07:15:51 UTC
ping, what is this waiting for?
Comment 14 Matthias Dieter Wallnöfer 2010-07-16 12:04:24 UTC
Jelmer is still on holidays I think. He will return next week.
Comment 15 Matthias Dieter Wallnöfer 2010-09-11 12:35:28 UTC
Jelmer, how to proceed here?
Comment 16 Matthias Dieter Wallnöfer 2010-10-01 08:40:38 UTC
Jelmer, what to do here?
Comment 17 Jelmer Vernooij 2010-10-01 09:34:09 UTC
This is still waiting for a correct patch.

libreplace should not depend on libutil; the replacement for strcasestr should use the original toupper(), not the ascii one.

As Bjoern mentions, we should use the functions from s3 by moving them to lib/util or at the very least putting them in lib/util but giving them the same signature to make it possible to reconcile the two implementations in the future.
Comment 18 Milan Crha 2010-10-19 06:10:29 UTC
Just for your information, this break localization, because when I run the application in other than C language, then anything opened after your setlocale call is back in the C language, instead of the one I run the application from. It's basically a deal breaker and this bug should be marked as a blocker and fixed as soon as possible.
Comment 19 Stefan Metzmacher 2010-10-19 06:22:57 UTC
I think we also need to discuss patches like this on samba-technical malling list..
Comment 20 Sean Finney 2011-02-03 08:09:53 UTC
I have brought this up for discussion on the list:

http://lists.samba.org/archive/samba-technical/2011-February/076172.html

This is a pretty major issue for anything linking against a samba library, as it totally breaks many things revolving around translations and character sets :(
Comment 21 Andrew Tridgell 2011-02-03 23:04:13 UTC
thanks for reporting this, and my apologies for the very slow response!

I've removed the setlocale() call in master