Bug 8310 - toupper_ascii() is broken on big-endian systems
Summary: toupper_ascii() is broken on big-endian systems
Status: RESOLVED FIXED
Alias: None
Product: Samba 3.6
Classification: Unclassified
Component: Config Files (show other bugs)
Version: 3.6.0rc2
Hardware: All All
: P5 regression
Target Milestone: ---
Assignee: Karolin Seeger
QA Contact: Samba QA Contact
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-07-19 11:00 UTC by David Disseldorp
Modified: 2020-12-11 11:24 UTC (History)
1 user (show)

See Also:


Attachments
git-am fix for 3.6.0rc3. (14.89 KB, patch)
2011-07-19 21:20 UTC, Jeremy Allison
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description David Disseldorp 2011-07-19 11:00:54 UTC
In parsing boolean options in smb.conf, the following code path is executed.

lib/util/util.c:
 790 _PUBLIC_ bool set_boolean(const char *boolean_string, bool *boolean)
 791 {          
 792         if (strwicmp(boolean_string, "yes") == 0 ||
 793             strwicmp(boolean_string, "true") == 0 ||
 794             strwicmp(boolean_string, "on") == 0 ||
 795             strwicmp(boolean_string, "1") == 0) {
 796                 *boolean = true;
 797                 return true;

lib/util_str.c:
 229 int strwicmp(const char *psz1, const char *psz2)
 230 {
...
 246                 if (toupper_ascii(*psz1) != toupper_ascii(*psz2) ||
 247                                 *psz1 == '\0' || *psz2 == '\0')
 248                         break;
...
260 int toupper_ascii(int c)
261 {
262         smb_ucs2_t uc = toupper_m(UCS2_CHAR(c));


lib/util/charset/charset.h
 42 #ifdef WORDS_BIGENDIAN
 43 #define UCS2_SHIFT 8
 44 #else
 45 #define UCS2_SHIFT 0
 46 #endif
 47 
 48 /* turn a 7 bit character into a ucs2 character */
 49 #define UCS2_CHAR(c) ((c) << UCS2_SHIFT)


lib/util/charset/codepoints.c
 90 _PUBLIC_ codepoint_t toupper_m(codepoint_t val)
 91 {
 92         if (val < 128) {
 93                 return toupper(val);
 94         }


On big endian systems, the ASCII val passed into toupper_m is pre-shifted meaning toupper() does not get called:
Breakpoint 1, set_boolean (boolean_string=0x10fb7d37 "Yes", boolean=0xfffffffe5f0) at ../lib/util/util.c:792
792             if (strwicmp(boolean_string, "yes") == 0 ||
(gdb) b toupper_m
Breakpoint 2 at 0x10681a80: file ../lib/util/charset/codepoints.c, line 92.
(gdb) c
Continuing.

Breakpoint 2, toupper_m (val=22784) at ../lib/util/charset/codepoints.c:92
92              if (val < 128) {
(gdb) p /x val
$1 = 0x5900
(gdb) l
87      /**
88       Convert a codepoint_t to upper case.
89      **/
90      _PUBLIC_ codepoint_t toupper_m(codepoint_t val)
91      {
92              if (val < 128) {
93                      return toupper(val);
94              }
95              if (upcase_table == NULL) {
96                      load_case_tables_library();
(gdb) bt
#0  toupper_m (val=22784) at ../lib/util/charset/codepoints.c:92
#1  0x000000001068169c in toupper_ascii (c=89) at lib/util_unistr.c:616
#2  0x0000000010677c38 in strwicmp (psz1=0x10fb7d37 "Yes", psz2=0x10da6618 "yes") at lib/util_str.c:246
#3  0x0000000010638bf8 in set_boolean (boolean_string=0x10fb7d37 "Yes", boolean=0xfffffffe5f0) at ../lib/util/util.c:792
#4  0x00000000100bcca0 in lp_bool (s=0x10fb7d37 "Yes") at param/loadparm.c:6168
#5  0x00000000100c3218 in lp_do_parameter (snum=-2, pszParmName=0x10fb7d20 "usershare allow guests", pszParmValue=0x10fb7d37 "Yes")
    at param/loadparm.c:7991
#6  0x00000000100c38b0 in do_parameter (pszParmName=0x10fb7d20 "usershare allow guests", pszParmValue=0x10fb7d37 "Yes", userdata=0x0)
    at param/loadparm.c:8100
#7  0x000000001063acc4 in Parameter (InFile=0x10fb7740, pfunc=@0x10eb8948: 0x100c37c8 <do_parameter>, c=0, userdata=0x0)
    at ../lib/util/params.c:426
#8  0x000000001063ae24 in Parse (InFile=0x10fb7740, sfunc=@0x10eb89c0: 0x100c3fcc <do_section>, 
    pfunc=@0x10eb8948: 0x100c37c8 <do_parameter>, userdata=0x0) at ../lib/util/params.c:482
#9  0x000000001063b234 in pm_process (FileName=0x10fb7230 "/etc/samba/smb.conf", sfunc=@0x10eb89c0: 0x100c3fcc <do_section>, 
    pfunc=@0x10eb8948: 0x100c37c8 <do_parameter>, userdata=0x0) at ../lib/util/params.c:566
#10 0x00000000100c8a0c in lp_load_ex (pszFname=0x10fb58f0 "/etc/samba/smb.conf", global_only=true, save_defaults=false, add_ipc=false, 
    initialize_globals=true, allow_include_registry=false, allow_registry_shares=false) at param/loadparm.c:9564
#11 0x00000000100c90a4 in lp_load_initial_only (pszFname=0x10fb58f0 "/etc/samba/smb.conf") at param/loadparm.c:9664
#12 0x0000000010c2014c in main (argc=7, argv=0xffffffff328) at smbd/server.c:1050


set_boolean() failures result in misinterpretation of smb.conf:
[2011/07/19 12:59:56.775424,  0] param/loadparm.c:6169(lp_bool)
  lp_bool(Yes): value is not boolean!
[2011/07/19 12:59:56.775526,  2] param/loadparm.c:8287(do_section)
  Processing section "[homes]"
[2011/07/19 12:59:56.775759,  0] param/loadparm.c:6169(lp_bool)
  lp_bool(No): value is not boolean!
[2011/07/19 12:59:56.775835,  0] param/loadparm.c:6169(lp_bool)
  lp_bool(No): value is not boolean!
[2011/07/19 12:59:56.775909,  0] param/loadparm.c:6169(lp_bool)
  lp_bool(Yes): value is not boolean!
Comment 1 Jeremy Allison 2011-07-19 17:44:19 UTC
Looks like this got broken in the conversion of the toupper_ascii()  tolower_ascii() isupper_ascii() and islower_ascii() functions to call the xxx_m() functions when previously they called the xxxx_w() functions.

This is a blocker for rc3.

Jeremy.
Comment 2 Jeremy Allison 2011-07-19 18:12:02 UTC
Ok, I've examined this. Andrew Bartlett made a mistake (sorry to point here, but git blame is pretty clear on this) in that smb_ucs2_t != codepoint_t.

smb_ucs2_t is explicitly a little endian uint16_t type.
codepoint_t is a native endian uint32_t type.

This means that we cannot just swap islower_w to islower_m and the same for all other similar calls.

I blame myself for not catching this at the time, but character conversion is tricky.

Jeremy.
Comment 3 Jeremy Allison 2011-07-19 18:27:11 UTC
David - thanks a *lot* for catching this. 3.6.0 would be completely broken on BE systems without a fix for this.

Jeremy.
Comment 4 Karolin Seeger 2011-07-19 18:51:54 UTC
Re-assigning to Jeremy.
Comment 5 Jeremy Allison 2011-07-19 20:08:04 UTC
I have a fix for this. In test now...

Jeremy.
Comment 6 Jeremy Allison 2011-07-19 21:20:15 UTC
Created attachment 6703 [details]
git-am fix for 3.6.0rc3.

David, can you test this please on your big-endian system. Metze, please review for the rc3 release.

Jeremy.
Comment 7 Stefan Metzmacher 2011-07-20 07:02:48 UTC
Comment on attachment 6703 [details]
git-am fix for 3.6.0rc3.

The first patch says that toupper_ascii() would be identical to toupper_m(),
but that's not true, as _ascii() uses UCS2_CHAR() and UCS2_TO_CHAR().
They do bitshifting based on the big endian.
So it's not trivial to ack.

Maybe someone else (who understands this code) can review this.
Comment 8 Andrew Tridgell 2011-07-20 07:28:18 UTC
(In reply to comment #7)
> Comment on attachment 6703 [details]
> git-am fix for 3.6.0rc3.
> 
> The first patch says that toupper_ascii() would be identical to toupper_m(),
> but that's not true, as _ascii() uses UCS2_CHAR() and UCS2_TO_CHAR().
> They do bitshifting based on the big endian.
> So it's not trivial to ack.
> 
> Maybe someone else (who understands this code) can review this.

I've tested on a BE box, and the patch does fix the problem
Comment 9 Andrew Tridgell 2011-07-20 07:30:07 UTC
I think this needs to go in for 3.6
 Tridge
Comment 10 David Disseldorp 2011-07-20 09:04:36 UTC
(In reply to comment #6)
> Created attachment 6703 [details]
> git-am fix for 3.6.0rc3.
> 
> David, can you test this please on your big-endian system.

Works for me too, thanks for the fix.
Comment 11 Jeremy Allison 2011-07-20 16:20:28 UTC
Metze, the reason I can replace toupper_ascii() with toupper_m() (and all the other _ascii() functions) is that the _ascii() versions are always called with a char value (extended to int on the stack). For this case (values < 128) they should be identical. For values > 127 the int will be sign extended and toupper_m will simply return that value.

Karolin please include in 3.6.0rc3.

Thanks,

Jeremy.
Comment 12 Karolin Seeger 2011-07-21 18:15:52 UTC
Pushed to v3-6-test.
Closing out bug report.

Thanks!