8310 – toupper_ascii() is broken on big-endian systems

Bug 8310 - toupper_ascii() is broken on big-endian systems

Summary: toupper_ascii() is broken on big-endian systems

Status:	RESOLVED FIXED

Alias:	None

Product:	Samba 3.6
Classification:	Unclassified
Component:	Config Files (show other bugs)
Version:	3.6.0rc2
Hardware:	All All

Importance:	P5 regression
Target Milestone:	---
Assignee:	Karolin Seeger
QA Contact:	Samba QA Contact

URL:
Keywords:

Depends on:
Blocks:

Reported:	2011-07-19 11:00 UTC by David Disseldorp
Modified:	2020-12-11 11:24 UTC (History)
CC List:	1 user (show)

See Also:

Attachments
git-am fix for 3.6.0rc3. (14.89 KB, patch) 2011-07-19 21:20 UTC, Jeremy Allison	no flags	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description David Disseldorp 2011-07-19 11:00:54 UTC

In parsing boolean options in smb.conf, the following code path is executed.

lib/util/util.c:
 790 _PUBLIC_ bool set_boolean(const char *boolean_string, bool *boolean)
 791 {          
 792         if (strwicmp(boolean_string, "yes") == 0 ||
 793             strwicmp(boolean_string, "true") == 0 ||
 794             strwicmp(boolean_string, "on") == 0 ||
 795             strwicmp(boolean_string, "1") == 0) {
 796                 *boolean = true;
 797                 return true;

lib/util_str.c:
 229 int strwicmp(const char *psz1, const char *psz2)
 230 {
...
 246                 if (toupper_ascii(*psz1) != toupper_ascii(*psz2) ||
 247                                 *psz1 == '\0' || *psz2 == '\0')
 248                         break;
...
260 int toupper_ascii(int c)
261 {
262         smb_ucs2_t uc = toupper_m(UCS2_CHAR(c));


lib/util/charset/charset.h
 42 #ifdef WORDS_BIGENDIAN
 43 #define UCS2_SHIFT 8
 44 #else
 45 #define UCS2_SHIFT 0
 46 #endif
 47 
 48 /* turn a 7 bit character into a ucs2 character */
 49 #define UCS2_CHAR(c) ((c) << UCS2_SHIFT)


lib/util/charset/codepoints.c
 90 _PUBLIC_ codepoint_t toupper_m(codepoint_t val)
 91 {
 92         if (val < 128) {
 93                 return toupper(val);
 94         }


On big endian systems, the ASCII val passed into toupper_m is pre-shifted meaning toupper() does not get called:
Breakpoint 1, set_boolean (boolean_string=0x10fb7d37 "Yes", boolean=0xfffffffe5f0) at ../lib/util/util.c:792
792             if (strwicmp(boolean_string, "yes") == 0 ||
(gdb) b toupper_m
Breakpoint 2 at 0x10681a80: file ../lib/util/charset/codepoints.c, line 92.
(gdb) c
Continuing.

Breakpoint 2, toupper_m (val=22784) at ../lib/util/charset/codepoints.c:92
92              if (val < 128) {
(gdb) p /x val
$1 = 0x5900
(gdb) l
87      /**
88       Convert a codepoint_t to upper case.
89      **/
90      _PUBLIC_ codepoint_t toupper_m(codepoint_t val)
91      {
92              if (val < 128) {
93                      return toupper(val);
94              }
95              if (upcase_table == NULL) {
96                      load_case_tables_library();
(gdb) bt
#0  toupper_m (val=22784) at ../lib/util/charset/codepoints.c:92
#1  0x000000001068169c in toupper_ascii (c=89) at lib/util_unistr.c:616
#2  0x0000000010677c38 in strwicmp (psz1=0x10fb7d37 "Yes", psz2=0x10da6618 "yes") at lib/util_str.c:246
#3  0x0000000010638bf8 in set_boolean (boolean_string=0x10fb7d37 "Yes", boolean=0xfffffffe5f0) at ../lib/util/util.c:792
#4  0x00000000100bcca0 in lp_bool (s=0x10fb7d37 "Yes") at param/loadparm.c:6168
#5  0x00000000100c3218 in lp_do_parameter (snum=-2, pszParmName=0x10fb7d20 "usershare allow guests", pszParmValue=0x10fb7d37 "Yes")
    at param/loadparm.c:7991
#6  0x00000000100c38b0 in do_parameter (pszParmName=0x10fb7d20 "usershare allow guests", pszParmValue=0x10fb7d37 "Yes", userdata=0x0)
    at param/loadparm.c:8100
#7  0x000000001063acc4 in Parameter (InFile=0x10fb7740, pfunc=@0x10eb8948: 0x100c37c8 <do_parameter>, c=0, userdata=0x0)
    at ../lib/util/params.c:426
#8  0x000000001063ae24 in Parse (InFile=0x10fb7740, sfunc=@0x10eb89c0: 0x100c3fcc <do_section>, 
    pfunc=@0x10eb8948: 0x100c37c8 <do_parameter>, userdata=0x0) at ../lib/util/params.c:482
#9  0x000000001063b234 in pm_process (FileName=0x10fb7230 "/etc/samba/smb.conf", sfunc=@0x10eb89c0: 0x100c3fcc <do_section>, 
    pfunc=@0x10eb8948: 0x100c37c8 <do_parameter>, userdata=0x0) at ../lib/util/params.c:566
#10 0x00000000100c8a0c in lp_load_ex (pszFname=0x10fb58f0 "/etc/samba/smb.conf", global_only=true, save_defaults=false, add_ipc=false, 
    initialize_globals=true, allow_include_registry=false, allow_registry_shares=false) at param/loadparm.c:9564
#11 0x00000000100c90a4 in lp_load_initial_only (pszFname=0x10fb58f0 "/etc/samba/smb.conf") at param/loadparm.c:9664
#12 0x0000000010c2014c in main (argc=7, argv=0xffffffff328) at smbd/server.c:1050


set_boolean() failures result in misinterpretation of smb.conf:
[2011/07/19 12:59:56.775424,  0] param/loadparm.c:6169(lp_bool)
  lp_bool(Yes): value is not boolean!
[2011/07/19 12:59:56.775526,  2] param/loadparm.c:8287(do_section)
  Processing section "[homes]"
[2011/07/19 12:59:56.775759,  0] param/loadparm.c:6169(lp_bool)
  lp_bool(No): value is not boolean!
[2011/07/19 12:59:56.775835,  0] param/loadparm.c:6169(lp_bool)
  lp_bool(No): value is not boolean!
[2011/07/19 12:59:56.775909,  0] param/loadparm.c:6169(lp_bool)
  lp_bool(Yes): value is not boolean!

Comment 1 Jeremy Allison 2011-07-19 17:44:19 UTC

Looks like this got broken in the conversion of the toupper_ascii()  tolower_ascii() isupper_ascii() and islower_ascii() functions to call the xxx_m() functions when previously they called the xxxx_w() functions.

This is a blocker for rc3.

Jeremy.

Comment 2 Jeremy Allison 2011-07-19 18:12:02 UTC

Ok, I've examined this. Andrew Bartlett made a mistake (sorry to point here, but git blame is pretty clear on this) in that smb_ucs2_t != codepoint_t.

smb_ucs2_t is explicitly a little endian uint16_t type.
codepoint_t is a native endian uint32_t type.

This means that we cannot just swap islower_w to islower_m and the same for all other similar calls.

I blame myself for not catching this at the time, but character conversion is tricky.

Jeremy.

Comment 3 Jeremy Allison 2011-07-19 18:27:11 UTC

David - thanks a *lot* for catching this. 3.6.0 would be completely broken on BE systems without a fix for this.

Jeremy.

Comment 4 Karolin Seeger 2011-07-19 18:51:54 UTC

Re-assigning to Jeremy.

Comment 5 Jeremy Allison 2011-07-19 20:08:04 UTC

I have a fix for this. In test now...

Jeremy.

Comment 6 Jeremy Allison 2011-07-19 21:20:15 UTC

Created attachment 6703 [details]
git-am fix for 3.6.0rc3.

David, can you test this please on your big-endian system. Metze, please review for the rc3 release.

Jeremy.

Comment 7 Stefan Metzmacher 2011-07-20 07:02:48 UTC

Comment on attachment 6703 [details]
git-am fix for 3.6.0rc3.

The first patch says that toupper_ascii() would be identical to toupper_m(),
but that's not true, as _ascii() uses UCS2_CHAR() and UCS2_TO_CHAR().
They do bitshifting based on the big endian.
So it's not trivial to ack.

Maybe someone else (who understands this code) can review this.

Comment 8 Andrew Tridgell 2011-07-20 07:28:18 UTC

(In reply to comment #7)
> Comment on attachment 6703 [details]
> git-am fix for 3.6.0rc3.
> 
> The first patch says that toupper_ascii() would be identical to toupper_m(),
> but that's not true, as _ascii() uses UCS2_CHAR() and UCS2_TO_CHAR().
> They do bitshifting based on the big endian.
> So it's not trivial to ack.
> 
> Maybe someone else (who understands this code) can review this.

I've tested on a BE box, and the patch does fix the problem

Comment 9 Andrew Tridgell 2011-07-20 07:30:07 UTC

I think this needs to go in for 3.6
 Tridge

Comment 10 David Disseldorp 2011-07-20 09:04:36 UTC

(In reply to comment #6)
> Created attachment 6703 [details]
> git-am fix for 3.6.0rc3.
> 
> David, can you test this please on your big-endian system.

Works for me too, thanks for the fix.

Comment 11 Jeremy Allison 2011-07-20 16:20:28 UTC

Metze, the reason I can replace toupper_ascii() with toupper_m() (and all the other _ascii() functions) is that the _ascii() versions are always called with a char value (extended to int on the stack). For this case (values < 128) they should be identical. For values > 127 the int will be sign extended and toupper_m will simply return that value.

Karolin please include in 3.6.0rc3.

Thanks,

Jeremy.

Comment 12 Karolin Seeger 2011-07-21 18:15:52 UTC

Pushed to v3-6-test.
Closing out bug report.

Thanks!