fie on Blogger. this post isn't visible from the front page or the Dashboard, even though it exists. so I'm reposting it.
UPDATE 2006-02-25: it has been closed as a duplicate.
C99 escapes in Obj-C string contain UTF-8 interpreted as an 8-bit encoding. as last time, edited only for HTMLification.
since C99, C has a
\u escape for Unicode characters. for example, a snowman is
when used in an Obj-C string literal (
@"foo"), however, these escapes are broken.
Steps to Reproduce:
- write a program that displays a string, created from an Obj-C string literal containing a Unicode escape.
- compile it.
- run it.
- observe the display of the string.
the Unicode character is displayed as such.
the Unicode character's UTF-8 representation is displayed in some 8-bit encoding (possibly ISO 8859-1).
the bug only occurs in Obj-C string literals, not plain C string literals.
it appears that the compiler uses UTF-8 for internal storage, which works. NSConstantString, however, seems to expect ISO 8859-1, and interpret its input as such.
I found the bug when displaying an Obj-C constant string (which had been passed through NSLocalizedString, but there is no matching localisation for it yet) as an NSMenuItem's title. so the problem is not specific to terminals or Terminal's display, nor dependent on the value of any locale environment variables.
I also attached two test cases, in a file named c99-unicode-escapes-test.tbz. of course, GeoCities doesn't let me upload tarballs, so you'll have to settle for the zip file.