Need help? Check out our Support site, then


Unicode characters unsupported (left- and right- pointing angle brackets)

  1. afieldworklinguist
    Member

    I am experiencing problems with two unicode characters in my fully hosted wordpress blog. These characters are:

    1. left-pointing angle bracket
    2. right-pointing angle bracket

    They do not show up properly on my browsers (Safari 7.0.1 and Google Chrome on Mac OS X Mavericks). Even if I enter them with HTML code, on saving, they are changed to mathematical angle brackets (with an horrible kerning). I do not understand if this is font or browser related.

    Is there any solution?

    The blog I need help with is afieldworklinguist.wordpress.com.

  2. I have no problem with displaying them in several ways:

    Source:

    test Greater than and less than < >
    Angle brackets  〈〉〉 〉 〈 〈

    Page: http://sandboxandtests.wordpress.com/2014/01/15/angle-bracket-test/

  3. Whoops I should have escaped the source:

    test Greater than and less than < >
    Angle brackets  〈〉〉 〉 〈 〈

  4. test Greater than and less than < >
    Angle brackets  〈〉〉 〉 〈 〈

  5. How strange the escaped source is not showing. I'll put "amp" fro ampersand

    test Greater than and less than (amp)lt; (amp)gt;
    Angle brackets  〈〉(amp)#12297; (amp)#x3009; (amp)#12296; (amp)#x3008;

  6. afieldworklinguist
    Member

    Thanks for your reply. I'm afraid I wasn't clear enough.

    I do see some brackets in both of my browser, but they are not the correct ones. According to unicode, the code points U+3009 e U+3008 are associated to "left angle bracket" e "right angle bracket". Not to "left-pointing angle bracket" U+2329 e "right-pointing angle bracket" U+232A. So those characters is not what I'm looking for. When I type the right HTML code for the pointing brackets, then I get the normal ones.

    Since I write about linguistics and graphemics, the conventions use the left- and right- pointing angle brackets and not the left and right ones. This is also a visual problem: left and right angle brackets have a very bad kerning in almost all typefaces, with a huge amount of blank space before or after them. This is indeed not desirable.

  7. afieldworklinguist
    Member

    I try here to let you see the difference:

    This is a phoneme /s/ and this is a grapheme ⟨s⟩, do not mix them.

    This is a phoneme /s/ and this is a grapheme 〉s〈, do not mix them.

    This is a phoneme /s/ and this is a grapheme ⟨s⟩, do not mix them.

    This is a phoneme /s/ and this is a grapheme 〉s〈, do not mix them.

  8. afieldworklinguist
    Member

    Sorry, done a mess. Here again.

    This is a phoneme /s/ and this is a grapheme ⟨s⟩, do not mix them.

    This is a phoneme /s/ and this is a grapheme 〈s〉, do not mix them.

  9. These work too,
    I have added

    Left and right pointing angle bracket 〈 (amp)#9001; (amp)#x2329; 〉 (amp)#9002; (amp)#x232a;

    to the source

  10. BTW (amp)lang; and (amp)rang; point to MATHEMATICAL RIGHT ANGLE BRACKET and MATHEMATICAL LEFT ANGLE BRACKET in webkit browsers. This won't display in many fonts

  11. afieldworklinguist
    Member

    No way. Even if I use (amp)#x2329; in the HTML editor, switching to visual editor changes it to U+3008 (same for right bracket)...

    It seems wordpress is converting the unicode string in the wrong manner (maybe because the font of the editor lack those chars)

    (actually, I have the same problem in a self-hosted wordpress website...)

  12. I am not getting that. If I put (amp)#x2329; into the text editor and switch to the visual editor I see 〈, which remains the same if I switch back to the text mode. Copying this character and pasing into http://www.ltg.ed.ac.uk/~richard/utf-8.cgi?input=%E2%8C%A9&mode=char I see that the character is still x2329:

    Character 〈
    Character name LEFT-POINTING ANGLE BRACKET
    Hex code point 2329
    Decimal code point 9001
    Hex UTF-8 bytes E2 8C A9
    Octal UTF-8 bytes 342 214 251
    UTF-8 bytes as Latin-1 characters bytes â <8C> ©

  13. I'm going to flag this for staff attention. I don't know why the editor should be working differently for you than for me (unless it's theme dependent).

    BTW I am using Google Chrome Version 32.0.1700.76 m

  14. afieldworklinguist
    Member

    Thank you.

    Yes, I tried copying the char from your post now and it is 2329 (even if it has a lot of white space which is not in, for example, Arial Unicode MS or Times New Roman).

  15. BTW it might be helpful if you gave some screenshots of your editor and the angle bracket test page.

    Here is what I see on the test page: http://sandboxandtests.files.wordpress.com/2014/01/test.png

  16. afieldworklinguist
    Member

    They are ok in HTML and visual editors but when I try the preview they are change to 3008 and 3009. (I didn't try actual posting but I suppose would be the same. or not?)

    https://dl.dropboxusercontent.com/u/58887255/Schermata%202014-01-15%20alle%2015.53.41.png

    https://dl.dropboxusercontent.com/u/58887255/Schermata%202014-01-15%20alle%2015.51.15.png

  17. Howdy,

    I'm looking into this closer. If you're seeing this in self-hosted WordPress too, it may be an issue with the common editor. I'll check into that.

    While doing that, if you use (amp)lang; does it render correctly on preview for you? Unless I'm mistaken lang is the same as 9001.

    Thanks!

  18. afieldworklinguist
    Member

    1. I type (amp)lang; in HTML editor
    2. I click on Preview
    3. I get U+27E8 (mathematical left angle bracket) which is not left-point angle bracket

    :)

    Actually, my theme (Motif) uses Georgia and Georgia does not have pointing brackets anyway...

    This changing behaviour is not new to me. Writing articles on (Italian) Wikipedia, I noticed the same thing. We created a template to ease insertion of the right unicode code and now they shows up correctly (before the template, inserting the character itself in the editor with a character palette caused the changing thing once the article was published).

    The "problem" with WordPress is that, as you know, the HTML editor switches automatically the HTML entity into characters string, but here's the failure. I suppose that the font used in HTML does't have the pointing angle brackets, just the normal and the mathematical ones, and it switches from the pointing to the others. (I don't know actually which font this is).

  19. We're checking deeper into the conversion that happens between the Text and Visual editors.

    If you add the html entities in the Text view and save from there without revisiting the Visual browser (e.g. add (amp)lang;, then publish/save/preview), the actual html entity will be in the source and, from there, it would be a browser/font decision on how it is renders.

    Thanks!

  20. I have investigated and I do get a different interpretation if I switch into visual before publishing.

    This test page where i entered (amp)lang; and (amp)rang; in the text pane then previewed and published (later adding the output from http://www.ltg.ed.ac.uk/~richard/utf-8.cgi to the post) gives

    Character ⟨
    Character name MATHEMATICAL LEFT ANGLE BRACKET
    Hex code point 27E8
    Decimal code point 10216

    and

    Character ⟩
    Character name MATHEMATICAL RIGHT ANGLE BRACKET
    Hex code point 27E9
    Decimal code point 10217

    If I enter (amp)lang; and (amp)rang; in the text editor then switch to the visual editor before publishing I get:

    Character 〈
    Character name LEFT-POINTING ANGLE BRACKET
    Hex code point 2329
    Decimal code point 9001

    and

    Character 〉
    Character name RIGHT-POINTING ANGLE BRACKET
    Hex code point 232A
    Decimal code point 9002

    It appears that the publish/preview without emtering the visual editor the source still contaings the (amp)lang; and (amp)rang; entities, which are interpreted by the browser. Unfortunately the interpretation is html version specific:

    In HTML5 these map to:
    U+27E8 MATHEMATICAL LEFT ANGLE BRACKET,
    U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET
    (see spec)

    In HTML 4.01 these map to:
    U+2329 LEFT-POINTING ANGLE BRACKET,
    U+232A RIGHT-POINTING ANGLE BRACKET

    In HTML 4.0 these map to
    U+3008 LEFT ANGLE BRACKET,
    U+3009 RIGHT ANGLE BRACKET,
    (I am not 100% certain that the 4.0 and 4.01 are the right way round here)

    In other words I think my browser is interpreting the symbols as html5, fieldworklinguist's as HTM4 and the visual editor as HTML 4.01.

  21. afieldworklinguist
    Member

    Thank you, now it is clear. I was guessing something like that, actually.

    I've also tried (amp)#9001; and (amp)#x2329; with the same result as rang; and lang;

  22. With the Visual browser, the html entities added in Text view (e.g. (amp)lang;) are converted to UTF-8 characters. The browser does this through a bit of Javascript.

    We tested a few different scenarios and it appears this is browser-dependent. Chrome/Safari incorrectly converted those particular entities while Firefox retained the correct ones. It doesn't appear there's a way for us to force the correct mapping at this time.

    In the mean time, we suggest publishing straight from the Text editor or try it with Firefox.

    Does that match what y'all are seeing too?

    Thanks!

Topic Closed

This topic has been closed to new replies.

About this Topic