[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] Arch utf8 vs UTF-8 fix and wide character support



Excerpts from Matti Eiden's message of 2010-05-06 14:02:46 -0400:
> Hey folks,
> 
> I've been experimenting with sup for the past few days, and of course,
> I love it. Firstly I had some trouble with getting unicode display
> going. This problem was already described in an old post on this
> mailing list:
> 
> http://rubyforge.org/pipermail/sup-devel/2010-March/000522.html
> 
> So Arch Linux defines encoding as utf8, but Iconv requires it to be
> UTF-8. I would say this is a bug in Arch Linux for not following
> standards, but anyway, I fixed it with the little modification to
> sup.rb:
> 
> ## determine encoding and character set
> $encoding = Locale.current.charset
> $encoding = "UTF-8" if $encoding == "utf8"

I've applied this fix, thanks.

> Then about wide character support. And I mean really wide. Like CJK
> characters. Scandics (ä,ö,å) and other European accent characters work
> nicely, as we all who are concerned probably know. These characters
> have a byte length of 2 and unicode length of 1.
> 
> However, take an example of the following two-character Korean word
> (byte length of such single character is 3 instead of 2!)
> 
> http://www.kotiposti.net/eiden/soulbound/hellovim.png (looking good in vim)
> http://www.kotiposti.net/eiden/soulbound/hellosup.png (sup lost 2
> characters (or bytes) from the line that has the Korean word)
> 
> It seems that for every Korean character with a byte length of 3, one
> byte is lost from the end of the line. In the above example, two bytes
> are missing in sup, as there are two Korean characters on the same
> line.
> 
> If the line consist of a single Korean character, nothing appears in
> sup (last byte out of three is missing?).
> If the line consist of two Korean characters, last character is
> missing (last two bytes out of six are missing?).
> etc.
> 
> Some sort of miscalculation somewhere is causing this, perhaps
> assuming that unicode characters always have a byte length of 2? Can
> anybody with Ruby skills take a look on this?

It's actually the multiple screen cells that causes problems, not
multiple bytes [1]. Sup currently thinks all characters are 1 cell wide.
The right thing is probably a C extension that uses wcswidth.

[1] http://mid.gmane.org/1264629880-sup-9232%40zyrg.net
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel