[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] Arch utf8 vs UTF-8 fix and wide character support

To: Matti Eiden <snaipperi@gmail.com>
Subject: Re: [sup-devel] Arch utf8 vs UTF-8 fix and wide character support
From: Rich Lane <rlane@club.cc.cmu.edu>
Date: Fri, 07 May 2010 12:46:20 -0400
Authentication-results: mx.google.com; spf=pass (google.com: domain of sup-devel-bounces@rubyforge.org designates 205.234.109.19 as permitted sender) smtp.mail=sup-devel-bounces@rubyforge.org
Cc: sup-devel <sup-devel@rubyforge.org>
Delivered-to: eg@gaute.vetsj.com
Delivered-to: sup-devel@rubyforge.org
In-reply-to: <p2s6242182a1005061102ree7fa042jb595f0a2e7b443cc@mail.gmail.com>
List-archive: <http://rubyforge.org/pipermail/sup-devel>
List-help: <mailto:sup-devel-request@rubyforge.org?subject=help>
List-id: Sup developer discussion <sup-devel.rubyforge.org>
List-post: <mailto:sup-devel@rubyforge.org>
List-subscribe: <http://rubyforge.org/mailman/listinfo/sup-devel>, <mailto:sup-devel-request@rubyforge.org?subject=subscribe>
List-unsubscribe: <http://rubyforge.org/mailman/options/sup-devel>, <mailto:sup-devel-request@rubyforge.org?subject=unsubscribe>
References: <y2j6242182a1005061059w5e32fb54vd10ccfd7e4a1911e@mail.gmail.com> <p2s6242182a1005061102ree7fa042jb595f0a2e7b443cc@mail.gmail.com>
Reply-to: Sup developer discussion <sup-devel@rubyforge.org>
Sender: sup-devel-bounces@rubyforge.org
User-agent: Sup/git

Excerpts from Matti Eiden's message of 2010-05-06 14:02:46 -0400:
> Hey folks,
> 
> I've been experimenting with sup for the past few days, and of course,
> I love it. Firstly I had some trouble with getting unicode display
> going. This problem was already described in an old post on this
> mailing list:
> 
> http://rubyforge.org/pipermail/sup-devel/2010-March/000522.html
> 
> So Arch Linux defines encoding as utf8, but Iconv requires it to be
> UTF-8. I would say this is a bug in Arch Linux for not following
> standards, but anyway, I fixed it with the little modification to
> sup.rb:
> 
> ## determine encoding and character set
> $encoding = Locale.current.charset
> $encoding = "UTF-8" if $encoding == "utf8"

I've applied this fix, thanks.

> Then about wide character support. And I mean really wide. Like CJK
> characters. Scandics (ä,ö,å) and other European accent characters work
> nicely, as we all who are concerned probably know. These characters
> have a byte length of 2 and unicode length of 1.
> 
> However, take an example of the following two-character Korean word
> (byte length of such single character is 3 instead of 2!)
> 
> http://www.kotiposti.net/eiden/soulbound/hellovim.png (looking good in vim)
> http://www.kotiposti.net/eiden/soulbound/hellosup.png (sup lost 2
> characters (or bytes) from the line that has the Korean word)
> 
> It seems that for every Korean character with a byte length of 3, one
> byte is lost from the end of the line. In the above example, two bytes
> are missing in sup, as there are two Korean characters on the same
> line.
> 
> If the line consist of a single Korean character, nothing appears in
> sup (last byte out of three is missing?).
> If the line consist of two Korean characters, last character is
> missing (last two bytes out of six are missing?).
> etc.
> 
> Some sort of miscalculation somewhere is causing this, perhaps
> assuming that unicode characters always have a byte length of 2? Can
> anybody with Ruby skills take a look on this?

It's actually the multiple screen cells that causes problems, not
multiple bytes [1]. Sup currently thinks all characters are 1 cell wide.
The right thing is probably a C extension that uses wcswidth.

[1] http://mid.gmane.org/1264629880-sup-9232%40zyrg.net
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel

Follow-Ups:
- Re: [sup-devel] Arch utf8 vs UTF-8 fix and wide character support
  - From: Matti Eiden <snaipperi@gmail.com>

References:
- [sup-devel] Arch utf8 vs UTF-8 fix and wide character support
  - From: Matti Eiden <snaipperi@gmail.com>

Prev by Date: [sup-devel] Arch utf8 vs UTF-8 fix and wide character support
Next by Date: [sup-devel] [PATCH] Overwrite line contents before drawing the new contents (instead of filling up the rest)
Previous by thread: [sup-devel] Arch utf8 vs UTF-8 fix and wide character support
Next by thread: Re: [sup-devel] Arch utf8 vs UTF-8 fix and wide character support
Index(es):
- Date
- Thread