[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] sup 0.13




On 30. april 2013 11:44, Horacio Sanson wrote:
> Great to see Sup getting back on track again..
> 
> I submitted some patches for the Gmail dumper of Heliotrope some time ago
> but the lack of non alphabet languages (Japanese, Chinese) made it
> impossible for me to keep using heliotrope/turnesole.
> 
> The main issue to support Japanese/Chinese with heliotrope was that
> whistlepig (indexer) lacked the ability to tokenize these languages. Also
> the half baked UTF-8 support caused several issues with these languages.
> 
> I would like to help in testing/implementing support for these languages,
> starting with Japanese, but I would require some guidance. First I would
> like to know is there is a way to configure the Xapian tokenizer
> (segmenter) within sup? Please consider that I am new to both sup and to
> Xapian.

Hi Horacio,

consider opening an issue at
https://github.com/sup-heliotrope/sup/issues to make sure this doesn't
disappear. Some changes will probably be made to the indexer when going
to Mail (from RMail), but I hope to be able to migrate the existing
index. Perhaps its time to get it right for arbitrary languages as well.
I am unfamiliar with Japanes/Chinese - does UTF-8 cover the needs?

Mail is better at handling UTF-8 and I think there was some fork that
had some extra support for Japanese.

Regards, Gaute