[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] Heliotrope improving but still found some issues



Speaked too fast...  got a few more issues.

First the patch I sent was incomplete, please ignore and use the one
attached here if you would. The previous patch completely removed the
header section of the web page.

Second the GMail labels work great as long as they are in english.
This is because the labels are UTF-7 encoded and look like this in
japanese: "+&mlkwrtdrmkiwwzdxmlgw4zdrmpm-" and clicking on any of
these labels in the web interface result in systax error. Fixing this
is as simple as replacing line 207 of imap-dumper.rb with the
following code:

labels = (data.attr["X-GM-LABELS"] || []).map { |label|
Net::IMAP.decode_utf7(label.to_s).downcase }

I am pretty sure the utf7 decoding is language independent and can be
applied safely to all labels in any language but I cannot bet on it
since I only have tested Japanese. Not sure if this conversion would
require a separate hook or something like that.



regards,
Horacio

On Tue, Jul 5, 2011 at 10:52 AM, Horacio Sanson <hsanson@gmail.com> wrote:
>
> So I tried the latest heliotrope with the leveldb-ruby 0.6 gem, whistlepig 0.7
> and MeCab hooks for Japanese text support and it works better than before.
> Unfortunately got two issues:
>
> First any attempt to search using japanese text fails with the dreaded
> incompatible character encodings error:
>
> #####################################################
> [2011-07-05 10:22:17] INFO  WEBrick 1.3.1
> [2011-07-05 10:22:17] INFO  ruby 1.9.2 (2010-08-18) [x86_64-linux]
> [2011-07-05 10:22:17] INFO  WEBrick::HTTPServer#start: pid=13523 port=8042
> search(body:"手紙", 0, 20) took 2.1ms
> Encoding::CompatibilityError - incompatible character encodings: ASCII-8BIT
> and UTF-8:
>  bin/heliotrope-server:223:in `block in <class:HeliotropeServer>'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `call'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `block in
> compile!'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in
> `instance_eval'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in
> `route_eval'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:708:in `block (2
> levels) in route!'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:758:in `block in
> process_route'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in `catch'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in
> `process_route'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:707:in `block in
> route!'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `each'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `route!'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:843:in `dispatch!'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `block in
> call!'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in
> `instance_eval'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `block in
> invoke'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `catch'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `invoke'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `call!'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:629:in `call'
>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/head.rb:9:in `call'
>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/showexceptions.rb:21:in
> `call'
>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:48:in `_call'
>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:36:in `call'
>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/showexceptions.rb:24:in `call'
>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/commonlogger.rb:18:in `call'
>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/content_length.rb:13:in `call'
>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/handler/webrick.rb:52:in
> `service'
>  /usr/lib/ruby/1.9.1/webrick/httpserver.rb:111:in `service'
>  /usr/lib/ruby/1.9.1/webrick/httpserver.rb:70:in `run'
>  /usr/lib/ruby/1.9.1/webrick/server.rb:183:in `block in start_thread'
> 127.0.0.1 - - [05/Jul/2011 10:22:20] "GET /search?q=%E6%89%8B%E7%B4%99
> HTTP/1.1" 500 89118 0.0331
> localhost - - [05/Jul/2011:10:22:20 JST] "GET /search?q=%E6%89%8B%E7%B4%99
> HTTP/1.0" 500 89118
> - -> /search?q=%E6%89%8B%E7%B4%99
> [2011-07-05 10:22:20] ERROR Errno::ECONNRESET: Connection reset by peer
>        /usr/lib/ruby/1.9.1/webrick/httpserver.rb:56:in `eof?'
>        /usr/lib/ruby/1.9.1/webrick/httpserver.rb:56:in `run'
> #######################################################
>
> The problem seems to be the header method in the heliotrope-server that uses
> multiline strings (e.g. <<- EOS). By forcing the resulting text to UTF-8
> encoding the search works as expected with japanese and non japanese text (see
> attached patch).
>
>
> The second problem is actually not heliotrope problem. Is the artificial
> limitations imposed by Gmail. After running heliotrope-add for some time it
> would fail when the IMAP fetch returns nil. Just after it failed I tried to
> use my current email reader (kmail) and got an interesting error saying:
> "exceeded IMAP bandwidth limits". These indicates the nil is due to Gmail
> limiting the maximum bandwidth I can consume downloading emails.
>
> The latest heliotrope now catches this error and ignores it but after a while
> ignoring it I started getting sys-write errors on the socket. I believe this
> is also GMail abruptly breaking the socket connection to enforce it's
> bandwidth limits.
>
> Maybe limiting the rate of gmail-dumper so it reads mails at a lower pace
> would eliminate these problems or simply stop reading emails for some time
> when we get the first nil response.
>
> Overall heliotrope is now usable for Japanese language users (at least for me
> ). Now I will start playing with turnsole to see if it can handle japanese.
>
> --
> regards,
> Horacio Sanson
>
From 90ef5390daf4fa7a05d62c3a61a1eee9b7e8061a Mon Sep 17 00:00:00 2001
From: Horacio Sanson <hsanson@gmail.com>
Date: Tue, 5 Jul 2011 10:31:33 +0900
Subject: [PATCH] Fix encoding exception.

---
 bin/heliotrope-server |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/bin/heliotrope-server b/bin/heliotrope-server
index 15dc897..26ae4c5 100644
--- a/bin/heliotrope-server
+++ b/bin/heliotrope-server
@@ -530,7 +530,7 @@ private
 
   def header title, query=""
     title = escape_html title
-    <<-EOS
+    title = <<-EOS
 <!DOCTYPE html><html><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><title>Heliotrope: #{title}</title>
 <meta name="application-name" content="heliotrope">
 <style type="text/css">
@@ -557,6 +557,9 @@ td {
   <input type="submit" value="go"/>
   </form></div>
     EOS
+
+    title.force_encoding(Encoding::UTF_8) if title.respond_to?(:force_encoding) # sigh...
+    title
   end
 
   def footer
-- 
1.7.4.1

_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel