[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[sup-devel] Heliotrope improving but still found some issues



So I tried the latest heliotrope with the leveldb-ruby 0.6 gem, whistlepig 0.7 
and MeCab hooks for Japanese text support and it works better than before. 
Unfortunately got two issues:

First any attempt to search using japanese text fails with the dreaded 
incompatible character encodings error:

#####################################################
[2011-07-05 10:22:17] INFO  WEBrick 1.3.1
[2011-07-05 10:22:17] INFO  ruby 1.9.2 (2010-08-18) [x86_64-linux]
[2011-07-05 10:22:17] INFO  WEBrick::HTTPServer#start: pid=13523 port=8042
search(body:"手紙", 0, 20) took 2.1ms
Encoding::CompatibilityError - incompatible character encodings: ASCII-8BIT 
and UTF-8:
 bin/heliotrope-server:223:in `block in <class:HeliotropeServer>'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `call'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `block in 
compile!'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in 
`instance_eval'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in 
`route_eval'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:708:in `block (2 
levels) in route!'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:758:in `block in 
process_route'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in `catch'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in 
`process_route'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:707:in `block in 
route!'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `each'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `route!'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:843:in `dispatch!'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `block in 
call!'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in 
`instance_eval'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `block in 
invoke'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `catch'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `invoke'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `call!'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:629:in `call'
 /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/head.rb:9:in `call'
 /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/showexceptions.rb:21:in 
`call'
 /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:48:in `_call'
 /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:36:in `call'
 /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/showexceptions.rb:24:in `call'
 /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/commonlogger.rb:18:in `call'
 /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/content_length.rb:13:in `call'
 /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/handler/webrick.rb:52:in 
`service'
 /usr/lib/ruby/1.9.1/webrick/httpserver.rb:111:in `service'
 /usr/lib/ruby/1.9.1/webrick/httpserver.rb:70:in `run'
 /usr/lib/ruby/1.9.1/webrick/server.rb:183:in `block in start_thread'
127.0.0.1 - - [05/Jul/2011 10:22:20] "GET /search?q=%E6%89%8B%E7%B4%99 
HTTP/1.1" 500 89118 0.0331
localhost - - [05/Jul/2011:10:22:20 JST] "GET /search?q=%E6%89%8B%E7%B4%99 
HTTP/1.0" 500 89118
- -> /search?q=%E6%89%8B%E7%B4%99
[2011-07-05 10:22:20] ERROR Errno::ECONNRESET: Connection reset by peer
        /usr/lib/ruby/1.9.1/webrick/httpserver.rb:56:in `eof?'
        /usr/lib/ruby/1.9.1/webrick/httpserver.rb:56:in `run'
#######################################################

The problem seems to be the header method in the heliotrope-server that uses 
multiline strings (e.g. <<- EOS). By forcing the resulting text to UTF-8 
encoding the search works as expected with japanese and non japanese text (see 
attached patch).


The second problem is actually not heliotrope problem. Is the artificial 
limitations imposed by Gmail. After running heliotrope-add for some time it 
would fail when the IMAP fetch returns nil. Just after it failed I tried to 
use my current email reader (kmail) and got an interesting error saying: 
"exceeded IMAP bandwidth limits". These indicates the nil is due to Gmail 
limiting the maximum bandwidth I can consume downloading emails.

The latest heliotrope now catches this error and ignores it but after a while 
ignoring it I started getting sys-write errors on the socket. I believe this 
is also GMail abruptly breaking the socket connection to enforce it's 
bandwidth limits. 

Maybe limiting the rate of gmail-dumper so it reads mails at a lower pace 
would eliminate these problems or simply stop reading emails for some time 
when we get the first nil response.

Overall heliotrope is now usable for Japanese language users (at least for me 
). Now I will start playing with turnsole to see if it can handle japanese.

-- 
regards,                                                                                                                                                                                                       
Horacio Sanson
From a056837d1ebe5054106e65ac7155b4e8e422a382 Mon Sep 17 00:00:00 2001
From: Horacio Sanson <hsanson@gmail.com>
Date: Tue, 5 Jul 2011 10:31:33 +0900
Subject: [PATCH] Fix encoding exception.

---
 bin/heliotrope-server |    3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/bin/heliotrope-server b/bin/heliotrope-server
index 15dc897..def9909 100644
--- a/bin/heliotrope-server
+++ b/bin/heliotrope-server
@@ -557,6 +557,9 @@ td {
   <input type="submit" value="go"/>
   </form></div>
     EOS
+
+    title.force_encoding(Encoding::UTF_8) if title.respond_to?(:force_encoding) # sigh...
+    title
   end
 
   def footer
-- 
1.7.4.1

_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel