[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[sup-devel] Heliotrope improving but still found some issues
So I tried the latest heliotrope with the leveldb-ruby 0.6 gem, whistlepig 0.7
and MeCab hooks for Japanese text support and it works better than before.
Unfortunately got two issues:
First any attempt to search using japanese text fails with the dreaded
incompatible character encodings error:
#####################################################
[2011-07-05 10:22:17] INFO WEBrick 1.3.1
[2011-07-05 10:22:17] INFO ruby 1.9.2 (2010-08-18) [x86_64-linux]
[2011-07-05 10:22:17] INFO WEBrick::HTTPServer#start: pid=13523 port=8042
search(body:"手紙", 0, 20) took 2.1ms
Encoding::CompatibilityError - incompatible character encodings: ASCII-8BIT
and UTF-8:
bin/heliotrope-server:223:in `block in <class:HeliotropeServer>'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `call'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `block in
compile!'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in
`instance_eval'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in
`route_eval'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:708:in `block (2
levels) in route!'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:758:in `block in
process_route'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in `catch'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in
`process_route'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:707:in `block in
route!'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `each'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `route!'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:843:in `dispatch!'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `block in
call!'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in
`instance_eval'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `block in
invoke'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `catch'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `invoke'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `call!'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:629:in `call'
/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/head.rb:9:in `call'
/var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/showexceptions.rb:21:in
`call'
/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:48:in `_call'
/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:36:in `call'
/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/showexceptions.rb:24:in `call'
/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/commonlogger.rb:18:in `call'
/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/content_length.rb:13:in `call'
/var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/handler/webrick.rb:52:in
`service'
/usr/lib/ruby/1.9.1/webrick/httpserver.rb:111:in `service'
/usr/lib/ruby/1.9.1/webrick/httpserver.rb:70:in `run'
/usr/lib/ruby/1.9.1/webrick/server.rb:183:in `block in start_thread'
127.0.0.1 - - [05/Jul/2011 10:22:20] "GET /search?q=%E6%89%8B%E7%B4%99
HTTP/1.1" 500 89118 0.0331
localhost - - [05/Jul/2011:10:22:20 JST] "GET /search?q=%E6%89%8B%E7%B4%99
HTTP/1.0" 500 89118
- -> /search?q=%E6%89%8B%E7%B4%99
[2011-07-05 10:22:20] ERROR Errno::ECONNRESET: Connection reset by peer
/usr/lib/ruby/1.9.1/webrick/httpserver.rb:56:in `eof?'
/usr/lib/ruby/1.9.1/webrick/httpserver.rb:56:in `run'
#######################################################
The problem seems to be the header method in the heliotrope-server that uses
multiline strings (e.g. <<- EOS). By forcing the resulting text to UTF-8
encoding the search works as expected with japanese and non japanese text (see
attached patch).
The second problem is actually not heliotrope problem. Is the artificial
limitations imposed by Gmail. After running heliotrope-add for some time it
would fail when the IMAP fetch returns nil. Just after it failed I tried to
use my current email reader (kmail) and got an interesting error saying:
"exceeded IMAP bandwidth limits". These indicates the nil is due to Gmail
limiting the maximum bandwidth I can consume downloading emails.
The latest heliotrope now catches this error and ignores it but after a while
ignoring it I started getting sys-write errors on the socket. I believe this
is also GMail abruptly breaking the socket connection to enforce it's
bandwidth limits.
Maybe limiting the rate of gmail-dumper so it reads mails at a lower pace
would eliminate these problems or simply stop reading emails for some time
when we get the first nil response.
Overall heliotrope is now usable for Japanese language users (at least for me
). Now I will start playing with turnsole to see if it can handle japanese.
--
regards,
Horacio Sanson
From a056837d1ebe5054106e65ac7155b4e8e422a382 Mon Sep 17 00:00:00 2001
From: Horacio Sanson <hsanson@gmail.com>
Date: Tue, 5 Jul 2011 10:31:33 +0900
Subject: [PATCH] Fix encoding exception.
---
bin/heliotrope-server | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)
diff --git a/bin/heliotrope-server b/bin/heliotrope-server
index 15dc897..def9909 100644
--- a/bin/heliotrope-server
+++ b/bin/heliotrope-server
@@ -557,6 +557,9 @@ td {
<input type="submit" value="go"/>
</form></div>
EOS
+
+ title.force_encoding(Encoding::UTF_8) if title.respond_to?(:force_encoding) # sigh...
+ title
end
def footer
--
1.7.4.1
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel