[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] Cannot query Japanese characters



I managed to stop the crash when searching for Japanese text by
forcing UTF-8 encoding in que query parameter (see patch).

But seems that Whistelpig cannot speak Japanese. I tried the following
small test and as you
can see I get no results:


> require 'rubygems' => true
> require 'whistlepig' => true
> include Whistlepig => Object
> index = Index.new "index" => #<Whistlepig::Index:0x00000002093f60>
> entry1 = Entry.new => #<Whistlepig::Entry:0x0000000207d328>
> entry1.add_string "body", "研究会" => #<Whistlepig::Entry:0x0000000207d328>
> docid1 = index.add_entry entry1 => 1
> q1 = Query.new "body", "研究" => body:"研究"
> results1 = index.search q1 => []

I will now dig in Whistelpig source code to see if I can fix this but
any pointer/directions or tips
were to start looking would be greatly appreciated.


On Mon, May 2, 2011 at 12:46 AM, Horacio Sanson <hsanson@gmail.com> wrote:
> I also tried with ruby 1.8 and heliotrope does not crash but searching
> any Japanese word returns no matches even for search terms I now have
> matches.
>
> And by the way the installation instructions should mention that for
> ruby 1.8 we also need to install the json gem or heliotrope won't
> start.
>
> regards,
> Horacio
>
> On Mon, May 2, 2011 at 12:35 AM, Horacio Sanson <hsanson@gmail.com> wrote:
>> Installed whistelpig 0.6 but now I get a different error that looks
>> similar to the turnsole problem. Below the backtrace:
>>
>> http://localhost:8042/search?q=primo -> /search?q=%7Einbox&start=0&num=20
>> 127.0.0.1 - - [02/May/2011 00:31:58] "GET /favicon.ico HTTP/1.1" 404 447 0.0008
>> localhost - - [02/May/2011:00:31:58 JST] "GET /favicon.ico HTTP/1.1" 404 447
>> - -> /favicon.ico
>> search(body:"会", 0, 20) took 0.0ms
>> Encoding::CompatibilityError - incompatible character encodings: UTF-8
>> and ASCII-8BIT:
>>  bin/heliotrope-server:154:in `block in <class:HeliotropeServer>'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in `call'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:1152:in
>> `block in compile!'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in
>> `instance_eval'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:724:in `route_eval'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:708:in
>> `block (2 levels) in route!'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:758:in
>> `block in process_route'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in `catch'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:755:in
>> `process_route'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:707:in
>> `block in route!'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `each'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:706:in `route!'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:843:in `dispatch!'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in
>> `block in call!'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in
>> `instance_eval'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in
>> `block in invoke'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `catch'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:808:in `invoke'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:644:in `call!'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/base.rb:629:in `call'
>>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/head.rb:9:in `call'
>>  /var/lib/gems/1.9.1/gems/sinatra-1.2.5/lib/sinatra/showexceptions.rb:21:in
>> `call'
>>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:48:in `_call'
>>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/lint.rb:36:in `call'
>>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/showexceptions.rb:24:in `call'
>>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/commonlogger.rb:18:in `call'
>>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/content_length.rb:13:in `call'
>>  /var/lib/gems/1.9.1/gems/rack-1.2.2/lib/rack/handler/webrick.rb:52:in `service'
>>  /usr/lib/ruby/1.9.1/webrick/httpserver.rb:111:in `service'
>>  /usr/lib/ruby/1.9.1/webrick/httpserver.rb:70:in `run'
>>  /usr/lib/ruby/1.9.1/webrick/server.rb:183:in `block in start_thread'
>> 127.0.0.1 - - [02/May/2011 00:32:09] "GET /search?q=%E4%BC%9A
>> HTTP/1.1" 500 89861 0.0228
>> localhost - - [02/May/2011:00:32:09 JST] "GET /search?q=%E4%BC%9A
>> HTTP/1.1" 500 89861
>> http://localhost:8042/search?q=%7Einbox&start=0&num=20 -> /search?q=%E4%BC%9A
>> 127.0.0.1 - - [02/May/2011 00:32:09] "GET /favicon.ico HTTP/1.1" 404 447 0.0009
>> localhost - - [02/May/2011:00:32:09 JST] "GET /favicon.ico HTTP/1.1" 404 447
>> - -> /favicon.ico
>>
>> regards,
>> Horacio
>>
>> On Fri, Apr 29, 2011 at 1:52 PM, William Morgan
>> <wmorgan-sup@masanjin.net> wrote:
>>> Reformatted excerpts from William Morgan's message of 2011-04-26:
>>>> Thanks for the bug report on this one too. It's great to have someone
>>>> testing this stuff with non-ASCII code. This is a known bug in
>>>> Whistlepig and I should be releasing a fix soon.
>>>
>>> This is fixed in Whistlepig 0.6. Heliotrope should now be fine with
>>> utf-8 input. I'm still working on this issue in turnsole.
>>>
>>> Let me know if you have any more issues!
>>> --
>>> William <wmorgan-sup@masanjin.net>
>>> _______________________________________________
>>> Sup-devel mailing list
>>> Sup-devel@rubyforge.org
>>> http://rubyforge.org/mailman/listinfo/sup-devel
>>>
>>
>
From 0881630c8b410b6f78df578bf686afacbb78ec64 Mon Sep 17 00:00:00 2001
From: Horacio Sanson <hsanson@gmail.com>
Date: Tue, 3 May 2011 23:18:22 +0900
Subject: [PATCH] Fix crash for non ASCII chars.

---
 bin/heliotrope-server |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/bin/heliotrope-server b/bin/heliotrope-server
index 4793ac2..ed9c3be 100644
--- a/bin/heliotrope-server
+++ b/bin/heliotrope-server
@@ -151,7 +151,7 @@ class HeliotropeServer < Sinatra::Base
       nav += "</div>"
 
       header("Search: #{query.original_query_s}", query.original_query_s) +
-        "<div>Parsed query: #{escape_html query.parsed_query_s}</div>" +
+        "<div>Parsed query: #{escape_html query.parsed_query_s.force_encoding('UTF-8')}</div>" +
         "<div>Search took #{sprintf '%.2f', info[:elapsed]}s and #{info[:continued] ? 'was' : 'was NOT'} continued</div>" +
         "#{nav}<table>" +
         results.map { |r| threadinfo_to_html r }.join +
-- 
1.7.4.1

_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel