[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] Heliotrope improving but still found some issues



Hello,

On Sunday 10 July 2011 07:22:15 William Morgan wrote:
> Hi Horacio,
> 
> Reformatted excerpts from Horacio Sanson's message of 2011-07-05:
> > First any attempt to search using japanese text fails with the dreaded
> 
> > incompatible character encodings error:
> I'm having trouble reproducing this, or even understanding why your fix
> would help, since all string literals in the code should be UTF-8-encoded.
> 
> Could you please apply this patch and tell me what the output is when
> you feed it a crashing search term? Thanks!
> 
> --- cut here ---
> diff --git a/bin/heliotrope-server b/bin/heliotrope-server
> index c9754d4..ca764c0 100644
> --- a/bin/heliotrope-server
> +++ b/bin/heliotrope-server
> @@ -219,6 +219,19 @@ class HeliotropeServer < Sinatra::Base
>        end
>        nav += "</div>"
> 
> +      puts "start"
> +      p query.original_query_s.encoding
> +      p query.parsed_query_s.encoding
> +      p header("Search: #{query.original_query_s}",
> query.original_query_s).enc +      p "<div>Parsed query: #{escape_html
> query.parsed_query_s}</div>".encoding +      p "<div>Search took #{sprintf
> '%.2f', info[:elapsed]}s and #{info[:contin +      p
> "#{nav}<table>".encoding
> +      p results.size
> +      p results.map { |r| threadinfo_to_html r }.join.encoding
> +      p "</table>#{nav}".encoding
> +      p footer.encoding
> +      puts "end"
> +
>        header("Search: #{query.original_query_s}", query.original_query_s)
> + "<div>Parsed query: #{escape_html query.parsed_query_s}</div>" +
> "<div>Search took #{sprintf '%.2f', info[:elapsed]}s and #{info[:contin
> --- cut here ---

Seems the problem is not heliotrope. The problem are my hooks that use MeCab 
to split Japanese words.

If I run a search for japanese using my query hook this is the output:

search(body:"飲み会", 0, 20) took 0.1ms
start
#<Encoding:ASCII-8BIT>
#<Encoding:UTF-8>
#<Encoding:ASCII-8BIT>
#<Encoding:UTF-8>
"<div>Search took 0.00s and was NOT continued</div>"
#<Encoding:UTF-8>
0
#<Encoding:ASCII-8BIT>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
end


If I put a force_encoding at the end of the hook I get:

start
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
"<div>Search took 0.00s and was NOT continued</div>"
#<Encoding:UTF-8>
20
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
end

I need to re-index my emails with the new UTF-8 hooks and test the search 
again.


-- 
regards,                                                                                                                                                                                                       
Horacio Sanson
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel