[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [sup-devel] Heliotrope improving but still found some issues
Hello,
On Sunday 10 July 2011 07:22:15 William Morgan wrote:
> Hi Horacio,
>
> Reformatted excerpts from Horacio Sanson's message of 2011-07-05:
> > First any attempt to search using japanese text fails with the dreaded
>
> > incompatible character encodings error:
> I'm having trouble reproducing this, or even understanding why your fix
> would help, since all string literals in the code should be UTF-8-encoded.
>
> Could you please apply this patch and tell me what the output is when
> you feed it a crashing search term? Thanks!
>
> --- cut here ---
> diff --git a/bin/heliotrope-server b/bin/heliotrope-server
> index c9754d4..ca764c0 100644
> --- a/bin/heliotrope-server
> +++ b/bin/heliotrope-server
> @@ -219,6 +219,19 @@ class HeliotropeServer < Sinatra::Base
> end
> nav += "</div>"
>
> + puts "start"
> + p query.original_query_s.encoding
> + p query.parsed_query_s.encoding
> + p header("Search: #{query.original_query_s}",
> query.original_query_s).enc + p "<div>Parsed query: #{escape_html
> query.parsed_query_s}</div>".encoding + p "<div>Search took #{sprintf
> '%.2f', info[:elapsed]}s and #{info[:contin + p
> "#{nav}<table>".encoding
> + p results.size
> + p results.map { |r| threadinfo_to_html r }.join.encoding
> + p "</table>#{nav}".encoding
> + p footer.encoding
> + puts "end"
> +
> header("Search: #{query.original_query_s}", query.original_query_s)
> + "<div>Parsed query: #{escape_html query.parsed_query_s}</div>" +
> "<div>Search took #{sprintf '%.2f', info[:elapsed]}s and #{info[:contin
> --- cut here ---
Seems the problem is not heliotrope. The problem are my hooks that use MeCab
to split Japanese words.
If I run a search for japanese using my query hook this is the output:
search(body:"飲み会", 0, 20) took 0.1ms
start
#<Encoding:ASCII-8BIT>
#<Encoding:UTF-8>
#<Encoding:ASCII-8BIT>
#<Encoding:UTF-8>
"<div>Search took 0.00s and was NOT continued</div>"
#<Encoding:UTF-8>
0
#<Encoding:ASCII-8BIT>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
end
If I put a force_encoding at the end of the hook I get:
start
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
"<div>Search took 0.00s and was NOT continued</div>"
#<Encoding:UTF-8>
20
#<Encoding:UTF-8>
#<Encoding:UTF-8>
#<Encoding:UTF-8>
end
I need to re-index my emails with the new UTF-8 hooks and test the search
again.
--
regards,
Horacio Sanson
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel