[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] [PATCH] switch default index to Xapian



Reformatted excerpts from Rich Lane's message of 2010-01-01:
> Previous versions didn't add an :index entry in config.yaml, so
> preserve compatibility by using Ferret if no index is specified and
> the ferret directory exists.

I have done something a little more extensive in the branch
ferret-deprecation, merged into next, so I'm going to drop this patch,
unless you think I missed something.

The current behavior is:
1. If a Xapian index exists, use Xapian
2. Otherwise, if a Ferret index exists, use Ferret
3. Otherwise (new index), use Xapian.

The choice is overrideable by the environment variable (which I'd like
to remove at some point), the config option, and a commandline flag
--index added to most things in bin/.

> Names are stemmed and otherwise munged for convenient searching by
> Xapian::TermGenerator, while email addresses are stored verbatim.
> Xapian::QueryParser needs to do the same alterations to search terms, so the
> parser uses separate from_{name,email} fields. This is not user-friendly but
> could be worked around by having parse_query insert an OR over both fields
> where it sees a from: prefix (same for to).

I'm fine with this solution. At some point (not necessarily for 0.10)
I'd also like to add more email address munging so that the address
bob@foo.com is matched by bob, foo, foo.com, and bob@foo.com, so maybe
this is an analogous case.

> A more pernicious issue is that QueryParser defaults to AND if there
> isn't an explicit operator (which is what we want), but if there are
> multiple boolean (label/email) terms over the same field it will OR
> them. So, "label:sup label:patch" will result in the union instead of
> the intersection. Assuming we don't want to write our own query
> parser, this needs to be made configurable in Xapian. I took a stab at
> it a few months ago but didn't get anywhere.

Ok. Unfortunate, but not a dealbreaker by any means, especially if it's
restricted to emails and labels.

> There's also the issue of long delays when flushing the index to disk
> on exit.  One option is to keep the delay and log an info message
> saying what's going on.  A second option is to set the
> XAPIAN_FLUSH_THRESHOLD environment variable to something low in
> bin/sup, which will limit the final delay but potentially cause short
> delays during normal use. A third option is to detect when the user
> has been idle for a while and flush the index then.

This is something I definitely would like to see fixed before 0.10, but
I would be happy with the silly but trivial option #1. (I suspect #2/#3
will require some back-and-forth to get just right.)

> We can easily fix the first and third issues before 0.10. Are there
> any others I've forgotten?

There was something with the counts in label-list-mode at some point,
but the whole issue has been swapped out of my head.

Ultimately getting us out of the world of Ferret is worth almost any
amount of pain, so, who cares, and, as always, thank you.
-- 
William <wmorgan-sup@masanjin.net>
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel