[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] email threading - tree vs. graph



On Sun, Feb 21, 2010 at 08:38:35AM +0200, Tero Tilus wrote:
> > For example, here's a slice from the recent be-devel list as a graph:
> >   ...............
> >   | *-|-\     | | Mon Jan 25  W. Trevor King  Re: Project releases
> >   | * | |     | | Sat Jan 23  Gianluca Montecchi  Re: Project releases
> >   | | | | t   | | Sat Jan 23  Gianluca Montecchi  Re: Project releases
> >   | *-|-|-|-\ | | Fri Jan 22  Ben Finney  Re: Project releases
> >   | | | * | | | | Fri Jan 22  W. Trevor King  Re: Project releases
> >   | | | *-/ | | | Thu Jan 21  Ben Finney  Re: Project releases
> >   | * | |   | | | Thu Jan 21  Gianluca Montecchi  Re: Project releases
> >   *-|-|-/   | | | Thu Jan 21  W. Trevor King  Re: Project releases
> >   ...............
> >          ^--- inheritence graph.
> > You can see that Ben's Fri message and my Mon message both have two
> > parents.
> 
> Honestly.  That took fair bit of staring and prolly wouldn't have
> opened to me without your explanation.  :) But that might only be due
> to the external noise in the graph.  Without those extra through-lines
> it looks a bit more readable.

It's a snip from the full list sorted by date, so there are a bunch of
through-lines when threads went out of fasion for a while ;).
If you sorted by oldest parent date first, you would probably have
fewer.

> I can't really see how this would lump significantly more mail into
> one thread.  Do you have examples of otherwise disconnected trees
> connected only by multi-parent mail?

  *-\-\-\-\-\-\-\-\ Fri May 16  Chris Ball  how to use the git backend?
  r | | | | | | | | Fri May 16  Christian Garbs  how to use the git backend?
  r-/ | | | | | | | Wed May 14  Jelmer Vernooij  [MERGE] Two tiny fixes
  r---/ | | | | | | Mon Apr 21  Ben Finney  [MERGE] Manual page for 'be' command.
  *-----/ | | | | | Mon Apr 21  Ben Finney  [MERGE] Makefile: add skeleton 'build' ..
  r       | | | | | Mon Apr 21  Ben Finney  [MERGE] Makefile: Add with 'clean' targ..
  *-------/ | | | | Fri Apr 18  Ben Finney  [MERGE] Add bug 00f (now fixed)
  r         | | | | Fri Apr 18  Ben Finney  [MERGE] Updated GPLv2 text and FSF addr..
  r---------/ | | | Fri Apr 18  Ben Finney  [MERGE] Bugs-Everywhere-Web/libbe: Fix ..
  *-----------/ | | Mon Apr 14  j@oil21.org  [PATCH] Bugs-Everywhere-Web identity
  r             | | Mon Apr 14  j@oil21.org  [PATCH] Bugs-Everywhere-Web identity
  r-------------/ | Mon Apr 14  j@oil21.org  [MERGE] update about
  r---------------/ Mon Apr 14  j@oil21.org  [MERGE] Bugs-Everywhere-Web works with .

Where Chris' mail was "...  I also merged patches from...".  Another
case that comes up is that a user will post with a problem that's been
discussed before, and we reply and link to the previous discussion.
These are both software mailing list specific though.  I don't have
examples from other settings.

> > With your thread-centric approach, you'll want to break threads when
> > the topic mutates too far from the original, and that could be
> > difficult for meshy-graphs.
> 
> Topic-mutation happens within a tree as well.  And pruning (boy I've
> wanted to do that quite a few times, I notice) afaik essentially
> requires scanning (and potentially modifying) all the messages in the
> thread.  Technically I don't see how this is very different.
> "Politically" it however could be.  If your mail graph is one big lump
> of spaghetti, it might be difficult to decide where to cut it off.  ;)

My infant be.mailing-list branch (link below) is an attempt to address
this by leaving the spaghetti alone, and attaching entry-point tags
wherever you feel the subject makes a significant shift.  Really, you
*want* the spaghetti, since its human-generated cross-linking that
reduces duplication-of-search effort.  Assuming you trust the human in
question ;).

> > On an implementation level, I've got the above graph browser going
> > in python/curses, so it should be easy to port to ruby/curses.
> 
> Have a pointer to code?

My code is currently stuffed into an in-transition BE project, but it
should be easy to separate.  Grab the whole repo with Bazaar:
  bzr branch http://www.physics.drexel.edu/~wking/code/bzr/be.mailing-list
Graphing module is libbe/util/graph.py.  My very minimal browser is
misc/mailbox-tools/mailgraph.py.  Set up the BE version file with
  cd be.mailing-list
  make libbe/_version.py
and run the browser with
  misc/mailbox-tools/mailgraph.py *.mbox
Press 'h' for help.

The graph module is pretty clean, but the others are not ;).

> I would love to see sup being able to do something usefull with
> multiple parent messages.

If you want to use this in the wild, you'll need to figure out how to
integrate multiple-parent In-Reply-To\s with your current JWZ
threading which only uses References.  From RFC 2822, section 3.6.4:

   Note: Some implementations parse the "References:" field to display
   the "thread of the discussion".  These implementations assume that
   each new message is a reply to a single parent and hence that they
   can walk backwards through the "References:" field to find the parent
   of each message listed there.  Therefore, trying to form a
   "References:" field for a reply that has multiple parents is
   discouraged and how to do so is not defined in this document.

On major benefit of JWZ-threading is that it's self-healing.  If some
users don't thread their replies, you can thread them locally and
reply, fixing the threading for others recieving your reply.  The
In-Reply-To alternative is to reply to both the broken message and the
original thread:
  *-\ fixing response
  | r broken message
  * original thread
which is needlessly uglier than
  * fixing response
  * broken message
  * original thread
if it is clear from the content of the broken message that it really
was a reply.

I think it would be best to leave view-graph and
browse-to-other-parent as peripheral options, to be used on curated
archives where you can trust In-Reply-To to be RFC 2822 compliant
(e.g. the eventual be.mailing-list ;).

-- 
This email may be signed or encrypted with GPG (http://www.gnupg.org).
The GPG signature (if present) will be attached as 'signature.asc'.
For more information, see http://en.wikipedia.org/wiki/Pretty_Good_Privacy

My public key is at http://www.physics.drexel.edu/~wking/pubkey.txt

Attachment: pgpPOTA7lEJIL.pgp
Description: PGP signature

_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel