[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] Query for largest msg_id?



I implemented a new version of the GMail -> Heliotrope sync script and
attach it here in hopes
someone will test it and provide some feedback/comments. This is by no
means ready for general
use so don't use for anything else than experimentation. But since
this script only opens mailboxes
in read only mode (examine) there should not be any problems with you
emails on the server.

This script is GMail specific and has these features:

  - Downloads emails from all mailboxes (except All, Trash and Spam)
automatically using
    the XLIST GMail IMAP extension and feeds them to Heliotrope via
REST interface.
  - Remembers the last email downloaded so it does not start from the beginning
    every time.
  - Synchronizes GMail labels using the X-GM-LABELS IMAP extension.
  - Synchronizes GMail flags with Heliotrope state flags.
  - Adds a new mailbox property to messages. This may allow later to implement
    Heliotrope -> GMail synchronization.

Things to check and do:
  - I am seeing some negative thread_id's in the response. Need to check if
    this is normal or a bug in Heliotrope or my script.
  - GMail inbox is with a capital "I" (e.g. Inbox) while heliotrope
uses a small "i".
    Shall I down case all labels? or make a special treatment for Inbox?
  - Refactor the script into something more modular and elegant.

To use this script I had to modify heliotrope-server.rb to allow
setting labels and states when
posting new messages (see attached patch).

regards,
Horacio

On Tue, May 17, 2011 at 12:02 AM, Horacio Sanson <hsanson@gmail.com> wrote:
> On Mon, May 16, 2011 at 12:01 AM, William Morgan
> <wmorgan-sup@masanjin.net> wrote:
>> Reformatted excerpts from Horacio Sanson's message of 2011-05-10:
>>> Is there a way to query Heliotrope what is the largest msg_id
>>> currently in the index?
>>
>> Sort of---it's a hack, but if you search for e.g. "a OR -a" you'll get
>> every message in the index, and the first result will be the message
>> with the highest id thanks to Whistlepig's search semantics.
>>
>
> Indeed I have been trying to wrap my head around the IMAP spec and
> still don't get a lot of things. For now I will just keep the max UID
> read from IMAP server somewhere on disk.
>
>>> I am trying to improve the imap-dumper.rb so
>>> it does not download all my emails every time but only the new ones.
>>
>> Sounds great. Unfortunately the Heliotrope message id and the IMAP
>> message id / message uid are completely different things, and
>> maintaining a cross-session mapping of them is impossible for generic
>> IMAP servers, because the uid of every message can change every time you
>> connect to an IMAP server---see the section on IMAP's 'uidvalidity'
>> variable. So you'll have to rescan the inbox every time and rebuild the
>> mapping. Welcome to hell.
>>
>
> When UIDVALIDITY differs I will simply re-scan the whole mailbox and
> feed it to Heliotrope. I trust Heliotrope won't add duplicates.
>
>>> Also while looking at the code I see that messages are stored in the
>>> index using the msg_id as parsed by RMail. There is no further
>>> association with the source or mailbox from where the messages were
>>> downloaded. This I think may cause collisions if we use one Heliotrope
>>> server with more than one email account. Not sure what is the
>>> probability of two messages from two different IMAP servers having the
>>> same msg_id but nothing in the standard rules out that possibility.
>>
>> This is yet another id: the Message-Id header of the email. This is only
>> needed to build up the thread structure and should otherwise be ignored.
>
> I am attaching my first small hack for GMail <-> Heliotrope
> synchronization. For now it only downloads mail from GMail and injects
> them to Heliotrope just as the imap-dumper.rb does. The difference is
> that I keep track of the last message UID and UIDVALIDITY values to
> avoid re-scanning the whole folder every time.
>
> Now I wan't to take advantage of GMail IMAP extensions (e.g.
> X-GM-LABELS, X-GM-THRID) to allow labels/threads synchronization. But
> have some doubts about how to correctly use the Heliotrope REST API.
> For example in the Heliotrope::Index the add_message method allows to
> insert a message and assign it labels, flags and extra parameters at
> the same time. How can I do this with the REST API? The only example I
> see only adds a message body.
>
>    RestClient.post "http://localhost:8042/message";, :message => body
>
> Also for what purpose are the ext array used for? Can I use it to add
> an account/mailbox property to each message so I can latter retrieve
> all messages associated to a mailbox/account pair?
>
> regards,
> Horacio
>
>> --
>> William <wmorgan-sup@masanjin.net>
>> _______________________________________________
>> Sup-devel mailing list
>> Sup-devel@rubyforge.org
>> http://rubyforge.org/mailman/listinfo/sup-devel
>>
>

Attachment: gmail.rb
Description: application/ruby

From 5504d5732ebe3c8b86b942b9b898221a520d283f Mon Sep 17 00:00:00 2001
From: Horacio Sanson <hsanson@gmail.com>
Date: Tue, 17 May 2011 23:34:14 +0900
Subject: [PATCH] Implement post /message.json.

Now we can  add messages to heliotrope with labels, state and mailbox
information in a single POST request. The request body has to be JSON
with a format like:

{
  body: <rfc822 raw message string>,
  labels: ["inbox", "school", "home"],
  state: ["seen", "old"],
  mailbox: "inbox"
}
---
 bin/heliotrope-server |   30 +++++++++++++++++++++++++++---
 1 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/bin/heliotrope-server b/bin/heliotrope-server
index d71989d..93b0490 100644
--- a/bin/heliotrope-server
+++ b/bin/heliotrope-server
@@ -302,10 +302,34 @@ class HeliotropeServer < Sinatra::Base
   end
 
   post "/message.json" do
-    body = params["body"] or raise RequestError, "need a 'body' param"
-    puts body
+    content_type :json
+    begin
+      rawbody = params["body"] or raise RequestError, "need a 'body' param"
+      rawbody.force_encoding "binary" if rawbody.respond_to?(:force_encoding) # sigh...
 
-    {:status => "ok"}.to_json
+      message = Heliotrope::Message.new(rawbody).parse!
+
+      mbox = params["mailbox"] || "inbox"
+      doc_id = nil
+      thread_id = nil
+      state = nil
+      labels = [mbox]
+      if @index.contains_msgid? message.msgid
+        messageinfo = get_message_summary message.msgid
+        doc_id =  messageinfo[:message_id]
+        thread_id = messageinfo[:thread_id]
+        { :response => :ok, :status => :seen, :doc_id => doc_id, :thread_id => thread_id }
+      else
+        # Set state
+        state params["state"] || ["unseen"]
+        labels.push(params["labels"]).flatten!.uniq! if params["labels"]
+        loc = @store.add rawbody
+        doc_id, thread_id = @index.add_message message, state, labels, { :loc => loc, :mbox => mbox }
+        { :response => :ok, :status => state, :doc_id => doc_id, :thread_id => thread_id, :labels => labels }
+      end
+    rescue Heliotrope::InvalidMessageError => e
+      { :response => :error, :error_message => e.message }
+    end.to_json
   end
 
 private
-- 
1.7.4.1

_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel