[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] Experimental Gmail Source



Thanks for checking the source and sorry for the late response... I can only look into this on rare free weekends.


On Wed, May 22, 2013 at 6:47 AM, Matthieu Rakotojaona <matthieu.rakotojaona@gmail.com> wrote:
Hey Horacio,

I took a stab at your gmail_source branch, and made a few
fixes/improvements [0]:

- Add configuration option in sup-add
- Dump the LevelDB path in the sources.yaml
- Add a load_from_yaml method for a source to initialize its working
  values (for instance, the @db cannot be serialized, it needs to be
  reconstructed)
- Fixed the msg_att monkey-patch for imap.rb


Great, I will add these changes to my branch....
 
All in all, the gmail source seems to work. I tested it on my usual
gmail account, I haven't tried to download it all, but I did download a
few dozens of emails without a problem. I'd like to warn users about
LevelDB though: it's sad to say, but as other wmorgan's stuff, it looks
abandoned. There are at least 2 bugs you will encounter if you try it: a
pb in configuration (fixed in [1]) and you need the `snappy` gem to make
it work if your db is more than 4MB large [2]. There are some up-to-date
forks, though.
 
 
I see LevelDB is used mostly for storing messages and mailboxes
uid{validity/last}, but if we are to use gmail (it's the only IMAP
provider that makes sense for sup), I believe we would stick to the All
Mail label, right ? So, no need for storing this in db, rather in the
sources.yaml file. Also, if leveldb-ruby is unreliable (I did encounter
some issues way back about something with glibc...), and we want to use
it for caching messages, I think we can salvage heliotrope's zmbox [3]
because it's so simple to use yet far better than simple mbox.

Using zmbox, mbox, maildir or any other mail storage (mix?) means I need to keep track of three indexes to allow two way sync between the Gmail source and the Sup index. I would need the Sup index id, the store id (e.g. zmbox file index) and the Gmail X-GM-MSGID. That complicates things a lot.

Using key/value stores like LevelDB allows me to directly store the messages and associate them directly with the Gmail X-GM-MSGID. Also LevelDB comes with high compression for text data, perfect for emails, and high performance [1]. The issues you mention seem to be on the ruby library rather than LevelDB itself and they are fixable. If there are no bigger issues (e.g. data corruption/loss) I will stick with LevelDB.

Regarding your ids questions, if you want to access the sup's messages
from the gmail source, you could use the mail's Message-ID header and
apply the same logic as in Message.sanitize_message_id. Caution,
however: I've already encountered the case where multiple messages in
GMail (i.e multiple X-GM-MSGID) have the same Message-ID, so they would
be considered the same in sup/heliotrope... yeah, that's annoying as
hell, and I don't know how we can solve this in the case of multiple
sources.


Thanks, this comment put me on track and I found a way to get the emails from the index using the message id provided by the source. All I need to do is call Message.build_from_source(source, info) where info is the message id provided by the source. In my case this would be the X-GM-MSGID string.

If you want to sync-back, maybe sup can call a source-level "sync_back"
method with the current known state ? Speaking of which, for general
synchronization we could reuse the elegant offlineimap's sync algorithm
[4]. The idea is basic: have each source class store a snapshot of the
state.  When a message is modified on the source, diff the change with
the known status and propagate to sup; when a message is modified in
sup, diff with the known status and propagate to the source.


Interesting and simple algorithm. Let me study it a little more and see how it is applicable to Sup. 

[1]  http://leveldb.googlecode.com/svn/trunk/doc/benchmark.html

regards,
Horacio

Just a brain dump.

[0] https://github.com/rakoo/sup/tree/gmail_source
[1] https://github.com/wmorgan/leveldb-ruby/pull/27
[2] https://github.com/wmorgan/leveldb-ruby/issues/23
[3] https://github.com/sup-heliotrope/heliotrope/blob/64d4b50d5649ec616a311a4cf6955137fdaeb13d/lib/heliotrope/zmbox.rb
[4] http://offlineimap.org/howitworks.html

Regards,

--
Matthieu Rakotojaona

_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel


_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel