[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [sup-devel] new branch: maildir



Excerpts from Mark Alexander's message of 2010-03-25 07:24:59 -0400:
> Excerpts from Rich Lane's message of Thu Mar 25 03:12:57 -0400 2010:
> > This branch makes some drastic changes to how mbox and maildir sources
> > work.
> 
> Thanks for attacking this problem!
> 
> I just took a quick look at the diffs, and I have some concern
> about this line in maildir.rb:
> 
>   Dir[File.join(subdir, '*')].map do |fn|
> 
> I'm worried about the memory usage with some of my maildirs that have
> tens of thousands of files.  Would it be more memory-efficient to
> use Dir.open and Dir.each?  You'd have to filter out "." and "..",
> of course.

Hence the "XXX use less memory" :). I've been doing my testing on a 30k
maildir which works fine. My sup scalability target is a million
messages and memory becomes a concern there. A maildir filename is about
30 characters plus any Ruby overhead.

The primitives we have are:
Iterate through filenames in a directory in arbitrary (?) order.
Check the existence of a single file in a directory.
Iterate through filenames with a given prefix stored in the index in lexicographical order.
Any more?

Right now I took the easiest route which loads both the filesystem and
indexed filenames into arrays and diffs them. Iterating over the index
and checking the file's existence won't detect new messages. Iterating
over the filesystem and checking for existence in the index won't detect
deleted messages. A solution would be to do both, but that seems
expensive. It would be good if we could optimize for the case where most
of the maildir messages have already been indexed.
_______________________________________________
Sup-devel mailing list
Sup-devel@rubyforge.org
http://rubyforge.org/mailman/listinfo/sup-devel