Build your own mail analyzer for Mac Mail.app


Build your own mail analyzer for Mac Mail.app

You’ve probably read about things like Xoopit and Xobni for analyzing both online mail and your outlook mail. As it turns out, Apple has done something great in this regard that I think has been mostly overlooked. Mail.app stores all of the meta-data for you email in a file called ~/Library/Mail/Envelope Index. You might wonder what the format of this file is… well it is a SQLite3 database. The contents are pretty easy to see, go to the terminal and type:

macpro:~ sam$ sqlite3 ~/Library/Mail/Envelope\ Index
SQLite version 3.6.3
Enter “.help” for instructions
Enter SQL statements terminated with a “;”
sqlite> 


Everything about your mailboxes is stored within this database and the structure of the database is normalized so its very easy to navigate. The tables of most interest for mail analysis are:

sqlite> .tables
addresses mailboxes todo_notes
alarms messages todos
associations properties todos_deleted_log
attachments recipients todos_server_snapshot
calendars subjects
feeds threads

Fortunately, accessing a SQLite database is quite easy from just about any language that you decide to use. I’m just going to do all the queries in straight sqlite3 rather than a language, but they could be embedded in your application. First things first, copy your Envelope Index to another directory:

macpro:tmp sam$ cp ~/Library/Mail/Envelope\ Index .

Now you can use that database without worrying about messing up the locking or corrupting data while Mail.app is using it. Since we might as well do an example that is interesting rather than merely educational, how about we answer the question: “Who are my coworkers with whom that I collaborated?”. This is going to be a multi-query process to extract the information — there may be more efficient ways to do it — but think of this as instructive rather than prescriptive. First I need to limit the query to only those mailboxes which contain work email:

DROP TABLE coworkermailboxes;
CREATE TABLE coworkermailboxes(id);
CREATE INDEX coworkermailboxes_index ON coworkermailboxes(id);
INSERT INTO coworkermailboxes SELECT rowid FROM mailboxes
WHERE
url like 'imap://samp@snv-webmail.corp.yahoo.com/%' OR
url like 'imap://sam@mail.sampullara.com/Yahoo%20Inc%20Archive';

That gives us a table with several mailboxes that I have in Mail.app including Sent Messages. I would peruse the list of mailboxes to ensure that you are grabbing all the correct information. For me I had to also search my archives. Now I am going to take a series of steps to get to the final out put by iteratively processing successive tables of information. The first table, is a list of those people that you have both sent and received an email with directly (they were the sender and you were a receiver or you were the sender and they were the receiver):

DROP TABLE coworkers;
CREATE TABLE coworkers(id);
CREATE INDEX coworkers_index ON coworkers(id);
INSERT INTO coworkers SELECT a.rowid FROM addresses a, messages m, recipients r
WHERE
m.sender = a.rowid AND
m.mailbox IN (SELECT id FROM coworkermailboxes) AND
r.message_id = m.rowid AND r.address_id = 4
INTERSECT
SELECT a.rowid FROM addresses a, messages m, recipients r
WHERE
m.sender = 4 AND
m.mailbox IN (SELECT id FROM coworkermailboxes) AND
r.message_id = m.rowid AND r.address_id = a.rowid
;

Note I have directly inserted my addresses rowid into this query for the sender on the one hand and the receiver on the other. The next step will be to count the actual number of emails you have received from each of those on the list:

DROP TABLE coworkers2;
CREATE TABLE coworkers2(id, recv);
CREATE INDEX coworkers2_index ON coworkers2(id);
SELECT "Get the received mail";
INSERT INTO coworkers2 SELECT w.id, COUNT(*) FROM messages m, recipients r, coworkers w
WHERE m.sender = 4 AND
m.mailbox IN (SELECT id FROM coworkermailboxes) AND
r.message_id = m.rowid AND r.address_id = w.id
GROUP BY w.id ORDER BY COUNT(*)
;

Finally, we count the number of sent emails and also derive a ratio of sent/received so we can judge how collaborative the exchanges have been:

DROP TABLE coworkers3;
CREATE TABLE coworkers3(id, sent float, recv float, ratio float);
CREATE INDEX coworkers3_index ON coworkers3(id);
SELECT "Get the sent mail";
INSERT INTO coworkers3 SELECT w.id, COUNT(*), w.recv, COUNT(*)*1.0/w.recv FROM messages m, recipients r, coworkers2 w
WHERE
m.sender = w.id AND
m.mailbox IN (SELECT id FROM coworkermailboxes) AND
r.message_id = m.rowid AND r.address_id = 4
GROUP BY w.id ORDER BY COUNT(*)
;

You will now have a table named coworkers3 that can be mined for information about your level of correspondence with them. For example, here is way to find relatively equal sends and receives:

SELECT a.comment FROM addresses a, coworkers3 w
WHERE
a.rowid = w.id AND
ratio >= .5 AND
ratio <=2 AND
sent > 10
ORDER BY sent
LIMIT 20;

When I do this I see the people that either I use to find information or that use me to find information. Every interaction is usually a request and then a response. On the other hand, this query will find those that typically made announcements out to the groups that I also worked with:

SELECT a.comment FROM addresses a, coworkers3 w
WHERE
a.rowid = w.id AND
ratio <= 1 AND
sent > 10
ORDER BY ratio
LIMIT 20;

And so on. Adding more filters on top of this you could easily derive your team at work for a particular time period and other insights. With the wealth of information contained in this meta-data store you could figure out all kinds of things:

  • Who sent you an email that you didn’t reply to yet?
  • Who do you respond to the most quickly?
  • Who responds to you most quickly?
  • What are you and your coworkers approximate working hours?
  • What groups of CCs could be made into aliases?

There really is no limit to how far the analysis could go. Ideally, it would be possible to setup a dashboard in Mail.app that let you cut and slice the data in a far more precise way than smart folders currently allow today. Maybe they should come out with super-sql-smart folders!