hi,
I try to write a app to automatically re-organize my mails (thunderbird).
The first thing I try is to re-create my folders tree, once for each year.
So i need to read mbox files and rewrite them

import mailbox

mbx=mailbox.mbox("./in_mbox")
mbx.lock()
of=open("out_mbox", "w")
for k, m in mbx.iteritems():
    of.write(m.as_string())
mbx.unlock()
of.close()

My problem is that during this process, I lose the "From " line between original file mails

in_mbox :

From - Mon Jun 16 08:54:05 2008
X-Account-Key: account2
X-UIDL: 919-1206101190
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Return-path: <adress@prv.com>
Received: from ...

out_mbox :

X-Account-Key: account2
X-UIDL: 919-1206101190
X-Mozilla-Status: 0001
X-Mozilla-Status2: 00000000
Return-path: <adresse@prv.fr>
Received: from ...

"From " line is missing.

I try to get it with everything i could in email.Message object (get_all, get_unixfrom...) but i couldn't find the solution.

Does anyone know what i have missed ?
Thanks.

Recommended Answers

All 9 Replies

As it is generated by thunderbird, I have no way to change its format in my situation.
Except if i don't retrieve the "From " line and generate one myself (which should be ok but not very clean).

Also mbox is not a real format:
http://homepage.ntlworld.com./jonathan.deboynepollard/FGA/mail-mbox-formats.html

Thanks for your links...
This one I knew but not the previous one.
In fact, with my little piece of code, i can read a thunderbird file and navigate through the mails. So, the most important is done.
My only problem is that i can't retrieve datas that are stored in the "From " line : the mailbox.mbox class recognise it but doesn't retrieve the datas.

As far as i can see, the mailbox.mbox and email.Message modules won't give me the answer so i'll look if i can overload the mbox class to deal with thunderbird's format or, if it's too complicated for me, i'll create a brand new "From " line that will have the same format (but not exactly the same datas).

Any help is welcome.
Thanks for having taken time to read my posts and help.

If you got something usable, you might publish it in the code snippets.
Google cannot answer this question...

By exploring the mailbox module code, i finally found what i was looking for.
It is the "get_from" function, which is also mentionned in the module documentation...
So, it works allright now.

I don't want to convert my mbox files because what i want to do is :
- automatically reorganize my mails (processed with thunderbird) by creating one folder per year
- automatically detach attached files and replace them with a html file to keep the link to reduce the mailbox size.

So, at the end, i want to have my mailbox with exactly the same format as before.

Thanks

commented: good! post your mbox parsing code when it's finished! +4

So, the working code would be :

import mailbox

mbx=mailbox.mbox("./in_mbox")
mbx.lock()
of=open("out_mbox", "w")
for k, m in mbx.iteritems():
    of.write("From %s\n" % m.get_from())
    of.write(m.as_string())
mbx.unlock()
of.close()
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.