hi everyone i have made a program which extracts information from an email account, it extracts the date, the sender, number of copies written in the body, and it extracts the pdf file from the email and puts it into a directory, it also counts the number of pages in each pdf.

i have strings setup so that in the python out put its outputs the date, teacher, copies, and pages. since there are more than one email, the uotput returns more than one oupt put, for example if there are two emails it will output two dates, two teachers, two number of copies, and two page number.

how do i inpt this information in a table using xlwt. so that each i can create a spreadsheet which has all this information in it.

i want the spreadsheet to look like this

Job Date Teacher Copies Pages
1 Feb. 5, 2012 Osman Mak 5 1
2 Feb. 10, 2012 Jack deve 3 6

currently my program outputs all this information i just need to put it on a spreadsheet, i am fairly new to python so if i can ge any help it would be greatly appreciated.

thanks!!


hers my the code i currently have:

import email, getpass, imaplib, os, string, re
from itertools import takewhile
from operator import methodcaller
import xlwt 

detach_dir = '/Users/defaultuser/Desktop' # directory where to save attachments (default: current)

m = imaplib.IMAP4_SSL('imap.gmail.com')
m.login('******@gmail.com', '*******')
m.list()
# Out: list of "folders" aka labels in gmail.
m.select("inbox") # connect to inbox.


resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id

for emailid in items:
    resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
    email_body = data[0][1] # getting the mail content
    mail = email.message_from_string(email_body) # parsing the mail content to get a mail object
    
    


    #Check if any attachments at all
    if mail.get_content_maintype() != 'multipart':
        continue


    teacher = mail["From"]
    subject = mail["Subject"]
    d = mail["date"]
    date = d[0:16]
    






    for part in mail.walk():
        
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue
        
        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = teacher + subject + ".pdf"
        counter = 1

        

        # if there is no filename, we create one with a counter to avoid duplicates
        if not filename:
            filename = 'part-%03d%s' % (counter, 'bin')
            counter += 1

        att_path = os.path.join(detach_dir, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()


            
    
    for part in mail.walk():
  # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue
 
  # we are interested only in the simple text messages
        if part.get_content_subtype() != 'plain':
            continue
 
        payload = part.get_payload()
        
        x = payload
        all=string.maketrans('','')
        nodigs=all.translate(all, string.digits)
        copies =  x.translate(all, nodigs)
        print date
        print teacher
        print subject
        print "Number of Copies:" + copies

        

        
    # we use walk to create a generator so we can iterate on the parts and forget about the recursive headach
    



d = r'/Users/defaultuser/Desktop'
totpages = 0
for f in (pf for pf in os.listdir(d) if pf.endswith('.pdf')):
    fn = os.path.join(d,f)
    with open(fn, 'rb') as pdf:
        text = pdf.read()
        pages = int(''.join(takewhile(methodcaller('isdigit'), text[text.rfind('/Count ')+7:].lstrip())))
   
    print('File %s: %i pages' % (f,pages))



book = xlwt.Workbook(encoding="utf-8") 

sheet1 = book.add_sheet("Python Sheet 1") 


sheet1.write(0, 0, "Job")
sheet1.write(0, 1, "Date")
sheet1.write(0, 2, "Teacher")
sheet1.write(0, 3, "Copies")
sheet1.write(0, 4, "Pages")




book.save("python_spreadsheet.xls")

Recommended Answers

All 4 Replies

Dear ozzyx123,

I think you are on the right track. Just open your xls book before you start reading your email ( i.e before line 18. and add your headers ( line 119 to line 123 ). Initialize a counter before line 74 to 1 ( i.e. you have written a line to your worksheet. ) and use it to write data to new rows by updating the counter. I think you can put the page reading loop ( line 102 to 110 ) inside the for loop that begins on line 74. This is what I riggeed up ( plz donot copy this code blindly as it may contain errors ) :

# All you code upto line 16
....
book = xlwt.Workbook(encoding="utf-8") 
sheet1 = book.add_sheet("Python Sheet 1") 
sheet1.write(0, 0, "Job")
sheet1.write(0, 1, "Date")
sheet1.write(0, 2, "Teacher")
sheet1.write(0, 3, "Copies")
sheet1.write(0, 4, "Pages")

for emailid in items:
    ....
    ....
    ....

    row = 1
    for part in mail.walk():   # This is line 74 of your code
       ....
       ....
       ....
       ....

       # You may place the page counter code here, so that you can get the pages as well.

       # This is where print datestatements come, instead you can put
       sheet.write( row, 0, str( row ) )    # 'Job'
       sheet.write( row, 1, date )          # 'Date'
       sheet.write( row, 2, teacher )       # 'Teacher'
       sheet.write( row, 3, copies )        # 'Copies'
       sheet.write( row, 4, pages )         # 'Pages'

       row += 1

book.save( 'python_spreadsheet.xls' )

There may be a few errors in the code above, but it should give you a general idea.

so basically i did what was suggested and it works great when its not in the for loop but it only inserts the first output of the python program. but when i put in into the for loop so that inputs all the output into the spreadsheet it gives me an error.
plus i put the page counter loop into the for loop how can i edit the page counter loop so that it doesnt repeat itself twice.


this is my output and heres the error

File jhon makhdoom <ozzyx123@yahoo.com>Osman Test 1.pdf: 1 pages
File jhon makhdoom <ozzyx123@yahoo.com>osman test 2.pdf: 6 pages
Thu, 9 Feb 2012
jhon makhdoom <ozzyx123@yahoo.com>
Osman Test 1
Number of Copies:1
File jhon makhdoom <ozzyx123@yahoo.com>Osman Test 1.pdf: 1 pages
File jhon makhdoom <ozzyx123@yahoo.com>osman test 2.pdf: 6 pages
Fri, 10 Feb 2012
jhon makhdoom <ozzyx123@yahoo.com>
osman test 2
Number of Copies:2

Traceback (most recent call last):
File "/Users/defaultuser/Downloads/nono.py", line 124, in <module>
sheet1.write( row, 0, str( row ) ) # 'Job'
File "/Library/Python/2.7/site-packages/xlwt/Worksheet.py", line 1032, in write
self.row(r).write(c, label, style)
File "/Library/Python/2.7/site-packages/xlwt/Row.py", line 233, in write
StrCell(self.__idx, col, style_index, self.__parent_wb.add_str(label))
File "/Library/Python/2.7/site-packages/xlwt/Row.py", line 152, in insert_cell
raise Exception(msg)
Exception: Attempt to overwrite cell: sheetname=u'Python Sheet 1' rowx=1 colx=0

heres my code rite now

import email, getpass, imaplib, os, string, re
from itertools import takewhile
from operator import methodcaller
import xlwt 

detach_dir = '/Users/defaultuser/Desktop' # directory where to save attachments (default: current)



m = imaplib.IMAP4_SSL('imap.gmail.com')
m.login('*****@gmail.com', '*******')
m.list()
# Out: list of "folders" aka labels in gmail.
m.select("inbox") # connect to inbox.


resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id

book = xlwt.Workbook(encoding="utf-8") 
sheet1 = book.add_sheet("Python Sheet 1") 
sheet1.write(0, 0, "Job")
sheet1.write(0, 1, "Date")
sheet1.write(0, 2, "Teacher")
sheet1.write(0, 3, "Copies")
sheet1.write(0, 4, "Pages")

for emailid in items:
    resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
    email_body = data[0][1] # getting the mail content
    mail = email.message_from_string(email_body) # parsing the mail content to get a mail object
    
    


    #Check if any attachments at all
    if mail.get_content_maintype() != 'multipart':
        continue


    teacher = mail["From"]
    subject = mail["Subject"]
    d = mail["date"]
    date = d[0:16]
    






    for part in mail.walk():
        
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue
        
        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = teacher + subject + ".pdf"
        counter = 1

        

        # if there is no filename, we create one with a counter to avoid duplicates
        if not filename:
            filename = 'part-%03d%s' % (counter, 'bin')
            counter += 1

        att_path = os.path.join(detach_dir, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()


            
    row = 1
    d = r'/Users/defaultuser/Desktop'
    for part in mail.walk():
  # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue
 
  # we are interested only in the simple text messages
        if part.get_content_subtype() != 'plain':
            continue
 
        payload = part.get_payload()
        
        x = payload
        all=string.maketrans('','')
        nodigs=all.translate(all, string.digits)
        copies =  x.translate(all, nodigs)

        
        for f in (pf for pf in os.listdir(d) if pf.endswith('.pdf')):
            fn = os.path.join(d,f)
            with open(fn, 'rb') as pdf:
                text = pdf.read()
                pages = int(''.join(takewhile(methodcaller('isdigit'), text[text.rfind('/Count ')+7:].lstrip())))
            print('File %s: %i pages' % (f,pages))


        
        print date
        print teacher
        print subject
        print "Number of Copies:" + copies

        

        
    # we use walk to create a generator so we can iterate on the parts and forget about the recursive headach
    



        sheet1.write( row, 0, str( row ) )    # 'Job'
        sheet1.write( row, 1, date )          # 'Date'
        sheet1.write( row, 2, teacher )       # 'Teacher'
        sheet1.write( row, 3, copies )        # 'Copies'
        sheet1.write( row, 4, pages )         # 'Pages'

        row += 1
 
book.save( 'python_spreadsheet.xls' )

Dear ozzyx123,

I think you are on the right track. Just open your xls book before you start reading your email ( i.e before line 18. and add your headers ( line 119 to line 123 ). Initialize a counter before line 74 to 1 ( i.e. you have written a line to your worksheet. ) and use it to write data to new rows by updating the counter. I think you can put the page reading loop ( line 102 to 110 ) inside the for loop that begins on line 74. This is what I riggeed up ( plz donot copy this code blindly as it may contain errors ) :

# All you code upto line 16
....
book = xlwt.Workbook(encoding="utf-8") 
sheet1 = book.add_sheet("Python Sheet 1") 
sheet1.write(0, 0, "Job")
sheet1.write(0, 1, "Date")
sheet1.write(0, 2, "Teacher")
sheet1.write(0, 3, "Copies")
sheet1.write(0, 4, "Pages")

for emailid in items:
    ....
    ....
    ....

    row = 1
    for part in mail.walk():   # This is line 74 of your code
       ....
       ....
       ....
       ....

       # You may place the page counter code here, so that you can get the pages as well.

       # This is where print datestatements come, instead you can put
       sheet.write( row, 0, str( row ) )    # 'Job'
       sheet.write( row, 1, date )          # 'Date'
       sheet.write( row, 2, teacher )       # 'Teacher'
       sheet.write( row, 3, copies )        # 'Copies'
       sheet.write( row, 4, pages )         # 'Pages'

       row += 1

book.save( 'python_spreadsheet.xls' )

There may be a few errors in the code above, but it should give you a general idea.

Oh sorry... That was my mistake. You need to define row = 1 before the first for loop, i.e before line 28. Also I did notice that an improperly formatted pdf file or just any file with an extension as .pdf will give raise to an error on line number 106. Probably try-expect loop will help you deal with the error.

Traceback (most recent call last):
  File "trial.py", line 106, in <module>
    pages = int(''.join(takewhile(methodcaller('isdigit'), text[text.rfind('/Count ')+7:].lstrip())))
ValueError: invalid literal for int() with base 10: ''

Oh sorry... That was my mistake. You need to define row = 1 before the first for loop, i.e before line 28. Also I did notice that an improperly formatted pdf file or just any file with an extension as .pdf will give raise to an error on line number 106. Probably try-expect loop will help you deal with the error.

Traceback (most recent call last):
  File "trial.py", line 106, in <module>
    pages = int(''.join(takewhile(methodcaller('isdigit'), text[text.rfind('/Count ')+7:].lstrip())))
ValueError: invalid literal for int() with base 10: ''

how can i get rid of the loop in this pdf page counter, since i already have it wrapped in another loop theres no reason to have it in a loop again becuase it just repeats its self twice, heres the code and how can i remove the loop within it, and have the loop before it do the same thing to it?

for f in (pf for pf in os.listdir(d) if pf.endswith('.pdf')):
            fn = os.path.join(d,f)
            with open(fn, 'rb') as pdf:
                text = pdf.read()
                pages = int(''.join(takewhile(methodcaller('isdigit'), text[text.rfind('/Count ')+7:].lstrip())))
   
            print('File %s: %i pages' % (f,pages))
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.