hi everyone i am currently building a program which logs into an email, extracts information like the date, sender, subject, number of copies, and takes any attachments and puts them into a directory, than takes the number of pages of the pdf. i have already built a program which does all of this(below is the program).

now i am stuck i need to be able to put this information in to a spreadsheet for acounting in the following format.

job Date teacher copies pages


with the job number justing counting by 1's
the date which is already in the generated by the output
the teacher which is generated by the output
the number of copies which is generated by the output and
the number of pages which is also generated by the out put.

i am farly new to python programming, and really dont know how to create an spredsheet with this information. i tried doing a import xlwt but i wasnt able to install the module because it kept giving error when i tried to install the setup.py on my mac. is there any other script i can use to create the spreadsheet im looking for.

another question i have is since i am working with many different files and emails, and that my strings appear as many different names depending on the number of emails and pdfs i get, how can i input this information on a spreadsheet on different cells?

if anyone could help it would be greatly appreciated, just try to play around with my code and try to see if you could add script to it so that it can accomodate for the spreadsheet the way i want it to, thank you very much!!

import email, getpass, imaplib, os, string, re
from itertools import takewhile
from operator import methodcaller

detach_dir = '/Users/defaultuser/Desktop' # directory where to save attachments (default: current)

m = imaplib.IMAP4_SSL('imap.gmail.com')
m.login('********@*****.com', '*******')
m.list()
# Out: list of "folders" aka labels in gmail.
m.select("inbox") # connect to inbox.


resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id

for emailid in items:
    resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
    email_body = data[0][1] # getting the mail content
    mail = email.message_from_string(email_body) # parsing the mail content to get a mail object
    
    


    #Check if any attachments at all
    if mail.get_content_maintype() != 'multipart':
        continue


    teacher = mail["From"]
    subject = mail["Subject"]
    d = mail["date"]
    date = d[0:16]

You should be able to install xlwt with the pip installer. For this type

sudo pip install xlwt

in a terminal (with a working internet connection).

If you don't have pip, then you must install pip first. For this type

sudo easy_install pip

If you don't have easy_install, then you must install setuptools first. (I'm assuming you're using python 2, for python 3 you can get easy_install by installing distribute)

Edited 4 Years Ago by Gribouillis: n/a

Comments
thank you so much

You should be able to install xlwt with the pip installer. For this type

sudo pip install xlwt

in a terminal (with a working internet connection).

If you don't have pip, then you must install pip first. For this type

sudo easy_install pip

If you don't have easy_install, then you must install setuptools first. (I'm assuming you're using python 2, for python 3 you can get easy_install by installing distribute)

ok thank you so much i finally got xlwt to work and now i am able to create a spreadsheet. i am trying to input the information i have into the spreadsheet and i am having some problems. Basically i have some strings in my code (date, teacher, copies, pages) which have multiple returns for each of the strings. i need to plug each of the strings into a separate cell.

i want the spreadsheet to to have

job Date teacher copies pages

as the headings and under each of them list the what my program returns for each string in separate cells.

this is the code i currently have

import email, getpass, imaplib, os, string, re
from itertools import takewhile
from operator import methodcaller
import xlwt 

detach_dir = '/Users/defaultuser/Desktop' # directory where to save attachments (default: current)

m = imaplib.IMAP4_SSL('imap.gmail.com')
m.login('******@gmail.com', '*******')
m.list()
# Out: list of "folders" aka labels in gmail.
m.select("inbox") # connect to inbox.


resp, items = m.search(None, "ALL") # you could filter using the IMAP rules here (check http://www.example-code.com/csharp/imap-search-critera.asp)
items = items[0].split() # getting the mails id

for emailid in items:
    resp, data = m.fetch(emailid, "(RFC822)") # fetching the mail, "`(RFC822)`" means "get the whole stuff", but you can ask for headers only, etc
    email_body = data[0][1] # getting the mail content
    mail = email.message_from_string(email_body) # parsing the mail content to get a mail object
    
    


    #Check if any attachments at all
    if mail.get_content_maintype() != 'multipart':
        continue


    teacher = mail["From"]
    subject = mail["Subject"]
    d = mail["date"]
    date = d[0:16]
    






    for part in mail.walk():
        
        # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue
        
        # is this part an attachment ?
        if part.get('Content-Disposition') is None:
            continue

        filename = teacher + subject + ".pdf"
        counter = 1

        

        # if there is no filename, we create one with a counter to avoid duplicates
        if not filename:
            filename = 'part-%03d%s' % (counter, 'bin')
            counter += 1

        att_path = os.path.join(detach_dir, filename)

        #Check if its already there
        if not os.path.isfile(att_path) :
            # finally write the stuff
            fp = open(att_path, 'wb')
            fp.write(part.get_payload(decode=True))
            fp.close()


            
    
    for part in mail.walk():
  # multipart are just containers, so we skip them
        if part.get_content_maintype() == 'multipart':
            continue
 
  # we are interested only in the simple text messages
        if part.get_content_subtype() != 'plain':
            continue
 
        payload = part.get_payload()
        
        x = payload
        all=string.maketrans('','')
        nodigs=all.translate(all, string.digits)
        copies =  x.translate(all, nodigs)
        print date
        print teacher
        print subject
        print "Number of Copies:" + copies

        

        
    # we use walk to create a generator so we can iterate on the parts and forget about the recursive headach
    



d = r'/Users/defaultuser/Desktop'
totpages = 0
for f in (pf for pf in os.listdir(d) if pf.endswith('.pdf')):
    fn = os.path.join(d,f)
    with open(fn, 'rb') as pdf:
        text = pdf.read()
        pages = int(''.join(takewhile(methodcaller('isdigit'), text[text.rfind('/Count ')+7:].lstrip())))
   
    print('File %s: %i pages' % (f,pages))



book = xlwt.Workbook(encoding="utf-8") 

sheet1 = book.add_sheet("Python Sheet 1") 


sheet1.write(0, 0, "Job")
sheet1.write(0, 1, "Date")
sheet1.write(0, 2, "Teacher")
sheet1.write(0, 3, "Copies")
sheet1.write(0, 4, "Pages")




book.save("python_spreadsheet.xls")

I think you should forget about the spreadsheet first and print your expected output to the console with 5 entries on each line. Once this work, it will be easy to convert the code to write in a spreadsheet.

I think you should forget about the spreadsheet first and print your expected output to the console with 5 entries on each line. Once this work, it will be easy to convert the code to write in a spreadsheet.

i really am stuck because im not sure what to do next. my program out puts al the information needed
(date, teacher, copies, pages) i just need it to put the information on the spreadsheet.

i am able to make my program out put all the entries on one line except for the number of pages. when i try to move the number of pages up in the script so to be able to output it on the same line as the other entries, it messes up the whole program and only outputs one of the emails out of other emails.

any advice on how to make it so that all the entries come in one line and how to rearrange my code so that i can get pages to be on the same line as the other entries

This article has been dead for over six months. Start a new discussion instead.