Im pretty new to python so im still finding my feet..

Ive written a small ftp program that downloads a file but its quite basic in that it gets the last element from the list and then downloads it.

What is the easiest way to add some logic so it only downloads the latest file if it is a .jpa extenstion and it is the newest file within the list ....

The code is :

#!/usr/bin/python

    #### IMPORT MODULES ####

import ftplib
import string
import os

    #### DEFINE VARIABLES ####

ftpserver = "??????????"
username = "??????????"
password = "??????????"
localdir = "??????????"

    #### SET EMPTY LIST ####

data = []

    #### CHANGE LOCAL DIRECTORY #####

os.chdir(localdir)

    #### CONNECT TO FTP SERVER ####

ftp = ftplib.FTP(ftpserver)
ftp.login(username, password)

    #### GET LIST OF FILES ####

ftp.dir(data.append)

    #### GET LAST FILENAME ####

data1 = data[-1:]
data2 = "".join(data1)
data3 = data2.split()
data4 = data3[-1:]
dstfile = "".join(data4)


    #### DOWNLOAD FILE ####

try:
        ftp.retrbinary('RETR '+dstfile, open(dstfile, 'wb').write)
except Error:
        print "Error transfering", dstfile

    #### GRACEFULLY CLOSE FTP CONNECTION ####

ftp.close()
ftp.quit()

Thanks,

Recommended Answers

All 18 Replies

newest file within the list is problematic. If you look at the output of ftp.dir() you will see a list of platform-specific lines. You will have to parse each line, in a way that is appropriate to that platform; and use that information to either sort the list or just keep track of the currently youngest item. In the course of doing that, you can simply discard items from the list that do not meet your specification using, perhaps, [I]something[/I].endswith('.jpa')

Ive changed the code and it now returns a list of .jpa files :

['site-website-20110105-101303.jpa', 'site-website-20110106-163312.jpa', 'site-website-20110108-094822.jpa', 'site-website-20110108-095834.jpa', 'site-website-20110109-203148.jpa', 'site-website-20110110-040326.jpa', 'site-www.website-20110109-040402.jpa', 'site-www.website-20110111-040219.jpa']

Is there an easy way of sorting these by their integers ??

You can use built in function sorted(). You need a function to extract the key from the filename. Here is one that depends on the exact format of the name.

import datetime
import os
def datetimeKeyFromSpecificFileName(name):
  """split up the name and return a datetime.datetime instance"""
  # format is site-website-yyyymmdd-hhmmss.jpa
  junk, junk, daypart,timepart = os.path.splitext(name)[0].split('-')
  year = int(daypart[:4])
  month = int(daypart[4:6])
  day = int(daypart[-2:])
  hour = int(timepart[:2])
  minute = int(timepart[2:4])
  second = int(timepart[-2:])
  return datetime.datetime(year,month,day,hour,minute,second)

Then your sorted list is sorted(yourlist,key=datetimeKeyFromSpecificFileName)

mmmmmm not bad

Looks little over complicated for me. We do not need datetime objects.

import string
def only_numbers(x):
    res= int(x.replace('-','').strip(string.letters + string.punctuation))
#     print res
    return res

files = ('site-website-20110105-101303.jpa', 'site-website-20110108-094822.jpa', 'site-website-20110108-095834.jpa', 'site-website-20110109-203148.jpa',
            'site-website-20110110-040326.jpa', 'site-website-20110106-163312.jpa', 'site-www.website-20110109-040402.jpa', 'site-www.website-20110111-040219.jpa')
print('Files sorted:\n\t' + '\n\t'.join(sorted(files, key = only_numbers)))
commented: As usual, Tony sees things clearly +1

Once I saw tonyjv's better solution, I realized it can be simplified even more: There is no need to cast the digits to an int: The ISO 8601 format sorts correctly as a string. You can even leave the dash:

def key_part(x):
  return x.strip(string.letters + string.punctuation)

I don't know if casting to an int after a replace costs more or less than comparing the strings. I'm sure it doesn't matter for this small problem

Well how about this then?
;)

alis="""
'site-website-20110105-101303.jpa', 'site-website-20110106-163312.jpa', 
'site-website-20110108-094822.jpa', 'site-website-20110108-095834.jpa',
 'site-website-20110109-203148.jpa', 'site-website-20110110-040326.jpa', 
 'site-www.website-20110109-040402.jpa', 'site-www.website-20110111-040219.jpa' """.split(",")

alis.sort()
info=([(x.split("-")[2:3],([z.split(".")[0] for z in x.split("-")[3:4]])) for x in alis])

Almost, but no banana:
first: your string alis doesn't support the sort() method
edit: Woops. Missed the trailing split(','). Sorry.

second: 'site-www.website' sorts after 'site-website' ... and I suspect that 'site' and 'website' are both stand-ins for actual strings that OP didn't choose to share (which is good: simplifies the question).

I actually saw the possibility, but thought that learning to sort numbers lexically would maybe confuse OP more if he has little experience. Also the numbers seem to be fixed place for negative indexing, so simple slice as key would suffice. I did not like so fixed and blind solution.

I think besides the problem of newbie readabilty....

There is nothing major wrong with this solution.
Anyway thanks guys.
:)

OK here the most simple version then:

files = ('site-website-20110105-101303.jpa', 'site-website-20110108-094822.jpa',
         'site-website-20110108-095834.jpa', 'site-website-20110109-203148.jpa',
         'site-website-20110110-040326.jpa', 'site-website-20110106-163312.jpa',
         'site-www.website-20110109-040402.jpa', 'site-www.website-20110111-040219.jpa')

print('Files sorted:\n\t' + '\n\t'.join(sorted(files, key = lambda x:x[-19:-4])))

Thanks guys...
Can anyone explain how the previous example works ??

Sure.

  • '\n\t'.join([I]an_iterable[/I]) Works by using the leading string as glue to join the elements of the iterable. So this prints each element on a line, tab indented.
  • sorted([I]an_iterable[/I],key=func) Works by returning a new iterable, sorted using func to derive a key from each element of the original iterable.
  • lambda x:x[-19:-4] Makes an unnamed function of one argument that returns the first 15 of the last 19 chars of the arg

Note: join requires that all its arguments are strings.

Ok thanks, even though Im not too sure if I understand fully how it works still..

It is pretty straight forward if you just think about it in pieces.

  • In your case, you don't want to actually print all the names, so the outer print function is not what you need. Tony used it to show that the data ended up correctly sorted
  • For the same reason, the join function is also not needed.
  • Any list is iterable, so the sorted function can be used to return a sorted list when given a list.
  • If you just chant sorted([I]somelist[/I]) then you get back a list that was sorted using the entire element as the key. That doesn't work for your case because the prefix part (site-xxx) would 'overpower' the date-time part
  • To avoid using the entire item as a key, you can pass a function that derives a key from the item. We have shown you several possible functions that will do the job on the data you gave as examples
  • lambda is syntactic sugar for a normal function definition. Programmers prefer short lambda functions because they are local, anonymous and scoped
    • (local): As you read the program, you can easily see exactly what was intended without hunting around for the function definition (assuming you can parse the lambda syntax of course) :icon_lol:
    • (anonymous): You don't need to remember its name and can't use it elsewhere
    • (scoped): Same as a nested function.

Thanks for all your help guys. Ive added the finished program below :

#!/usr/bin/python
#
# Usage : Download latest .jpa file from FTP Server.

import ftplib                                                           ## IMPORT MODULES
import string                                                           ## IMPORT MODULES
import os                                                               ## IMPORT MODULES
import re                                                               ## IMPORT MODULES

ftpserver = "????"                                                     ## ASSIGN VARIABLE
username = "????"                                                      ## ASSIGN VARIABLE
password = "????"                                                      ## ASSIGN VARIABLE
localdir = "????"                                                       ## ASSIGN VARIABLE

data = []                                                               ## SET EMPTY LIST
list2 = []                                                              ## SET EMPTY LIST

os.chdir(localdir)                                                      ## CHANGE LOCAL DIRECTORY

ftp = ftplib.FTP(ftpserver)                                             ## SET FTPLIB MODULE TO FTPSERVER + ASSIGN TO FTP
ftp.login(username, password)                                           ## LOG INTO FTP SERVER

ftp.dir(data.append)                                                    ## GET DIRECTORY LISTING + ASSIGN TO LIST

                                                                        ## CONVERT DIR LISTING LINES TO WORDS FOR EACH ELEMENT
data = " ".join(data)                                                   ## CONVERT DATA LIST INTO STR
data = data.split()                                                     ## CONVERT DATA STR INTO LIST

y = re.compile('.*\.jpa$')                                              ## SET REGEX MATCH CRITERIA

for x in data:                                                          ## LOOP THROUGH DATA LIST AND APPEND ANY ELEMENT WITH DEFINED STR TO LIST2.
        if y.match(x):
                list2.append(x)

dstfile = (' '.join(sorted(list2, key = lambda x:x[-19:-4])))           ## SORT LIST2 BY NUMERIC DIGITS WITHIN EACH ELEMENT + ASSIGN TO STR
dstfile = dstfile.split()                                               ## SPLIT STR TO LIST
dstfile = dstfile[-1]                                                   ## ASSIGN LAST ELEMENT OF LIST TO STR

try:
        ftp.retrbinary('RETR '+dstfile, open(dstfile, 'wb').write)      ## GET FILE
except Error:                                                           ## PRINT EXCEPTION ERROR
        print "ERROR TRANSFERING", dstfile

ftp.close()                                                             ## CLOSE FTP CONNECTION GRACEFULLY
ftp.quit()                                                              ## CLOSE FTP CONNECTION GRACEFULLY

Glad you got it going.
Couple of comments:

  • At lines 26 and 27, 35 and 36 you seem to be spinning your wheels by first doing a join, then a split. Why? I'm guessing it helps with newlines?? in which case explore the rstrip() string member function: Certainly easier to understand, probably more efficient. For instance, at line 32, you could say if y.match(x.rstrip()): and something similar at line 37. The strip functions come in three flavors: strip() operates on both ends, lstrip() and rstrip() on only the left or right end of the string respectively.
  • I was about to suggest using with statement to protect the ftp connection, but it turns out ftplib is too ancient to support it. Instead, to be nice to the server, you want to wrap the use of your ftp connection in try...finally blocks so even if something, ahem, exceptional happens, your clean up code is run:
    ftp = None
    try:
        ftp = ftplib.FTP(ftpserver)
        # ... all the work is done here
    # the except block is optional, but often very helpful
    except Exception,x 
        print "Yeek:",type(x),x
    # the finally block is always run, even if an exception happens
    finally:
        if ftp:
            ftp.close()
            ftp.quit()

A meta comment on your comments: "Way overkill", though it isn't wrong, and may be good if it helps you keep track of things. However I do suggest that you use a single '#' for most comments; use normal capitalization, so the comments don't seem to YELL AT YOU; and almost always, put the comment either on a line by itself (as a comment about the next block of code), or only one or two spaces after the end of the line (as a comment about that line only). This is the way most of the rest of us do it. The rule of thumb is that your code should be so blindingly obvious that comments are not needed (self documenting code)... or if not, then a short comment to explain your trick or subtlety. After all, it is the code itself that does the work, so the code should be what your eye is naturally drawn to.

Finally, I want to remind you that you can keep DaniWeb a little cleaner and more functional and slightly boost the reputation of the folks who helped you if you hunt down the 'solved' link at the bottom of the page and click it... after the thread is indeed solved, of course. Only the OP (Original Poster) has access to that link/button.

commented: Nice pedagogic tone +3

Thanks. I def agree with your spinning the wheels comment, but Im sure my code will get cleaner over time. As for the comments this was purley just as a precaution in case I forget any of the commands of functions while I pick up the basics.

Anyways thanks once again for all your efforts ...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.