I am continuing on with one of my first projects in python. This module for my larger project will be to download files(legal freely published by sporting assosciations). I am not sure if my concepts are right in how best to execute this, I have read the urllib docs on python.org and the syntax itself seems okay.

For example the first site publishes the information in zip files each containing 5 csv files. My aim is to save these files to hard as read only and in a later module take csv and format to input to a database(new at that as well probably will be MySQL mainly due to more docs than postgre). I also aim to have this current module be able to check sites to see if I have all current files and download new files if necessary (but for now just getting files is my goal).

So website stores data in format http://someurl.com/results/year/yymmddr.zip results range from 2004 onward however I am only going to initially collect from 2006 on. Dates within months are not ordered they are randomish.

My two concepts (and I know both could be wrong) is

1. As I know that maximum days is 31 max months in a year is 12. My first concept was to create the dates to count and check existence.

so that yy = years
mm = months
dd = days into urllib format so that I test if each condition is true. So that in my example here
yy = 06
mm = 08
dd = 01
if file exists download to filename based on date in specified folder
increase dd + 1 - then repeat loop
if dd > 31
then mm + 1
if mm > 12
then year + 1
then use timdate module to import current date to stop it contiinuing if incremented date is greater than current date.
Is this a valid concept ? Could there be an easier way to use urllib other sites I will be using may not follow this format.

2. My second idea, not so confident about this one but it would be more adaptive in the long run.

use urllib to read the filenames and return them to a list. Then split the list and download each instance of the list and when all are complete increment year by one and read again.

so http://someurl.com/results/year/p.file(read)r.zip
for items in list p download and so on.

Are either of these concepts valid? Which would you reccommend or would a third option be more valid?


Recommended Answers

All 2 Replies

My question is defintiely wordy, is it confusing ? Where could I clarify it to get some help with which direction is a better practise or more pythony I guess.

Your first concept should be doable.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.