Correct Concept with Urllib

Question

flebber 12 Light Poster

15 Years Ago

Hi

I am continuing on with one of my first projects in python. This module for my larger project will be to download files(legal freely published by sporting assosciations). I am not sure if my concepts are right in how best to execute this, I have read the urllib docs on python.org and the syntax itself seems okay.

For example the first site publishes the information in zip files each containing 5 csv files. My aim is to save these files to hard as read only and in a later module take csv and format to input to a database(new at that as well probably will be MySQL mainly due to more docs than postgre). I also aim to have this current module be able to check sites to see if I have all current files and download new files if necessary (but for now just getting files is my goal).

So website stores data in format http://someurl.com/results/year/yymmddr.zip results range from 2004 onward however I am only going to initially collect from 2006 on. Dates within months are not ordered they are randomish.

My two concepts (and I know both could be wrong) is

1. As I know that maximum days is 31 max months in a year is 12. My first concept was to create the dates to count and check existence.

so that yy = years
mm = months
dd = days into urllib format so that I test if each condition is true. So that in my example here
yy = 06
mm = 08
dd = 01
someurl.com/results/2006/060801r.zip
try
if file exists download to filename based on date in specified folder
except
increase dd + 1 - then repeat loop
if dd > 31
then mm + 1
if mm > 12
then year + 1
then use timdate module to import current date to stop it contiinuing if incremented date is greater than current date.
End
Is this a valid concept ? Could there be an easier way to use urllib other sites I will be using may not follow this format.

2. My second idea, not so confident about this one but it would be more adaptive in the long run.

use urllib to read the filenames and return them to a list. Then split the list and download each instance of the list and when all are complete increment year by one and read again.

so http://someurl.com/results/year/p.file(read)r.zip
for items in list p download and so on.

Are either of these concepts valid? Which would you reccommend or would a third option be more valid?

Thanks

legal python

Edited 15 Years Ago by flebber because: typo

2 Contributors
2 Replies
161 Views
1 Day Discussion Span
Latest Post 15 Years Ago Latest Post by vegaseat

All 2 Replies

vegaseat 1,735 DaniWeb's Hypocrite

15 Years Ago

Your first concept should be doable.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

flebber 12 Light Poster · Answer 1 · 2009-10-04T15:47:25+00:00

My question is defintiely wordy, is it confusing ? Where could I clarify it to get some help with which direction is a better practise or more pythony I guess.

Correct Concept with Urllib

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers