Hello,

I've been downloading my news for offline use with this script:

wget -r --no-parent -Q4096m -U Mozilla A.stm -erobots=off http://news.bbc.co.uk/2/hi/business/default.stm'

But it dumps it into one folder, and the titles are numbered. Is there any regular expression or other command I can use to seperate them into folders by date?

I would look up the wget manual.
If that does not help, you can peek into the file, extract the date from it, and put the file wherever you want.

import os
from os.path import join
import re

s=re.compile('.*?<meta name=\"OriginalPublicationDate\" content=\"(.*?)" />.*?',re.M|re.S)

_dir="news.bbc.co.uk/2/hi/business/"
for f in os.listdir(_dir):
        #print open(_dir+"/"+f).read()
        if f.endswith("stm"):
                st=open(join(_dir,f)).read()
                ob=s.match(st)
                if ob:
                        print f,ob.group(1)

This prints:

8348437.stm 2009/11/07 16:34:40
8317828.stm 2009/10/21 08:09:31
8253047.stm 2009/09/13 08:55:16
7879565.stm 2009/02/09 16:31:28
8375969.stm 2009/11/24 22:15:41
8388133.stm 2009/12/01 10:42:11
8370035.stm 2009/11/20 13:26:04
4372794.stm 2006/01/31 21:07:28
8063149.stm 2009/05/22 09:35:28
8365018.stm 2009/11/23 23:44:45

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.