Regular Expressions for all files into a folder?

Question

npn_ 0 Newbie Poster

15 Years Ago

Hello,

I've been downloading my news for offline use with this script:

wget -r --no-parent -Q4096m -U Mozilla A.stm -erobots=off http://news.bbc.co.uk/2/hi/business/default.stm'

But it dumps it into one folder, and the titles are numbered. Is there any regular expression or other command I can use to seperate them into folders by date?

python

2 Contributors
1 Reply
122 Views
1 Day Discussion Span
Latest Post 15 Years Ago Latest Post by slate

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

slate 241 Posting Whiz in Training · Answer 1 · 2009-12-05T05:12:39+00:00

I would look up the wget manual.
If that does not help, you can peek into the file, extract the date from it, and put the file wherever you want.

import os
from os.path import join
import re

s=re.compile('.*?<meta name=\"OriginalPublicationDate\" content=\"(.*?)" />.*?',re.M|re.S)

_dir="news.bbc.co.uk/2/hi/business/"
for f in os.listdir(_dir):
        #print open(_dir+"/"+f).read()
        if f.endswith("stm"):
                st=open(join(_dir,f)).read()
                ob=s.match(st)
                if ob:
                        print f,ob.group(1)

This prints:

8348437.stm 2009/11/07 16:34:40
8317828.stm 2009/10/21 08:09:31
8253047.stm 2009/09/13 08:55:16
7879565.stm 2009/02/09 16:31:28
8375969.stm 2009/11/24 22:15:41
8388133.stm 2009/12/01 10:42:11
8370035.stm 2009/11/20 13:26:04
4372794.stm 2006/01/31 21:07:28
8063149.stm 2009/05/22 09:35:28
8365018.stm 2009/11/23 23:44:45