Hello Developers,

I am a beginner in python and need help with writing a regular expression for date and time to be fetched from a html document. In the following code I am walking through the html files in a folder called event and printing the headings with h1 tag using beautifulsoup. These html pages also contains different formats of date and time. I want to fetch and display this information as well. Different formats of date in these html documents are:

21 - 27 Nov 2012
1 Dec 2012
30 Nov - 2 Dec 2012
26 Nov 2012

Can someone help me out with fetching these formats from these html documents ?
Here is my code for walking through the files:

import re
import os
from bs4 import BeautifulSoup

for subdir, dirs, files in os.walk("/home/himanshu/event/"):
    for fle in files:
        path = os.path.join(subdir, fle)    
        soup = BeautifulSoup(open(path))

        print (soup.h1.string)

        #Date and Time detection

for the first type, 21 - 27 Nov 2012:


import re
s="abcd efgh 44 - 88 Dec 2012 xyz"
pat=r'\d{1,2} - \d{1,2} \w{3} \d{4}'
'44 - 88 Dec 2012'

for 1 Dec 2012 and 26 Nov 2012:
pat=r'\d{1,2} \w{3} \d{4}

for 30 Nov - 2 Dec 2012:
pat=r'\d{1,2} \w{3} - \d{1,2} \w{3} \d{4}

Edited 4 Years Ago by rrashkin

Hi rrashkin, thanks for your reply but I don't understand this following line of code

s="abcd efgh 44 - 88 Dec 2012 xyz"

what is it actually doing ?

That was just setting a string with the target pattern to demonstrate that the search would actually find it. In your case, you would be reading in the string so it's unnecessary.

It's up, but empty. I had to set up a new user account to get in so I don't know if my previous "status" will be preserved. I feel like Ozimandius.

This article has been dead for over six months. Start a new discussion instead.