Hi folks,

I wonder if anyone out there could help me with something.
I have a folder containing many html files that look like this:

Astronomical Applications Dept.
U.S. Naval Observatory
Washington, DC 20392-5420

ST. PETER'S NOVA SCOTIA
o , o ,
W 60 52, N45 39

Altitude and Azimuth of the Sun
Jun 7, 2008
Zone: 3h West of Greenwich

Altitude Azimuth
(E of N)

h m o o
03:50 -11.7 39.0
03:55 -11.1 40.0
04:00 -10.6 41.0
04:05 -10.0 42.0
04:10 -9.4 43.0
04:15 -8.8 44.0
04:20 -8.2 45.0
04:25 -7.6 46.0
04:30 -6.9 47.0
04:35 -6.3 47.9
04:40 -5.6 48.9
04:45 -5.0 49.8
04:50 -4.3 50.8
04:55 -3.6 51.7
05:00 -2.9 52.6
05:05 -2.2 53.5
05:10 -1.5 54.4
05:15 -0.8 55.3
05:20 0.4 56.2
05:25 1.1 57.1
05:30 1.7 58.0
05:35 2.4 58.9
05:40 3.1 59.8
05:45 3.8 60.6
05:50 4.6 61.5
05:55 5.3 62.4
06:00 6.1 63.2
06:05 6.9 64.1
06:10 7.6 64.9
06:15 8.4 65.8
06:20 9.2 66.6
06:25 10.0 67.4
06:30 10.8 68.3
06:35 11.6 69.1


What I would like to do is remove the header information all the way down to the h m o o line. I then need to add a date to the beginning of each line so that it looks like:

2006-08-15 06:35 11.6 69.1

(The date is the name of the hmtl file already.)

I then need to concatenate all these files into one massive file that can be imported into a program such as quattro pro.
Can anyone give me some assistance with this project? Greatly appreciated.

Recommended Answers

All 4 Replies

Create the initial test file ...

data = """Astronomical Applications Dept.
U.S. Naval Observatory
Washington, DC 20392-5420

ST. PETER'S NOVA SCOTIA
o , o ,
W 60 52, N45 39

Altitude and Azimuth of the Sun
Jun 7, 2008
Zone: 3h West of Greenwich

Altitude Azimuth
(E of N)

h m o o
03:50 -11.7 39.0
03:55 -11.1 40.0
04:00 -10.6 41.0
04:05 -10.0 42.0
04:10 -9.4 43.0
04:15 -8.8 44.0
04:20 -8.2 45.0
04:25 -7.6 46.0
04:30 -6.9 47.0
04:35 -6.3 47.9
04:40 -5.6 48.9
04:45 -5.0 49.8
04:50 -4.3 50.8
04:55 -3.6 51.7
05:00 -2.9 52.6
"""

fout = open("2006-08-15.dat", "w")
fout.write(data)
fout.close()

Now read the test file back in, process it and write it back out ...

filename = "2006-08-15"
datafile = filename + ".dat"

data_list = []
data_flag = False
for line in open(datafile):
    if data_flag:
        new_line = filename + " " + line
        data_list.append(new_line)
    if "h m o o" in line:
        data_flag = True
    
# test it ...
for item in data_list:
    print item,

# write a test file ...
fout = open("test.dat", "w")
fout.writelines(data_list)
fout.close()

"""
my output -->
2006-08-15 03:50 -11.7 39.0
2006-08-15 03:55 -11.1 40.0
2006-08-15 04:00 -10.6 41.0
2006-08-15 04:05 -10.0 42.0
2006-08-15 04:10 -9.4 43.0
2006-08-15 04:15 -8.8 44.0
2006-08-15 04:20 -8.2 45.0
2006-08-15 04:25 -7.6 46.0
2006-08-15 04:30 -6.9 47.0
2006-08-15 04:35 -6.3 47.9
2006-08-15 04:40 -5.6 48.9
2006-08-15 04:45 -5.0 49.8
2006-08-15 04:50 -4.3 50.8
2006-08-15 04:55 -3.6 51.7
2006-08-15 05:00 -2.9 52.6
"""

Wow!
Thank you so so much :)
I really appreciate it!

Now...just because I'm a real beginner, I need to ask:

How do I modify this code to loop through each html file in my c:/sun_py folder and create the new output file in the updated format?

It seems I have another issue to solve before I get into the batch processing...

When I test your code, my test.dat file is blank. Any ideas what might be causing this?

sorry, please disregard the post about the blank test.dat file. That is all good now.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.