0

Hi folks,

I wonder if anyone out there could help me with something.
I have a folder containing many html files that look like this:

Astronomical Applications Dept.
U.S. Naval Observatory
Washington, DC 20392-5420

ST. PETER'S NOVA SCOTIA
o , o ,
W 60 52, N45 39

Altitude and Azimuth of the Sun
Jun 7, 2008
Zone: 3h West of Greenwich

Altitude Azimuth
(E of N)

h m o o
03:50 -11.7 39.0
03:55 -11.1 40.0
04:00 -10.6 41.0
04:05 -10.0 42.0
04:10 -9.4 43.0
04:15 -8.8 44.0
04:20 -8.2 45.0
04:25 -7.6 46.0
04:30 -6.9 47.0
04:35 -6.3 47.9
04:40 -5.6 48.9
04:45 -5.0 49.8
04:50 -4.3 50.8
04:55 -3.6 51.7
05:00 -2.9 52.6
05:05 -2.2 53.5
05:10 -1.5 54.4
05:15 -0.8 55.3
05:20 0.4 56.2
05:25 1.1 57.1
05:30 1.7 58.0
05:35 2.4 58.9
05:40 3.1 59.8
05:45 3.8 60.6
05:50 4.6 61.5
05:55 5.3 62.4
06:00 6.1 63.2
06:05 6.9 64.1
06:10 7.6 64.9
06:15 8.4 65.8
06:20 9.2 66.6
06:25 10.0 67.4
06:30 10.8 68.3
06:35 11.6 69.1


What I would like to do is remove the header information all the way down to the h m o o line. I then need to add a date to the beginning of each line so that it looks like:

2006-08-15 06:35 11.6 69.1

(The date is the name of the hmtl file already.)

I then need to concatenate all these files into one massive file that can be imported into a program such as quattro pro.
Can anyone give me some assistance with this project? Greatly appreciated.

2
Contributors
4
Replies
8
Views
9 Years
Discussion Span
Last Post by turnerca902
0

Create the initial test file ...

data = """Astronomical Applications Dept.
U.S. Naval Observatory
Washington, DC 20392-5420

ST. PETER'S NOVA SCOTIA
o , o ,
W 60 52, N45 39

Altitude and Azimuth of the Sun
Jun 7, 2008
Zone: 3h West of Greenwich

Altitude Azimuth
(E of N)

h m o o
03:50 -11.7 39.0
03:55 -11.1 40.0
04:00 -10.6 41.0
04:05 -10.0 42.0
04:10 -9.4 43.0
04:15 -8.8 44.0
04:20 -8.2 45.0
04:25 -7.6 46.0
04:30 -6.9 47.0
04:35 -6.3 47.9
04:40 -5.6 48.9
04:45 -5.0 49.8
04:50 -4.3 50.8
04:55 -3.6 51.7
05:00 -2.9 52.6
"""

fout = open("2006-08-15.dat", "w")
fout.write(data)
fout.close()

Now read the test file back in, process it and write it back out ...

filename = "2006-08-15"
datafile = filename + ".dat"

data_list = []
data_flag = False
for line in open(datafile):
    if data_flag:
        new_line = filename + " " + line
        data_list.append(new_line)
    if "h m o o" in line:
        data_flag = True
    
# test it ...
for item in data_list:
    print item,

# write a test file ...
fout = open("test.dat", "w")
fout.writelines(data_list)
fout.close()

"""
my output -->
2006-08-15 03:50 -11.7 39.0
2006-08-15 03:55 -11.1 40.0
2006-08-15 04:00 -10.6 41.0
2006-08-15 04:05 -10.0 42.0
2006-08-15 04:10 -9.4 43.0
2006-08-15 04:15 -8.8 44.0
2006-08-15 04:20 -8.2 45.0
2006-08-15 04:25 -7.6 46.0
2006-08-15 04:30 -6.9 47.0
2006-08-15 04:35 -6.3 47.9
2006-08-15 04:40 -5.6 48.9
2006-08-15 04:45 -5.0 49.8
2006-08-15 04:50 -4.3 50.8
2006-08-15 04:55 -3.6 51.7
2006-08-15 05:00 -2.9 52.6
"""
0

Wow!
Thank you so so much :)
I really appreciate it!

Now...just because I'm a real beginner, I need to ask:

How do I modify this code to loop through each html file in my c:/sun_py folder and create the new output file in the updated format?

0

It seems I have another issue to solve before I get into the batch processing...

When I test your code, my test.dat file is blank. Any ideas what might be causing this?

0

sorry, please disregard the post about the blank test.dat file. That is all good now.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.