Hi folks,

I believe this is a simple task, but I'm having problems getting things to work the way I would expect them to.
What I have is a master file containing about 1000 lines.
Each line looks like this:
2008_06_01 07:55 24.8 83.1

What I need to do is extract all the lines that have a value between 30 and 45 in the column where you currently see "24.8."
I am dealing with both positive and negative values. (I need to accept -30 and 30 and -31 and 31 and so on...)
Once I have read through the file and selected all the records with the values in the desired range, I need to write them to a new output file.
Can anyone suggest a method for doing this?

Any assistance is greatly appreciated.

Recommended Answers

All 12 Replies

Here is one example how to approach this:

"""
assume your test data file (test_data.txt) looks like this
2008_06_01 07:55 24.8 83.1
2008_06_02 07:55 31.2 85.7
2008_06_03 07:55 33.4 86.1
2008_06_04 07:55 -30.8 81.6
2008_06_05 07:55 25.7 82.3
"""

new_list = []
for line in file("test_data.txt"):
    data_list = line.split()
    # item at index 2 is your critical data
    temp = float(data_list[2])
    # select values above and below your decisive point 
    if temp >= 30.0 or temp <= -30.0:
        # create a new list of selected data
        new_list.append(line)

# save it to file test_data2.txt 
fout = open("test_data2.txt", "w")
fout.writelines(new_list)
fout.close()

"""
my output of test_data2.txt from editor --->
2008_06_02 07:55 31.2 85.7
2008_06_03 07:55 33.4 86.1
2008_06_04 07:55 -30.8 81.6
"""

Oh perfect!
Thanks alot for your help on this.

My code had some of these elements(not in the right sequence), and you really clarified where I was making a mistake.

Thanks again.

There is another part to this task that I wonder if you might have the time to look at as well.

Now that I have my output file with the desired range of values, I need to mess with the organization a bit.

So, the file looks like this:
2008_06_01 08:25 30.1 88.2
2008_06_01 08:30 30.9 89.1
2008_06_01 08:35 31.8 90.0
2008_06_01 08:40 32.7 90.9
2008_06_01 08:45 33.6 91.8
2008_06_01 08:50 34.4 92.7
2008_06_01 08:55 35.3 93.6
2008_06_01 09:00 36.2 94.6
2008_06_01 09:05 37.0 95.5
2008_06_01 09:10 37.9 96.5
2008_06_01 09:15 38.8 97.5
2008_06_01 09:20 39.6 98.4
2008_06_01 09:25 40.5 99.5
2008_06_01 09:30 41.4 100.5
2008_06_01 09:35 42.2 101.5
2008_06_01 09:40 43.1 102.6
2008_06_01 09:45 43.9 103.7
2008_06_01 09:50 44.8 104.8
2008_06_01 16:15 44.4 255.8
2008_06_01 16:20 43.6 256.9
2008_06_01 16:25 42.7 257.9
2008_06_01 16:30 41.9 259.0
2008_06_01 16:35 41.0 260.0
2008_06_01 16:40 40.1 261.0
2008_06_01 16:45 39.3 262.0
2008_06_01 16:50 38.4 263.0
2008_06_01 16:55 37.6 264.0
2008_06_01 17:00 36.7 264.9
2008_06_01 17:05 35.8 265.9
2008_06_01 17:10 34.9 266.8
2008_06_01 17:15 34.1 267.8
2008_06_01 17:20 33.2 268.7
2008_06_01 17:25 32.3 269.6
2008_06_01 17:30 31.5 270.5
2008_06_01 17:35 30.6 271.3
2008_06_02 08:25 30.1 88.1
2008_06_02 08:30 31.0 88.9
2008_06_02 08:35 31.9 89.8
2008_06_02 08:40 32.7 90.7
2008_06_02 08:45 33.6 91.6
2008_06_02 08:50 34.5 92.6
2008_06_02 08:55 35.4 93.5
2008_06_02 09:00 36.2 94.4
2008_06_02 09:05 37.1 95.4
2008_06_02 09:10 38.0 96.3
2008_06_02 09:15 38.8 97.3
2008_06_02 09:20 39.7 98.3
2008_06_02 09:25 40.6 99.3
2008_06_02 09:30 41.4 100.3
2008_06_02 09:35 42.3 101.4
2008_06_02 09:40 43.1 102.4
2008_06_02 09:45 44.0 103.5
2008_06_02 09:50 44.8 104.6
2008_06_02 16:15 44.5 255.9
2008_06_02 16:20 43.7 256.9
2008_06_02 16:25 42.8 258.0
2008_06_02 16:30 42.0 259.1
2008_06_02 16:35 41.1 260.1
2008_06_02 16:40 40.3 261.1
2008_06_02 16:45 39.4 262.1
2008_06_02 16:50 38.5 263.1
2008_06_02 16:55 37.7 264.1
2008_06_02 17:00 36.8 265.0
2008_06_02 17:05 35.9 266.0
2008_06_02 17:10 35.1 266.9
2008_06_02 17:15 34.2 267.8
2008_06_02 17:20 33.3 268.7
2008_06_02 17:25 32.4 269.6
2008_06_02 17:30 31.6 270.5
2008_06_02 17:35 30.7 271.4
2008_06_03 08:25 30.2 87.9
2008_06_03 08:30 31.0 88.8
2008_06_03 08:35 31.9 89.7
2008_06_03 08:40 32.8 90.6
2008_06_03 08:45 33.7 91.5
2008_06_03 08:50 34.5 92.4
2008_06_03 08:55 35.4 93.3
2008_06_03 09:00 36.3 94.3
2008_06_03 09:05 37.2 95.2
2008_06_03 09:10 38.0 96.2
2008_06_03 09:15 38.9 97.2
2008_06_03 09:20 39.8 98.1
2008_06_03 09:25 40.6 99.1
2008_06_03 09:30 41.5 100.2
2008_06_03 09:35 42.3 101.2
2008_06_03 09:40 43.2 102.3
2008_06_03 09:45 44.0 103.3
2008_06_03 09:50 44.9 104.4
2008_06_03 16:15 44.6 255.9
2008_06_03 16:20 43.8 257.0
2008_06_03 16:25 42.9 258.1
2008_06_03 16:30 42.1 259.2
2008_06_03 16:35 41.2 260.2
2008_06_03 16:40 40.4 261.2
2008_06_03 16:45 39.5 262.2
2008_06_03 16:50 38.6 263.2
2008_06_03 16:55 37.8 264.2
2008_06_03 17:00 36.9 265.1
2008_06_03 17:05 36.0 266.1
2008_06_03 17:10 35.2 267.0
2008_06_03 17:15 34.3 267.9


Now, what I need to do is create an output file which states the time on a given day where the sun first enters the window (@30) and the time it leaves the window(@45). On some days, there are two windows.

So the output file should have colums something like

Day Sunin_Time Sunout_time Sunin_Time2 Sunout_time2

This type of thing is a bit (read:alot) over my head, so I am grateful for any and all help out there from you python folks.

Thanks.

Do you mean the earliest and latest times for one day in the file. That should be fairly straight forward.
min_time=99:99
max_time=00:00
if time < min_time: min_time=time (you will actually have to convert to minutes)
elif time > max_time: max_time=time
Of course you will have to test for date != previous date, but that's the general idea. If you are instead asking about the third column, the same principle is used to find the minimum and maximum with possibly an if value <=45 or value >= 30 thrown in.

I'm confused about the nature of the problem. Are you asking "When on June 1, 2008 did the sun enter the window @30 and leave the window @45?"

'Cause with the data you have, the sun never leaves that window...

Jeff

My apologies for not being totally clear..I should have included a bit more background.

So, what I am looking for is the first time of the day that the sun goes above 30 degrees, and I want to record this time in my first column. (SUNIN_1) On 2008_06_01, this occurs at 8:25 in the morning, so SUNIN_1 should have 8:25 as it's first record.
And then I want to record the last time that the sun is below 45 degrees in this same arc. On 2008_06_01 this would be at 9:50. (SUNOUT_1)
The sun then re-enters the window at 16:15, so this value would be recorded in SUNIN_2.
The sun stays in the second window until 17:35, so 17:35 would be the last column in the table : SUNOUT_2.

I actually have a lot more data than the sample I included, if it was just 3-4 days, i would record the values manually- but since it covers a couple of months, it would be really great to accomplish it programatically.

Woooee, I see what you are saying about the mintime/maxtime thing, but how would I make this work on the same values at two different times on the same day?

Thanks for the suggestions so far.

If I understand correctly, this should come close. I break in down into data for each day, and then into the specific range for however many times that range occurs in a day

#!/usr/bin/python

input_file = [
'2008_06_01 08:25 30.1 88.2',
'2008_06_01 08:30 30.9 89.1',
'2008_06_01 08:35 31.8 90.0',
'2008_06_01 08:40 32.7 90.9',
'2008_06_01 08:45 33.6 91.8',
'2008_06_01 08:50 34.4 92.7',
'2008_06_01 08:55 35.3 93.6',
'2008_06_01 09:00 36.2 94.6',
'2008_06_01 09:05 37.0 95.5',
'2008_06_01 09:10 37.9 96.5',
'2008_06_01 09:15 38.8 97.5',
'2008_06_01 09:20 39.6 98.4',
'2008_06_01 09:25 40.5 99.5',
'2008_06_01 09:30 41.4 100.5',
'2008_06_01 09:35 42.2 101.5',
'2008_06_01 09:40 43.1 102.6',
'2008_06_01 09:45 43.9 103.7',
'2008_06_01 09:50 44.8 104.8',
'2008_06_01 10:00 46.0 105.0',
'2008_06_01 16:15 44.4 255.8',
'2008_06_01 16:20 43.6 256.9',
'2008_06_01 16:25 42.7 257.9',
'2008_06_01 16:30 41.9 259.0',
'2008_06_01 16:35 41.0 260.0',
'2008_06_01 16:40 40.1 261.0',
'2008_06_01 16:45 39.3 262.0',
'2008_06_01 16:50 38.4 263.0',
'2008_06_01 16:55 37.6 264.0',
'2008_06_01 17:00 36.7 264.9',
'2008_06_01 17:05 35.8 265.9',
'2008_06_01 17:10 34.9 266.8',
'2008_06_01 17:15 34.1 267.8',
'2008_06_01 17:20 33.2 268.7',
'2008_06_01 17:25 32.3 269.6',
'2008_06_01 17:30 31.5 270.5',
'2008_06_01 17:35 30.6 271.3',
'2008_06_02 08:25 30.1 88.1',
'2008_06_02 08:30 31.0 88.9',
'2008_06_02 08:35 31.9 89.8',
'2008_06_02 08:40 32.7 90.7',
'2008_06_02 08:45 33.6 91.6',
'2008_06_02 08:50 34.5 92.6',
'2008_06_02 08:55 35.4 93.5',
'2008_06_02 09:00 36.2 94.4',
'2008_06_02 09:05 37.1 95.4',
'2008_06_02 09:10 38.0 96.3',
'2008_06_02 09:15 38.8 97.3',
'2008_06_02 09:20 39.7 98.3',
'2008_06_02 09:25 40.6 99.3',
'2008_06_02 09:30 41.4 100.3',
'2008_06_02 09:35 42.3 101.4',
'2008_06_02 09:40 43.1 102.4',
'2008_06_02 09:45 44.0 103.5',
'2008_06_02 09:50 44.8 104.6',
'2008_06_02 10:00 46.0 105.0',
'2008_06_02 16:15 44.5 255.9',
'2008_06_02 16:20 43.7 256.9',
'2008_06_02 16:25 42.8 258.0',
'2008_06_02 16:30 42.0 259.1',
'2008_06_02 16:35 41.1 260.1',
'2008_06_02 16:40 40.3 261.1',
'2008_06_02 16:45 39.4 262.1',
'2008_06_02 16:50 38.5 263.1',
'2008_06_02 16:55 37.7 264.1',
'2008_06_02 17:00 36.8 265.0',
'2008_06_02 17:05 35.9 266.0',
'2008_06_02 17:10 35.1 266.9',
'2008_06_02 17:15 34.2 267.8',
'2008_06_02 17:20 33.3 268.7',
'2008_06_02 17:25 32.4 269.6',
'2008_06_02 17:30 31.6 270.5',
'2008_06_02 17:35 30.7 271.4',
'2008_06_03 08:25 30.2 87.9',
'2008_06_03 08:30 31.0 88.8',
'2008_06_03 08:35 31.9 89.7',
'2008_06_03 08:40 32.8 90.6',
'2008_06_03 08:45 33.7 91.5',
'2008_06_03 08:50 34.5 92.4',
'2008_06_03 08:55 35.4 93.3',
'2008_06_03 09:00 36.3 94.3',
'2008_06_03 09:05 37.2 95.2',
'2008_06_03 09:10 38.0 96.2',
'2008_06_03 09:15 38.9 97.2',
'2008_06_03 09:20 39.8 98.1',
'2008_06_03 09:25 40.6 99.1',
'2008_06_03 09:30 41.5 100.2',
'2008_06_03 09:35 42.3 101.2',
'2008_06_03 09:40 43.2 102.3',
'2008_06_03 09:45 44.0 103.3',
'2008_06_03 09:50 44.9 104.4',
'2008_06_03 10:00 46.0 105.0',
'2008_06_03 16:15 44.6 255.9',
'2008_06_03 16:20 43.8 257.0',
'2008_06_03 16:25 42.9 258.1',
'2008_06_03 16:30 42.1 259.2',
'2008_06_03 16:35 41.2 260.2',
'2008_06_03 16:40 40.4 261.2',
'2008_06_03 16:45 39.5 262.2',
'2008_06_03 16:50 38.6 263.2',
'2008_06_03 16:55 37.8 264.2',
'2008_06_03 17:00 36.9 265.1',
'2008_06_03 17:05 36.0 266.1',
'2008_06_03 17:10 35.2 267.0',
'2008_06_03 17:15 34.3 267.9'
]


def ranges(list_in):
   temp_list=[]
   ret_list=[]
   for rec in list_in:
      substrs=rec.split()  
      test_float=abs(float(substrs[2]))
      if (test_float >= 30) and (test_float <= 45):
         ## convert to string and add leading zeroes so
         ## 11 doesn't sort before 2
         test_str="%(test_float)07.3f" % vars()
         temp_list.append(test_str + "  " + rec)     ## only values in 30-45 range
      elif len(temp_list) > 0 :       ## now outside the 30-45 range
         temp_list.sort()
         ##  now the lowest value is temp_list[0] and the
         ##  highest value is temp_list[-1] = last element
         ret_list.append(temp_list[0] + "**" + temp_list[-1])
         temp_list=[]
   if len(temp_list) > 0:    ## process final recs if there are any
      temp_list.sort()
      ret_list.append(temp_list[0] + "**" + temp_list[-1])
   return ret_list


prev_date=""
day_list=[]
for eachrec in input_file:
   substrs=eachrec.split()
   if (substrs[0] != prev_date) and (len(day_list) > 0):
      print ranges(day_list)       ## one day's data
      day_list=[]

   day_list.append(eachrec) 
   prev_date=substrs[0]
print ranges(day_list)             ## process final day

If the above code does not come close to what you want, then you will have to explain it with even more detail for us who are (fill in your own adjective here) folk.

That's terrific.
Thanks.

It get's me really close to what I need. (Closer than I would be on my own, that's for sure!)
I'm just wondering why I return an error after directing the script to my actual master file. ("newsunfile.txt")

import os, string, glob
path = "C:\\sun_py\\Sun Angle\\SunTxt\\"
input_file = ["newsunfile.txt"]

The error I get is:

Traceback (most recent call last):
  File "C:\Python25\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 307, in RunScript
    debugger.run(codeObject, __main__.__dict__, start_stepping=0)
  File "C:\Python25\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
    _GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
  File "C:\Python25\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 631, in run
    exec cmd in globals, locals
  File "C:\SUN_PY\Sun Angle\SunTxt\final_sunpy.py", line 32, in <module>
    print ranges(day_list)       ## one day's data
  File "C:\SUN_PY\Sun Angle\SunTxt\final_sunpy.py", line 9, in ranges
    test_float=abs(float(substrs[2]))
IndexError: list index out of range

I'm obviously doing something wrong in the way I am trying to retrieve the file. Can I not open a file this way? Input_file = [filename.txt]?
Is there another step that I need to do beforehand?

Thanks again for your assistance wise python folk :)

This is the problem area

for rec in list_in:
      substrs=rec.split()  
      test_float=abs(float(substrs[2]))
#
#There are probably some blank lines in the file so you have to test each line for length/data
#
for rec in list_in:
      substrs=rec.split()  
      if len(substrs) > 2:
         test_float=abs(float(substrs[2]))
         ## rest of code here

Thanks woooee, I tried out your suggested modification, but I am still returning the same error.

As a side note: how did you add the formatting to the original code you posted?
(That is, the single quotes and the comma at the end of each line.) Manually or using a separate script? Even though it would be awfully ugly, maybe I can just apply this formatting to the entire file and use the same method you did originally, because that worked so smoothly.

-->'2008_06_01 08:25 30.1 88.2',


Thanks for all your help.
I really appreciate it!

Ok, please disregard my last post.
Almost everything is working now.

thankyouthankyouthankyou!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.