Hi folks,

I wonder if anyone out there could help me with something.
I have a folder containing many cnv files that look like this:

* Sea-Bird SBE 9 Data File:
* FileName = C:\CTD Data\Alg173\stn001.dat
* Software Version Seasave Win32 V 5.38
* Temperature SN = 4977
* Conductivity SN = 3436
* Number of Bytes Per Scan = 27
* Number of Voltage Words = 5
* Number of Scans Averaged by the Deck Unit = 1
* System UpLoad Time = Oct 04 2009 05:18:31
** Ship: Algoa
** Cruise: ACEP II
** Station: C09412
** Latitude: 26 23.607 S
** Longitude: 032 57.018 E
** Transect 1 - Mozambique
** Grazing
# nquan = 16
# nvalues = 31
# units = specified
# name 0 = prDM: Pressure, Digiquartz [db]
# name 1 = depSM: Depth [salt water, m]
# name 2 = t068C: Temperature [ITS-68, deg C]
# name 3 = t090C: Temperature [ITS-90, deg C]
# name 4 = potemp090C: Potential Temperature [ITS-90, deg C]
# name 5 = c0S/m: Conductivity [S/m]
# name 6 = sal00: Salinity [PSU]
# name 7 = sbeox0ML/L: Oxygen, SBE 43 [ml/l]
# name 8 = flECO-AFL: Fluorescence, Wetlab ECO-AFL/FL [mg/m^3]
# name 9 = par: PAR/Irradiance, Biospherical/Licor
# name 10 = spar: SPAR/Surface Irradiance
# name 11 = v1: Voltage 1
# name 12 = sal00: Salinity [PSU]
# name 13 = svCM: Sound Velocity [Chen-Millero, m/s]
# name 14 = sigma-é00: Density [sigma-theta, Kg/m^3]
# name 15 = flag: flag
# span 0 = 5.029, 35.213
# span 1 = 5.000, 35.000
# span 2 = 20.9034, 21.8131
# span 3 = 20.8983, 21.8079
# span 4 = 20.8916, 21.8059
# span 5 = 4.940027, 5.033821
# span 6 = 35.4444, 35.4573
# span 7 = 4.35836, 4.67184
# span 8 = 0.3789, 0.5774
# span 9 = 3.5226e+01, 2.4172e+02
# span 10 = 9.8951e+02, 1.5403e+03
# span 11 = 0.0626, 0.0702
# span 12 = 35.4443, 35.4533
# span 13 = 1525.01, 1527.07
# span 14 = 24.6142, 24.8617
# span 15 = 0.0000e+00, 0.0000e+00
# interval = meters: 1
# start_time = Oct 04 2009 05:18:31
# bad_flag = -9.990e-29
# sensor 0 = Frequency 0 temperature, 4977, 17/04/2008
# sensor 1 = Frequency 1 conductivity, 3436, 15/04/2008, cpcor = -9.5700e-08
# sensor 2 = Frequency 2 pressure, 89112, 21/5/2003
# sensor 3 = Extrnl Volt 0 Oxygen, SBE, primary, 0591, 19/11/03
# sensor 4 = Extrnl Volt 1 WET Labs, ECO_AFL
# sensor 5 = Extrnl Volt 2 userpoly 0, BBRTD-385R, 20/07/2007
# sensor 6 = Extrnl Volt 3 transmissometer, primary, CST-970DR, 06/12/2006
# sensor 7 = Extrnl Volt 4 backscatterance, 2355, 29/10/2003
# sensor 8 = Extrnl Volt 5 irradiance (PAR), primary, 70168, 17/03/2008
# sensor 9 = Extrnl Volt 6 altimeter
# sensor 10 = Extrnl Volt 9 surface irradiance (SPAR), degrees = 0.0
# datcnv_date = Oct 07 2009 05:23:59, 7.16
# datcnv_in = D:\Alg 173\CTD Data\Alg173_Raw CTD data\stn001.dat D:\Alg 173\CTD Data\Alg173_Raw CTD data\algoa_0746_ACEP_072009.con
# datcnv_skipover = 0
# wildedit_date = Oct 07 2009 05:26:26, 7.16
# wildedit_in = D:\Alg 173\CTD Data\Processed data\stn001.cnv
# wildedit_pass1_nstd = 2.0
# wildedit_pass2_nstd = 10.0
# wildedit_pass2_mindelta = 0.000e+000
# wildedit_npoint = 100
# wildedit_vars = prDM c0S/m sal00 sbeox0ML/L
# wildedit_excl_bad_scans = yes
# celltm_date = Oct 07 2009 05:29:16, 7.16
# celltm_in = D:\Alg 173\CTD Data\Processed data\stn001.cnv
# celltm_alpha = 0.0300, 0.0000
# celltm_tau = 7.0000, 0.0000
# celltm_temp_sensor_use_for_cond = primary,
# filter_date = Oct 07 2009 05:32:55, 7.16
# filter_in = D:\Alg 173\CTD Data\Processed data\stn001.cnv
# filter_low_pass_tc_A = 0.030
# filter_low_pass_tc_B = 0.150
# filter_low_pass_A_vars = prDM c0S/m
# filter_low_pass_B_vars =
# loopedit_date = Oct 07 2009 05:36:22, 7.16
# loopedit_in = D:\Alg 173\CTD Data\Processed data\stn001.cnv
# loopedit_minVelocity = 0.250
# loopedit_surfaceSoak: do not remove
# loopedit_excl_bad_scans = yes
# Derive_date = Oct 07 2009 05:39:02, 7.16
# Derive_in = D:\Alg 173\CTD Data\Processed data\stn001.cnv D:\Alg 173\CTD Data\Processed data\algoa_0746_ACEP_072009.con
# binavg_date = Oct 07 2009 05:42:22, 7.16
# binavg_in = D:\Alg 173\CTD Data\Processed data\stn001.cnv
# binavg_bintype = meters
# binavg_binsize = 1
# binavg_excl_bad_scans = yes
# binavg_skipover = 0
# binavg_surface_bin = no, min = 0.000, max = 0.000, value = 0.000
# file_type = ascii
*END*
5.029 5.000 21.8056 21.8003 21.7994 5.032665 35.4488 4.65753 0.4282 2.4172e+02 9.8951e+02 0.0645 35.4489 1526.92 24.6151 0.0000e+00
6.039 6.000 21.8092 21.8039 21.8027 5.033160 35.4494 4.65977 0.4554 1.9693e+02 1.0206e+03 0.0655 35.4495 1526.94 24.6147 0.0000e+00
7.045 7.000 21.8115 21.8063 21.8049 5.033505 35.4499 4.66138 0.4422 1.9115e+02 1.0276e+03 0.0650 35.4500 1526.97 24.6144 0.0000e+00
8.052 8.000 21.8119 21.8067 21.8051 5.033587 35.4498 4.66450 0.4545 1.7034e+02 1.0387e+03 0.0655 35.4499 1526.99 24.6143 0.0000e+00
9.059 9.000 21.8121 21.8069 21.8051 5.033646 35.4499 4.66610 0.4633 1.6056e+02 1.0427e+03 0.0658 35.4499 1527.00 24.6143 0.0000e+00
10.067 10.000 21.8131 21.8079 21.8059 5.033821 35.4500 4.67016 0.4665 1.5411e+02 1.0422e+03 0.0659 35.4501 1527.02 24.6142 0.0000e+00
11.072 11.000 21.8123 21.8070 21.8048 5.033758 35.4499 4.67162 0.4667 1.4018e+02 1.1178e+03 0.0660 35.4499 1527.04 24.6144 0.0000e+00
12.082 12.000 21.8098 21.8045 21.8022 5.033462 35.4493 4.67184 0.4830 1.2395e+02 1.0651e+03 0.0666 35.4493 1527.05 24.6147 0.0000e+00
13.085 13.000 21.8060 21.8007 21.7982 5.033032 35.4488 4.66982 0.4961 1.1477e+02 1.0785e+03 0.0671 35.4486 1527.05 24.6153 0.0000e+00
14.093 14.000 21.8051 21.7999 21.7972 5.032948 35.4484 4.65928 0.4565 1.1152e+02 1.1353e+03 0.0656 35.4483 1527.07 24.6153 0.0000e+00
15.097 15.000 21.7792 21.7740 21.7710 5.030081 35.4471 4.66203 0.4687 1.0431e+02 1.3038e+03 0.0661 35.4466 1527.01 24.6213 0.0000e+00
16.108 16.000 21.7231 21.7179 21.7147 5.024489 35.4497 4.64409 0.4337 9.7048e+01 1.2136e+03 0.0647 35.4481 1526.88 24.6381 0.0000e+00
17.113 17.000 21.6283 21.6232 21.6198 5.015128 35.4548 4.63877 0.4348 8.8181e+01 1.1866e+03 0.0647 35.4514 1526.65 24.6671 0.0000e+00
18.117 18.000 21.5675 21.5623 21.5588 5.009032 35.4568 4.61598 0.4593 8.4134e+01 1.1696e+03 0.0657 35.4528 1526.51 24.6850 0.0000e+00
19.127 19.000 21.5131 21.5080 21.5043 5.003506 35.4573 4.57525 0.5364 8.3381e+01 1.1797e+03 0.0687 35.4533 1526.39 24.7005 0.0000e+00
20.128 20.000 21.4251 21.4200 21.4161 4.994048 35.4552 4.56028 0.5774 7.5384e+01 1.3882e+03 0.0702 35.4502 1526.16 24.7225 0.0000e+00
21.137 21.000 21.2996 21.2945 21.2904 4.980852 35.4549 4.53999 0.5317 6.8363e+01 1.1980e+03 0.0685 35.4483 1525.85 24.7557 0.0000e+00
22.143 22.000 21.2137 21.2086 21.2043 4.972081 35.4550 4.48993 0.4839 6.3423e+01 1.2170e+03 0.0666 35.4488 1525.63 24.7798 0.0000e+00
23.151 23.000 21.1549 21.1499 21.1454 4.965830 35.4534 4.47629 0.4570 5.9354e+01 1.2144e+03 0.0656 35.4470 1525.49 24.7946 0.0000e+00
24.158 24.000 21.1131 21.1080 21.1034 4.961652 35.4541 4.46035 0.4449 5.7152e+01 1.2104e+03 0.0651 35.4479 1525.40 24.8067 0.0000e+00
25.165 25.000 21.0963 21.0913 21.0864 4.960029 35.4535 4.43695 0.4811 5.5788e+01 1.3497e+03 0.0665 35.4484 1525.37 24.8117 0.0000e+00
26.172 26.000 21.0921 21.0871 21.0820 4.959645 35.4526 4.42531 0.4807 5.4281e+01 1.2645e+03 0.0665 35.4485 1525.37 24.8130 0.0000e+00
27.175 27.000 21.0872 21.0821 21.0769 4.959222 35.4524 4.42024 0.4323 5.1083e+01 1.2715e+03 0.0646 35.4488 1525.38 24.8147 0.0000e+00
28.186 28.000 21.0783 21.0733 21.0679 4.958398 35.4519 4.41756 0.4275 4.9388e+01 1.2812e+03 0.0645 35.4492 1525.37 24.8174 0.0000e+00
29.192 29.000 21.0722 21.0672 21.0616 4.957797 35.4513 4.41682 0.4030 4.7150e+01 1.2673e+03 0.0635 35.4491 1525.37 24.8191 0.0000e+00
30.197 30.000 21.0655 21.0605 21.0547 4.957158 35.4511 4.41458 0.4170 4.4495e+01 1.2614e+03 0.0641 35.4491 1525.37 24.8210 0.0000e+00
31.204 31.000 21.0522 21.0472 21.0412 4.955819 35.4508 4.41263 0.4055 4.2201e+01 1.3798e+03 0.0636 35.4491 1525.35 24.8246 0.0000e+00
32.213 32.000 21.0255 21.0204 21.0143 4.952889 35.4494 4.41073 0.3928 3.9706e+01 1.2389e+03 0.0632 35.4474 1525.29 24.8307 0.0000e+00
33.214 33.000 20.9885 20.9835 20.9771 4.949170 35.4495 4.39706 0.3948 3.7106e+01 1.2453e+03 0.0632 35.4478 1525.21 24.8411 0.0000e+00
34.232 34.000 20.9489 20.9438 20.9373 4.944946 35.4483 4.38765 0.3789 3.5226e+01 1.3157e+03 0.0626 35.4464 1525.12 24.8509 0.0000e+00
35.213 35.000 20.9034 20.8983 20.8916 4.940027 35.4444 4.35836 0.4258 3.5964e+01 1.5403e+03 0.0644 35.4443 1525.01 24.8617 0.0000e+00

What I would like to do is remove the header information all the way down to the *END*. I then need to add the text in bold to the beginning of each line so that it looks like:

Algoa ACEP II C09412 Oct 04 2009 05:18:3 26 23.607 S 032 57.018 E 35.213 35.000 20.9034 20.8983 20.8916 4.940027 35.4444 4.35836 0.4258 3.5964e+01 1.5403e+03 0.0644 35.4443 1525.01 24.8617 0.0000e+00

I then need to concatenate all these files into one massive file that can be imported into a program such as excel.
Can anyone give me some assistance with this project? Greatly appreciated.

I am very new to Pyhton and have very limited experience. The problem is I don't know where to start to develop a code to extracting my information.

Recommended Answers

All 12 Replies

You would have to test each record for the specific info you want to keep, "Ship", "start_time", etc., and store it, probably in a dictionary. If the example you posted has not been "improved" for our readability then you can start writing once you hit *END*, headers first and then each record in turn. You would probably want a comma delimited file, if there isn't any commas in the records themselves, to import in Excel. Start with reading one file, one record at a time. Next, initialize a dictionary with the keys you want to search for. It appears that you can search the dictionary using the first word in each record, (strip non-letters and split). Post back with some code for more assistance and see 10.8, Text Files, here http://openbookproject.net/thinkcs/python/english2e/ch10.html

Here is for starters filter to read in to list the numbers, the newline characters are still in place.

This filter disregards the *END* tag and only reads lines with exactly 15 number values on line without any other information

def fifteennumbers(a):
    sep=[x for x in a if not x.isdigit() and x not in '.eE-+']
    return sep == [' ']*15+['\n'] # spaces and newline in the end

interesting = filter(fifteennumbers, open("stn001.txt").readlines())
print ''.join(interesting)

File formed from your message as attachment (.cnv was not allowed extension for files to upload)

Here other filter with picking up the infos from beginning:

take = ["** Ship: ","** Cruise:","** Station:",
      "** Latitude:", "** Longitude:",
      "# start_time ="]

info = ""
lines = []

t=take.pop(0) ## take first marker from beginning of list take
for i in  open("stn001.txt").readlines():
    if lines or i.startswith('*END*'):
        lines.append(i)
    elif i.startswith(t):
            i = i[len(t):]
            info += i.rstrip()
            if take: t=take.pop(0)
lines=lines[1:] ## take out *END* line
for i in  [info+' '+j for j in lines]:
    print i,

Thanx to tonyjv, can I now transfrom myfiles to the desired format.

I added a file open and write function to the code. This works well, but how can I get it to open all the txt files in my folder, transform the data of each file and then save it with the "station" as the filename?

take = ["** Ship: ","** Cruise:","** Station:",
      "** Latitude:", "** Longitude:",
      "# start_time ="]
 
info = ""
lines = []

myfile = open('stn001_transformed.txt', 'w')

t=take.pop(0) ## take first marker from beginning of list take
for i in  open("stn001.txt").readlines():
    if lines or i.startswith('*END*'):
        lines.append(i)
    elif i.startswith(t):
            i = i[len(t):]
            info += i.rstrip()
            if take: t=take.pop(0)
lines=lines[1:] ## take out *END* line
for i in  [info+' '+j for j in lines]:
    myfile.write(i)
myfile.close()

How about like this? It puts the transformed files in subdirectory transformed (which you can set by changing the trans variable).

import os
trans = 'transformed'
info = ""
lines = []
list_of_tags = ["** Ship: ","** Cruise:","** Station:",
      "** Latitude:", "** Longitude:",
      "# start_time ="]

txt_list = [ x for x in os.listdir(os.curdir) if x.startswith('stn') and x.endswith('.txt') ]
if not os.path.isdir(trans):
    os.mkdir(trans)
for fn in txt_list:
    print fn,
    take = list_of_tags[:] ## copy of tag list because pop is destructive
    t = take.pop(0) ## take first marker from beginning of list take
    for i in  open(fn).readlines():
        if lines or i.startswith('*END*'):
            lines.append(i)
        elif i.startswith(t):
                i = i[len(t):]
                info += i.rstrip()
                if take: t=take.pop(0)
    lines = lines[1:] ## take out *END* line

    myfile = open(os.path.join(trans,fn), 'w')

    for i in  [info+' '+j for j in lines]:
        myfile.write(i)
    myfile.close()

Hi,

Thank you for the response. It works, saves the new text files in the new folder. List all the different files. However, there is a hick-up. It uses the data from the first file (stn001) for the other files. It doesn't use the data from stn002, etc.

lines=[] must obviously be moved to right place, it needs to be emptied for every file. I had only one data file and I made copies of that, so I did not notice the omission.

Solved?

Hi. Sorry for the late response. Was away on a trip. I get it to work, using different files but don't know how to empty or dump the previous header information.

You should really study little more programming. You only move initialisations inside loop:

import os
trans = 'transformed'
list_of_tags = ["** Ship: ","** Cruise:","** Station:",
      "** Latitude:", "** Longitude:",
      "# start_time ="]

txt_list = [ x for x in os.listdir(os.curdir) if x.startswith('stn') and x.endswith('.txt') ]
if not os.path.isdir(trans):
    os.mkdir(trans)
for fn in txt_list:
    print fn,
    lines = []
    info = ""
    take = list_of_tags[:] ## copy of tag list
    t = take.pop(0) ## take first marker from beginning of list take
    for i in  open(fn).readlines():
        if lines or i.startswith('*END*'):
            lines.append(i)
        elif i.startswith(t):
                i = i[len(t):]
                info += i.rstrip()
                if take: t=take.pop(0)
    lines = lines[1:] ## take out *END* line

    myfile = open(os.path.join(trans,fn), 'w')

    for i in  [info+' '+j for j in lines]:
        myfile.write(i)
    myfile.close()
commented: Very helpful +1

Thank you so much.It works like a dream.

Close the thread and thanks for your reputation comments. Need those as my reputation is going down hill nowadays.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.