Line read problem

Question

halien 0 Newbie Poster

14 Years Ago

Hi,

I have data which begins with TEXT which is exactly 13 lines (variable number of characters) followed by BINARY DATA (exactly 262144 chars). It looks like:

BASELINE NUM: 258
MJD: 54270
SECONDS: 28321
CONFIG INDEX: 0
SOURCE INDEX: 2
FREQ INDEX: 0
POLARISATION PAIR: RR
PULSAR BIN: 0
FLAGGED: 0
WEIGHTS WRITTEN: 0
U (METRES): -18954.5
V (METRES): 99486.5
W (METRES): 54117.25
<F1><AA><E8><U+0084>^]^NDeR.<U+0085><B2><D4>B<D8>m<8B><C1><B3><91><E2>A<AA>f)
<C1>˛<91>Ad<DD> <C1>^L^^%AESC<91><U+037F>c<9E> A^F<BE>0]<8D>@H<F8>^M<C0>m<D8>]@@3<F7>^W@<FF>^Z<96>><B6><B0><E9>@g<AA><CE>?dl<86>?^\<ED><FE><BD><BF><82><88>@<E8>

This combination repeats for a few thousand times. What I need to do is read the text which contains important info, then the binary data (exactly 262144 bytes), and convert to floats. The conversion is easy, what is not is reading the TEXT, and the BiNARY DATA to the exact byte.

Here is what I tried to do:

openVis = open(testVis, 'rb')
readVis = openVis.read()

size_array = 32768#number of channels
totalTime = 60#seconds
timeSize = totalTime/2.0#seconds
fig = plt.figure()
ax = fig.add_subplot(111)
plt.ylabel('Visibility Amplitude')
plt.xlabel('Channels')
word = "BASELINE"
headerStart = readVis.find(word)
relTime = np.zeros(timeSize)
while headerStart >1:
#Reading Header Info
	baselineIn = float(readVis[headerStart+20:headerStart+23])
	if baselineIn == 258.0:
		print 'Baseline = ',baselineIn
		PoL =  readVis[headerStart+162:headerStart+164]
		if PoL == 'RR':
			print 'Polarisation = ',PoL	
			time = float(readVis[headerStart+70:headerStart+75])
			print 'Time = ',time
			relTime = time - 28321.0
			sourceIn =  readVis[headerStart+118:headerStart+119]
#Get BINARY DATA, decode, then plot		
			Vis = np.zeros(size_array)
			phase = np.zeros(size_array)
			u = readVis[headerStart+317:headerStart+317+262144]
			bit1 = 0
			bit8 = 8
			for i in range (size_array):
				visPair = struct.unpack('ff',u[bit1:bit8])
				Vis[i] = sqrt((visPair[0])**2+(visPair[1])**2) 
				phase[i] = atan(visPair[0]/visPair[1])*(180/pi)
				bit1 = bit1+8
				bit8 = bit8+8

			lgVis = np.log10(Vis)

			ax.plot(lgVis,'r+')
			plt.savefig(PLOTS+PoL+str(relTime)+'.eps')
		else:
			print 'wrong polarisation'
	
		

	else:
		print 'pbbbbbbbbt'	


	headerStart = readVis.find(word,headerStart+1)

This code would loop through 60/2 data combination (1 combination = TEXT + BINARY DATA). What I tried, was to read the entire data set, find the word "BASELINE", count characters to the end of the TEXT, and then read the BINARY DATA. However this only works if the TEXT is exactly 317 chars long. The TEXT changes size (but remains 13 lines) by 2-3 chars throughout the data set. Is there any way to read the exact size of the TEXT i.e. the number of chars?

Any help would be greatly appreciated.

cheers,
h.

python

3 Contributors
6 Replies
115 Views
1 Week Discussion Span
Latest Post 14 Years Ago Latest Post by woooee

richieking 44 Master Poster

14 Years Ago

That shouldnt be hard. How exactly is your text formatted? can you shoe a copy of that? or is the same as what you got up there.

This code would loop through 60/2 data combination (1 combination = TEXT + BINARY DATA). What I tried, was to read the entire data set, find the word "BASELINE", count characters to the end of the TEXT, and then read the BINARY DATA.

Ans also what you you mean by this.?? what is end of the text? Do you mean till you meet another baseline or what?

more info plz ;

woooee 814 Nearly a Posting Maven

14 Years Ago

It looks like you have 13 lines of normal text, by which I mean that you could open it normally and read the first 13 lines normally, and count the bytes to skip. Then close the file, open as binary and skip the number of bytes that contained text, but it is not easy to tell from the example.

Edited 14 Years Ago by woooee because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

halien 0 Newbie Poster · Answer 1 · 2010-12-03T08:10:56+00:00

That shouldnt be hard. How exactly is your text formatted? can you shoe a copy of that? or is the same as what you got up there.
Ans also what you you mean by this.?? what is end of the text? Do you mean till you meet another baseline or what?
more info plz ;

Sorry, I meant to say that after each binary data I meet another 13 line of TEXT (which starts with baseline), which has a different byte length.

woooee 814 Nearly a Posting Maven · Answer 2 · 2010-12-03T22:44:01+00:00

Since the text in the example posted appears to end with a newline, you should be able to test a reasonable length of characters, I'm using 30, for the newline and if not found, or 13 text records were found, you have the start of the binary data. An example of how you could do that follows. Another way is to test for "BASELINE" and group the data from the start of "BASELINE" to the next occurrence. You could then start at the end position of each group and slice 262144 bytes. The following appears easier to understand to me but it is personal choice. For more specific info you should include a link to part of an actual file with a 5-10 of these record groups.

file_read = """BASELINE NUM: 258\n
MJD: 54270\n
SECONDS: 28321\n
CONFIG INDEX: 0\n
SOURCE INDEX: 2\n
FREQ INDEX: 0\n
POLARISATION PAIR: RR\n
PULSAR BIN: 0\n
FLAGGED: 0\n
WEIGHTS WRITTEN: 0\n
U (METRES): -18954.5\n
V (METRES): 99486.5\n
W (METRES): 54117.25\n
<F1><AA><E8><U+0084>^]^NDeR.<U+0085><B2><D4>B<D8>m<8B><C1><B3><91><E2>A<AA>f)
<C1>A<91>Ad<DD> <C1>^L^^%AESC<91><U+037F>c<9E> A^F<BE>0]<8D>@H<F8>^M<C0>m<D8>]@@3<F7>^W@<FF>^Z<96>><B6><B0><E9>@g<AA><CE>?dl<86>?^\<ED><FE><BD><BF><82><88>@<E8>"""

def find_newline(test_this):
    print "test_this =", test_this
    for ctr, ch in enumerate(test_this):
        ## this should work but you may have to use "if ord(ch) == 10"
        if ch == "\n":
            return ctr
    return -1

position = 0
new_position = 0
text_list = []
while new_position > -1:
    new_position = find_newline(file_read[position:position+30])
    print position, new_position
    if new_position > -1:
        text_list.append(file_read[position:position+new_position])  ## removes "\n"
        ## change the following +2 to +1 because "\n" is 2 bytes in the example
        ## but is a decimal 10 (1 byte) in the file unless you are on MS Windows
        position += new_position +2          # skip "\n"

    if len(text_list) > 12:
        print "***** all text records found"

position += 1     ## skip final decimal 10 ("\n")
print "final byte location is %d and end of binary is %d" % (position, position+262144)
print text_list

halien 0 Newbie Poster · Answer 3 · 2010-12-07T15:36:03+00:00

Since the text in the example posted appears to end with a newline, you should be able to test a reasonable length of characters, I'm using 30, for the newline and if not found, or 13 text records were found, you have the start of the binary data. An example of how you could do that follows. Another way is to test for "BASELINE" and group the data from the start of "BASELINE" to the next occurrence. You could then start at the end position of each group and slice 262144 bytes. The following appears easier to understand to me but it is personal choice. For more specific info you should include a link to part of an actual file with a 5-10 of these record groups.
file_read = """BASELINE NUM: 258\n
MJD: 54270\n
SECONDS: 28321\n
CONFIG INDEX: 0\n
SOURCE INDEX: 2\n
FREQ INDEX: 0\n
POLARISATION PAIR: RR\n
PULSAR BIN: 0\n
FLAGGED: 0\n
WEIGHTS WRITTEN: 0\n
U (METRES): -18954.5\n
V (METRES): 99486.5\n
W (METRES): 54117.25\n
<F1><AA><E8><U+0084>^]^NDeR.<U+0085><B2><D4>B<D8>m<8B><C1><B3><91><E2>A<AA>f)
<C1>A<91>Ad<DD> <C1>^L^^%AESC<91><U+037F>c<9E> A^F<BE>0]<8D>@H<F8>^M<C0>m<D8>]@@3<F7>^W@<FF>^Z<96>><B6><B0><E9>@g<AA><CE>?dl<86>?^\<ED><FE><BD><BF><82><88>@<E8>"""

def find_newline(test_this):
    print "test_this =", test_this
    for ctr, ch in enumerate(test_this):
        ## this should work but you may have to use "if ord(ch) == 10"
        if ch == "\n":
            return ctr
    return -1

position = 0
new_position = 0
text_list = []
while new_position > -1:
    new_position = find_newline(file_read[position:position+30])
    print position, new_position
    if new_position > -1:
        text_list.append(file_read[position:position+new_position])  ## removes "\n"
        ## change the following +2 to +1 because "\n" is 2 bytes in the example
        ## but is a decimal 10 (1 byte) in the file unless you are on MS Windows
        position += new_position +2          # skip "\n"

    if len(text_list) > 12:
        print "***** all text records found"

position += 1     ## skip final decimal 10 ("\n")
print "final byte location is %d and end of binary is %d" % (position, position+262144)
print text_list

Hey woooee, your code worked really well with one record.
However, I've been trying to have it loop over all records, but to no avail. My first idea was to locate the start of BASELINE with find(), and set that to "position" (in your code), count the header size and have this loop over for each BASELINE position, but no luck. I've posted the code below. Here is a link to a larger data with >5 records (~9MB)
http://www.humyo.com/FXMNxRs/data/?a=8azUhOf4wA4

cheers,
h.

text_list = []
word = "BASELINE"
headerStart = file_read.find(word)
#position = headerStart
new_position = 0
print headerStart
while headerStart >-1:
    position = headerStart
    print 'Header Start=',headerStart
    while new_position > -1:
        new_position = find_newline(file_read[position:position+30])
        print position, new_position
        if new_position > -1:
            text_list.append(file_read[position:position+new_position])  ## removes "\n"
        ## change the following +2 to +1 because "\n" is 2 bytes in the example
        ## but is a decimal 10 (1 byte) in the file unless you are on MS Windows
            position += new_position +1          # skip "\n"
 
        if len(text_list) == 13:
            print "***** all text records found"
 
    position += 1     ## skip final decimal 10 ("\n")
    print "final byte location is %d and end of binary is %d" % (position, position+262144)
    print text_list
    headerStart = file_read.find(word,headerStart+1)

woooee 814 Nearly a Posting Maven · Answer 4 · 2010-12-10T22:32:05+00:00

This appears to work for the given test data.

test_data = open("./tmp/GL581_54270_28321_28325.dat", "rb").read()
all_groups = test_data.split("BASELINE")
print len(all_groups)

fp_out = open("./tmp/test_output", "wb")
for group in all_groups[1:]:     ## removes empty, first split occurance
    sep = group.partition("\n")
    fp_out.write("BASELINE %s\n" % (sep[0]))
    for x in range(12):
        sep = sep[2].partition("\n")
        fp_out.write("%s\n" % (sep[0]))
        fp_out.write("-" * 50)
        fp_out.write("\n")
    fp_out.write(sep[2])
    fp_out.write("\n")
    fp_out.write("*"*50)
    fp_out.write("\n\n")
fp_out.close()