What are the rules on the data you want kept?
From what I see it's:
1) Delete the header
2) Ignore ALL punctuation (except asterisk * )
3) Ignore all text after the last parenthesis
4) If it contains "output" also ignore the last field
Is that correct?
thines01
Postaholic
2,433 posts since Oct 2009
Reputation Points: 447
Solved Threads: 408
Skill Endorsements: 7
Well, this is not very elegant, but it's functional:
import re
fileIn = open("input.txt", "rb")
fileOut = open("output.txt", "wb")
for strData in fileIn:
strData = strData.split('-')[0]
if("input" in strData):
a=re.split("\W+", strData)
fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+'\n')
if("output" in strData):
a=re.split("\W+", strData)
fileOut.write(a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6]+' '+a[7]+' '+a[8]+'\n')
fileOut.close()
fileIn.close()
thines01
Postaholic
2,433 posts since Oct 2009
Reputation Points: 447
Solved Threads: 408
Skill Endorsements: 7
great. the output is right. it's strange though when i open using output.txt using notepad, it shows:
17 BC_1 CLK input X 16 BC_1 OC_NEG input X 15 BC_1 D 1
but when i use other editor like wordpad it shows (which is what i want)
17 BC_1 CLK input X
16 BC_1 OC_NEG input X
15 BC_1 D 1
i also notice that in the code already inserted \n so it should write to a new line,not sure why notepad is showing differently.
anyway this certainly works. do you mind explaining the logic of your code, i'm trying to understand why there are 2 ifs. and the meaning of a=re.split("\W+", ?
thanks a lot
About the end of line issue, try to open the output file in mode "w" instead of "wb" to see if it changes something.
Gribouillis
Posting Maven
3,101 posts since Jul 2008
Reputation Points: 1,130
Solved Threads: 761
Skill Endorsements: 11
you are right, it works after i use the 'w' option instead of 'wb'. any comment why notepad view differently? tq
See the doc of os.linesep for more information.
Also a[1]+' '+a[2]+' '+a[3]+' '+a[4]+' '+a[5]+' '+a[6] is best written ' '.join(a[1:7])
Gribouillis
Posting Maven
3,101 posts since Jul 2008
Reputation Points: 1,130
Solved Threads: 761
Skill Endorsements: 11
Question Answered as of 1 Year Ago by
thines01
and
Gribouillis