| | |
python newbie help
Thread Solved
![]() |
•
•
Join Date: Sep 2006
Posts: 5
Reputation:
Solved Threads: 0
i am newbie and have a problem. i have a text file(rawfile.txt) like below.
NAME
XXXXXXXXXXX
SURNAME
XXXXXXXXXXX
DATE:23.09.2006
A B C D E F G H (column names)
40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
.
.
.
.
PAGE 1
40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
31 123 455 02.02.2006 11:22:43 654474151 YBANK
.
.
.
.
PAGE 2
.
PAGE 3
.
.
PAGE 4
.
.
NOTES
XXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX
i want to convert format below(tab separeted) and save (resultfile.txt).
A B C D E F G H
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK
how can i do this? Thanks.
NAME
XXXXXXXXXXXSURNAME
XXXXXXXXXXXDATE:23.09.2006
A B C D E F G H (column names)
40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
.
.
.
.
PAGE 1
40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
31 123 455 02.02.2006 11:22:43 654474151 YBANK
.
.
.
.
PAGE 2
.
PAGE 3
.
.
PAGE 4
.
.
NOTES
XXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXXi want to convert format below(tab separeted) and save (resultfile.txt).
A B C D E F G H
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK
how can i do this? Thanks.
If the lines you want start with number, other than header, here is easy solution:
Python Syntax (Toggle Plain Text)
# this would be rawfile.txt str1 = """ NAME:XXXXXXXXXXXX SURNAME:XXXXXXXXXXXX DATE:23.09.2006 A B C D E F G H (column names) 40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK 31 123 455 02.02.2006 11:22:43 450 CAPITALBANK . . . . PAGE 1 40 150 240 01.11.2006 17:41:21 50 12346678 XBANK 31 123 455 02.02.2006 11:22:43 654474151 YBANK . . . . PAGE 2 . . PAGE 3 . . PAGE 4 . . NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX """ # convert to something like this ... """ A B C D E F G H 40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK 31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK 40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK 31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK """ # save as resultfile.txt # create raw_file.txt from str1 for testing fout = open("raw_file.txt", "w") fout.write(str1) fout.close() # read in raw_file.txt as list of lines/strings fin = open("raw_file.txt", "r") line_list1 = fin.readlines() fin.close() #print line_list1 # test # process the list of lines # give the new list proper header line_list2 = ["A B C D E F G H\n"] for line in line_list1: lead_char = line[0] # use only line starting with a number if lead_char.isdigit(): print line # test line_list2.append(line) #print line_list2 # test # convert processed list to string str2 = ''.join(line_list2) print str2 # test # write the string to file fout = open("result_file.txt", "w") fout.write(str2) fout.close()
Last edited by bumsfeld; Sep 24th, 2006 at 1:07 pm.
•
•
Join Date: Sep 2006
Posts: 5
Reputation:
Solved Threads: 0
thanks guys...it's good for me. if some lines are empty, how can i set them default valus?
for example:
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 XXX CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 YYY
31 123 455 02.02.2005 11:22:43 ZZZ 654474151 CITYBANK
XXX,YYY,ZZZ means empty (not indicated).
SID(default):111111
Bank(default):NA
Sec(default):0
and all fields should be tab separeted, not space.
for example:
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 XXX CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 YYY
31 123 455 02.02.2005 11:22:43 ZZZ 654474151 CITYBANK
XXX,YYY,ZZZ means empty (not indicated).
SID(default):111111
Bank(default):NA
Sec(default):0
and all fields should be tab separeted, not space.
Simply replace this part of present code:
The thing with xxx, yyy, zzz and default values you have to explain better. For instance, is xxx contained in rawfile.txt and you want it replaced with 111111? You could do that with additional line.replace(what, with) statements.
Python Syntax (Toggle Plain Text)
# process the list of lines # give the new list proper header #line_list2 = ["A B C D E F G H\n"] str3 = "ID A1 A2 Date Time Sec.SID Bank\n" str3.replace(" ", "\t") #print str3 # test line_list2 = [str3] for line in line_list1: lead_char = line[0] # use only line starting with a number if lead_char.isdigit(): # replace space with tab line.replace(" ", "\t") #print line # test line_list2.append(line)
•
•
Join Date: Sep 2006
Posts: 5
Reputation:
Solved Threads: 0
ok...assume that column6-7-8 are variables.
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 column7 CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 column8
31 123 455 02.02.2005 11:22:43 column6 654474151 CITYBANK
column6(default):0
column7(default):11111111
column8(default):NA
and all lines should be separeted with tab, not only line names.
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 column7 CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 column8
31 123 455 02.02.2005 11:22:43 column6 654474151 CITYBANK
column6(default):0
column7(default):11111111
column8(default):NA
and all lines should be separeted with tab, not only line names.
•
•
Join Date: Sep 2006
Posts: 5
Reputation:
Solved Threads: 0
additionally,
if A1 or A2 columns start with "2", insert 555.
A1:250 then A1:555240
ID A1 A2 Date Time Sec SID Bank
40 555250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
*****************************************************
if A1 or A2 columns start with "4", insert 666.
ID A1 A2 Date Time Sec. SID Bank
31 123 666452 02.02.2005 11:22:43 450 column7 CAPITALBANK
31 123 666455 02.07.2005 14:22:43 column6 654474151 CITYBANK
*********************************************************
to be continued....
note:i have decided to learn Python. Python,great! (i wonder python & database applications)
if A1 or A2 columns start with "2", insert 555.
A1:250 then A1:555240
ID A1 A2 Date Time Sec SID Bank
40 555250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
*****************************************************
if A1 or A2 columns start with "4", insert 666.
ID A1 A2 Date Time Sec. SID Bank
31 123 666452 02.02.2005 11:22:43 450 column7 CAPITALBANK
31 123 666455 02.07.2005 14:22:43 column6 654474151 CITYBANK
*********************************************************
to be continued....
note:i have decided to learn Python. Python,great! (i wonder python & database applications)
Got to hurry! The waitress at the internet bistro wants to serve my meal.
Here is some more code, hope that satisfies your needs:
Luckily, Python makes it easy. So far it has been a brain-teaser, when it gets to be work I will stop!
Here is some more code, hope that satisfies your needs:
Python Syntax (Toggle Plain Text)
# this would be rawfile.txt str1 = """ NAME:XXXXXXXXXXXX SURNAME:XXXXXXXXXXXX DATE:23.09.2006 A B C D E F G H (column names) 40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK 31 123 455 02.02.2006 11:22:43 450 CAPITALBANK . . . . PAGE 1 40 150 240 01.11.2006 17:41:21 50 12346678 XBANK 31 123 455 02.02.2006 11:22:43 654474151 YBANK . . PAGE 2 40 250 240 01.11.2006 17:41:21 50 12346678 . PAGE 3 . . PAGE 4 . . NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX """ # convert to ... """ A B C D E F G H 40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK 31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK 40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK 31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK """ # save as resultfile.txt # create raw_file.txt from str1 for testing fout = open("raw_file.txt", "w") fout.write(str1) fout.close() # read in raw_file.txt as list of lines/strings fin = open("raw_file.txt", "r") line_list1 = fin.readlines() fin.close() #print line_list1 # test def sub_missing(line): """take string line and sub for certain missing items""" # convert string to list list1 = line.split() # if list1[1] (column A1) starts with 2 prefix with 555 if list1[1].startswith('2'): list1[1] = "555" + list1[1] # dito for column A2 if list1[2].startswith('2'): list1[2] = "555" + list1[2] # if list1[1] (column A1) starts with 4 prefix with 666 if list1[1].startswith('4'): list1[1] = "666" + list1[1] # dito for column A2 if list1[2].startswith('4'): list1[2] = "666" + list1[2] # check if item 6 is a number if not list1[6].isdigit(): # item 5 of list1 would be Sec. or SID val = int(list1[5]) # assume that sec value is < 1000 if val < 1000: # replace missing SID with "111111" list1.insert(6, "111111") else: # replace missing Sec with 0 list1.insert(5, "0") elif len(list1) < 8: # case of the missing bank name list1.append("NA") # convert list to string again, separated by tabs str1 = "\t".join(list1) return str1 + '\n' # process the list of lines # give the new list proper header #line_list2 = ["A B C D E F G H\n"] str3 = "ID A1 A2 Date Time Sec SID Bank\n" str3.replace(" ", "\t") #print str3 # test line_list2 = [str3] for line in line_list1: lead_char = line[0] # use only line starting with a number if lead_char.isdigit(): # replace space with tab line.replace(" ", "\t") # replace certain missing data items line = sub_missing(line) #print line # test line_list2.append(line) #print line_list2 # test # convert processed list to string str2 = ''.join(line_list2) print str2 # test # write the string to file fout = open("result_file.txt", "w") fout.write(str2) fout.close()
Last edited by bumsfeld; Sep 24th, 2006 at 5:12 pm.
•
•
Join Date: Sep 2006
Posts: 5
Reputation:
Solved Threads: 0
1. i have tried codes. it runs vey well. how can i use files argument?
my python file bank.py
i call file using arguments
bank.py <filename>
2. line.replace(" ","\t") doesnt work. i am trying separete all lines/columns with tab but i couldnt.
3. and bon appetite
my python file bank.py
i call file using arguments
bank.py <filename>
2. line.replace(" ","\t") doesnt work. i am trying separete all lines/columns with tab but i couldnt.
3. and bon appetite
Last edited by canerkocamaz; Sep 24th, 2006 at 5:33 pm.
Function line.replace(" ","\t") works very well, but it may depend on how many spaces your tab is set at in your editor, as you look at the result.
You could use double tabs like line.replace(" ","\t\t").
To add a commandline argument, change this part of the code ...
to this ...
On your question about Python and databases, there are many modules available to make Python interface with the most common databases. You just have to google for them.
You could use double tabs like line.replace(" ","\t\t").
To add a commandline argument, change this part of the code ...
Python Syntax (Toggle Plain Text)
# read in raw_file.txt as list of lines/strings fin = open("raw_file.txt", "r") line_list1 = fin.readlines() fin.close()
Python Syntax (Toggle Plain Text)
# use commandline argument for filename # usage eg. Bank.py myfile.txt import sys, time if len(sys.argv) > 1: filename = sys.argv[1] else: # give it a default filename filename = "raw_file.txt" # read in data file as list of lines/strings try: fin = open(filename, "r") line_list1 = fin.readlines() fin.close() print "Successfully opened file", filename except IOError: print "\a Could not find file", filename time.sleep(3) sys.exit(1)
Last edited by vegaseat; Sep 25th, 2006 at 12:36 am.
May 'the Google' be with you!
![]() |
Similar Threads
- Python! Thoughts? (Python)
- Tuple List (Python)
Other Threads in the Python Forum
- Previous Thread: Python Sets?
- Next Thread: Directory Scanning (sort of)
| Thread Tools | Search this Thread |
abrupt accessdenied ansi anti apache application approximation argv array assignment backend beginner binary bluetooth builtin calculator change character converter countpasswordentry curved customdialog dan08 dictionary edit exe file float format function gnu heads homework ideas inches input java keyboard lapse leftmouse library line lines linux list lists loop microphone mouse movingimageswithpygame mysqlquery newb number numbers numeric output parameters parsing path phonebook pointer prime programming py2exe pygame pyopengl python random recursion redirect remote reverse scrolledtext session software sprite sqlite statictext statistics string strings syntax terminal text thread threading time tlapse tuple twoup ubuntu unicode unit urllib urllib2 variable wordgame write wxpython xlib






