python newbie help

Thread Solved

Join Date: Sep 2006
Posts: 5
Reputation: canerkocamaz is an unknown quantity at this point 
Solved Threads: 0
canerkocamaz canerkocamaz is offline Offline
Newbie Poster

python newbie help

 
0
  #1
Sep 24th, 2006
i am newbie and have a problem. i have a text file(rawfile.txt) like below.

NAMEXXXXXXXXXXX
SURNAMEXXXXXXXXXXX
DATE:23.09.2006
A B C D E F G H (column names)
40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
.
.
.
.
PAGE 1

40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
31 123 455 02.02.2006 11:22:43 654474151 YBANK
.
.
.
.
PAGE 2
.
PAGE 3
.
.
PAGE 4
.
.
NOTESXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX


i want to convert format below(tab separeted) and save (resultfile.txt).



A B C D E F G H
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK

how can i do this? Thanks.
Reply With Quote Quick reply to this message  
Join Date: Apr 2006
Posts: 148
Reputation: ghostdog74 is on a distinguished road 
Solved Threads: 40
ghostdog74 ghostdog74 is offline Offline
Junior Poster

Re: python newbie help

 
0
  #2
Sep 24th, 2006
Does all the lines you want end in "BANK"?
Last edited by ghostdog74; Sep 24th, 2006 at 10:23 am.
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 1,221
Reputation: bumsfeld will become famous soon enough bumsfeld will become famous soon enough 
Solved Threads: 137
bumsfeld's Avatar
bumsfeld bumsfeld is offline Offline
Nearly a Posting Virtuoso

Re: python newbie help

 
0
  #3
Sep 24th, 2006
If the lines you want start with number, other than header, here is easy solution:
  1. # this would be rawfile.txt
  2. str1 = """
  3. NAME:XXXXXXXXXXXX
  4. SURNAME:XXXXXXXXXXXX
  5. DATE:23.09.2006
  6. A B C D E F G H (column names)
  7. 40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
  8. 31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
  9. .
  10. .
  11. .
  12. .
  13. PAGE 1
  14.  
  15. 40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
  16. 31 123 455 02.02.2006 11:22:43 654474151 YBANK
  17. .
  18. .
  19. .
  20. .
  21. PAGE 2
  22.  
  23. .
  24. .
  25. PAGE 3
  26.  
  27. .
  28. .
  29. PAGE 4
  30.  
  31. .
  32. .
  33. NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX
  34. """
  35.  
  36. # convert to something like this ...
  37. """
  38. A B C D E F G H
  39. 40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
  40. 31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
  41. 40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
  42. 31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK
  43. """
  44. # save as resultfile.txt
  45.  
  46. # create raw_file.txt from str1 for testing
  47. fout = open("raw_file.txt", "w")
  48. fout.write(str1)
  49. fout.close()
  50.  
  51. # read in raw_file.txt as list of lines/strings
  52. fin = open("raw_file.txt", "r")
  53. line_list1 = fin.readlines()
  54. fin.close()
  55.  
  56. #print line_list1 # test
  57.  
  58. # process the list of lines
  59. # give the new list proper header
  60. line_list2 = ["A B C D E F G H\n"]
  61. for line in line_list1:
  62. lead_char = line[0]
  63. # use only line starting with a number
  64. if lead_char.isdigit():
  65. print line # test
  66. line_list2.append(line)
  67.  
  68. #print line_list2 # test
  69.  
  70. # convert processed list to string
  71. str2 = ''.join(line_list2)
  72.  
  73. print str2 # test
  74.  
  75. # write the string to file
  76. fout = open("result_file.txt", "w")
  77. fout.write(str2)
  78. fout.close()
Last edited by bumsfeld; Sep 24th, 2006 at 1:07 pm.
Reply With Quote Quick reply to this message  
Join Date: Sep 2006
Posts: 5
Reputation: canerkocamaz is an unknown quantity at this point 
Solved Threads: 0
canerkocamaz canerkocamaz is offline Offline
Newbie Poster

Re: python newbie help

 
0
  #4
Sep 24th, 2006
thanks guys...it's good for me. if some lines are empty, how can i set them default valus?

for example:
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 XXX CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 YYY
31 123 455 02.02.2005 11:22:43 ZZZ 654474151 CITYBANK

XXX,YYY,ZZZ means empty (not indicated).
SID(default):111111
Bank(default):NA
Sec(default):0

and all fields should be tab separeted, not space.
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 1,221
Reputation: bumsfeld will become famous soon enough bumsfeld will become famous soon enough 
Solved Threads: 137
bumsfeld's Avatar
bumsfeld bumsfeld is offline Offline
Nearly a Posting Virtuoso

Re: python newbie help

 
0
  #5
Sep 24th, 2006
Simply replace this part of present code:
  1. # process the list of lines
  2. # give the new list proper header
  3. #line_list2 = ["A B C D E F G H\n"]
  4. str3 = "ID A1 A2 Date Time Sec.SID Bank\n"
  5. str3.replace(" ", "\t")
  6. #print str3 # test
  7. line_list2 = [str3]
  8. for line in line_list1:
  9. lead_char = line[0]
  10. # use only line starting with a number
  11. if lead_char.isdigit():
  12. # replace space with tab
  13. line.replace(" ", "\t")
  14. #print line # test
  15. line_list2.append(line)
The thing with xxx, yyy, zzz and default values you have to explain better. For instance, is xxx contained in rawfile.txt and you want it replaced with 111111? You could do that with additional line.replace(what, with) statements.
Reply With Quote Quick reply to this message  
Join Date: Sep 2006
Posts: 5
Reputation: canerkocamaz is an unknown quantity at this point 
Solved Threads: 0
canerkocamaz canerkocamaz is offline Offline
Newbie Poster

Re: python newbie help

 
0
  #6
Sep 24th, 2006
ok...assume that column6-7-8 are variables.
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 column7 CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 column8
31 123 455 02.02.2005 11:22:43 column6 654474151 CITYBANK

column6(default):0
column7(default):11111111
column8(default):NA
and all lines should be separeted with tab, not only line names.
Reply With Quote Quick reply to this message  
Join Date: Sep 2006
Posts: 5
Reputation: canerkocamaz is an unknown quantity at this point 
Solved Threads: 0
canerkocamaz canerkocamaz is offline Offline
Newbie Poster

Re: python newbie help

 
0
  #7
Sep 24th, 2006
additionally,
if A1 or A2 columns start with "2", insert 555.
A1:250 then A1:555240
ID A1 A2 Date Time Sec SID Bank
40 555250 300 01.01.2005 13:43:21 250 12345678 KENTBANK

*****************************************************
if A1 or A2 columns start with "4", insert 666.
ID A1 A2 Date Time Sec. SID Bank
31 123 666452 02.02.2005 11:22:43 450 column7 CAPITALBANK
31 123 666455 02.07.2005 14:22:43 column6 654474151 CITYBANK
*********************************************************
to be continued....

note:i have decided to learn Python. Python,great! (i wonder python & database applications)
Reply With Quote Quick reply to this message  
Join Date: Jul 2005
Posts: 1,221
Reputation: bumsfeld will become famous soon enough bumsfeld will become famous soon enough 
Solved Threads: 137
bumsfeld's Avatar
bumsfeld bumsfeld is offline Offline
Nearly a Posting Virtuoso

Re: python newbie help

 
0
  #8
Sep 24th, 2006
Got to hurry! The waitress at the internet bistro wants to serve my meal.
Here is some more code, hope that satisfies your needs:
  1. # this would be rawfile.txt
  2. str1 = """
  3. NAME:XXXXXXXXXXXX
  4. SURNAME:XXXXXXXXXXXX
  5. DATE:23.09.2006
  6. A B C D E F G H (column names)
  7. 40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
  8. 31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
  9. .
  10. .
  11. .
  12. .
  13. PAGE 1
  14.  
  15. 40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
  16. 31 123 455 02.02.2006 11:22:43 654474151 YBANK
  17. .
  18. .
  19. PAGE 2
  20.  
  21. 40 250 240 01.11.2006 17:41:21 50 12346678
  22. .
  23. PAGE 3
  24.  
  25. .
  26. .
  27. PAGE 4
  28.  
  29. .
  30. .
  31. NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX
  32. """
  33.  
  34. # convert to ...
  35. """
  36. A B C D E F G H
  37. 40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
  38. 31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
  39. 40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
  40. 31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK
  41. """
  42. # save as resultfile.txt
  43.  
  44. # create raw_file.txt from str1 for testing
  45. fout = open("raw_file.txt", "w")
  46. fout.write(str1)
  47. fout.close()
  48.  
  49. # read in raw_file.txt as list of lines/strings
  50. fin = open("raw_file.txt", "r")
  51. line_list1 = fin.readlines()
  52. fin.close()
  53.  
  54. #print line_list1 # test
  55.  
  56. def sub_missing(line):
  57. """take string line and sub for certain missing items"""
  58. # convert string to list
  59. list1 = line.split()
  60. # if list1[1] (column A1) starts with 2 prefix with 555
  61. if list1[1].startswith('2'):
  62. list1[1] = "555" + list1[1]
  63. # dito for column A2
  64. if list1[2].startswith('2'):
  65. list1[2] = "555" + list1[2]
  66. # if list1[1] (column A1) starts with 4 prefix with 666
  67. if list1[1].startswith('4'):
  68. list1[1] = "666" + list1[1]
  69. # dito for column A2
  70. if list1[2].startswith('4'):
  71. list1[2] = "666" + list1[2]
  72. # check if item 6 is a number
  73. if not list1[6].isdigit():
  74. # item 5 of list1 would be Sec. or SID
  75. val = int(list1[5])
  76. # assume that sec value is < 1000
  77. if val < 1000:
  78. # replace missing SID with "111111"
  79. list1.insert(6, "111111")
  80. else:
  81. # replace missing Sec with 0
  82. list1.insert(5, "0")
  83. elif len(list1) < 8:
  84. # case of the missing bank name
  85. list1.append("NA")
  86. # convert list to string again, separated by tabs
  87. str1 = "\t".join(list1)
  88. return str1 + '\n'
  89.  
  90. # process the list of lines
  91. # give the new list proper header
  92. #line_list2 = ["A B C D E F G H\n"]
  93. str3 = "ID A1 A2 Date Time Sec SID Bank\n"
  94. str3.replace(" ", "\t")
  95. #print str3 # test
  96. line_list2 = [str3]
  97. for line in line_list1:
  98. lead_char = line[0]
  99. # use only line starting with a number
  100. if lead_char.isdigit():
  101. # replace space with tab
  102. line.replace(" ", "\t")
  103. # replace certain missing data items
  104. line = sub_missing(line)
  105. #print line # test
  106. line_list2.append(line)
  107.  
  108. #print line_list2 # test
  109.  
  110. # convert processed list to string
  111. str2 = ''.join(line_list2)
  112.  
  113. print str2 # test
  114.  
  115. # write the string to file
  116. fout = open("result_file.txt", "w")
  117. fout.write(str2)
  118. fout.close()
Luckily, Python makes it easy. So far it has been a brain-teaser, when it gets to be work I will stop!
Last edited by bumsfeld; Sep 24th, 2006 at 5:12 pm.
Reply With Quote Quick reply to this message  
Join Date: Sep 2006
Posts: 5
Reputation: canerkocamaz is an unknown quantity at this point 
Solved Threads: 0
canerkocamaz canerkocamaz is offline Offline
Newbie Poster

Re: python newbie help

 
0
  #9
Sep 24th, 2006
1. i have tried codes. it runs vey well. how can i use files argument?

my python file bank.py

i call file using arguments

bank.py <filename>


2. line.replace(" ","\t") doesnt work. i am trying separete all lines/columns with tab but i couldnt.
3. and bon appetite
Last edited by canerkocamaz; Sep 24th, 2006 at 5:33 pm.
Reply With Quote Quick reply to this message  
Join Date: Oct 2004
Posts: 3,983
Reputation: vegaseat is just really nice vegaseat is just really nice vegaseat is just really nice vegaseat is just really nice vegaseat is just really nice 
Solved Threads: 926
Moderator
vegaseat's Avatar
vegaseat vegaseat is offline Offline
DaniWeb's Hypocrite

Re: python newbie help

 
0
  #10
Sep 25th, 2006
Function line.replace(" ","\t") works very well, but it may depend on how many spaces your tab is set at in your editor, as you look at the result.

You could use double tabs like line.replace(" ","\t\t").

To add a commandline argument, change this part of the code ...
  1. # read in raw_file.txt as list of lines/strings
  2. fin = open("raw_file.txt", "r")
  3. line_list1 = fin.readlines()
  4. fin.close()
to this ...
  1. # use commandline argument for filename
  2. # usage eg. Bank.py myfile.txt
  3. import sys, time
  4. if len(sys.argv) > 1:
  5. filename = sys.argv[1]
  6. else:
  7. # give it a default filename
  8. filename = "raw_file.txt"
  9.  
  10. # read in data file as list of lines/strings
  11. try:
  12. fin = open(filename, "r")
  13. line_list1 = fin.readlines()
  14. fin.close()
  15. print "Successfully opened file", filename
  16. except IOError:
  17. print "\a Could not find file", filename
  18. time.sleep(3)
  19. sys.exit(1)
On your question about Python and databases, there are many modules available to make Python interface with the most common databases. You just have to google for them.
Last edited by vegaseat; Sep 25th, 2006 at 12:36 am.
May 'the Google' be with you!
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:



Similar Threads
Other Threads in the Python Forum
Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC