943,929 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Marked Solved
  • Views: 1918
  • Python RSS
Sep 24th, 2006
0

python newbie help

Expand Post »
i am newbie and have a problem. i have a text file(rawfile.txt) like below.

NAMEXXXXXXXXXXX
SURNAMEXXXXXXXXXXX
DATE:23.09.2006
A B C D E F G H (column names)
40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
.
.
.
.
PAGE 1

40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
31 123 455 02.02.2006 11:22:43 654474151 YBANK
.
.
.
.
PAGE 2
.
PAGE 3
.
.
PAGE 4
.
.
NOTESXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX


i want to convert format below(tab separeted) and save (resultfile.txt).



A B C D E F G H
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK

how can i do this? Thanks.
Similar Threads
Reputation Points: 10
Solved Threads: 0
Newbie Poster
canerkocamaz is offline Offline
5 posts
since Sep 2006
Sep 24th, 2006
0

Re: python newbie help

Does all the lines you want end in "BANK"?
Last edited by ghostdog74; Sep 24th, 2006 at 10:23 am.
Reputation Points: 75
Solved Threads: 44
Junior Poster
ghostdog74 is offline Offline
156 posts
since Apr 2006
Sep 24th, 2006
0

Re: python newbie help

If the lines you want start with number, other than header, here is easy solution:
Python Syntax (Toggle Plain Text)
  1. # this would be rawfile.txt
  2. str1 = """
  3. NAME:XXXXXXXXXXXX
  4. SURNAME:XXXXXXXXXXXX
  5. DATE:23.09.2006
  6. A B C D E F G H (column names)
  7. 40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
  8. 31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
  9. .
  10. .
  11. .
  12. .
  13. PAGE 1
  14.  
  15. 40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
  16. 31 123 455 02.02.2006 11:22:43 654474151 YBANK
  17. .
  18. .
  19. .
  20. .
  21. PAGE 2
  22.  
  23. .
  24. .
  25. PAGE 3
  26.  
  27. .
  28. .
  29. PAGE 4
  30.  
  31. .
  32. .
  33. NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX
  34. """
  35.  
  36. # convert to something like this ...
  37. """
  38. A B C D E F G H
  39. 40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
  40. 31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
  41. 40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
  42. 31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK
  43. """
  44. # save as resultfile.txt
  45.  
  46. # create raw_file.txt from str1 for testing
  47. fout = open("raw_file.txt", "w")
  48. fout.write(str1)
  49. fout.close()
  50.  
  51. # read in raw_file.txt as list of lines/strings
  52. fin = open("raw_file.txt", "r")
  53. line_list1 = fin.readlines()
  54. fin.close()
  55.  
  56. #print line_list1 # test
  57.  
  58. # process the list of lines
  59. # give the new list proper header
  60. line_list2 = ["A B C D E F G H\n"]
  61. for line in line_list1:
  62. lead_char = line[0]
  63. # use only line starting with a number
  64. if lead_char.isdigit():
  65. print line # test
  66. line_list2.append(line)
  67.  
  68. #print line_list2 # test
  69.  
  70. # convert processed list to string
  71. str2 = ''.join(line_list2)
  72.  
  73. print str2 # test
  74.  
  75. # write the string to file
  76. fout = open("result_file.txt", "w")
  77. fout.write(str2)
  78. fout.close()
Last edited by bumsfeld; Sep 24th, 2006 at 1:07 pm.
Reputation Points: 404
Solved Threads: 180
Nearly a Posting Virtuoso
bumsfeld is offline Offline
1,422 posts
since Jul 2005
Sep 24th, 2006
0

Re: python newbie help

thanks guys...it's good for me. if some lines are empty, how can i set them default valus?

for example:
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 XXX CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 YYY
31 123 455 02.02.2005 11:22:43 ZZZ 654474151 CITYBANK

XXX,YYY,ZZZ means empty (not indicated).
SID(default):111111
Bank(default):NA
Sec(default):0

and all fields should be tab separeted, not space.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
canerkocamaz is offline Offline
5 posts
since Sep 2006
Sep 24th, 2006
0

Re: python newbie help

Simply replace this part of present code:
Python Syntax (Toggle Plain Text)
  1. # process the list of lines
  2. # give the new list proper header
  3. #line_list2 = ["A B C D E F G H\n"]
  4. str3 = "ID A1 A2 Date Time Sec.SID Bank\n"
  5. str3.replace(" ", "\t")
  6. #print str3 # test
  7. line_list2 = [str3]
  8. for line in line_list1:
  9. lead_char = line[0]
  10. # use only line starting with a number
  11. if lead_char.isdigit():
  12. # replace space with tab
  13. line.replace(" ", "\t")
  14. #print line # test
  15. line_list2.append(line)
The thing with xxx, yyy, zzz and default values you have to explain better. For instance, is xxx contained in rawfile.txt and you want it replaced with 111111? You could do that with additional line.replace(what, with) statements.
Reputation Points: 404
Solved Threads: 180
Nearly a Posting Virtuoso
bumsfeld is offline Offline
1,422 posts
since Jul 2005
Sep 24th, 2006
0

Re: python newbie help

ok...assume that column6-7-8 are variables.
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 column7 CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 column8
31 123 455 02.02.2005 11:22:43 column6 654474151 CITYBANK

column6(default):0
column7(default):11111111
column8(default):NA
and all lines should be separeted with tab, not only line names.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
canerkocamaz is offline Offline
5 posts
since Sep 2006
Sep 24th, 2006
0

Re: python newbie help

additionally,
if A1 or A2 columns start with "2", insert 555.
A1:250 then A1:555240
ID A1 A2 Date Time Sec SID Bank
40 555250 300 01.01.2005 13:43:21 250 12345678 KENTBANK

*****************************************************
if A1 or A2 columns start with "4", insert 666.
ID A1 A2 Date Time Sec. SID Bank
31 123 666452 02.02.2005 11:22:43 450 column7 CAPITALBANK
31 123 666455 02.07.2005 14:22:43 column6 654474151 CITYBANK
*********************************************************
to be continued....

note:i have decided to learn Python. Python,great! (i wonder python & database applications)
Reputation Points: 10
Solved Threads: 0
Newbie Poster
canerkocamaz is offline Offline
5 posts
since Sep 2006
Sep 24th, 2006
0

Re: python newbie help

Got to hurry! The waitress at the internet bistro wants to serve my meal.
Here is some more code, hope that satisfies your needs:
Python Syntax (Toggle Plain Text)
  1. # this would be rawfile.txt
  2. str1 = """
  3. NAME:XXXXXXXXXXXX
  4. SURNAME:XXXXXXXXXXXX
  5. DATE:23.09.2006
  6. A B C D E F G H (column names)
  7. 40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
  8. 31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
  9. .
  10. .
  11. .
  12. .
  13. PAGE 1
  14.  
  15. 40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
  16. 31 123 455 02.02.2006 11:22:43 654474151 YBANK
  17. .
  18. .
  19. PAGE 2
  20.  
  21. 40 250 240 01.11.2006 17:41:21 50 12346678
  22. .
  23. PAGE 3
  24.  
  25. .
  26. .
  27. PAGE 4
  28.  
  29. .
  30. .
  31. NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX
  32. """
  33.  
  34. # convert to ...
  35. """
  36. A B C D E F G H
  37. 40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
  38. 31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
  39. 40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
  40. 31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK
  41. """
  42. # save as resultfile.txt
  43.  
  44. # create raw_file.txt from str1 for testing
  45. fout = open("raw_file.txt", "w")
  46. fout.write(str1)
  47. fout.close()
  48.  
  49. # read in raw_file.txt as list of lines/strings
  50. fin = open("raw_file.txt", "r")
  51. line_list1 = fin.readlines()
  52. fin.close()
  53.  
  54. #print line_list1 # test
  55.  
  56. def sub_missing(line):
  57. """take string line and sub for certain missing items"""
  58. # convert string to list
  59. list1 = line.split()
  60. # if list1[1] (column A1) starts with 2 prefix with 555
  61. if list1[1].startswith('2'):
  62. list1[1] = "555" + list1[1]
  63. # dito for column A2
  64. if list1[2].startswith('2'):
  65. list1[2] = "555" + list1[2]
  66. # if list1[1] (column A1) starts with 4 prefix with 666
  67. if list1[1].startswith('4'):
  68. list1[1] = "666" + list1[1]
  69. # dito for column A2
  70. if list1[2].startswith('4'):
  71. list1[2] = "666" + list1[2]
  72. # check if item 6 is a number
  73. if not list1[6].isdigit():
  74. # item 5 of list1 would be Sec. or SID
  75. val = int(list1[5])
  76. # assume that sec value is < 1000
  77. if val < 1000:
  78. # replace missing SID with "111111"
  79. list1.insert(6, "111111")
  80. else:
  81. # replace missing Sec with 0
  82. list1.insert(5, "0")
  83. elif len(list1) < 8:
  84. # case of the missing bank name
  85. list1.append("NA")
  86. # convert list to string again, separated by tabs
  87. str1 = "\t".join(list1)
  88. return str1 + '\n'
  89.  
  90. # process the list of lines
  91. # give the new list proper header
  92. #line_list2 = ["A B C D E F G H\n"]
  93. str3 = "ID A1 A2 Date Time Sec SID Bank\n"
  94. str3.replace(" ", "\t")
  95. #print str3 # test
  96. line_list2 = [str3]
  97. for line in line_list1:
  98. lead_char = line[0]
  99. # use only line starting with a number
  100. if lead_char.isdigit():
  101. # replace space with tab
  102. line.replace(" ", "\t")
  103. # replace certain missing data items
  104. line = sub_missing(line)
  105. #print line # test
  106. line_list2.append(line)
  107.  
  108. #print line_list2 # test
  109.  
  110. # convert processed list to string
  111. str2 = ''.join(line_list2)
  112.  
  113. print str2 # test
  114.  
  115. # write the string to file
  116. fout = open("result_file.txt", "w")
  117. fout.write(str2)
  118. fout.close()
Luckily, Python makes it easy. So far it has been a brain-teaser, when it gets to be work I will stop!
Last edited by bumsfeld; Sep 24th, 2006 at 5:12 pm.
Reputation Points: 404
Solved Threads: 180
Nearly a Posting Virtuoso
bumsfeld is offline Offline
1,422 posts
since Jul 2005
Sep 24th, 2006
0

Re: python newbie help

1. i have tried codes. it runs vey well. how can i use files argument?

my python file bank.py

i call file using arguments

bank.py <filename>


2. line.replace(" ","\t") doesnt work. i am trying separete all lines/columns with tab but i couldnt.
3. and bon appetite
Last edited by canerkocamaz; Sep 24th, 2006 at 5:33 pm.
Reputation Points: 10
Solved Threads: 0
Newbie Poster
canerkocamaz is offline Offline
5 posts
since Sep 2006
Sep 25th, 2006
0

Re: python newbie help

Function line.replace(" ","\t") works very well, but it may depend on how many spaces your tab is set at in your editor, as you look at the result.

You could use double tabs like line.replace(" ","\t\t").

To add a commandline argument, change this part of the code ...
Python Syntax (Toggle Plain Text)
  1. # read in raw_file.txt as list of lines/strings
  2. fin = open("raw_file.txt", "r")
  3. line_list1 = fin.readlines()
  4. fin.close()
to this ...
Python Syntax (Toggle Plain Text)
  1. # use commandline argument for filename
  2. # usage eg. Bank.py myfile.txt
  3. import sys, time
  4. if len(sys.argv) > 1:
  5. filename = sys.argv[1]
  6. else:
  7. # give it a default filename
  8. filename = "raw_file.txt"
  9.  
  10. # read in data file as list of lines/strings
  11. try:
  12. fin = open(filename, "r")
  13. line_list1 = fin.readlines()
  14. fin.close()
  15. print "Successfully opened file", filename
  16. except IOError:
  17. print "\a Could not find file", filename
  18. time.sleep(3)
  19. sys.exit(1)
On your question about Python and databases, there are many modules available to make Python interface with the most common databases. You just have to google for them.
Last edited by vegaseat; Sep 25th, 2006 at 12:36 am.
Moderator
Reputation Points: 1333
Solved Threads: 1403
DaniWeb's Hypocrite
vegaseat is offline Offline
5,792 posts
since Oct 2004

This thread is solved

Either the thread starter or a moderator has marked this thread as solved. You can most likely trust the responses and answers given. There is most likely no reason for any further responses to be posted here. If you have a related question, please start a new thread in this forum instead.

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: Python Sets?
Next Thread in Python Forum Timeline: Directory Scanning (sort of)





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC