i am newbie and have a problem. i have a text file(rawfile.txt) like below.

NAME:XXXXXXXXXXXX
SURNAME:XXXXXXXXXXXX
DATE:23.09.2006
A B C D E F G H (column names)
40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
.
.
.
.
PAGE 1

40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
31 123 455 02.02.2006 11:22:43 654474151 YBANK
.
.
.
.
PAGE 2
.
PAGE 3
.
.
PAGE 4
.
.
NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX


i want to convert format below(tab separeted) and save (resultfile.txt).

A B C D E F G H
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK

how can i do this? Thanks.

Does all the lines you want end in "BANK"?

If the lines you want start with number, other than header, here is easy solution:

# this would be rawfile.txt
str1 = """
NAME:XXXXXXXXXXXX
SURNAME:XXXXXXXXXXXX
DATE:23.09.2006
A B C D E F G H (column names)
40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
.
.
.
.
PAGE 1

40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
31 123 455 02.02.2006 11:22:43 654474151 YBANK
.
.
.
.
PAGE 2

.
.
PAGE 3

.
.
PAGE 4

.
.
NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX
"""

# convert to something like this ...
"""
A B C D E F G H
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK
"""
# save as resultfile.txt

# create raw_file.txt from str1 for testing
fout = open("raw_file.txt", "w")
fout.write(str1)
fout.close()

# read in raw_file.txt as list of lines/strings
fin = open("raw_file.txt", "r")
line_list1 = fin.readlines()
fin.close()

#print line_list1  # test

# process the list of lines
# give the new list proper header
line_list2 = ["A B C D E F G H\n"]
for line in line_list1:
    lead_char = line[0]
    # use only line starting with a number
    if lead_char.isdigit():
        print line  # test
        line_list2.append(line)

#print line_list2  # test

# convert processed list to string
str2 = ''.join(line_list2)

print str2  # test

# write the string to file
fout = open("result_file.txt", "w")
fout.write(str2)
fout.close()

thanks guys...it's good for me. if some lines are empty, how can i set them default valus?

for example:
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 XXX CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 YYY
31 123 455 02.02.2005 11:22:43 ZZZ 654474151 CITYBANK

XXX,YYY,ZZZ means empty (not indicated).
SID(default):111111
Bank(default):NA
Sec(default):0

and all fields should be tab separeted, not space.

Simply replace this part of present code:

# process the list of lines
# give the new list proper header
#line_list2 = ["A B C D E F G H\n"]
str3 = "ID A1 A2 Date Time Sec.SID Bank\n"
str3.replace(" ", "\t")
#print str3  # test
line_list2 = [str3] 
for line in line_list1:
    lead_char = line[0]
    # use only line starting with a number
    if lead_char.isdigit():
        # replace space with tab
        line.replace(" ", "\t")
        #print line  # test
        line_list2.append(line)

The thing with xxx, yyy, zzz and default values you have to explain better. For instance, is xxx contained in rawfile.txt and you want it replaced with 111111? You could do that with additional line.replace(what, with) statements.

ok...assume that column6-7-8 are variables.
ID A1 A2 Date Time Sec.SID Bank
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 column7 CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 column8
31 123 455 02.02.2005 11:22:43 column6 654474151 CITYBANK

column6(default):0
column7(default):11111111
column8(default):NA
and all lines should be separeted with tab, not only line names.

additionally,
if A1 or A2 columns start with "2", insert 555.
A1:250 then A1:555240
ID A1 A2 Date Time Sec SID Bank
40 555250 300 01.01.2005 13:43:21 250 12345678 KENTBANK

*****************************************************
if A1 or A2 columns start with "4", insert 666.
ID A1 A2 Date Time Sec. SID Bank
31 123 666452 02.02.2005 11:22:43 450 column7 CAPITALBANK
31 123 666455 02.07.2005 14:22:43 column6 654474151 CITYBANK
*********************************************************
to be continued....

note:i have decided to learn Python. Python,great! (i wonder python & database applications)

Got to hurry! The waitress at the internet bistro wants to serve my meal.
Here is some more code, hope that satisfies your needs:

# this would be rawfile.txt
str1 = """
NAME:XXXXXXXXXXXX
SURNAME:XXXXXXXXXXXX
DATE:23.09.2006
A B C D E F G H (column names)
40 250 300 01.01.2006 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2006 11:22:43 450 CAPITALBANK
.
.
.
.
PAGE 1

40 150 240 01.11.2006 17:41:21 50 12346678 XBANK
31 123 455 02.02.2006 11:22:43 654474151 YBANK
.
.
PAGE 2

40 250 240 01.11.2006 17:41:21 50 12346678
.
PAGE 3

.
.
PAGE 4

.
.
NOTES:XXXXXXX XXXXXXX XXXXXXXXXXXXXXXXXXXXXXX
"""

# convert to ...
"""
A B C D E F G H
40 250 300 01.01.2005 13:43:21 250 12345678 KENTBANK
31 123 455 02.02.2005 11:22:43 450 tab CAPITALBANK
40 150 240 01.11.2005 17:41:21 50 12346678 CITYBANK
31 123 455 02.02.2005 11:22:43 tab 654474151 CITYBANK
"""
# save as resultfile.txt

# create raw_file.txt from str1 for testing
fout = open("raw_file.txt", "w")
fout.write(str1)
fout.close()

# read in raw_file.txt as list of lines/strings
fin = open("raw_file.txt", "r")
line_list1 = fin.readlines()
fin.close()

#print line_list1  # test

def sub_missing(line):
    """take string line and sub for certain missing items"""
    # convert string to list
    list1 = line.split()
    # if list1[1] (column A1) starts with 2 prefix with 555
    if list1[1].startswith('2'):
        list1[1] = "555" + list1[1]
    # dito for column A2
    if list1[2].startswith('2'):
        list1[2] = "555" + list1[2]
    # if list1[1] (column A1) starts with 4 prefix with 666
    if list1[1].startswith('4'):
        list1[1] = "666" + list1[1]
    # dito for column A2
    if list1[2].startswith('4'):
        list1[2] = "666" + list1[2]
    # check if item 6 is a number
    if not list1[6].isdigit():
        # item 5 of list1 would be Sec. or SID
        val = int(list1[5])
        # assume that sec value is < 1000
        if val < 1000:   
            # replace missing SID with "111111"
            list1.insert(6, "111111")
        else:
            # replace missing Sec with 0
            list1.insert(5, "0")
    elif len(list1) < 8:
        # case of the missing bank name
        list1.append("NA")
    # convert list to string again, separated by tabs
    str1 = "\t".join(list1)
    return str1 + '\n'

# process the list of lines
# give the new list proper header
#line_list2 = ["A B C D E F G H\n"]
str3 = "ID A1 A2 Date Time Sec SID Bank\n"
str3.replace(" ", "\t")
#print str3  # test
line_list2 = [str3] 
for line in line_list1:
    lead_char = line[0]
    # use only line starting with a number
    if lead_char.isdigit():
        # replace space with tab
        line.replace(" ", "\t")
        # replace certain missing data items
        line = sub_missing(line)
        #print line  # test
        line_list2.append(line)

#print line_list2  # test

# convert processed list to string
str2 = ''.join(line_list2)

print str2  # test

# write the string to file
fout = open("result_file.txt", "w")
fout.write(str2)
fout.close()

Luckily, Python makes it easy. So far it has been a brain-teaser, when it gets to be work I will stop!

1. i have tried codes. it runs vey well. how can i use files argument?

my python file bank.py

i call file using arguments

bank.py <filename>


2. line.replace(" ","\t") doesnt work. i am trying separete all lines/columns with tab but i couldnt.
3. and bon appetite :)

Function line.replace(" ","\t") works very well, but it may depend on how many spaces your tab is set at in your editor, as you look at the result.

You could use double tabs like line.replace(" ","\t\t").

To add a commandline argument, change this part of the code ...

# read in raw_file.txt as list of lines/strings
fin = open("raw_file.txt", "r")
line_list1 = fin.readlines()
fin.close()

to this ...

# use commandline argument for filename
# usage eg.  Bank.py myfile.txt
import sys, time
if len(sys.argv) > 1:
    filename = sys.argv[1]
else:
    # give it a default filename
    filename = "raw_file.txt"

# read in data file as list of lines/strings
try:
    fin = open(filename, "r")
    line_list1 = fin.readlines()
    fin.close()
    print "Successfully opened file", filename
except IOError:
    print "\a Could not find file", filename
    time.sleep(3)
    sys.exit(1)

On your question about Python and databases, there are many modules available to make Python interface with the most common databases. You just have to google for them.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.