Hello, I am new to this.

What I am trying to do is take a .txt file and write parts of it to different lists. The file contains line sof FCODE,DESCRIPTION
such as this:
DAAS44,AIRSTRIP region ruin/inactive/abandoned

I would like to split each line into FCODE and DESCRIPTION strings and append them to FCODE and DESCRIPTION lists so that I can retrieve entries in those lists later in the program.

I am reading the text file as follows:

infile = open(name, "r")
lines = infile.readlines()

This gives me a list where each line seems to be a string

## create a regular expression to find all strings beginning with RR
    ##these represent road feature codes. 
    reobj = re.compile("^RR[A-Z]")
    ## set up lists for the fcode and description
    fcode=[]
    desc=[]
    for line in lines:
        if re.search(reobj,line):
            fcline=line.split(",")
            fcode=fcline[0]
            desc=fcline[1]
            print desc
            fcode=re.split("[\b]",fcode)

What I am trying to do is split the fcode and the descriptions and get strings that each have a unique index value. While splitting the fcode and descriptions seems to work, the resulting lists only have 0,1 index values so I cant retrieve individual strings by index value.

I thought the problem might be caused by reading in the txt file all at once rather then one line at a time, but the indexing seems to work and I can take a slice of more then just [0:1] so the problem has to be with he way Im using the search regular expression.

Another thing I had considered doing beyond this is splitting the description into uppercase and lowercase sections and adding each section to its own list, however I do not know the regular expression to separate uppercase words from lowercase words and numbers.

Recommended Answers

All 5 Replies

Your program seems too complicated for what you are trying to do. You could split the lines on the first "," without regular expressions:

def rr_lines(lines):
    for line in lines:
        if line.startswith("RR"):
            yield line

def splitted_lines(lines):
    for line in rr_lines(lines):
        fcode, desc = line.split(",", 1)
        yield (fcode, desc)

if __name__ == "__main__":
    name = "filename.txt"
    infile = open(name, "r")
    for item in splitted_lines(infile):
        print item

Ok, that works well for splitting and ive never used the yield statement before. Thank you.

What Im wondering now is how I can assign the items it returns to lists since it returns all fcodes with index 0 and all descriptions with an index 1. How can I append these to lists so that each line has a unique index?

Oh, I see. Try this

fcodes, descs = zip(*splitted_lines(infile))

Ok that works, I used:

fcode, desc = zip(*splitted_lines(lines))

Thanks a lot for the tips

I've been working on the second part of my question. I want to take the description string and divide it into a string with only the uppercase words and the whitespace between uppercase words from the original string. I've tried the following:

for items in desc:        
      desc1=re.findall("([A-Z,\s])", items)

However, this returns each uppercase letter and whitespace as a string. Is there a way to extract the words and spaces as a string and leave the rest?
Thanks in advance;

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.