hello everyone,

my file has many long lines witth 12 ccolumns like:

gnl|dbS|13484118 gi|62750812 100 16 .......around 12 columns
gnl|dbS|13484888 gi|62750812 95 20 .......
gnl|dbS|22484118 gi|62750812 92 20 ..........

I want to grab the lines where values of column 3 lies between 90 - 99.9 in the file and store the result in new file.

Please help me if you can...!!

Thanks in advance

Recommended Answers

All 20 Replies

Here's how to open a file for reading and iterate over it:

f = open(my_file)
for line in file:
    # Do something to each line
f.close()

Here's how to split a string :

>>> my_text = "Here'ssomeTExt WithSomeMoreOver Here and then some more"
>>> my_text.split()
["Here'ssomeTExt", 'WithSomeMoreOver', 'Here', 'and', 'then', 'some', 'more']
>>>

As you can see, split returns a list so you can easily use slicing or indexing to access the third column (which would be index 2, btw)

Here's how to compare a string to a number by converting to float (you can also convert to int for integers, natch):

>>> my_number_text = '95.09'
>>> if float(my_number_text) < 100:
...     print 'That number was less than 100!'
...     
That number was less than 100!
>>>

And finally here's how you open a file for writing

f = open(my_file, 'w')
f.write('Some text here\n')
f.close()

There you go. Everything that you asked for. All in one place, now isn't that nice?

hi jlm699,

I am not able to do do it in this away can you please give me the right code??
many thanks..!!

I am not able to do do it in this away

Why not? Show me the code you've created and describe the error that you get and I will gladly help you resolve it.

Why not? Show me the code you've created and describe the error that you get and I will gladly help you resolve it.

from __future__ import with_statement

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
f = fil.readlines
for line in f:
line.split()
for text in line[2]:
text = '90.00'
if float(text) <100:
result = line


with open('C:\\Documents and Settings\\jDesktop\\j3.txt','w')as resultfile:
resultfile.write(result)


I am not getting how to readlines for column 3rd and include condition of number between 90 -100 and get those lines....:(

In the future please use code tags when pasting code in this forum as it makes your code more readable and thus more people will be willing to read your post and answer your question. Here's how to use code tags:

[code=python] # Put your code inside here

[/code]

Now here's what your code would look like had you used code tags:

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
    f = fil.readlines
    for line in f:
        line.split()
        for text in line[2]:
            text = '90.00'
            if float(text) <100:
                result = line

Now with the indentation and syntax highlighting I can immediately see that you forgot to actually call readlines . So change the line f = fil.readlines so that it reads f = fil.readlines() .

Next, you forgot to assign the split line to a variable. You can assign it to itself so that your split line looks like this line = line.split() . So now line will contain the split list instead of the actual line.

The rest of your code is confusing to me... I suggest using print statements liberally to see what type of data you have and its contents at each point. That should help you figure out what's wrong with your last bit of code.

Run this to give you an example of what's going on (it's your code with the fixes I suggested above plus some debugging output):

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        print ' |LINE| ', line
        print ' |LINE[2]| ', line[2], ' <-- Is this the value I want?'
        for text in line[2]:
            print ' |TEXT| ', text, ' <-- Was this for loop a good idea?'
            text = '90.00'
            print ' |TEXT| ', text, ' <-- Why did I just assign 90?'
            if float(text) <100:
                result = line
                print ' |RESULT| ', result, ' <-- Is this what I want?'

Thank you very much for helping me.

#
with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
#
f = fil.readlines()
#
for line in f:
#
line = line.split()
#
print ' |LINE| ', line
#
print ' |LINE[2]| ', line[2], ' <-- Is this the value I want?'

Yes, this is the column i want to access for range between (90, 100)
and print the lines having these values.

So after this please suggest something

I want my outputfile to be like:


gnl|dbS|13484888 gi|62750812 95 20 .......
gnl|dbS|22484118 gi|62750812 92 20 .........
lines with column 3 values between 90 -100 ( clored red)

Many Thanks

from __future__ import with_statement

with open ('C:\\Documents and Settings\\jguleria\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        for line[2] in line:
            if i in range(90, 100):
                print line
                #result = line

I want to do something like above
but iam getting an error message i is not defined..!!

Please suggest...Thanks

Yes, this is the column i want to access for range between (90, 100)
and print the lines having these values.

So after this please suggest something

I want my outputfile to be like:


gnl|dbS|13484888 gi|62750812 95 20 .......
gnl|dbS|22484118 gi|62750812 92 20 .........
lines with column 3 values between 90 -100 ( clored red)

Alright, so in that case you'll need to remove the for loop that iterates over line[2] and instead compare float(line[2]) to the desired value.

I hope you notice that I'm intentionally not hand feeding you this code because challenging yourself to learn how to put it all together is a stepping stone to becoming a programmer. So please don't take offense, as I'm only trying to help you help yourself.

from __future__ import with_statement

with open ('C:\\Documents and Settings\\jguleria\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        for line[2] in line:
            if i in range(90, 100):
                print line

I want to do something like above
but iam getting an error message i is not defined..!!

Please suggest...Thanks

You never defined i . I believe you meant for i to contain the value of line[2] ; however you'll want to convert it to a float() before assigning it to i .

Also, remove the for line[2] in line: as that doesn't make sense syntactically or logically

EDIT: Also keep in mind your endpoints of range. The result of a range(a, b) has the following range: [a, b) meaning you'll get values 90 - 99.

Thank you very much for help, i am getting that you do not want to feed me but helping me to learn as well..:)

My current code is:

from __future__ import with_statement

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        i = float(line[2])
        if i in range(90, 99.99):
            print line

It gave me an error message:
IndexError: list index out of range

please suggest on this....Thanks

from __future__ import with_statement

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        i = float(line[2])
        if i in range(90, 99.99):
            print line

It gave me an error message:
IndexError: list index out of range

please suggest on this....Thanks

Notice what happens in my interpreter when I use the range that you provided:

>>> range(90,99.99)
C:\Python25\Lib\site-packages\wx-2.8-msw-unicode\wx\py\PyCrust.py:1:
DeprecationWarning: integer argument expected, got float
  """PyCrust is a python shell and namespace browser application."""
[90, 91, 92, 93, 94, 95, 96, 97, 98]
>>>

range returns a list of integers, and is not the best solution for test the bounds of a floating point number.

You're best bet is probably to use the comparison statements. Luckily for you, Python provides an easy way to check if a value is within parameters.

Consider the following example:

>>> my_numbers = [1,5,15,6,21,10]
>>> for each_number in my_numbers:
...     if 1 <= each_number < 10:
...         print each_number, 'is in the range (1,10]'
...     
1 is in the range (1,10]
5 is in the range (1,10]
6 is in the range (1,10]
>>>

Note that 10 was not included in the output. I used the 'less than' ( < ) operator instead of the 'less than or equal to' ( <= ) operator.

If you're getting an IndexError it means the location you're referencing in the list doesn't exist. You assume each line has at least three columns, however I'd bet there's an empty line in there that's screwing you up. It's easy to fix in any case. Just check that the length of the list is at least three. Here's you're code, plus the fix:

from __future__ import with_statement

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        if len(line) >= 3:
            i = float(line[2])
            if i in range(90, 99.99):
                print line

Thank you very much for helping me but there is one problem in result..!!

I am just getting lines with integers not decimals !!!

i was expecting my result to print 5 lines including ( 95, 95, 91.62, 91.165 and 92)

i just got 3 lines with 95, 95 and 92..

What do you suggest on it??

here is the code:

from __future__ import with_statement

with open ('C:\\Documents and Settings\\jguleria\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        if len(line) >= 3:
            i = float(line[2])
            if i in range(90, 99.99):
                print line

and here is result:

Thanks a lot..!!

I also tried:

from __future__ import with_statement

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        if len(line) >= 3:
            i = line[2]
            for each_number in i:
                if 90 <= each_number < 100:
                    print line

Not printing anything...:(

Thank you everyone...


I fixed my problem...


Thanks a lot...:)

Hello there,

Still got another problem...:(

When I am printing the lines, i am getting all the correct ones but when iam trying to save the result in file, I am getting just one line saved in my result file..

Can you please suggest something on this??

here is the code

from __future__ import with_statement

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    for line in f:
        line = line.split()
        if len(line) >= 3:
            i = float(line[2])
            if 90 <= i < 100:
                line = ' '.join(line)
                result = line
            
               
          


with open('C:\\Documents and Settings\\Desktop\\j3.txt','w')as resultfile:
    resultfile.write(result)

The problem is that you're just assigning "line" to "result" each time, hence "result" will only contain the last line you want to print. What you want to do is make "result" a list, then append the matching line to it. Then when you want to write it all to a file you can just do outfile.write('\n'.join(result))

Its not working this way...i got a long column as in my result file instead of lines....:(

Well here's how I'd change your code. Let me know if this is what you have or doesn't work:

from __future__ import with_statement

with open ('C:\\Documents and Settings\\Desktop\\file2.txt') as fil:
    f = fil.readlines()
    result = []
    for line in f:
        line = line.split()
        if len(line) >= 3:
            i = float(line[2])
            if 90 <= i < 100:
                line = ' '.join(line)
                result.append(line)
            
with open('C:\\Documents and Settings\\Desktop\\j3.txt','w') as resultfile:
    resultfile.write('\n'.join(result))

Waoo..!!
It worked..Thanks a lot..:)

Thanks to everyone who helped me in writing my code..!!

Now i want to modify my result if possible....please help..it looks little challenging to me...

from the above code my result is:( modified to look simpler)

NC_005111.2|NC_005111 95 20 1 0 68 87 31017559 31017578 4.4 32.3
NC_005111.2|NC_005111 91.67 24 2 0 63 86 35247737 35247714 4.4 32.3
NC_005111.2|NC_005111 91.67 24 2 0 64 87 40549054 40549031 4.4 32.3
NC_005111.2|NC_005111 92 24 2 0 63 86 42462636 42462659 4.4 32.3


Here the numbers colored Red are the one i just parsed...
the problem is: because of decimals and whole numbers the next column shifted...like in first row "95" is followed by "20" but there should be some space so that all the column should be visible or you can say tab delimited...

IS IT POSSIBLE TO DO ???

Many Thanks...!!!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.