extract column from text file

Question

sofia85 0 Junior Poster in Training

13 Years Ago

Hi,

I'm a beginner at python and I'm trying to extract a specific column from a txt file.

In the file I want to extract the entire column pph2_prob (i.e. column 16). But I want to get all the values from that column without the headline pph2_prob.

How do I accomplish that?

This is a part of what the file looks like (it contains 2 rows and 20 col):

#o_snp_id o_acc o_pos o_aa1 o_aa2 snp_id acc pos aa1 aa2 nt1 nt2 prediction based_on effect pph2_class pph2_prob pph2_FPR pph2_TPR pph2_FDR
BSND_M1I Q8WZ55 1 M I BSND_M1I Q8WZ55 1 M I G T probably damaging alignment deleterious 0.999 0.00692 0.111 0.0222
BSND_M1K Q8WZ55 1 M K BSND_M1K Q8WZ55 1 M K T A probably damaging alignment deleterious 0.999 0.00692 0.111 0.0222

Best,
Sofia

python

4 Contributors
10 Replies
4K Views
1 Day Discussion Span
Latest Post 13 Years Ago Latest Post by JoshuaBurleson

All 10 Replies

richieking 44 Master Poster

13 Years Ago

Sofia, Can you please show some organized data info.?? dealing with what you got shown here is virtually not possible.

TrustyTony 888 ex-Moderator

13 Years Ago

nt1  nt2   pred_effect   pph2_class   pph2_prob  pph2_FPR    pph2_TPR   pph2_FDR
G    T	 prob_damaging	 deleterious	0.999	   0.00692	0.111	0.0222
T    A	 prob_damaging	 deleterious	0.999	   0.00692	0.111	0.0222
A    T	 prob_damaging	 deleterious	0.997	    0.0208	0.332	0.0665

I have shortened down the original file (which contains both more columns and rows). But the general problem is how to extract the data underneath column pph2_prob, without the header pph2_prob.
Best,
Sofia

You must use code tags for data also to keep white space

TrustyTony 888 ex-Moderator

13 Years Ago

Your header does not align exactly with data columns. I used tab splitting if tabs in line to allow multiple word columns including white space in data columns.

to_extract = 'pph2_prob'
# smaller sample data without tabs and original post data
for fn in ('genetic.txt', 'genetic2.txt'):
    print('Extracting from %r' % fn)
    with open(fn) as data:
        # analyze header
        header = next(data)
        # first post had tab separation, second not. Adapth to situation
        header = [h.strip() for h in header.split('\t' if '\t' in header else None)]
        print(header)
        ind = header.index(to_extract)
        print 'Extracting value %i from each column' % ind
        # tab separation if exist, otherwise all white space splits
        pph2_prob = [float(line.split('\t' if '\t' in line else None)[ind]) for line in data]
        print(pph2_prob)
        print('')
        
"""Output:
Extracting from 'genetic.txt'
['nt1', 'nt2', 'pred_effect', 'pph2_class', 'pph2_prob', 'pph2_FPR', 'pph2_TPR', 'pph2_FDR']
Extracting value 4 from each column
[0.00692, 0.00692, 0.0208]

Extracting from 'genetic2.txt'
['#o_snp_id', 'o_acc', 'o_pos', 'o_aa1', 'o_aa2', 'snp_id', 'acc', 'pos', 'aa1', 'aa2', 'nt1', 'nt2', 'prediction', 'based_on', 'effect', 'pph2_class', 'pph2_prob', 'pph2_FPR', 'pph2_TPR', 'pph2_FDR']
Extracting value 16 from each column
[0.999, 0.999]
"""

Edited 13 Years Ago by TrustyTony because: No need to strip for float

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sofia85 0 Junior Poster in Training · Answer 1 · 2011-10-13T21:50:29+00:00

nt1 nt2 pred_effect pph2_class pph2_prob pph2_FPR pph2_TPR pph2_FDR
G T prob_damaging deleterious 0.999 0.00692 0.111 0.0222
T A prob_damaging deleterious 0.999 0.00692 0.111 0.0222
A T prob_damaging deleterious 0.997 0.0208 0.332 0.0665

I have shortened down the original file (which contains both more columns and rows). But the general problem is how to extract the data underneath column pph2_prob, without the header pph2_prob.

Best,
Sofia

sofia85 0 Junior Poster in Training · Answer 2 · 2011-10-13T21:51:45+00:00

As you can se now, in the small table there are 4 rows and 8 columns.

sofia85 0 Junior Poster in Training · Answer 3 · 2011-10-14T00:02:41+00:00

Hi,
thank you for your help, but it still doesn't work. Are you using both the files I put up on this thread in the code? Also, do I put my directory in the for loop "for fn in ('genetic.txt', 'genetic2.txt'):" instead of genetic.txt? Or just the file name? Because right now I get a error message saying "with open(fn) as data: IOerror: [Errno 21] Is a directory: '/'. What am I doing wrong?

Best,
Anna

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 4 · 2011-10-14T00:10:42+00:00

If you have files in same directory as the script, file names suffice, otherwise you must use full path or cd to directory by os.chdir before loop.

JoshuaBurleson 23 Posting Whiz · Answer 5 · 2011-10-14T05:07:30+00:00

Also, I tried this with a file someone else posted here a couple days ago, it had a different format, but I feel the idea here is the same, the way I broke it down is just different because of the format:

File:
115     139-28-4313     1056.30

135     706-02-6945      -99.06

143   595-74-5767     4289.07

155     972-87-1379     3300.26

#codes
def extract_second(file):
    col_2=[]
    with open(file) as f:
        for line in f:
            chars=[]
            line=line.split(' ')
            for char in line:
                if char not in ['',' ']:
                    chars.append(char)
            col_2.append(chars[1])
    return col_2



def extract_col(file,col):
    columns=[]
    with open(file) as f:
        for line in f:
            chars=[]
            line=line.split(' ')
            for char in line:
                if char not in ['',' ']:
                    chars.append(char)
            columns.append(chars[col])
    return columns

#results
>>>extract_second('help.txt')
['139-28-4313', '706-02-6945', '595-74-5767', '972-87-1379']
>>>
>>>extract_col('help.txt',0)
['115', '135', '143', '155']

not sure if that helps at all, hope it does. And this worked fine with what I had of yours but it could be much cleaner, I was just messing around with it.

def extract_col(file,col):
    column=[]
    with open(file) as f:
        for line in f:
            chars=[]
            line=line.split('\t')
            for char in line:
                if char not in ['',' ']:
                    chars.append(char)
            chars=' '.join(chars)
            chars=chars.split()
            column.append(chars[col])
    return column

output

extract_col('help2.txt',4)
['pph2_prob', '0.999', '0.999', '0.997']

sofia85 0 Junior Poster in Training · Answer 6 · 2011-10-14T05:56:09+00:00

sofia85 0 Junior Poster in Training

13 Years Ago

Thank you so much. I managed to solve my problem!

JoshuaBurleson 23 Posting Whiz · Answer 7 · 2011-10-14T05:58:41+00:00

Thank you so much. I managed to solve my problem!

can you show us your solution?

extract column from text file

Recommended Answers Collapse Answers

All 10 Replies

Recommended Answers