tab-delimited data

Question

grahhh 0 Newbie Poster

17 Years Ago

I'm new to Python, and need a bit of help. I have a large data set that is tab delimited but annoyingly also has some extra spaces in it and I can't seem to get it in a nice array to perform computations on it.

This is a simplification of what the data looks like (hundreds of lines like this):
[two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n

I've tried doing the .readlines() on it and going line by line and splitting at the tab, but because of the extra spaces, it's still not usable, and plus, the last element of each list/line is \r\n, which I don't want, but isn't a big deal.

I also tried a regular expression, but couldn't get it to a series of vectors of usable floating point numbers (Python seems to handle the scientific notation format fine, which is nice).

I'm sure I'm missing something painfully easy and obvious. Can someone steer me in the right direction? Any help is appreciated.

python

5 Contributors
7 Replies
187 Views
4 Days Discussion Span
Latest Post 17 Years Ago Latest Post by paddy3118

All 7 Replies

vegaseat 1,735 DaniWeb's Hypocrite

17 Years Ago

Can you live with this?

"""
[two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n
"""
my_line = '  6.0730000e+003\t -9.2027000e+004\t  7.8891354e+01\t\r\n'
my_list = [eval(n) for n in my_line.split(None)]
print my_list
"""
result -->
[6073.0, -92027.0, 78.891354000000007]
"""

vegaseat 1,735 DaniWeb's Hypocrite

17 Years Ago

Thanks for the extra work paddy3118 and grahhh. If you know the type you want, then int() or float() is faster and better than the more general eval().

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

N317V 10 Newbie Poster · Answer 1 · 2007-09-06T15:38:03+00:00

>>> '  String     '.strip()
'String'

You may also want to look at the module csv.
http://docs.python.org/lib/module-csv.html

HTH

woooee 814 Nearly a Posting Maven · Answer 2 · 2007-09-07T05:02:13+00:00

This is a simplification of what the data looks like (hundreds of lines like this):
[two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n

string.split() treats all whites space (space, tab, newline) the same
s=" 6.0730000e+003\t -9.2027000e+004\t 7.8891354e+01\t\r\n"
print s.split()

paddy3118 11 Light Poster · Answer 3 · 2007-09-09T16:30:48+00:00

Instead of:

my_list = [eval(n) for n in my_line.split(None)]

of answer #3 of vegaseat, it is always better to reduce use of eval to an absolute minimum, so if you know it's a file of floats then use float() like this:

my_list = [float(n) for n in my_line.split(None)]

grahhh 0 Newbie Poster · Answer 4 · 2007-09-10T04:40:06+00:00

Ah, sorry for not responding to the comments sooner. I've been a bit busy with this and other projects.

vegaseat's eval(n) for n in my_line.split(None) comment worked as I wanted, but it was extremely slow (I have to process 3 data files each with over 60,000 lines).

paddy3118's float(n) for n in my_line.split(None) was the icing on the cake. Seems to work same as vegaseat but is much faster.

Thanks for all the help!

paddy3118 11 Light Poster · Answer 5 · 2007-09-10T10:42:43+00:00

Thanks for the extra work paddy3118 and grahhh. If you know the type you want, then int() or float() is faster and better than the more general eval().

And it helps when validating input data. Someone can't insert text to remove all files into the middle of your input file and have it blindly executed by eval :)

- Paddy.

tab-delimited data

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers