•
•
•
•
What is DaniWeb IT Discussion Community?
You're currently browsing the Python section within the Software Development category of DaniWeb, a massive community of 456,596 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,407 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Python advertiser: Programming Forums
Views: 2132 | Replies: 7 | Solved
![]() |
•
•
Join Date: Sep 2007
Posts: 5
Reputation:
Rep Power: 0
Solved Threads: 0
I'm new to Python, and need a bit of help. I have a large data set that is tab delimited but annoyingly also has some extra spaces in it and I can't seem to get it in a nice array to perform computations on it.
This is a simplification of what the data looks like (hundreds of lines like this):
[two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n
I've tried doing the .readlines() on it and going line by line and splitting at the tab, but because of the extra spaces, it's still not usable, and plus, the last element of each list/line is \r\n, which I don't want, but isn't a big deal.
I also tried a regular expression, but couldn't get it to a series of vectors of usable floating point numbers (Python seems to handle the scientific notation format fine, which is nice).
I'm sure I'm missing something painfully easy and obvious. Can someone steer me in the right direction? Any help is appreciated.
This is a simplification of what the data looks like (hundreds of lines like this):
[two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n
I've tried doing the .readlines() on it and going line by line and splitting at the tab, but because of the extra spaces, it's still not usable, and plus, the last element of each list/line is \r\n, which I don't want, but isn't a big deal.
I also tried a regular expression, but couldn't get it to a series of vectors of usable floating point numbers (Python seems to handle the scientific notation format fine, which is nice).
I'm sure I'm missing something painfully easy and obvious. Can someone steer me in the right direction? Any help is appreciated.
•
•
Join Date: Jul 2006
Location: Ingolstadt, Germany
Posts: 18
Reputation:
Rep Power: 3
Solved Threads: 1
>>> ' String '.strip() 'String'
You may also want to look at the module csv.
http://docs.python.org/lib/module-csv.html
HTH
Can you live with this?
python Syntax (Toggle Plain Text)
""" [two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n """ my_line = ' 6.0730000e+003\t -9.2027000e+004\t 7.8891354e+01\t\r\n' my_list = [eval(n) for n in my_line.split(None)] print my_list """ result --> [6073.0, -92027.0, 78.891354000000007] """
May 'the Google' be with you!
•
•
Join Date: Dec 2006
Posts: 468
Reputation:
Rep Power: 2
Solved Threads: 65
•
•
•
•
This is a simplification of what the data looks like (hundreds of lines like this):
[two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n
s=" 6.0730000e+003\t -9.2027000e+004\t 7.8891354e+01\t\r\n"
print s.split()
['6.0730000e+003', '-9.2027000e+004', '7.8891354e+01']
•
•
Join Date: Sep 2007
Posts: 19
Reputation:
Rep Power: 2
Solved Threads: 2
Instead of:
of answer #3 of vegaseat, it is always better to reduce use of eval to an absolute minimum, so if you know it's a file of floats then use float() like this:
python Syntax (Toggle Plain Text)
my_list = [eval(n) for n in my_line.split(None)]
python Syntax (Toggle Plain Text)
my_list = [float(n) for n in my_line.split(None)]
•
•
Join Date: Sep 2007
Posts: 5
Reputation:
Rep Power: 0
Solved Threads: 0
Ah, sorry for not responding to the comments sooner. I've been a bit busy with this and other projects.
vegaseat's eval(n) for n in my_line.split(None) comment worked as I wanted, but it was extremely slow (I have to process 3 data files each with over 60,000 lines).
paddy3118's float(n) for n in my_line.split(None) was the icing on the cake. Seems to work same as vegaseat but is much faster.
Thanks for all the help!
vegaseat's eval(n) for n in my_line.split(None) comment worked as I wanted, but it was extremely slow (I have to process 3 data files each with over 60,000 lines).
paddy3118's float(n) for n in my_line.split(None) was the icing on the cake. Seems to work same as vegaseat but is much faster.
Thanks for all the help!
•
•
Join Date: Sep 2007
Posts: 19
Reputation:
Rep Power: 2
Solved Threads: 2
•
•
•
•
Thanks for the extra work paddy3118 and grahhh. If you know the type you want, then int() or float() is faster and better than the more general eval().
And it helps when validating input data. Someone can't insert text to remove all files into the middle of your input file and have it blindly executed by eval
- Paddy.
![]() |
•
•
•
•
•
•
•
•
DaniWeb Python Marketplace
•
•
•
•
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
- Save Excel as txt (Tab Delimited) (Visual Basic 4 / 5 / 6)
- Beginner working with tab-delimited text file (Perl)
- Import/export tab delimited file (PHP)
- qb 4.5 and comma-delimited data (Legacy and Other Languages)
- Beginner question: need help importing tab-delimited file as perl hash (Perl)
- Perl/CGI (Saving Data) Part III (Computer Science and Software Design)
- Adding to linked list from external file (C)
Other Threads in the Python Forum
- Previous Thread: Creating a ref to a string ?
- Next Thread: Help with simple programming...



Linear Mode