User Name Password Register
DaniWeb IT Discussion Community
All
What is DaniWeb IT Discussion Community?
You're currently browsing the Python section within the Software Development category of DaniWeb, a massive community of 456,596 software developers, web developers, Internet marketers, and tech gurus who are all enthusiastic about making contacts, networking, and learning from each other. In fact, there are 3,407 IT professionals currently interacting right now! Registration is free, only takes a minute and lets you enjoy all of the interactive features of the site.
Please support our Python advertiser: Programming Forums
Views: 2132 | Replies: 7 | Solved
Reply
Join Date: Sep 2007
Posts: 5
Reputation: grahhh is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
grahhh grahhh is offline Offline
Newbie Poster

tab-delimited data

  #1  
Sep 6th, 2007
I'm new to Python, and need a bit of help. I have a large data set that is tab delimited but annoyingly also has some extra spaces in it and I can't seem to get it in a nice array to perform computations on it.

This is a simplification of what the data looks like (hundreds of lines like this):
[two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n

I've tried doing the .readlines() on it and going line by line and splitting at the tab, but because of the extra spaces, it's still not usable, and plus, the last element of each list/line is \r\n, which I don't want, but isn't a big deal.

I also tried a regular expression, but couldn't get it to a series of vectors of usable floating point numbers (Python seems to handle the scientific notation format fine, which is nice).

I'm sure I'm missing something painfully easy and obvious. Can someone steer me in the right direction? Any help is appreciated.
AddThis Social Bookmark Button
Reply With Quote  
Join Date: Jul 2006
Location: Ingolstadt, Germany
Posts: 18
Reputation: N317V is an unknown quantity at this point 
Rep Power: 3
Solved Threads: 1
N317V's Avatar
N317V N317V is offline Offline
Newbie Poster

Re: tab-delimited data

  #2  
Sep 6th, 2007
 
>>> '  String     '.strip()
'String'

You may also want to look at the module csv.
http://docs.python.org/lib/module-csv.html

HTH
Reply With Quote  
Join Date: Oct 2004
Posts: 2,529
Reputation: vegaseat will become famous soon enough vegaseat will become famous soon enough 
Rep Power: 11
Solved Threads: 178
Moderator
vegaseat's Avatar
vegaseat vegaseat is offline Offline
DaniWeb's Hypocrite

Re: tab-delimited data

  #3  
Sep 6th, 2007
Can you live with this?
  1. """
  2. [two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n
  3. """
  4. my_line = ' 6.0730000e+003\t -9.2027000e+004\t 7.8891354e+01\t\r\n'
  5. my_list = [eval(n) for n in my_line.split(None)]
  6. print my_list
  7. """
  8. result -->
  9. [6073.0, -92027.0, 78.891354000000007]
  10. """
May 'the Google' be with you!
Reply With Quote  
Join Date: Dec 2006
Posts: 468
Reputation: woooee is on a distinguished road 
Rep Power: 2
Solved Threads: 65
woooee woooee is offline Offline
Posting Pro in Training

Re: tab-delimited data

  #4  
Sep 6th, 2007
Originally Posted by grahhh View Post
This is a simplification of what the data looks like (hundreds of lines like this):
[two spaces]6.0730000e+003[tab][one space]-9.2027000e+004[tab][two spaces]7.8891354e+01[tab]\r\n
string.split() treats all whites space (space, tab, newline) the same
s=" 6.0730000e+003\t -9.2027000e+004\t 7.8891354e+01\t\r\n"
print s.split()
['6.0730000e+003', '-9.2027000e+004', '7.8891354e+01']
Reply With Quote  
Join Date: Sep 2007
Posts: 19
Reputation: paddy3118 is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 2
paddy3118 paddy3118 is offline Offline
Newbie Poster

Solution Re: tab-delimited data

  #5  
Sep 9th, 2007
Instead of:
  1. my_list = [eval(n) for n in my_line.split(None)]
of answer #3 of vegaseat, it is always better to reduce use of eval to an absolute minimum, so if you know it's a file of floats then use float() like this:
  1. my_list = [float(n) for n in my_line.split(None)]
Reply With Quote  
Join Date: Sep 2007
Posts: 5
Reputation: grahhh is an unknown quantity at this point 
Rep Power: 0
Solved Threads: 0
grahhh grahhh is offline Offline
Newbie Poster

Re: tab-delimited data

  #6  
Sep 9th, 2007
Ah, sorry for not responding to the comments sooner. I've been a bit busy with this and other projects.

vegaseat's eval(n) for n in my_line.split(None) comment worked as I wanted, but it was extremely slow (I have to process 3 data files each with over 60,000 lines).

paddy3118's float(n) for n in my_line.split(None) was the icing on the cake. Seems to work same as vegaseat but is much faster.

Thanks for all the help!
Reply With Quote  
Join Date: Oct 2004
Posts: 2,529
Reputation: vegaseat will become famous soon enough vegaseat will become famous soon enough 
Rep Power: 11
Solved Threads: 178
Moderator
vegaseat's Avatar
vegaseat vegaseat is offline Offline
DaniWeb's Hypocrite

Re: tab-delimited data

  #7  
Sep 9th, 2007
Thanks for the extra work paddy3118 and grahhh. If you know the type you want, then int() or float() is faster and better than the more general eval().
May 'the Google' be with you!
Reply With Quote  
Join Date: Sep 2007
Posts: 19
Reputation: paddy3118 is an unknown quantity at this point 
Rep Power: 2
Solved Threads: 2
paddy3118 paddy3118 is offline Offline
Newbie Poster

Re: tab-delimited data

  #8  
Sep 10th, 2007
Originally Posted by vegaseat View Post
Thanks for the extra work paddy3118 and grahhh. If you know the type you want, then int() or float() is faster and better than the more general eval().


And it helps when validating input data. Someone can't insert text to remove all files into the middle of your input file and have it blindly executed by eval

- Paddy.
Reply With Quote  
Reply

Only community members can participate in forum threads. You must register or log in to contribute.

DaniWeb Python Marketplace
Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)

 

Thread Tools Display Modes

Similar Threads
Other Threads in the Python Forum

All times are GMT -4. The time now is 6:56 am.
Forum system based on vBulletin Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
©2003 - 2008 DaniWeb® LLC