Extract Numbers from Data Stream

Question

bumsfeld 413 Nearly a Posting Virtuoso

18 Years Ago

Decided to start my own thread rather than hijack thread "Sorting"

I followed ghostdog74 advice and used module re to extract numeric strings:

import re

data_raw = """[20]
[ 35+ ]
age = 40
(84)
100kg
  $245
"""

# use regex module re to extract numeric string
data_list = re.findall(r"\d+",data_raw)
print data_list  # ['20', '35', '40', '84', '100', '245']

That works fine, but when I change to floating point number:

import re

data_raw = """[20]
[ 35+ ]
age = 40
(84)
100kg
  $245.99
"""

# use re to extract numeric string (however, float split at '.')
data_list = re.findall(r"\d+",data_raw)
print data_list  # ['20', '35', '40', '84', '100', '245', '99']

How can I make re handle floating point numbers?

python regex

5 Contributors
9 Replies
1K Views
4 Days Discussion Span
Latest Post 18 Years Ago Latest Post by bumsfeld

All 9 Replies

mawe 6 Junior Poster

18 Years Ago

Hi!

data_list = re.findall(r'\d+(?:\.\d+)?', data_raw)

You want one or more digits \d+ , followed by a point and some more digits \.\d+ , or not ? .
The ?: prevents the () making backreferences. We want them to just cluster two patterns. You see it on the output you get without the "?:":

['', '', '', '', '', '.99']

Hope this was not too confusing ;)

Regards, mawe

vegaseat 1,735 DaniWeb's Hypocrite

18 Years Ago

Two nice solutions, wow! Now I have a question, what if we had a "-$245.99" to extract so it would give "-245.99"?

vegaseat 1,735 DaniWeb's Hypocrite

18 Years Ago

This re stuff makes my head spin! I can see that it is very powerful for text processing, but also seemingly very complex! Almost another language within Python.

Makes my head spin too, so I used this short 'regular stuff' code ...

# extract numeric value from a data stream
# caveat --> only for one number per data line
 
data_raw = """
header
[23 ]
[ 43 ]
[4323]
[-$23.44 ]
[ 12.32 ]
footer
"""
 
data_list = data_raw.split('\n')
print data_list  # test
 
num_list = []
for x in data_list:
    s = ""
    for c in x:
        if c in '1234567890.-':
            s += c
    if s:
        num_list.append(s)
 
print num_list  # ['23', '43', '4323', '-23.44', '12.32']

... you should be able to figure that one out.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

ghostdog74 57 Junior Poster · Answer 1 · 2007-01-26T06:06:36+00:00

another way is to use the "|" special character.

>>> re.findall(r'\d+\.\d+|\d+',data_raw)
['20', '35', '40', '84', '100', '245.99']
>>>

ghostdog74 57 Junior Poster · Answer 2 · 2007-01-26T07:50:46+00:00

another way is to use the "|" special character.
>>> re.findall(r'\d+\.\d+|\d+',data_raw)
['20', '35', '40', '84', '100', '245.99']
>>>

sorry somehow i can't find my edit button, but anyway
a decimal/float looks like this : 245.332 or 4.5 or 74.32
so to match them, we need one or more digits, followed by "." and followed by one or more digits again..so the expression becomes "\d+\.\d+".

ghostdog74 57 Junior Poster · Answer 3 · 2007-01-26T12:18:49+00:00

Two nice solutions, wow! Now I have a question, what if we had a "-$245.99" to extract so it would give "-245.99"?

hi, wow, getter harder.:-)
anyway, here's a rather crude way and i am sure there are better ways (using re). I did substitution first, then do the rest

>>> data_raw = """
... header
... [23 ]
... [ 43 ]
... [4323]
... [-$23.44 ]
... [ 12.32 ]
... footer
... """
>>> 
>>> re.findall(r"(-\d+\.\d+|\d+\.\d+|\d+)",re.sub(r"(-\$)","-",data_raw))
['23', '43', '4323', '-23.44', '12.32']

I still prefer not to use re though lol:)

sneekula 969 Nearly a Posting Maven · Answer 4 · 2007-01-27T02:04:36+00:00

This re stuff makes my head spin! I can see that it is very powerful for text processing, but also seemingly very complex! Almost another language within Python.

ghostdog74 57 Junior Poster · Answer 5 · 2007-01-27T08:11:08+00:00

This re stuff makes my head spin! I can see that it is very powerful for text processing, but also seemingly very complex! Almost another language within Python.

Most string manipulation problems can be solved with Python's string functions. Only very complex ones will need regexp. So try not to use regexp if possible. Of course if you are good at it , then by all means, but have to think of the next person reading your code and who don't understand regexp. Just my $0.02 :cheesy:

bumsfeld 413 Nearly a Posting Virtuoso · Answer 6 · 2007-01-29T23:34:33+00:00

Makes my head spin too, so I used this short 'regular stuff' code ...

# extract numeric value from a data stream
# caveat --> only for one number per data line
 
data_raw = """
header
[23 ]
[ 43 ]
[4323]
[-$23.44 ]
[ 12.32 ]
footer
"""
 
data_list = data_raw.split('\n')
print data_list  # test
 
num_list = []
for x in data_list:
    s = ""
    for c in x:
        if c in '1234567890.-':
            s += c
    if s:
        num_list.append(s)
 
print num_list  # ['23', '43', '4323', '-23.44', '12.32']

... you should be able to figure that one out.

Thank you, I could understand that code! Have been bitten by the re bug a little too!

Extract Numbers from Data Stream

Recommended Answers Collapse Answers

All 9 Replies

Recommended Answers