Whats wrong with my Regex code? Need simple text replacement

Question

[V] 0 Newbie Poster

11 Years Ago

The text I have:

Amiloride-sensitive cation channel, ASIC3 (also called BNC1 or MDEG) which is an acid-sensitive (proton-gated) homo- or hetero-oligomeric cation (Na+ (high affinity), Ca2+, K+) channel. It associates with DRASIC and ASIC1. It mediates touch sensation, being a mechanosensor) (lead inhibited) (Wang et al., 2006). In pulmonary tissue (lung epithelial cells) it and CFTR interregulate each other (Su et al., 2006). ASIC3 is a sensor of acidic and primary inflammatory pain (Deval et al., 2008).

Im trying to remove all instanes of X-sensitive or X-gated, etc.

My Code:

functional = r'\w{1,}( |-)(inducing|inducible|inhibited|inhibiting|responsive|gated|regulated|activated|receptor|modulated|enhanced|repressed|repressible|sensitive|dependent)'
cleantext=re.sub('\(|\)|\[|\]','',cleantext)
cleantext = re.sub(functional,'',cleantext,re.IGNORECASE)
print cleantext

Sometimes the two words are separated by a space or a dash.

But Python will only do a few instances.

cation channel, ASIC3 also called BNC1 or MDEG which is an proton-gated homo- or hetero-oligomeric cation Na+ high affinity, Ca2+, K+ channel. It associates with DRASIC and ASIC1. It mediates touch sensation, being a mechanosensor lead inhibited . In pulmonary tissue lung epithelial cells it and CFTR interregulate each other . ASIC3 is a sensor of acidic and primary inflammatory pain .

Notice that 'proton-gated' is still there? It got rid of :Amiloride-sensitive, acid-sensitive, lead inhhibited.

But IGNORES 'PROTON-GATED'

why is this? I have many instances of this where only random parts are being replaced.

python regex

Edited 11 Years Ago by [V]

3 Contributors
3 Replies
195 Views
1 Hour Discussion Span
Latest Post 11 Years Ago Latest Post by snippsat

All 3 Replies

Gribouillis 1,391 Programming Explorer

11 Years Ago

I think it is a parity error. Basically, you are looking for 2 consecutive words. Take the sequence foo inhibited bar baz gated qux. There are 3 pairs of consecutive words, (foo, inhibited), (bar, baz), (gated, qux). Gated is not removed because it is not in second position in its pair.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

[V] 0 Newbie Poster · Answer 1 · 2013-12-21T22:03:09+00:00

only the word 'sensitive' is removed. Everything else just sits there....`

snippsat 661 Master Poster · Answer 2 · 2013-12-21T23:15:28+00:00

If you remove re.IGNORECASE look better?

import re

data = '''\
Amiloride-sensitive cation channel, ASIC3 (also called BNC1 or MDEG) which is an acid-sensitive (proton-gated)'''

cleantext = re.sub(r'\(|\)|\[|\]', '' ,data)
cleantext = re.sub(r'\w{1,}( |-)(gated|sensitive)', '' ,cleantext)
print cleantext.strip()

'''Output-->
cation channel, ASIC3 also called BNC1 or MDEG which is an
'''

Whats wrong with my Regex code? Need simple text replacement

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers