0

The text I have:

Amiloride-sensitive cation channel, ASIC3 (also called BNC1 or MDEG) which is an acid-sensitive (proton-gated) homo- or hetero-oligomeric cation (Na+ (high affinity), Ca2+, K+) channel. It associates with DRASIC and ASIC1. It mediates touch sensation, being a mechanosensor) (lead inhibited) (Wang et al., 2006). In pulmonary tissue (lung epithelial cells) it and CFTR interregulate each other (Su et al., 2006). ASIC3 is a sensor of acidic and primary inflammatory pain (Deval et al., 2008).

Im trying to remove all instanes of X-sensitive or X-gated, etc.

My Code:

functional = r'\w{1,}( |-)(inducing|inducible|inhibited|inhibiting|responsive|gated|regulated|activated|receptor|modulated|enhanced|repressed|repressible|sensitive|dependent)'
cleantext=re.sub('\(|\)|\[|\]','',cleantext)
cleantext = re.sub(functional,'',cleantext,re.IGNORECASE)
print cleantext

Sometimes the two words are separated by a space or a dash.

But Python will only do a few instances.

cation channel, ASIC3 also called BNC1 or MDEG which is an proton-gated homo- or hetero-oligomeric cation Na+ high affinity, Ca2+, K+ channel. It associates with DRASIC and ASIC1. It mediates touch sensation, being a mechanosensor lead inhibited . In pulmonary tissue lung epithelial cells it and CFTR interregulate each other . ASIC3 is a sensor of acidic and primary inflammatory pain .

Notice that 'proton-gated' is still there? It got rid of :Amiloride-sensitive, acid-sensitive, lead inhhibited.

But IGNORES 'PROTON-GATED'

why is this? I have many instances of this where only random parts are being replaced.

Edited by [V]

3
Contributors
3
Replies
51
Views
3 Years
Discussion Span
Last Post by snippsat
0

When I replace my regex to just:
`functional = r'(inducing|inducible|inhibited|inhibiting|responsive|gated|regulated|activated|receptor|modulated|enhanced|repressed|repressible|sensitive|dependent)'

only the word 'sensitive' is removed. Everything else just sits there....`

0

I think it is a parity error. Basically, you are looking for 2 consecutive words. Take the sequence foo inhibited bar baz gated qux. There are 3 pairs of consecutive words, (foo, inhibited), (bar, baz), (gated, qux). Gated is not removed because it is not in second position in its pair.

0

If you remove re.IGNORECASE look better?

import re

data = '''\
Amiloride-sensitive cation channel, ASIC3 (also called BNC1 or MDEG) which is an acid-sensitive (proton-gated)'''

cleantext = re.sub(r'\(|\)|\[|\]', '' ,data)
cleantext = re.sub(r'\w{1,}( |-)(gated|sensitive)', '' ,cleantext)
print cleantext.strip()

'''Output-->
cation channel, ASIC3 also called BNC1 or MDEG which is an
'''

Edited by snippsat

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.