Regular Expressions Help

Question

rmbrown09 0 Newbie Poster

12 Years Ago

Hello, I am trying to create a program that will allow me to find a certain phrase, (always between brackets and always with a "'0" in it) and replace it. Basically I have something like [ k '0 ir e ] and what I need to do is use regular expressions to replace it with k_ir_e.

Also want the output to be to a different text file, leaving everything in the file the same but changing the phrase like above. (This is a syllabification document)

What I have been wresting with is getting my program to seemingly work with re.sub

I can't get even simple find and subs to work.
This is what I have so far, yet it can't even find exact words and replace them let alone regex. Any help would be great. Thanks.

#!/usr/bin/env python
import re
#Open the SyllRaw text file so that it can be read 
file = open('syllRaw.txt','r')
#Create a new file where our output will be stored
new = open('wordSyll.txt', 'w')
#Set text to contain contents of syllraw file
text = file.read()
#create variable for desired search pattern
match = re.compile(r'm')
for words in text:
    fixed = match.sub(r')

python regex

Edited 12 Years Ago by rmbrown09 because: lol

3 Contributors
7 Replies
242 Views
11 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by TrustyTony

All 7 Replies

TrustyTony 888 ex-Moderator

12 Years Ago

You give very little information on your data, but for what you need the re for?

print '_'.join(s for s in "[ k '0 ir e ]".split() if s.isalpha())
# Output:
# k_ir_e

Edited 12 Years Ago by TrustyTony

snippsat 661 Master Poster

12 Years Ago

A sample of input data would help,regex i use under may fail.

What I have been wresting with is getting my program to seemingly work with re.sub

>>> import re
>>> s = "abc [ k '0 ir e ] 123"
>>> re.sub(r'\[.*\]', 'k_ir_e', s)
'abc k_ir_e 123'

Remeber to close file object new.close()
Dont use file as variable name,that is a reserved word in python.
It is better to use with open(),then you dont need to close fileobject.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

rmbrown09 0 Newbie Poster · Answer 1 · 2012-09-18T18:45:59+00:00

I see that is helpful. I am having trouble getting a regular expression for this though, the whole file is full of these short blocks with differing contents in the syllabification.

This is an example:

Enter ASCII phone string:  Basic pron is /# [ m '0 iy ]  #/

 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ m '0 iy ] #                     /  >0      0

I need a regex that will pull the '0 and make it into the same block with m_iy only.

How can something like that be done?

rmbrown09 0 Newbie Poster · Answer 2 · 2012-09-18T19:45:35+00:00

So i at least can find the first string that I want to replace with the following

m = re.search(r'\[.+\'0.+\]',text)
print m.group()

I just need a way to run through each instance of this and replace what I find. That is my trouble.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 3 · 2012-09-18T19:55:03+00:00

>>> import re
>>> def process(data):
    for d in data.splitlines():
            m = re.search(r'\[.+\'0.+\]',d)
            if m:
                print '_'.join(s for s in m.group().split() if s.isalpha())


>>> process("""Enter ASCII phone string:  Basic pron is /# [ m '0 iy ]  #/

 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ m '0 iy ] #                     /  >0      0
""")
m_iy
m_iy
>>>

rmbrown09 0 Newbie Poster · Answer 4 · 2012-09-18T20:37:04+00:00

That would only help me if they were all the same syllabic though correct? The file is hundreds of these, all changing the pronunciation. The regular expression should be able to find them all though because they are in the same general format. For example these two. I need them to read d_ow_n_t and ae_s_k respectively.

 Enter ASCII phone string:  Basic pron is /# [ d '0 ow n t ] #/


 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ d '0 ow n t ] #                 /  >0      0

 Enter ASCII phone string:  Basic pron is /# [ '0 ae s k ] #/


 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ '0 ae s k ] #                   /  >0      0

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 5 · 2012-09-18T20:47:18+00:00

What is problem? Can you give the output you are getting, I seem to get what you want

 process(""" Enter ASCII phone string:  Basic pron is /# [ d '0 ow n t ] #/


 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ d '0 ow n t ] #                 /  >0      0

 Enter ASCII phone string:  Basic pron is /# [ '0 ae s k ] #/


 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ '0 ae s k ] #                   /  >0      0
""")
d_ow_n_t
d_ow_n_t
ae_s_k
ae_s_k
>>>

Regular Expressions Help

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers