Hello, I am trying to create a program that will allow me to find a certain phrase, (always between brackets and always with a "'0" in it) and replace it. Basically I have something like [ k '0 ir e ] and what I need to do is use regular expressions to replace it with k_ir_e.

Also want the output to be to a different text file, leaving everything in the file the same but changing the phrase like above. (This is a syllabification document)

What I have been wresting with is getting my program to seemingly work with re.sub

I can't get even simple find and subs to work.
This is what I have so far, yet it can't even find exact words and replace them let alone regex. Any help would be great. Thanks.

#!/usr/bin/env python
import re
#Open the SyllRaw text file so that it can be read 
file = open('syllRaw.txt','r')
#Create a new file where our output will be stored
new = open('wordSyll.txt', 'w')
#Set text to contain contents of syllraw file
text = file.read()
#create variable for desired search pattern
match = re.compile(r'm')
for words in text:
    fixed = match.sub(r')

Recommended Answers

All 7 Replies

You give very little information on your data, but for what you need the re for?

print '_'.join(s for s in "[ k '0 ir e ]".split() if s.isalpha())
# Output:
# k_ir_e

A sample of input data would help,regex i use under may fail.

What I have been wresting with is getting my program to seemingly work with re.sub

>>> import re
>>> s = "abc [ k '0 ir e ] 123"
>>> re.sub(r'\[.*\]', 'k_ir_e', s)
'abc k_ir_e 123'

Remeber to close file object new.close()
Dont use file as variable name,that is a reserved word in python.
It is better to use with open(),then you dont need to close fileobject.

I see that is helpful. I am having trouble getting a regular expression for this though, the whole file is full of these short blocks with differing contents in the syllabification.

This is an example:

Enter ASCII phone string:  Basic pron is /# [ m '0 iy ]  #/

 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ m '0 iy ] #                     /  >0      0

I need a regex that will pull the '0 and make it into the same block with m_iy only.

How can something like that be done?

So i at least can find the first string that I want to replace with the following

m = re.search(r'\[.+\'0.+\]',text)
print m.group()

I just need a way to run through each instance of this and replace what I find. That is my trouble.

>>> import re
>>> def process(data):
    for d in data.splitlines():
            m = re.search(r'\[.+\'0.+\]',d)
            if m:
                print '_'.join(s for s in m.group().split() if s.isalpha())


>>> process("""Enter ASCII phone string:  Basic pron is /# [ m '0 iy ]  #/

 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ m '0 iy ] #                     /  >0      0
""")
m_iy
m_iy
>>> 

That would only help me if they were all the same syllabic though correct? The file is hundreds of these, all changing the pronunciation. The regular expression should be able to find them all though because they are in the same general format. For example these two. I need them to read d_ow_n_t and ae_s_k respectively.

 Enter ASCII phone string:  Basic pron is /# [ d '0 ow n t ] #/


 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ d '0 ow n t ] #                 /  >0      0

 Enter ASCII phone string:  Basic pron is /# [ '0 ae s k ] #/


 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ '0 ae s k ] #                   /  >0      0

What is problem? Can you give the output you are getting, I seem to get what you want

 process(""" Enter ASCII phone string:  Basic pron is /# [ d '0 ow n t ] #/


 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ d '0 ow n t ] #                 /  >0      0

 Enter ASCII phone string:  Basic pron is /# [ '0 ae s k ] #/


 No. of prons = 1
 They are:
 #  Pronunciation ................         Rate  Lects
 1 /# [ '0 ae s k ] #                   /  >0      0
""")
d_ow_n_t
d_ow_n_t
ae_s_k
ae_s_k
>>> 
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.