Hello,am newbie in python.Am trying to write a function that deletes urls in a file.The function accepts the url to be deleted as an argument.I want to use regular expression to match the url in the file and then delete it(maybe replacing it with a white space).My problem is that i cant seem to get the regular expression right.

Here is the makeup of the file:

http:// www.google.com
http://www.digg.com
http://www.digg.com/signup
http://www.cnn.com

My Code:

import re

def delete_url(url):
    
   file=open('myurls.txt','r+')
   file_content=file.read()
   
   p=re.compile('\b'+url+'\b')     #Url re (maybe wrong)
   new_content=p.sub(' ',url)
   file.write(new_content)      #Write the new string to file

Sorry if my code look too "noobish".Basically what i want is a function that deletes urls from a file based on the url passed to it.For example,

delete_url("http://www.digg.com")  

"""This should delete http://www.digg.com and not some part of urls with paths like http://www.digg.com/people/sigup

Thanks in advance

Recommended Answers

All 5 Replies

Did you try re.compile(r"\b%s\b" % url) ? Or probably better re.compile(r"\b%s\b" % re.escape(url) ?

@Gribouillis

Thanks alot.I have also tried using your own method,although it finds and replaces a string that matches the regular expression.But the problem is that it also replaces the word if found in another word.

for example

url="http://www.yahoo.com
pattern=re.compile(r'\b%s\b' % re.escape(url))
match=p.sub("","I love http://www.yahoo.com so much")
print match

#prints: I love  so much
#But

url="http://www.yahoo.com
pattern=re.compile(r'\b%s\b' % re.escape(url))
match=p.sub("","I love http://www.yahoo.com/signup so much")
print match

#prints: I love  /signup so much

I do not want the above regular extension to replace urls with paths just because some part of it matches the re

Please help me with this

You could try to match the url only if it's followed by a white space character or the end of the string, with a lookahead assertion like this r"\b%s(?=\s|$)" % url . Alternatively, you could try to match the url if it's not followed by a slash like this r"\b%s(?![/])" % url

Thanks for your quick response,i will try this and get back to you later

@Gribouillis

Thanks alot,it worked like a charm.you the man

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.