0

Hello,am newbie in python.Am trying to write a function that deletes urls in a file.The function accepts the url to be deleted as an argument.I want to use regular expression to match the url in the file and then delete it(maybe replacing it with a white space).My problem is that i cant seem to get the regular expression right.

Here is the makeup of the file:

http:// www.google.com
http://www.digg.com
http://www.digg.com/signup
http://www.cnn.com

My Code:

import re

def delete_url(url):
    
   file=open('myurls.txt','r+')
   file_content=file.read()
   
   p=re.compile('\b'+url+'\b')     #Url re (maybe wrong)
   new_content=p.sub(' ',url)
   file.write(new_content)      #Write the new string to file

Sorry if my code look too "noobish".Basically what i want is a function that deletes urls from a file based on the url passed to it.For example,

delete_url("http://www.digg.com")  

"""This should delete http://www.digg.com and not some part of urls with paths like http://www.digg.com/people/sigup

Thanks in advance

2
Contributors
5
Replies
6
Views
8 Years
Discussion Span
Last Post by codedhands
0

@Gribouillis

Thanks alot.I have also tried using your own method,although it finds and replaces a string that matches the regular expression.But the problem is that it also replaces the word if found in another word.

for example

url="http://www.yahoo.com
pattern=re.compile(r'\b%s\b' % re.escape(url))
match=p.sub("","I love http://www.yahoo.com so much")
print match

#prints: I love  so much
#But

url="http://www.yahoo.com
pattern=re.compile(r'\b%s\b' % re.escape(url))
match=p.sub("","I love http://www.yahoo.com/signup so much")
print match

#prints: I love  /signup so much

I do not want the above regular extension to replace urls with paths just because some part of it matches the re

Please help me with this

0

You could try to match the url only if it's followed by a white space character or the end of the string, with a lookahead assertion like this r"\b%s(?=\s|$)" % url . Alternatively, you could try to match the url if it's not followed by a slash like this r"\b%s(?![/])" % url

0

Thanks for your quick response,i will try this and get back to you later

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.