We're a community of 1076K IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,075,703 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

Regex Replace for all to display as hyperlink

I need to find a pattern of kind APP[a-z][a-z][0-9][0-9][0-9] in the body of HTML and then replace them with hyperlinks.

I am using Beautiful Soup to replace as I am dealing with HTML content.


For Eg:
APPsd222 to APPsd222

APPfd333 to APPsd333

If you are not aware about Beautiful Soup, please tell me how to do for a String?

2
Contributors
2
Replies
1 Day
Discussion Span
1 Year Ago
Last Updated
3
Views
pythonnoobie
Newbie Poster
2 posts since Jul 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Would re.sub work for what you're doing? http://www.regular-expressions.info/python.html

Something like this?

#!/usr/bin/python

import re
import fileinput

for line in fileinput.input("test.txt"):
    print re.sub("(APP[a-z]{2}[0-9]{3})", "<a href=\"\\1\">\\1</a>", line)

Here's a test run:

-> cat test.txt 
APPsd222
APPxx333


-> python test.py 
<a href="APPsd222">APPsd222</a>

<a href="APPxx333">APPxx333</a>

I hope this helps! I'm also a python noob :)

Gromit
Posting Whiz in Training
214 posts since Sep 2008
Reputation Points: 47
Solved Threads: 33
Skill Endorsements: 0

The regex does the job. But I am facing a problem here. It also replaces the occurrences within <a href tag..

I am facing a problem as described below

modified_contents = re.sub("([^http://*/s]APP[a-z]{2}[0-9]{2})", "<a href=\"http://python.com=\\1\">\\1</a>", str)

Sample input 1:

Input File contains APPdd34

Output File contains <a href="http://python.com=APPdd34"> APPdd34</a>

Sample input 2:

Input File contains <a href="http://python.com=APPdd34"> APPdd34</a>

Output File contains <a href="http://python.com=<a href="http://python.com=APPdd34"> APPdd34</a>"> <a href="http://python.com=APPdd34"> APPdd34</a></a>

Desired Output File 2 is same as Sample Input File 2.

How can I rectify this problem?

pythonnoobie
Newbie Poster
2 posts since Jul 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page rendered in 0.0730 seconds using 2.66MB