data = re.sub(u"\u102F*\u102D", u"\u102D\\2\u102F", data)
	data = re.sub(u"\u1031*\u103B", u"\u103B\\2\u1031", data)
	data = re.sub(u"\u1001*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1002*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1004*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1007*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1012*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1013*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1014*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1015*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1016*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1017*\u102C", u"\u1001\\2\u102B", data)
	data = re.sub(u"\u1018*\u102C", u"\u1001\\2\u102B", data)

I am writing a Unicode conversion tool, but am a bit stuck at regex. As an example between these to characters U+1001 and U+102C there might one or two other characters stuck. I tried backreferencing with \2, but got throne the following error:

raise error, "invalid group reference"
sre_constants.error: invalid group reference

Also if I wanted to swap position of two Unicode Characters using regex how would I do it? If I have AB, AC, AD and I wish to change that to BA, CA and DA? Thanks.

Recommended Answers

All 2 Replies

Here is a way to swap characters

import re

swap_pat = re.compile(u"A(?:B|C|D)")

def swap(swap_match):
	s = swap_match.group(0)
	return s[1] + s[0]

def swap_sub(theString):
	return swap_pat.sub(swap, theString)

second_pat = re.compile(u"\u1013(.*)\u102C")

def second(second_match):
	middle = second_match.group(1)
	return u"\u1001%s\u102B" % middle

def second_sub(theString):
	return second_pat.sub(second, theString)

if __name__ == u"__main__":
	s = u"ACCESS TO PADDED DATA ABORTED"
	print s
	print swap_sub(s)
	
	s = u"Hello \u1013world\u102C"
	print repr(s)
	print repr(second_sub(s))

Since you have many substitutions to perform, you should try to group your regex in a single regex. Also make sure the regex are compiled only once with re.compile.
Hope this helps ...

Thanks. I will need help in groping. Will post here once the rules are complete.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.