I have a file named test.txt
I get the file

file=open("test.txt","r")
obj=file.read()
file.close()
print obj
a=a

b=b

c=c



d=e
e=d

e=f
f=e

f=g
g=h

All I want to do with this obj is that, I've to create a regular expression such that,

1.If the left number matches the right number, it should become a single number.ie., a=a should become a.

2.Then d=e & e=d means the same. In this case any one of them must be removed. So as for e=f & f=e.

3. Notice the newlines. Some have \n ,some have \n\n and some have \n\n\n . Make everything into a singe \n for each.

The output should be

a
b
c
d=e
e=f
f=g
g=h

Someone please help me coding the regular expression. I've tried to find one for ages, but I could'nt... Help me please.

Edited 6 Years Ago by knan: n/a

lets start easy:

lines = []
with open('test.txt', 'r') as f:
  for x in f:
    if x.strip() # lose empty lines
      lines.append(x.strip())
for line in lines:
  print(line)

This just eliminates the blank lines, then prints out the remainder. Of course you will want to do some more work. You will probably want to do something like lhs,rhs = line.split('=') at some point.

Edited 6 Years Ago by griswolf: n/a

Thank you very much. That was very helpful. How am I going to achieve the 1st and 2nd conditions. I am still trying, but i couldnt figure out a regular expression...

Thank you very much. That was very helpful. How am I going to achieve the 1st and 2nd conditions. I am still trying, but i couldnt figure out a regular expression...

You can write pseudo code to build the regular expression. You want to match this

pattern:
    either:
        symbol1 equal symbol2
        newline
        symbol2 equal symbol1
    or:
        symbol3 equal symbol3
    or:
        symbol4 equal symbol5
    newlines (0 or more)

Each of these elements has an equivalent regex pattern:

symbol1 ->  (?P<symbol1>[a-z])
symbol2 ->  (?P<symbol2>[a-z])
repeated symbol1  -> (?P=symbol1)
repeated symbol2  -> (?P=symbol2)
equal -> [=]
newline -> \n
zero or more -> *

This should give you hints to build the regular expression.

Edited 6 Years Ago by Gribouillis: n/a

Comments
Thank you very much!! I think i am nearing the answer.

lets start easy:

lines = []
with open('test.txt', 'r') as f:
  for x in f:
    if x.strip() # lose empty lines
      lines.append(x.strip())
for line in lines:
  print(line)

This just eliminates the blank lines, then prints out the remainder. Of course you will want to do some more work. You will probably want to do something like lhs,rhs = line.split('=') at some point.

Good advices. Also good of not giving ready solution, as OP must solve the problem by RE.

So I am free to post two of my non-RE solutions:

import itertools as it
with open("test.txt","r") as datasource:
    c,d = '',''
    for ab in datasource:
        if '=' in ab:
            a,b =  ab.rstrip().split('=')
            if a == b:
                print a
            else:
                if (a,b) != (d,c):
                    print '='.join((a,b))
                    c, d = a, b
 
print 60 * '-'
with open("test.txt","r") as source:
    datasource = (sorted(d.rstrip().split('='))
                  for d in source if '=' in d)
    print '\n'.join(sorted(set(a if a==b else a+'='+b for a,b in datasource)))

Edited 6 Years Ago by pyTony: unused itertools removed, with closing the files

Comments
Thank you very much. That solved the problem.
This question has already been answered. Start a new discussion instead.