Delete repeating elements in a file.

Question

knan 0 Light Poster

14 Years Ago

I have a file named test.txt
I get the file

file=open("test.txt","r")
obj=file.read()
file.close()
print obj
a=a

b=b

c=c



d=e
e=d

e=f
f=e

f=g
g=h

All I want to do with this obj is that, I've to create a regular expression such that,

1.If the left number matches the right number, it should become a single number.ie., a=a should become a.

2.Then d=e & e=d means the same. In this case any one of them must be removed. So as for e=f & f=e.

3. Notice the newlines. Some have \n ,some have \n\n and some have \n\n\n . Make everything into a singe \n for each.

The output should be

a
b
c
d=e
e=f
f=g
g=h

Someone please help me coding the regular expression. I've tried to find one for ages, but I could'nt... Help me please.

file-system python regex

Edited 14 Years Ago by knan because: n/a

4 Contributors
4 Replies
203 Views
13 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by TrustyTony

All 4 Replies

Gribouillis 1,391 Programming Explorer

14 Years Ago

Thank you very much. That was very helpful. How am I going to achieve the 1st and 2nd conditions. I am still trying, but i couldnt figure out a regular expression...

You can write pseudo code to build the regular expression. You want to match this

pattern:
    either:
        symbol1 equal symbol2
        newline
        symbol2 equal symbol1
    or:
        symbol3 equal symbol3
    or:
        symbol4 equal symbol5
    newlines (0 or more)

Each of these elements has an equivalent regex pattern:

symbol1 ->  (?P<symbol1>[a-z])
symbol2 ->  (?P<symbol2>[a-z])
repeated symbol1  -> (?P=symbol1)
repeated symbol2  -> (?P=symbol2)
equal -> [=]
newline -> \n
zero or more -> *

This should give you hints to build the regular expression.

Edited 14 Years Ago by Gribouillis because: n/a

knan commented: Thank you very much!! I think i am nearing the answer. +0

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

griswolf 304 Veteran Poster · Answer 1 · 2010-10-26T14:38:05+00:00

lets start easy:

lines = []
with open('test.txt', 'r') as f:
  for x in f:
    if x.strip() # lose empty lines
      lines.append(x.strip())
for line in lines:
  print(line)

This just eliminates the blank lines, then prints out the remainder. Of course you will want to do some more work. You will probably want to do something like lhs,rhs = line.split('=') at some point.

knan 0 Light Poster · Answer 2 · 2010-10-26T17:16:33+00:00

Thank you very much. That was very helpful. How am I going to achieve the 1st and 2nd conditions. I am still trying, but i couldnt figure out a regular expression...

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 3 · 2010-10-26T20:10:40+00:00

lets start easy:
lines = []
with open('test.txt', 'r') as f:
  for x in f:
    if x.strip() # lose empty lines
      lines.append(x.strip())
for line in lines:
  print(line)
This just eliminates the blank lines, then prints out the remainder. Of course you will want to do some more work. You will probably want to do something like lhs,rhs = line.split('=') at some point.

Good advices. Also good of not giving ready solution, as OP must solve the problem by RE.

So I am free to post two of my non-RE solutions:

import itertools as it
with open("test.txt","r") as datasource:
    c,d = '',''
    for ab in datasource:
        if '=' in ab:
            a,b =  ab.rstrip().split('=')
            if a == b:
                print a
            else:
                if (a,b) != (d,c):
                    print '='.join((a,b))
                    c, d = a, b
 
print 60 * '-'
with open("test.txt","r") as source:
    datasource = (sorted(d.rstrip().split('='))
                  for d in source if '=' in d)
    print '\n'.join(sorted(set(a if a==b else a+'='+b for a,b in datasource)))

Delete repeating elements in a file.

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers