I am trying to show a diff in two files from a daily script.
i am unable to get it to read the files correctly and show what is different.

example of the files is -
<ad_xml>
<group name="group1">
</group>
<group name="group2">
<user>
<name>user1</name>
</user>
<user>
<name>user2</name>
</user>
<user>
<name>user3</name>
</user>
<user>
<name>user4</name>
</user>
<user>
<name>user5</name>
</user>
<user>
<name>user6</name>
</user>
<user>
<name>user7</name>
</user>
</group>
<group name="group3">
<user>
<name>user1</name>
</user>
<user>
<name>user2</name>
</user>
<user>
<name>user3</name>
</user>
<user>
<name>user4</name>
</user>
</group>
</ad_xml>

f1 = (line.strip() for line in open(oldfile, 'r'))

    f2 = (line.strip() for line in open(filename, 'r'))

    name = ('<name>%s</name>' % user)
    
    for name in f2:
        if name in f1:
            pass
        else:
            print "New user: %s" % user

    for name in f1:
        if name in f2:
            pass
        else:
            print "User removed: %s" % user

You have to define "different" before you begin. Do you want to compare the first record in the first file to the first record in the second file, and the second record to second record, etc.? What happens if one of the files has one record more than the other file, or is that detail not to be considered. Generally you would read the files into lists using readlines and then check first to first, second to second, etc. for a one to one check. You could also read both files, compare the records, read both files, rinse and repeat. Linux provides this with diff (man diff) if you are running one the the Linux distros.

Edited 6 Years Ago by woooee: n/a

i used difflib but the results come back very unreadable
i would like it to compare each files group and see when a user is removed or added regardless of where it is.
i am only new to alot of this so maybe im sounding a bit slow.

diff = difflib.ndiff(open(file1).readlines(), open(file2).readlines()) 
 
try:
    while 1:
        print diff.next(),
    if diff is None:
        pass
except:
    pass

set.defference() would be the easiest if record order doesn't matter, otherwise you would have to index by record number and search the next x records to determine if a record was not found, and then go back to the original point. And you could probably delete records that were just " </group>", "<user>", "</user>", etc.

file_input_1 = [ "record 1\n",
                 "record 2\n",
                 "deleted from 2nd file\n",
                 "record 3\n",
                 "record 4\n",
                 "record 5\n" ]

file_input_2 = [ "record 1\n",
                 "record 2\n",
                 "record 3\n",
                 "record 4\n",
                 "added to 2nd file\n",
                 "record 5\n" ]

set_1 = set(file_input_1)
set_2 = set(file_input_2)

diff_1_2 = set_1.difference(set_2)
diff_2_1 = set_2.difference(set_1)
print(diff_1_2)
print(diff_2_1)

Edited 6 Years Ago by woooee: n/a

If you disregard the tag info, including the qroups, which in example have different sets of user (you must split from <group to deal them separately):

oldfile = 'user.txt'
filename = 'userchanged.txt'
f1 = [line.strip() for line in open(oldfile, 'r') if line.startswith('<name>')] ## generator changed to list

f2 = [line.strip() for line in open(filename, 'r') if line.startswith('<name>')] ## generator changed to list
## this does not consider there groups group by group, that is your job
for user in ('user1','new1','user7','new2'): ## added
    name = ('<name>%s</name>' % user) ## user does not exist for loop added

    if name in f2 and name not in f1:
            print "New user: %s" % user

    elif name in f1 and name not in f2:
            print "User removed: %s" % user
    else: print "Nothing changed: %s" % user
Attachments
<ad_xml>
<group name="group1">
</group>
<group name="group2">
<user>
<name>user1</name>
</user>
<user>
<name>user2</name>
</user>
<user>
<name>user3</name>
</user>
<user>
<name>user4</name>
</user>
<user>
<name>user5</name>
</user>
<user>
<name>user6</name>
</user>
<user>
<name>user7</name>
</user>
</group>
<group name="group3">
<user>
<name>user1</name>
</user>
<user>
<name>user2</name>
</user>
<user>
<name>user3</name>
</user>
<user>
<name>user4</name>
</user>
</group>
</ad_xml>
<ad_xml>
<group name="group1">
<user>
<name>new3</name>
</user>
</group>
<group name="group2">
<user>
<name>user1</name>
</user>
<user>
<name>user2</name>
</user>
<user>
<name>user3</name>
</user>
<user>
<name>new1</name>
</user>
<user>
<name>user5</name>
</user>
<user>
<name>user6</name>
</user>
</group>
<group name="group3">
<user>
<name>user1</name>
</user>
<user>
<name>new2</name>
</user>
<user>
<name>user3</name>
</user>
<user>
<name>user4</name>
</user>
</group>
</ad_xml>

I have created this as a temp solution.
It will suffice until i can maybe use minidom or similar to read the xml data correctly.

Thank you for your help

# read files and remove whitelines
    f1 = [line.strip() for line in open(oldfile).readlines()]
    f2 = [line.strip() for line in open(filename).readlines()]
    
    for x in f2:
        if x in f1:
            pass
        else:
            print "New user: %s" % x

    for x in f1:
        if x in f2:
            pass
        else:
            print "User removed: %s" % x

Consider this code for future development:

import pretty ## my posted code snippet
oldfile = 'user.txt'

gr= open(oldfile, 'r').read().split('<group name=')

pretty.ppr(gr)

pretty.ppr(gr[2].splitlines())

f1 = [line.partition('name')
      for line in gr[2].splitlines()
      if (line.startswith('<name>') or
          line.startswith('<group')
          )
      ]
pretty.ppr(f1)

gr1=[x[0].lstrip('>') for x in [n[2].rsplit('</name>',1) for n in f1]]
print
print('Second groups users are:')
pretty.ppr(gr1)
"""Output:

['<ad_xml>\n', '"group1">\n</group>\n', '"group2">\n<user>\n<name>user1</name>\n</user>\n<user>\n<name>user2</name>\n</user>\n<user>\n<name>user3</name>\n</user>\n<user>\n<name>user4</name>\n</user>\n<user>\n<name>user5</name>\n</user>\n<user>\n<name>user6</name>\n</user>\n<user>\n<name>user7</name>\n</user>\n</group>\n', '"group3">\n<user>\n<name>user1</name>\n</user>\n<user>\n<name>user2</name>\n</user>\n<user>\n<name>user3</name>\n</user>\n<user>\n<name>user4</name>\n</user>\n</group>\n</ad_xml>\n']

['"group2">', '<user>', '<name>user1</name>', '</user>', '<user>', '<name>user2</name>', '</user>', '<user>', '<name>user3</name>', '</user>', '<user>', '<name>user4</name>', '</user>', '<user>', '<name>user5</name>', '</user>', '<user>', '<name>user6</name>', '</user>', '<user>', '<name>user7</name>', '</user>', '</group>']

[
  ('<', 'name', '>user1</name>'), 
  ('<', 'name', '>user2</name>'), 
  ('<', 'name', '>user3</name>'), 
  ('<', 'name', '>user4</name>'), 
  ('<', 'name', '>user5</name>'), 
  ('<', 'name', '>user6</name>'), 
  ('<', 'name', '>user7</name>')]

Second groups users are:

['user1', 'user2', 'user3', 'user4', 'user5', 'user6', 'user7']
>>> """

Edited 6 Years Ago by pyTony: Tidied prints

This question has already been answered. Start a new discussion instead.