Hello,

I am very new to python. I have two files containing multiple lines of files name, and I need to filter out if file name present on file1 than remove it from file2 and save the file2 difference into a third file.

i.e.

file1:
/users/ux454500/radpres.tar
/paci/ucb/ux453039/source/amr.12.20.2002.tar~
/paci/ucb/ux453039/source/amr.1.25.2003.htar.idx
/paci/ucb/ux453039/source/amr.1.18.2003.htar.idx

file2:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/source/amr.12.20.2002.tar~ 03/10/2005 20:56:50 09/02/2007 10:35:41
FILE /paci/ucb/ux453039/source/amr.1.25.2003.htar.idx 02/23/2007 14:20:15 08/27/2007 14:53:48
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

file3:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

Kindly help.

With my best regards,

We are glad to check your code, either push the (CODE) button right before doing paste or select the code you pasted and push the (CODE) button after.

Edited 3 Years Ago by mike_2000_17: Fixed formatting

file1:
/users/ux454500/radpres.tar
/paci/ucb/ux453039/source/amr.12.20.2002.tar~
/paci/ucb/ux453039/source/amr.1.25.2003.htar.idx
/paci/ucb/ux453039/source/amr.1.18.2003.htar.idx

file2:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/source/amr.12.20.2002.tar~ 03/10/2005 20:56:50 09/02/2007 10:35:41
FILE /paci/ucb/ux453039/source/amr.1.25.2003.htar.idx 02/23/2007 14:20:15 08/27/2007 14:53:48
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

file3:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi
file1:
/users/ux454500/radpres.tar
/paci/ucb/ux453039/source/amr.12.20.2002.tar~
/paci/ucb/ux453039/source/amr.1.25.2003.htar.idx
/paci/ucb/ux453039/source/amr.1.18.2003.htar.idx

file2:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/source/amr.12.20.2002.tar~ 03/10/2005 20:56:50 09/02/2007 10:35:41
FILE /paci/ucb/ux453039/source/amr.1.25.2003.htar.idx 02/23/2007 14:20:15 08/27/2007 14:53:48
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

file3:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

and your effort on the code? Please read the forum rules, or for a simplified version; pytony's signature.

I am sorry. This is my first time login and post the question. Attached is the code that I copy from someone had posted before and try to modify and use it.

Thank you very much for your help.

#!/usr/local/bin/python -u
# Filename: file_comp.py

file1=['/Users/file_comp/1.md']
file2=['/Users/file_comp/2.out']
file3=['/Users/file_comp/3.comp']


def key(line):
    return tuple(line.strip().split()[0:2])
 
def make_key_set(file_path):
    return set(key(line) for line in open(file_path))
 
 
def filtered_lines(file_path1, file_path2):
    key_set = make_key_set(file_path2)
    return (line for line in open(file_path1) if key(line) in key_set)
 
if __name__ == "__main__":
    file3 = open("file3", "w")
    for line in filtered_lines("file1", "file2"):
        file3.write(line)
    file3.close()
import os

with open('file_1.txt') as file_1, open('file_2.txt') as file_2:
    f1 = [i.strip() for i in file_1]
    f2 = [i.strip() for i in file_2]
    comp =  [f for f in f2 if f not in f1]
    for f in comp:
        #print f
        #print os.path.basename(f)
        with open('new.txt', 'a') as f_out:
            f_out.write(f + '\n')

So in file_1 1.txt .
In file file_2 1.txt 2.txt 3.txt .
Saved to new.txt 2.txt 3.txt .

This will not work for you because your filename is strange. .ktb_ux453039 . amr.12.20.2002.tar~ In a normal case os.path.basename() would extract filename.

>>> import os
>>> os.path.basename('/root/desktop/python.py')
'python.py'
>>> os.path.basename('FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33')
'2007 14:25:33'

If your filename look like this i think write a regex is the only option,to extract those strange looking filename.

Edited 5 Years Ago by snippsat: n/a

import os

with open('file_1.txt') as file_1, open('file_2.txt') as file_2:
    f1 = [i.strip() for i in file_1]
    f2 = [i.strip() for i in file_2]
    comp =  [f for f in f2 if f not in f1]
    for f in comp:
        #print f
        #print os.path.basename(f)
        with open('new.txt', 'a') as f_out:
            f_out.write(f + '\n')

So in file_1 1.txt .
In file file_2 1.txt 2.txt 3.txt .
Saved to new.txt 2.txt 3.txt .

This will not work for you because your filename is strange. .ktb_ux453039 . amr.12.20.2002.tar~ In a normal case os.path.basename() would extract filename.

>>> import os
>>> os.path.basename('/root/desktop/python.py')
'python.py'
>>> os.path.basename('FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33')
'2007 14:25:33'

If your filename look like this i think write a regex is the only option,to extract those strange looking filename.

# Who need regexp when we have Python ;) (... mostly)
import os

name = 'FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33'
name = name.split()[1]
print name
print os.path.basename(name)

Ok, here is my final code for your information. Again, thank you for your time and help.

#!/usr/local/bin/python -u
# Filename: file_comp.py
 
file_path1='/Users/file_comp/1.md'
file_path2='/Users/file_comp/2.out'
file_path3='/Users/file_comp/3.comp'
file_path3='/Users/file_comp/4.missing'

import os

def key_file2(line):
    t = tuple(line.strip().split()[1:2])
    #print "key_file2: ", t
    return t

def key_file1(line):
    t = tuple(line.strip().split()[0:1])
    # print "key_file1: ", t
    return t

def make_key_set(file_path):
    return set(key_file1(line) for line in open(file_path))

def filtered_lines(file_path1, file_path2):
    key_set = make_key_set(file_path1)

    file3 = open(file_path3, "w")
    for line in open(file_path2):
        if line.split()[0:1][0] == 'DIRECTORY':
            file3.write(line)
        elif key_file2(line) not in key_set:
            file3.write(line)
        else:
            key_set.remove(key_file2(line))

    file3.close()

    file4 = open(file_path4, "w")
    for key in key_set:
        file4.write(key[0] + "\n")

if __name__ == "__main__":
    filtered_lines(file_path1, file_path2)
This question has already been answered. Start a new discussion instead.