0

Hello,

I am very new to python. I have two files containing multiple lines of files name, and I need to filter out if file name present on file1 than remove it from file2 and save the file2 difference into a third file.

i.e.

file1:
/users/ux454500/radpres.tar
/paci/ucb/ux453039/source/amr.12.20.2002.tar~
/paci/ucb/ux453039/source/amr.1.25.2003.htar.idx
/paci/ucb/ux453039/source/amr.1.18.2003.htar.idx

file2:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/source/amr.12.20.2002.tar~ 03/10/2005 20:56:50 09/02/2007 10:35:41
FILE /paci/ucb/ux453039/source/amr.1.25.2003.htar.idx 02/23/2007 14:20:15 08/27/2007 14:53:48
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

file3:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

Kindly help.

With my best regards,

4
Contributors
8
Replies
9
Views
6 Years
Discussion Span
Last Post by radiumc
0

We are glad to check your code, either push the (CODE) button right before doing paste or select the code you pasted and push the (CODE) button after.

Edited by mike_2000_17: Fixed formatting

0
file1:
/users/ux454500/radpres.tar
/paci/ucb/ux453039/source/amr.12.20.2002.tar~
/paci/ucb/ux453039/source/amr.1.25.2003.htar.idx
/paci/ucb/ux453039/source/amr.1.18.2003.htar.idx

file2:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/source/amr.12.20.2002.tar~ 03/10/2005 20:56:50 09/02/2007 10:35:41
FILE /paci/ucb/ux453039/source/amr.1.25.2003.htar.idx 02/23/2007 14:20:15 08/27/2007 14:53:48
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

file3:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi
0
file1:
/users/ux454500/radpres.tar
/paci/ucb/ux453039/source/amr.12.20.2002.tar~
/paci/ucb/ux453039/source/amr.1.25.2003.htar.idx
/paci/ucb/ux453039/source/amr.1.18.2003.htar.idx

file2:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/source/amr.12.20.2002.tar~ 03/10/2005 20:56:50 09/02/2007 10:35:41
FILE /paci/ucb/ux453039/source/amr.1.25.2003.htar.idx 02/23/2007 14:20:15 08/27/2007 14:53:48
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

file3:
DIRECTORY /paci/ucb/ux453039/.
FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33
FILE /paci/ucb/ux453039/bhsf1.0000.out 02/23/2007 14:20:13 08/27/2007 14:53:48
FILE /users/ux454500/AIX.mpCC.DEBUG.ex 02/23/2007 14:20:13 02/28/2007 14:47:55
DIRECTORY /paci/ucb/ux453039/runs/bondi

and your effort on the code? Please read the forum rules, or for a simplified version; pytony's signature.

0

I am sorry. This is my first time login and post the question. Attached is the code that I copy from someone had posted before and try to modify and use it.

Thank you very much for your help.

#!/usr/local/bin/python -u
# Filename: file_comp.py

file1=['/Users/file_comp/1.md']
file2=['/Users/file_comp/2.out']
file3=['/Users/file_comp/3.comp']


def key(line):
    return tuple(line.strip().split()[0:2])
 
def make_key_set(file_path):
    return set(key(line) for line in open(file_path))
 
 
def filtered_lines(file_path1, file_path2):
    key_set = make_key_set(file_path2)
    return (line for line in open(file_path1) if key(line) in key_set)
 
if __name__ == "__main__":
    file3 = open("file3", "w")
    for line in filtered_lines("file1", "file2"):
        file3.write(line)
    file3.close()
0
import os

with open('file_1.txt') as file_1, open('file_2.txt') as file_2:
    f1 = [i.strip() for i in file_1]
    f2 = [i.strip() for i in file_2]
    comp =  [f for f in f2 if f not in f1]
    for f in comp:
        #print f
        #print os.path.basename(f)
        with open('new.txt', 'a') as f_out:
            f_out.write(f + '\n')

So in file_1 1.txt .
In file file_2 1.txt 2.txt 3.txt .
Saved to new.txt 2.txt 3.txt .

This will not work for you because your filename is strange. .ktb_ux453039 . amr.12.20.2002.tar~ In a normal case os.path.basename() would extract filename.

>>> import os
>>> os.path.basename('/root/desktop/python.py')
'python.py'
>>> os.path.basename('FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33')
'2007 14:25:33'

If your filename look like this i think write a regex is the only option,to extract those strange looking filename.

Edited by snippsat: n/a

0
import os

with open('file_1.txt') as file_1, open('file_2.txt') as file_2:
    f1 = [i.strip() for i in file_1]
    f2 = [i.strip() for i in file_2]
    comp =  [f for f in f2 if f not in f1]
    for f in comp:
        #print f
        #print os.path.basename(f)
        with open('new.txt', 'a') as f_out:
            f_out.write(f + '\n')

So in file_1 1.txt .
In file file_2 1.txt 2.txt 3.txt .
Saved to new.txt 2.txt 3.txt .

This will not work for you because your filename is strange. .ktb_ux453039 . amr.12.20.2002.tar~ In a normal case os.path.basename() would extract filename.

>>> import os
>>> os.path.basename('/root/desktop/python.py')
'python.py'
>>> os.path.basename('FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33')
'2007 14:25:33'

If your filename look like this i think write a regex is the only option,to extract those strange looking filename.

# Who need regexp when we have Python ;) (... mostly)
import os

name = 'FILE /paci/ucb/ux453039/.ktb_ux453039 05/31/2006 17:52:04 09/01/2007 14:25:33'
name = name.split()[1]
print name
print os.path.basename(name)
0

Ok, here is my final code for your information. Again, thank you for your time and help.

#!/usr/local/bin/python -u
# Filename: file_comp.py
 
file_path1='/Users/file_comp/1.md'
file_path2='/Users/file_comp/2.out'
file_path3='/Users/file_comp/3.comp'
file_path3='/Users/file_comp/4.missing'

import os

def key_file2(line):
    t = tuple(line.strip().split()[1:2])
    #print "key_file2: ", t
    return t

def key_file1(line):
    t = tuple(line.strip().split()[0:1])
    # print "key_file1: ", t
    return t

def make_key_set(file_path):
    return set(key_file1(line) for line in open(file_path))

def filtered_lines(file_path1, file_path2):
    key_set = make_key_set(file_path1)

    file3 = open(file_path3, "w")
    for line in open(file_path2):
        if line.split()[0:1][0] == 'DIRECTORY':
            file3.write(line)
        elif key_file2(line) not in key_set:
            file3.write(line)
        else:
            key_set.remove(key_file2(line))

    file3.close()

    file4 = open(file_path4, "w")
    for key in key_set:
        file4.write(key[0] + "\n")

if __name__ == "__main__":
    filtered_lines(file_path1, file_path2)
This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.