0

Problem Description
-----------------------

I tried to download my favourite site containing essays on history (leeching) with depth 4. For some reason say my download tool problem, I realized I could not download all the files. I had with me a list of files (list1) I downloaded and list of files (list2) that were present on the site.

I only wanted to download the differential. Some may comment that using a better site ripper may solve this problem. I agree, but the problem is generic. I have 2 lists and I want to find the delta.

I am quite comfortable with scripting and immediate rescue seemed to be using dort,diff....

But them I thought let me try python. Wao I could not have imagined a shorter code!

#! /usr/bin/env python

import sys
import sets

from sets import Set

#Open the list1 and read it into the set1
f=open(sys.argv[1], 'r')
set1 = Set(f.readlines())
f.close()

#Open the list2 and read it into the set2
f=open(sys.argv[2], 'r')
set2 = Set(f.readlines())
f.close()

#Find Delta
set1-=set2

#Dump delta
f=open('new_dwnl.txt', 'w')
f.writelines(set1)
f.close()
2
Contributors
1
Reply
2
Views
10 Years
Discussion Span
Last Post by vegaseat
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.