I study manual "Think Python: How to Think Like a Computer Scientist by Allen B. Downey, Version 1.1.22, chapter 13, 13.6 Dictionary subtraction. There is an exercise 6. I did it by not using set types. The result will be the same?
Pls can you show me how to use this set types in the script? (+ probably my version of the script is too long, is it possible to make it more simple & understandable? I did it in the best way I can do it right now as a reason that I am a debutant in this field)

here is the script:

import string
def subtract(filename1,filename2):
    d1=file1(filename1)
    d2=file2(filename2)
    res = dict()
    for key in d1:
        if key not in d2:
            res[key] = None
    print res


def file1(filename):
    d1 = dict()
    fp = open(filename)
    for line in fp:
        process_line1(line, d1)
        return d1

def file2(filename):
    d2 = dict()
    fp = open(filename)
    for line in fp:
        process_line(line, d2)
        return d2

def process_line1(line, d1):
    line = line.replace('-', ' ')
    for word in line.split():
        word = word.strip(string.punctuation + string.whitespace)
        word = word.lower()
        d1[word] = d1.get(word, 0) +1


d1 = file1('emma.txt')
d2 = file2('words.txt')

subtract('emma.txt','words.txt')

file emma.txt contains these words:
Hossa-už-dlhšie patrí medzi najlepších hrácov v lige, ale to už všetci vedia, preto si nan dávajú velký pozor. Otvára sa tak šanca pre dalších, napríklad aj pre mna. Hossa vie vynikajúco pracovat s pukom, má výborný prehlad v hre," cituje ESPN krídelníka Troya Brouwera, ktorý po dvoch prihrávkach od staršieho z bratov Hossovcov dvakrát rozvlnil siet za gólmanom hostí Leightonom. Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa Hossa

file words.txt contains these words:
Finding the words from the book that are not in the word list from words.txt is a problem you might recognize as set subtraction; that is, we want to find all the words from one set (the words in the book) that are not in another set (the words in the list).

Thank you for help and pushing me further

Vlady

Try the following code:

import string

# this function returns a set of distinct words from a file
def readwords(filename):
	res = set()
	f = open(filename, 'r')
	# iterate over lines
	for line in f:
		line = line.replace('-', ' ')
		words = line.split()
		# iterate over words of each line
		for word in words:
			word = word.strip(string.punctuation + string.whitespace)
			word = word.lower()
			# add all words that we encounter
			# the set will ignore the duplicates automatically
			res.add(word)
	# close the file
	f.close()
	return res

# read distinct words from each file
emma  = readwords('emma.txt')
words = readwords('words.txt')
# words that are in emma, but not in words
diff = emma.difference(words)
print diff

Hope this helps.

Edited 6 Years Ago by sergb: n/a

This question has already been answered. Start a new discussion instead.