Comparing two text files

Question

Nyaato 0 Newbie Poster

15 Years Ago

I'm a little stuck on this particular piece of code that I'm working on. I'm supposed to check the contents of 1 text file (original), and compare it with another text file (filter) to see if there's any words that matches up.

What I have now is a so-called working comparison piece of code, as the code is only able to detect the last word in the original to see if it's similar to the one that is on the filter. What puzzles is the fact that I've got several of the similar words before the last word, but the code does not detect it as a similar text to the one that is on the filter.

Here's what I have now...

def open_file():
        f = open("c:/temp/test.txt","r")
        g = open("c:/temp/filter.txt","r")
        line = f.readlines()
        line2 = g.readlines()
        array_size = 0
        for loop in line:
                if line[array_size] == line2[0]:
                        print 'OFFENSIVE'
                        print line[array_size]

                if line[array_size] != line2[0]:
                        print 'NOT OFFENSIVE'
                        print line[array_size]
                array_size+=1
        g.close()
        f.close()

open_file()

If it helps, here's the original text:

filter
lol
filter
lol
lol
filter
lol

The text that is supposed to be filter is: "filter".

Any help would be greatly appreciated.

python

2 Contributors
8 Replies
424 Views
8 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by Nyaato

shadwickman 159 Posting Pro in Training

15 Years Ago

You can try the built-in filter function. Here's what I tried in the interpreter:

>>> a = [
	'filter',
	'lol',
	'filter',
	'lol',
	'lol',
	'filter',
	'lol'
        ]
>>> b = ['filter']
>>> c = filter(lambda x: if x in b, a)
>>> c
['filter', 'filter', 'filter']

As you can see, it takes each item in the list passed to filter (in this case, "a"), and returns a list of the values that returned True in the function passed to it.
In this case, the lambda function would return True if the current item (x) is in list "b". I hope that simplified your code a lot :P

Here's a Dive Into Python links concerning filter, and lambda.

Nyaato commented: Thanks for helping! +1

shadwickman 159 Posting Pro in Training

15 Years Ago

Oh damn! I made a mistake there. That line should read:

c = filter(lambda x: x in b, a)

That's embarrassing haha... I typed that in wrong. the statement x in b just returns a boolean of whether or not value "x" is an item in list "b", basically "is x in b?". The "if" shouldn't be there because that starts to define a conditional statement. Sorry about that!

Anyways, lambda functions are just a way of declaring simple functions on-the-go without assigning a name to them.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Nyaato 0 Newbie Poster · Answer 1 · 2009-06-29T13:58:38+00:00

Thanks for the help!

However, I'm really confused at the filter and lambda. This is the first time that I've ever touched Python, so I'm quite new to all the stuff that Python uses.

I tried entering this: c = filter(lambda x: if x in b, a) into the interpreter, but it returns as invalid syntax on "if". Why's that?

Nyaato 0 Newbie Poster · Answer 2 · 2009-06-29T14:52:57+00:00

Ah, I've gotten it to work in the interpreter. However, I'm still rather confused about the usage of it when I actually code it down.

I've tried several times (while reading the filter and lambda articles), and all of them returns a single 'filter' result. However, from then on, I've got absolutely no idea how to proceed...

shadwickman 159 Posting Pro in Training · Answer 3 · 2009-06-29T14:56:04+00:00

What do you mean, a single "filter" result? Filter returns a list... oh wait. Are you using Python 2.x, or are you using Python 3.0? If you are then filter actually returns an iterator. Anyways, what did you mean by "single result"?

Nyaato 0 Newbie Poster · Answer 4 · 2009-06-29T15:25:38+00:00

I'm using 2.6.2 at the moment.

I've attached a picture to go along. I'm not exactly good at explaining all of these, because I'm as confused as it is...

Anyway, I've added some additional stuff to my code to allow me to debug it better. Here's the code:

def open_file():
        f = open("c:/temp/test.txt","r")
        g = open("c:/temp/filter.txt","r")
        line = f.readlines()
        line2 = g.readlines()
        # Added this to check what's in memory after
        # reading the files.
        print line
        print '-----------'
        print line2
        # End add
        array_size = 0
        for loop in line:
                # Added one print here for testing:
                print 'Checking: ', loop, ' for ', line2[0]
                # End add
                print cmp(line[array_size],line2[0])
                array_size+=1
        g.close()
        f.close() 

open_file()

And I think I've located the problem. The output window shows these in the list that I'm supposed to look through:

['filter \n', 'lol \n', 'filter \n', 'lol \n', 'lol \n', 'filter \n', 'filter']

However, the filter.txt only has:

['filter ']

So, I think it's the \n that is affecting the comparisons. Is there a way to remove the \n in the list?

shadwickman 159 Posting Pro in Training · Answer 5 · 2009-06-29T15:34:35+00:00

Oh! That's a simple problem. Use the str object's strip() function. It removes all leading and trailing whitespace. Like this:

>>> a = " \t Hello World!\n "
>>> b = a.strip()
>>> b
'Hello World!'

As you can see, all spaces, tabs, newlines, etc. get removed. When you compare the indices in the list, strip each one first. Alternatively, you can strip each line with a list comprehension when you store the [icode]readlines()[/icode] lists like so:

line_list = [x.strip() for x in open("filename", "r").readlines()]

That way the lists get stored without any whitespace in their indices.

Nyaato 0 Newbie Poster · Answer 6 · 2009-06-29T15:45:02+00:00

Awesome! Thanks a lot! Finally fixed the problem!

I'm kind of new to python, so thanks a lot for bearing with my questions!