Newbie here! I am trying to make a simple python program that when called from the command line, would search all the .txt files in a given directory for a given string inside the files by typing something like this
> python mygrep.py ":" "C:\"

Also, i want to specifically use the fileinput module.

I have several issues with the code below, the least of which is my txt file filter doesn't work 100% and i have no idea why. The other is a weird traceback error i get at the command prompt. PLEASE help.

import fileinput, sys, string, os
def main():
    searchterm = sys.argv[1]
    targetdir  = os.listdir(sys.argv[2])

    for item in targetdir:
        if not('.txt' in item):
            targetdir.remove(str(item))
            print item, "removed"
        else:
            print item

    print targetdir

    for line in fileinput.input(targetdir):
       num_matches = string.count(line, searchterm)
       if num_matches:                     # a nonzero count means there was a match
           print "found '%s' %d times in %s on line %d." % (searchterm, num_matches,
               fileinput.filename(), fileinput.filelineno())

if __name__ == '__main__':
    main()

Recommended Answers

All 7 Replies

The issue with your txt file filter is that you are removing items in targetdir while iterating on targetdir. This does not work. Look at this example

>>> L = range(20)
>>> for item in L:
...  print item,
...  if item % 3:
...   L.remove(item) # DONT DO THAT !!!!
... 
0 1 3 4 6 7 9 10 12 13 15 16 18 19 # not all items were processed
>>> print L
[0, 2, 3, 5, 6, 8, 9, 11, 12, 14, 15, 17, 18] # not the expected result

On the other hand

>>> L = range(20)
>>> for item in list(L): # <-- iterate over a copy of the initial list
...  print item,
...  if item % 3:
...   L.remove(item)
... 
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
>>> L
[0, 3, 6, 9, 12, 15, 18] # ok

It is shorter to use list comprehesion syntax

>>> [ x for x in range(20) if not (x % 3) ]
[0, 3, 6, 9, 12, 15, 18]

In your case, it would be

targetdir = [ name for name in targetdir if name.endswith(".txt") ]

Otherwise, if you have a weird traceback, why don't you post the traceback ?
Do you know that such a grep-like python program already exists ? See http://pypi.python.org/pypi/grin . I even wrote a small script once to send grin's output directly to the web browser, see this post http://www.daniweb.com/software-development/python/code/298477/1295460#post1295460

Ok using your suggestion i got everything to work and the traceback error no longer happens. (I dont even know what traceback means) I realize that scripts like this one already exist but i'm trying to learn on my own and since you've already taught me quite a bit my plan is working!
How can i give the command line a directory without typing double "\\" ? I get an error if i just type "C:\" which i understand because the interpreter thinks \ is a special character.

(I dont even know what traceback means

Traceback you get when something goes wrong.

>>> a
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
NameError: name 'a' is not defined
>>> # We most define a
>>> a = 5
>>> a + '6'
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'int' and 'str'
>>> # We most covert to int or str
>>> str(a) + '6'
'56'
>>> a + int('6')
11

As you see Traceback give back a very good clue to what went wrong.
Then is just to fix it:confused:,as you use python more this will become very familiar.

How can i give the command line a directory without typing double "\\" ? I get an error if i just type "C:\" which i understand because the interpreter thinks \ is a special character.

r'C:\something\' # ok
c:/something/    # ok

r'C:\' does not work on the command line :

Traceback (most recent call last):
  File "mygrep.py", line 35, in <module>
    main()
  File "mygrep.py", line 21, in main
    file  = os.listdir(sys.argv[2])
WindowsError: [Error 3] The system cannot find the path specified: 'rC:"/*.*'

Also, how could i make it so that if a directory isn't entered (so sys.argv[2] is optional) then the script should run in the current directory?

I didn't put it in the quotes i literally typed r'C:\' which then makes things even more confusing because as you can see in the error message it draws the r inside the quotes??

But you did that in Python code, not in Command interpreter, which is alltogether different thing? sys.argv[2] refers to command line, which gives \ directly without quotes entered even (but double quotes can be used in command line, not Python single quotes of course).

import sys
print sys.argv[1] if sys.argv[1:] else 'No arguments given'
Microsoft Windows XP [versio 5.1.2600]
(C) Copyright 1985 - 2001 Microsoft Corp.

to 08.09.2011 23:10:04,62 K:\test
>argv_ex.py k:\test
 k:\test

to 08.09.2011 23:10:18,37 K:\test
>
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.