Hi everyone,

Im a python beginner so please bare with me, i have done the first few project euler but thats it,

Im trying to write a script that will search all txt, htm, html files recursively through multiple directorys for a string and will then write the whole line that that string is on to a new text file.

So far i've got to here, now im stuck,

import os

for root, dirs, files in os.walk('C:\New Folder'):
    
    for file in [f for f in files if f.endswith(".html" or ".htm" or ".txt")]:
        
        fh = open(file)
        
        for line in fh:

            print(line)

            
            
        
        
        fh.close()

Am I on the write track? Can anyone quickly provide a sample?

Also the folders are on a network drive, can python read this?

Thanks

Roboguy

Recommended Answers

All 6 Replies

so please bare with me

You must be a nudist. Please test your code before you post it here.

for file in [f for f in files if f.endswith(".html" or ".htm" or ".txt")]:

So first, get the file names one at a time and print on the screen so you know you are testing the correct names. Next, you should split off the last 4 bytes, and can test for
split_off in ["html", ".htm", ".txt"]
You can try list comprehension some other time. Now, just write code that you understand.

for file in [f for f in files if f.endswith(".html" or ".htm" or ".txt")]:

is equal to

for file in [f for f in files if f.endswith(False)]:

Don't use "file" as a variable name (identifier) since it's builtin.

Escape "C:\New Folder" as "C:\\New Folder" or add an "r" before the first quotation mark.

There may also be a problem with the space between 'New' and 'Folder'.
It is better to backslash the space also as in "New\ Folder"

Just to make it explicit, the correct form for this:

for file in [f for f in files if f.endswith(".html" or ".htm" or ".txt")]:

could be for example:

import os
filenames = os.listdir(os.curdir)
for filename in (f for f in filenames
                 if any(f.endswith(extension)
                        for extension in ('.html', '.htm', '.txt'))):
    print filename

Another way that i think is ok to read.

import os

filenames = os.listdir(os.curdir)
for filename in filenames:
    for ext in ['txt', 'htm', 'html']:
        if filename.lower().endswith(ext):
            print filename

Yes snippsat, only you are missing break in if statement to do the short cut operation any does (if filename finishes with txt it can not end with htm also). Maybe also better to keep the . in extensions as we do not want to match file 'nextxt', for example (in Unix like system file names without extension are common, but not so much in Windows world).

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.