I'm writing a script that will search file names looking from certain file extensions. The thing is, is that I am looking for multiple extensions, and the list may change.

I could write an if statement using "and", but the line just gets a bit long and can become difficult to manage.

I was hoping that I might be able to create a list or use regex to make one place where I can list all of the extension and then use that list (or regex) to search the file names.

I'm still a Python newbie and I'm not sure how I could implement something like this.

Any help would be appreciated.

Thanks.

Jason

Recommended Answers

All 8 Replies

If I understood.

extensions = ['.txt', '.jpg', '.zip']

f_name, f_extension = os.path.splitext(filename)

for extension in extensions:
    if f_extension == extension:
        do_something()

Another way, with regular expressions

import re
extensions = ['txt', 'jpg', 'zip']
pat = '\.('+'|'.join(extensions)+')$'
extRE = re.compile(pat)
for f in getFilenames():
  if extRE.search(f):
    do_something(f)

note the absence of the '.' in the extensions: I wanted to put it once, but you could duplicate it if that makes the UI simpler. I have no idea if a full search is faster than a splitext followed by a short search. If it would be useful, you could do this instead: if extRE.search(os.path.splitext(f)[1]) You can also parenthesisze the individual extensions, and check the groups() of the match object if it would be nice to do a sub-dispatch on each extension.

Here is an alternative with "endswith" thats works fine for this.

aFile = 'test.jpg'
for ext in ['.txt', '.jpg', '.zip']:
    if aFile.lower().endswith(ext):
        print 'Do something'

Here is an alternative with "endswith" thats works fine for this.

Just one small thing is that it is quite rare for the filename to end with with both .txt and .jpg, so I would fix logic little:

aFile = 'test.jpg'
for ext in ['.txt', '.jpg', '.zip']:
    if aFile.lower().endswith(ext):
        print('Do something')
        break #done, or return if this is deep inside function

If there is many extensions, I would consider using rpartition (or os.path function, see Beat_Slayer) to separate the extension and do dict lookup from that to get processing function for that filetype. This does the checking direct lookup instead of linear search. The .lower() call is quite essential, as many times files can be .jpg or .JPG, that is good catch.

There is however case of .tar.gz etc files in Linux/*nix environments, where better way is to split extension from first point not the last one as I suppose the os.path.splitext(filename) does it. .tar.gz file is however gz file, not tar file.

Just one small thing is that it is quite rare for the filename to end with with both .txt and .jpg, so I would fix logic little:

See the 2 previos post i just copy that list,and i think most understand it`s just used as an example.

does it. .tar.gz file is however gz file, not tar file

This will check for gz file.

aFile = 'file.tar.gz'
for ext in ['.rar', '.gz', '.zip']:
    if aFile.lower().endswith(ext):
        print 'Do something'

There are many way do this,i think this is quit readable.
"Readability counts" as taken out from the zen of python.
http://www.python.org/dev/peps/pep-0020/

Looks only for me looks more logically correct that program stops checking the other alternative when only one is possible and it is found. Your code continues checking the .zip ending even it finds .gz, for example. Don't take offense.

My comment about .tar.gz was considering uniformity between linux and windows style environments as that file has 'own file type': tgz, even it is just one file type inside other file type.

I love to use .endswith myself, It makes nice code. Also putting the file type list to separate module and importing that and formatting the list logically in many lines will improve final code, if the number of filetypes is big.

Here is start for function dict based solution:

import os

def textfunc(filename):
    print('Text processing %s' % filename)

def rtffunc(filename):
    print('RTF processing %s' % filename)

def pyfunc(filename):
    print('PY processing %s' % filename)

def jpgfunc(filename):
    print('JPG processing %s' % filename)

def gzfunc(filename):
    print('gz processing %s' % filename)

def zipfunc(filename):
    print('zip processing %s' % filename)

filefuncs={'.txt' : textfunc, '.rtf' : rtffunc,'.py' : pyfunc, # text files
            '.jpg' : jpgfunc, # pictures
            '.gz' : gzfunc, '.zip': zipfunc, # compressed
             # comma in the end helps updating
            }

for this_file in os.listdir(os.curdir):
    _,ext = os.path.splitext(this_file)
    if ext in filefuncs:
        filefuncs[ext](this_file)
    else:
        print('Handler not written for %s filetype' % ext)

input('Ready')

Thank you all for your input.

Here is what I ended up doing, and it seems to be working.

doNotSearch = re.compile(r"[0-9a-zA-Z]*. \
(?i)(exe|gif|jpeg|jpg|png|dll|jar|wpc|sys|ocx|cnv|cpl \
|sdb|ime|hlp|mp3|wav|mpeg|chm|msi|msp|mst|olb)")

if not re.search(doNotSearch,j):

Simple current directory printing files not in disallowed (added pyc):

from __future__ import print_function
import os
ignore_filetypes=set("exe|gif|jpeg|jpg|png|dll|jar|wpc|sys|ocx|cnv|cpl\
|sdb|ime|hlp|mp3|wav|mpeg|chm|msi|msp|mst|olb|pyc".split('|'))

for i in (f for f in os.listdir(os.curdir) if os.path.isfile(f) ): ## no directories like '.', '..' or normal ones
    _,ext = os.path.splitext(i)
    if ext[1:] in ignore_filetypes: ## take out '.'
        print ('Not', i)
        continue
    print('-'*30, " %18s " % i, '-'*30)
    print(open(i).read()) ## print printable file
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.