How do I filter a list based on file type

Question

flebber 12 Light Poster

14 Years Ago

I want to compare a list of files with extension .rtf in my directory with a list of files at a given url and download the files at the url not found in my directory.

This is where I am at but cannot figure how to filter a list based on file extension.

import urllib

file_list = urllib.urlretrieve("http://www.tvn.com.au/tvnlive/v1/system/modules/org.tvn.website/resources/sectionaltimes/")
dir_list = urllib.urlretrieve("c:/MyPy/")
# create a list of files from url
url_files = []
url_files = file_list
# create a list of files from directory
dir_files = []
dir_files = dir_list
# compare lists -  for *.rtf files url_files not in dir_files download
for url_files in dir_files.iteritems():
	del url_files
# can't figure out how filter a list by file extension.
# download those rtf files not in dir_files to c:/MyPy

python

3 Contributors
4 Replies
354 Views
13 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by TrustyTony

All 4 Replies

TrustyTony 888 ex-Moderator

14 Years Ago

You have not proper file list:

>>> file_list = urllib.urlretrieve("http://www.tvn.com.au/tvnlive/v1/system/modules/org.tvn.website/resources/sectionaltimes/")
>>> print(file_list)
('c:\\docume~1\\veijal~1.yks\\locals~1\\temp\\tmpq6ol5c', <httplib.HTTPMessage instance at 0x00ED12B0>)
>>> help(urllib.urlretrieve)
Help on function urlretrieve in module urllib:

urlretrieve(url, filename=None, reporthook=None, data=None)

>>> print(list(file_list[1]))
['content-length', 'set-cookie', 'expires', 'server', 'connection', 'date', 'content-type']
>>> print(open(file_list[0]).read())


<html>
<title>tvn.com.au</title>
<body leftmargin="0" topmargin="0" marginwidth="0" marginheight="0">
<table width="100%" height="100%" border="0" cellpadding="0" cellspacing="0">
  <tr>
    <td align="center"><img src="/tvnlive/v1/system/modules/org.tvn.website/resources/graphics/g_404_error.gif">
    <br><br>
    <font color="#919191"; size="1"; face="Verdana">The page you requested is not available. Please click <a href="/tvnlive/v1/system/modules/org.tvn.website/jsptemplates/tvn_main_menu.jsp;jsessionid=53391838F909BEE13C315632CE7E2BC1.tvnEngine2?TVNSESSION=53391838F909BEE13C315632CE7E2BC1.tvnEngine2" style="color: red; text-decoration: none;  image-decoration: none; font-weight: bolder;">here</a> to return to the homepage.</font>
    </td>
  </tr>
</table>
</body>
</html>
>>>

Print the values to check what you have.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2011-05-27T11:57:19+00:00

Example:

filetype = 'rtf'
files = ['doc.txt', 'stuff.rtf', 'artfunction.exe', 'rtfunc.bat', 'doc2.rtf']

print('\n'.join(filename for filename in files if filename.endswith('.'+filetype)))

BTW iteritems is being deprecated, use items method instead.

jice 53 Posting Whiz in Training · Answer 2 · 2011-05-27T14:01:46+00:00

Another method

import fnmatch
pattern = '*.rtf'
files = ['doc.txt', 'stuff.rtf', 'artfunction.exe', 'rtfunc.bat', 'doc2.rtf']
print('\n'.join(filename for filename in fnmatch.filter(files, pattern)))

flebber 12 Light Poster · Answer 3 · 2011-05-27T14:28:12+00:00

I have got it to this

import urllib
import fnmatch
import os

"""Module to download only files not in directory from a given url"""
file_list = urllib.urlretrieve("http://www.tvn.com.au/tvnlive/v1/system/modules/org.tvn.website/resources/sectionaltimes/")
path="C:\\MyPy"
dir_list=os.listdir(path)
pattern = '*.rtf'
# create a list of files from url
url_files = []
url_files = ('\n'.join(filename for filename in fnmatch.filter(file_list, pattern)))
# create a list of files from directory
dir_files = []
dir_files = ('\n'.join(filename for filename in fnmatch.filter(dir_list, pattern)))
# compare lists -  for *.rtf files url_files not in dir_files download
for url_files in dir_files.items():
    del url_files
print(url_files)
# can't figure out how filter a list by file extension.
# download those rtf files not in dir_files to c:/MyPy

But I am receiving a HTTP error?

>>> python -u "retrieve.py"
Traceback (most recent call last):
  File "retrieve.py", line 12, in <module>
    url_files = ('\n'.join(filename for filename in fnmatch.filter(file_list, pattern)))
  File "C:\Python27\lib\fnmatch.py", line 63, in filter
    if match(os.path.normcase(name)):
  File "C:\Python27\lib\ntpath.py", line 46, in normcase
    return s.replace("/", "\\").lower()
AttributeError: HTTPMessage instance has no attribute 'replace'
>>> Exit Code: 1

How do I filter a list based on file type

Recommended Answers Collapse Answers

All 4 Replies

Recommended Answers