I'm working on a cleanup script for tv shows that I download. Right now I'm just looking for a file greater than 50mb, but there should be a better way.

import os
    import shutil

    dir = "C:\Users\Bobe\Downloads\TV\\"

    for folder in os.listdir(dir):
        if os.path.isdir(os.path.join(dir,folder)):
            for file in os.listdir(dir + folder):
                filelocation = dir+folder+"\\"+file
                if os.path.getsize(filelocation) > 50000000:
                    shutil.move(filelocation, dir + folder + ".avi")
                else:
                    os.remove(filelocation)

            shutil.rmtree(dir + folder)

What is your problem with the code you have?

I feel finding a file greater than 50mb is kind of a hack fix. Is there a simple way to return the filename of the largest file in a directory?

OK, check this out. The code you're using to find files is pretty good. If you're looking to improve that part of the code you could use recursion so that your function will descend into subdirectories and pull out those files too. Here is some code I whipped up that will print the number of files in all the subdirectories under a folder.

import os

# Windows and linux slashes go in opposite directions.
# Uncomment the slash appropriate for your system.
systemslash='/'
# systemslash='\'

def get_list_of_files(inDirectory, container=[]):
    for entry in os.listdir(inDirectory):
        if os.path.isdir(inDirectory+systemslash+entry):
            get_list_of_files(inDirectory+systemslash+entry,container)
        container.append(inDirectory+systemslash+entry)
    return container

Final_List_of_Files = get_list_of_files('/Users/kevin')
print len(Final_List_of_Files)

Now you want to get the list of files and the file size, so I would suggest putting them into a list of tuples which you can sort to get the biggest file of them all. Change the line where we add the file name to the list so that it adds a tuple containing the file name and size.

filesize = os.path.getsize(inDirectory+systemslash+entry)
fileandsize = (filesize, inDirectory+systemslash+entry)
container.append(fileandsize)

Then your last task is to sort the list of tuples with Final_List_of_Files.sort(). You'll have to reverse the sort order so that you can the largest file in the top position. Here is the final code

import os

# Windows and linux slashes go in opposite directions.
# Uncomment the slash appropriate for your system.
systemslash='/'
# systemslash='\'

def get_list_of_files(inDirectory, container=[]):
    for entry in os.listdir(inDirectory):
        entry = inDirectory+systemslash+entry
        if os.path.isdir(entry):
            get_list_of_files(entry,container)
        filesize = os.path.getsize(entry)
        fileandsize = (filesize, entry)
        container.append(fileandsize)
    return container

Final_List_of_Files = get_list_of_files('/Users/kevin/Documents')
Final_List_of_Files.sort(reverse=True)

print Final_List_of_Files[0]

Well, you could do something like this:

# File_lister2.py
# create a list of all the files and sizes in a given direcory
# and optionally any of its subdirectories (Python2 & Python3)
# snee

import os

def file_lister(directory, subs=False):
    """
    returns a list of (size, full_name) tuples of all files
    in a given directory
    if subs=True also any of its subdirectories
    """
    mylist = []
    for fname in os.listdir(directory):
        # add directory to filename for a full pathname
        full_name = os.path.join(directory, fname)
        # size in kb
        size = int(os.path.getsize(full_name)//1024) + 1
        if not os.path.isdir(full_name):
            # append a (size, full_name) tuple
            mylist.append((size, full_name))
        elif subs==True:
            # optionally recurse into subdirs
            file_lister(full_name)
    return mylist

#dir_name = r"C:\Python31\Tools"  # Windows
dir_name = "/home/dell/Downloads"  # Linux
file_list = file_lister(dir_name)

# show the list sorted by size
for file_info in sorted(file_list, reverse=True):
    print(file_info)

print('-'*66)

print( "The largest file is: \n%s (%skb)" % \
    (max(file_list)[1], max(file_list)[0]) )

"""a typical partial output -->
(24144, '/home/dell/Downloads/ActivePython-2.6.2.2-linux-x86.tar.gz')
(23320, '/home/dell/Downloads/ActivePython-3.1.0.1-linux-x86.tar.gz')
(9288, '/home/dell/Downloads/Python-3.1.tar.bz2')
...
...
------------------------------------------------------------------
The largest file is:
/home/dell/Downloads/ActivePython-2.6.2.2-linux-x86.tar.gz (24144kb)
"""

Arrr, too much code noise! ;-)

If you know you have at least one file:

import os, glob
largest = sorted( (os.path.getsize(s), s) for s in glob.glob('yourdir/*.avi') )[-1][1]

If not, you split the code a bit:

import os, glob
files = glob.glob('yourdir/*.avi')
largest = sorted((os.path.getsize(s), s) for s in files)[-1][1] if files else ''
if largest:
  ... # do something with it

Another very useful Linux script to find largest files can be found here,

http://www.thegeekscope.com/linux-script-to-find-largest-files/

This script finds the top x largest files available in a specific folder on your Linux system and provides the following information for these files.

  • File size in bytes
  • Percentage of total disk space occupied by the file
  • File Owner
  • Last modified time of the file
  • File name along with full path

The script accepts two mandatory arguments.

  • The folder you want to search for the largest files.
  • A number x for extracting top x largest file.

Following is the sample output of the script.

sh get_largest_files.sh / 5

[SIZE (BYTES)] [% OF DISK] [OWNER] [LAST MODIFIED ON] [FILE]

56421808 0% root 2012-08-02 14:58:51 /usr/lib/locale/locale-archive
32464076 0% root 2008-09-18 18:06:28 /usr/lib/libgcj.so.7rh.0.0
29147136 0% root 2012-08-02 15:17:40 /var/lib/rpm/Packages
20278904 0% root 2008-12-09 13:57:01 /usr/lib/xulrunner-1.9/libxul.so
16001944 0% root 2012-08-02 15:02:36 /etc/selinux/targeted/modules/active/base.linked

Total disk size: 23792652288 Bytes
Total size occupied by these files: 154313868 Bytes [ 0% of Total Disc Space ]

*** Note: 0% represents less than 1% ***

Hope you will find this useful !!

commented: Not Python! -3
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.