Hi,

I am trying to write a script that will traverse through my directory and sub directory and list number of files in a specific size. For example 0kb-1kb: 3, 1kb-4kb:4, 4-16KB: 4, 16kb-64-kb:11 and goes on in multiples of 4. I am able to get list of file numbers, size in human readable format and find number of files in a size group. But i feel my code is very messy and not anywhere near to the standard. Need help in refurbishing the code

`import os
suffixes = ['B', 'KB', 'MB', 'GB', 'TB', 'PB']
route = raw_input('Enter a location')


def human_Readable(nbytes):
        if nbytes == 0: return '0 B'
        i = 0
        while nbytes >= 1024 and i < len(suffixes)-1:
                nbytes /= 1024.
                i += 1
        f = ('%.2f' % nbytes).rstrip('0').rstrip('.')
        return '%s %s' % (f, suffixes[i])


def file_Dist(path, start,end):
        counter = 0
        counter2 = 0
        for path, subdir, files in os.walk(path):
                for r in files:
                        if os.path.getsize(os.path.join(path,r)) > start and os.path.getsize(os.path.join(path,r)) < end:
                                counter += 1
        print "Number of files greater than %s less than %s:" %(human_Readable(start), human_Readable(end)),  counter
file_Dist(route, 0, 1024)
file_Dist(route,1024,4095)
file_Dist(route, 4096, 16383)
file_Dist(route, 16384, 65535)
file_Dist(route, 65536, 262143)
file_Dist(route, 262144, 1048576)
file_Dist(route, 1048577, 4194304)
file_Dist(route, 4194305, 16777216)`

Since each increment is 4 times the previous, you should be able to divide the size by 1024 and use that. But to use the form you posted, you first want to traverse the directory(s) once only instead of every time the function is called, and store the numbers in a list. This is more straight forward and flexible IMHO, but you will have to decide if you like it better or not.

def update_list(file_size, sizes_list):
    """ return from function when correct size is found
    """
    for ctr in range(len(sizes_list)):
        if file_size < sizes_list[ctr][0]:
            sizes_list[ctr][1] += 1
            return sizes_list
    ## larger than largest test
    return sizes_list

def file_Dist(path, sizes_list):
    for path, subdir, files in os.walk(path):
        for r in files:
            this_size=os.path.getsize(os.path.join(path,r))
            sizes_list=update_list(this_size, sizes_list)
    ## all processing complete
    previous=1
    for size, ctr in sizes_list:
        print "%d to %d = %d" % (previous, size-1, ctr)
        previous=size

sizes_list=[]
num=1
for ctr in range(8):
    sizes_list.append([num*1024, 0])  
    num *= 4
print sizes_list

file_Dist(path, sizes_list)
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.