Hey there!

I've written a program and outside of that program I want to make a new python program which generates lists (or in some cases maybe dictionaries) from certain directories and the files in them.

I've puzzled around with the os.path(), os.chdir(), os.getcwd() functions, the glob.glob() function and I've tried to see how os.walk() would work.

Say I have the following file structure:

I have a folder named "Library", with the subfolders "Detached", "Semi_Detached" and "Appartments".
In each subfolder it then has a certain amount of files like:
"1_simple.max", "1_detailed.max", "1.csv", "2_simple.max", "2_detailed.max", "2.csv" and "x_simple.max", "x_detailed.max", "x.csv",

Library
Detached
1_simple.max
1_detailed.max
1.csv
2_simple.max
2_detailed.max
2.csv
Semi_Detached
1_simple.max
1_detailed.max
1.csv
2_simple.max
2_detailed.max
2.csv
Appartments
1_simple.max
1_detailed.max
1.csv
2_simple.max
2_detailed.max
2.csv

I want to create lists like:
living_types =
and from the folder Detached I want different lists of:
max_simple =
max_detailed =
csv =

This needs to be done for the other subfolders too. Somehow I also wanna know to which subfolder each max_simple list belongs (maybe this is just a matter of naming or there are better ways)

All these lists should later be used in the first written program to do operations on.

Here's some code I came up with but for now I'm not sure which tactic I should use because I'm fairly new to working with directories and files in Python.

import os, os.path, glob

# where are we?
cwd = os.getcwd()
print "1", cwd

# go up
os.chdir(os.pardir)
print "2", os.getcwd()

# go to Library
os.chdir("Library")
print "3", os.getcwd()

print glob.glob("*.*")

for root, dirs, files in os.walk("G:\Afstuderen\Library"):
	print "root =", root
	if len(dirs) > 0:
		print "dirs =", dirs
	if len(files) > 0:
		print "files =", files
	if len(dirs) > 0:
		type_list = dirs
type_list.sort()
print type_list
type_dict = {}
for i in range(len(type_list)):
	type_dict[i+1] = type_list[i]
print type_dict
for key in type_dict:
	print key, type_dict[key]

I get this kind of output now, not what I really want :P:

1 G:\Afstuderen\Python
2 G:\Afstuderen
3 G:\Afstuderen\Library
root = G:\Afstuderen\Library
dirs = ['1_Vrijstaand', '4_Appartementen', '2_Twee-onder-een kap', '3_Rij']
files = ['info over typen.doc', 'properties_templates.xls', 'render_vrijstaand.max', 'render.max']
root = G:\Afstuderen\Library\1_Vrijstaand
files = ['1.csv', '1.max', '2.csv', '2.max', '3.csv', '3.max', '4.csv', '4.max', 'kavel.max', 'render_back.jpg', 'render_front.jpg', 'vrijstaand.txt', '1_small.csv']
root = G:\Afstuderen\Library\4_Appartementen
root = G:\Afstuderen\Library\2_Twee-onder-een kap
files = ['1.csv', '1.max', '2.csv', '2.max', '3.csv', '3.max', '4.csv', '4.max', '5.csv', '5.max', 'render_back.jpg', 'render_front.jpg', 'render_twee-onder-een kap.max', 'twee-onder-een kap.txt']
root = G:\Afstuderen\Library\3_Rij
files = ['1.csv', '1.max', '2.csv', '2.max', '3.csv', '3.max', '4.csv', '4.max', 'render_back.jpg', 'render_front.jpg', 'render_rijtjeswoning.max', 'rijtjes.txt', 'Thumbs.db']
['1_Vrijstaand', '2_Twee-onder-een kap', '3_Rij', '4_Appartementen']
{1: '1_Vrijstaand', 2: '2_Twee-onder-een kap', 3: '3_Rij', 4: '4_Appartementen'}
1 1_Vrijstaand
2 2_Twee-onder-een kap
3 3_Rij
4 4_Appartementen

Don't mind the other files in the files output of the subfolders.

I'm hoping my question is clear enough and some of you can give me some pointers. Or even a place to start learning more about working with directories / files in Python on a fairly easy level.

Thanks in advance :)

Recommended Answers

All 3 Replies

First, the directory structure shown by walk doesn't quite appear to match what you said the structure would be.

For example, the directories under Library appear to be prefixed with a number (as in 1_Vrijstaand).

For another example, the files under the individual directories don't show the 1_simple.max, 1_detailed.max, 1.csv structure you indicated, but just 1.max and 1.csv

Are the 1_simple.max and 1_detailed.max going to exist?

Will it always be 'simple' and 'detailed' or might there be other words between "1_" and ".max" that you have to collect as well.

I personally, try not to change directories once my application is started. I'm not against you doing it if it makes sense, but as you passed an absolute directory to os.walk I don't suspect it was required.

I'm going to recommend that you create some classes to hold more of your data and then you can fill the classes as you either walk the libaray or use an alternative iteration.

I'm thinking an instance of a class for each subdirectory under Library. Then inside that class intance, you could create an instance of another class for each csv file you find in the subdirectory. That class would also store the names of all of max files that match the csv name. (You will need to be careful to make sure that you don't put '10_simple.max' under the class for '1.csv'.) Note that this is not quite the structure that you proposed, but it would seem to make more sense based on the data available. (Feel free to ignore this part or tell me I'm wrong.)

An alternative to os.walk for creating the structure might be os.listdir.

You could do something like:

librarypath = "G:\\Afstuderen\\Library"
for libname in os.listdir(librarypath):
    pathname = os.path.join(librarypath, libname)
    if os.path.isdir(pathname):
        # create the class for the subdirectory here
        # one option for finding the files inside the subdirectory
        for fname in os.listdir(pathname):
            fullname = os.path.join(pathname, fname)
            base,ext = os.path.splitext(fullname)
            if ext == ".csv":
                # create the class for the csv here
                # we could either look for the .max files now,
                # or come back later and look for them...
        # second option for finding the files in the subdirectory
        for csvname in glob.glob(os.path.join(pathname, "*.csv")):
            # create the class for the csv here

If you don't understand something I used, please ask about it.

PS- When posting python code, please use python code tags
[code=python] # Your code here

[/code]

First of all thanks for your post and your suggestions.

I'm gonna look more into it now.

PS. don't know what went wrong with posting the code. Normally my posted code looks fine :)

Your posted code did have [code]

[/code] tags, but if you use the language specific tags [code=python] [/code] for example, you get line numbers and syntax highlighting as well. The syntax highlighting makes the code easier to read and the line numbers give you a point of reference if you need to talk about just part of the code.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.