I need to know how to copy some folders not all from a source tmp/folder. I'm new to python and I want to better understand python. I can't use copytree because I don't wish to copy all folders. I want to be able to pick the folders I want to copy from a folder.txt file. please help. thanks.
So basically what I'm trying to do is
1- Copy from a tmp/directory some not all folders, listed in a .txt file
2- Before copying to the destination location, the destination location will be empty
3- Once I copy the folders, the destination location should only have the folders as specified in the .txt file
4- The copying of the folders should be recursebly
5- Linux environment.
I can easily do this w/ shell scripting but I want to move away for shell scripting. please help.
thanks.
Here are some function you can look at :
- os.listdir(mydir) will list all the files from "mydir"
- fnmatch.filter(files_list, "*.txt") will give you a list of all text files in the files_list (tip : fnmatch.filter(os.listdir(mydir))
- os.path.join(mydir, file) will give you the absolute name of the file
- os.walk(mydir) will list mydir and all its subdirs, giving you a tuple (root, dirs, files) where root is the current dir, dirs, the subdirs and files the files of the current dir. Tip : if you remove dir from dirs, it won't be visited.
Here are some function you can look at : - os.listdir(mydir) will list all the files from "mydir" - fnmatch.filter(files_list, "*.txt") will give you a list of all text files in the files_list (tip : fnmatch.filter(os.listdir(mydir)) - os.path.join(mydir, file) will give you the absolute name of the file - os.walk(mydir) will list mydir and all its subdirs, giving you a tuple (root, dirs, files) where root is the current dir, dirs, the subdirs and files the files of the current dir. Tip : if you remove dir from dirs, it won't be visited.
Not sure yet how I'm going to put these together as I'm new to Python, but I'll give it a try. Thanks for your suggestions.
I'll give you more details tomorrow...
But it would be easier if you post some code you've done and precise the situation (what is the content of folder.txt file, which files are to be copied, which are not...)
As your question is general, i gave you general ideas you can use for your particular problem...
Anyway, you can easily google for examples of each of the functions I gave you.
They are not very difficult to understand.
an example of code
import os
import os.path
import shutil
import fnmatch
list_of_dirs_to_copy = ['path/to/dir/1', 'path/to/dir/2'] # List of source dirs
excluded_subdirs = ['dir1', 'dir2'] # subdir to exclude from copy
dest_dir = 'path/to/my/dest/dir' # folder for the destination of the copy
files_patterns = ['*.txt', '*.doc']
for root_path in list_of_dirs_to_copy:
for root, dirs, files in os.walk(root_path): # recurse walking
for dir in excluded_subdirs:
if dir in dirs:
dirs.remove(dir) # remove the dir from the subdirs to visit
if not os.path.exists(dest_dir):
os.makedirs(dest_dir) # create the dir if not exists
for pattern in files_patterns:
for thefile in fnmatch.filter(files, pattern): # filter the files to copy
shutil.copy2(os.path.join(root, thefile), dest_dir) #copy filean example of code
import os import os.path import shutil import fnmatch list_of_dirs_to_copy = ['path/to/dir/1', 'path/to/dir/2'] # List of source dirs excluded_subdirs = ['dir1', 'dir2'] # subdir to exclude from copy dest_dir = 'path/to/my/dest/dir' # folder for the destination of the copy files_patterns = ['*.txt', '*.doc'] for root_path in list_of_dirs_to_copy: for root, dirs, files in os.walk(root_path): # recurse walking for dir in excluded_subdirs: if dir in dirs: dirs.remove(dir) # remove the dir from the subdirs to visit if not os.path.exists(dest_dir): os.makedirs(dest_dir) # create the dir if not exists for pattern in files_patterns: for thefile in fnmatch.filter(files, pattern): # filter the files to copy shutil.copy2(os.path.join(root, thefile), dest_dir) #copy file
Thanks a lot Jice for your great help. I'll be working on these all day today, I will post code/questions later. Thanks Jice this is a great start as it gives me an idea on what approach to take. Like I said before I'm really new to python so I've been spending a lot of time learning about the different modules python has to offer. For file manipulation and operating system functions, I now realize that for what I’m trying to do I need to learn the shutil and os modules well. One thing I was surprised to learn about python is that before copying folders from source to destination I have to create the folders before copying them, that was a little odd I thought. In linux a simple cp –r destination folder takes care of that. I noticed in your code, your created a list of the folders I don’t want to copy, however in my case I need to read from a file the folders I don’t wish to copy. So in my case I’ll be working with a tuple :) at least I’m getting the python lingo correct I hope.
My filedirlist.txt looks something like this
+ Folder1
- Folder2
+ Folder3
+ Folder4
+ Folder5
- Folder6
+Folder7
The folders with the plus next to them I want to include with the minus I don’t want to include. The code I wrote to iterate through the list is this
myfile1 = open(“/home/user/locationoffile/filedirlist.txt”, “r”) # open file for reading
"""This function reads the .txt file"""
def readSource(myfile1):
for line in myfile1:
if line.startswith('-'): #lines to ignore
continue
else:
cpfolders = line.strip('+ \n') #strip the + and empty lines from the list
print cpfolders #I know is not necessary just want to see the variable value
this simple function creates and output of the folders and I want to copy and assigns them to the cpfolders variable.
Is this ok Jice? Or do you think there’s a better way to do this?
Thanks
Droid.
I just figured out how post indented code, here's the function code I posted in my previous reply
myfile1 = open(“/home/user/locationoffile/filedirlist.txt”, “r”) # open file for reading
"""This function reads the .txt file"""
def readSource(myfile1):
for line in myfile1:
if line.startswith('-'): #lines to ignore
continue
else:
cpfolders = line.strip('+ \n') #strip the + and empty lines from the list
print cpfolders #I know is not necessary just want to see the variable valueSorry for this late reply...
Some comments :
1. The comment IN the function. This becomes a docstring and is used to document your programs.
2. I don't understand how this work :
"-" dirs are ignored so why are they in the file ?
Your file can contain only the dirs you want to copy so that you don't need to deal with the "+" and "-". Your code will be much simpler (you won't need a function to read the file)
Your function doesn't return anything... so what is the function used for ?
cpfolders only contains the last line so how do you want your main program to copy each dir ?
Note : This would be possible if your function is a generator but i don't think you ment it to be one...
Here is the code to make your function work as a generator (look for python generator on google to have more details)
myfile1 = "/home/user/locationoffile/filedirlist.txt"
# For myfile, i'd just affect the name to the variable and open it in the function.
# In your code, i can't see the closing of the file.
# When you do like I did, the closing is implicit at the end of the loop.
def readSource(myfile1):
"""This function reads the .txt file"""
for line in open(myfile1):
if line.startswith('-'): #lines to ignore
continue
else:
cpfolder = line.strip('+ \n') #strip the + and empty lines from the list
yield cpfolder
for mydir in readSource(myfile1):
print mydir
Easyer to understand, you can return a list
myfile1 = "/home/user/locationoffile/filedirlist.txt"
# For myfile, i'd just affect the name to the variable and open it in the function.
# In your code, i can't see the closing of the file.
# When you do like I did, the closing is implicit at the end of the loop.
def readSource(myfile1):
"""This function reads the .txt file"""
for line in open(myfile1):
cpfolders = []
if line.startswith('-'): #lines to ignore
continue
else:
cpfolders.append(line.strip('+ \n')) # add the stripped line to the list
return cpfolders
for mydir in readSource(myfile1):
print mydir Note that the most classical way to use file would be (for python 2.6 and after) :
myfile1 = "/home/user/locationoffile/filedirlist.txt"
def readSource(myfile1):
"""This function reads the .txt file"""
with open(myfile1) as in_file:
for line in in_file:
...
Note that in your case, all this is of no use as you don't need the "-" lines so you can do
myfile1 = "/home/user/locationoffile/filedirlist.txt"
with open(myfile1) as in_file:
for my_dir in in_file:
copy_dir(my_dir) # where copy_dir is the function that copy the dir like shown upper. LAST TIP :
To filter files, you can use "list comprehension" (google it for more details).
Let's admit you want to filter the "+" lines in your file (even if it's of no use here), you can avoid writing a function for this simple task :
myfile1 = "/home/user/locationoffile/filedirlist.txt"
with open(myfile1) as in_file:
for my_dir in [dir.strip('+ \n') for dir in in_file if dir.startswith('+')]:
copy_dir(my_dir) # where copy_dir is the function that copy the dir like shown upper.Incredible thanks, I've been having such a hard time trying to put everything together, it's not as easy as I thought it was going to be, but it's not as difficult either. I'll keep you posted, thanks a million
Sorry for this late reply...
Some comments : 1. The comment IN the function. This becomes a docstring and is used to document your programs.
2. I don't understand how this work : "-" dirs are ignored so why are they in the file ? Your file can contain only the dirs you want to copy so that you don't need to deal with the "+" and "-". Your code will be much simpler (you won't need a function to read the file)
Your function doesn't return anything... so what is the function used for ?
cpfolders only contains the last line so how do you want your main program to copy each dir ? Note : This would be possible if your function is a generator but i don't think you ment it to be one... Here is the code to make your function work as a generator (look for python generator on google to have more details)
myfile1 = "/home/user/locationoffile/filedirlist.txt" # For myfile, i'd just affect the name to the variable and open it in the function. # In your code, i can't see the closing of the file. # When you do like I did, the closing is implicit at the end of the loop. def readSource(myfile1): """This function reads the .txt file""" for line in open(myfile1): if line.startswith('-'): #lines to ignore continue else: cpfolder = line.strip('+ \n') #strip the + and empty lines from the list yield cpfolder for mydir in readSource(myfile1): print mydirEasyer to understand, you can return a list
myfile1 = "/home/user/locationoffile/filedirlist.txt" # For myfile, i'd just affect the name to the variable and open it in the function. # In your code, i can't see the closing of the file. # When you do like I did, the closing is implicit at the end of the loop. def readSource(myfile1): """This function reads the .txt file""" for line in open(myfile1): cpfolders = [] if line.startswith('-'): #lines to ignore continue else: cpfolders.append(line.strip('+ \n')) # add the stripped line to the list return cpfolders for mydir in readSource(myfile1): print mydirNote that the most classical way to use file would be (for python 2.6 and after) :
myfile1 = "/home/user/locationoffile/filedirlist.txt" def readSource(myfile1): """This function reads the .txt file""" with open(myfile1) as in_file: for line in in_file: ...Note that in your case, all this is of no use as you don't need the "-" lines so you can do
myfile1 = "/home/user/locationoffile/filedirlist.txt" with open(myfile1) as in_file: for my_dir in in_file: copy_dir(my_dir) # where copy_dir is the function that copy the dir like shown upper.LAST TIP : To filter files, you can use "list comprehension" (google it for more details). Let's admit you want to filter the "+" lines in your file (even if it's of no use here), you can avoid writing a function for this simple task :
myfile1 = "/home/user/locationoffile/filedirlist.txt" with open(myfile1) as in_file: for my_dir in [dir.strip('+ \n') for dir in in_file if dir.startswith('+')]: copy_dir(my_dir) # where copy_dir is the function that copy the dir like shown upper.
Thanks you, thank you, thank you about the function generator tip ....... Thank you!
If you're interested in generators (and you should be), here is the link that made me use generators and list comprehension:
http://www.dabeaz.com/generators/Generators.pdf
If you're interested in generators (and you should be), here is the link that made me use generators and list comprehension: http://www.dabeaz.com/generators/Generators.pdf
Jice you cannot imagine how much I appreciate your help, thanks buddy. At this moment I'm in the python learning process. Being familiar with shell scripting has helped me a great deal, I’ve been spending quite some time understanding the language structure of Phyton. Thanks for the reference
Thank you for your feedback
Hi Jice I’m still very much involved working with python and I figured out how to make my python scripts to work and your feedback and support has been invaluable. Recently I realized that the best way to manage and manipulate files and directories is with glob, I like glob because I can have all files and directories listed all in one place, glob also gives you the full path of the files and directories, glob also allows you to use wildcards. I haven’t posted in the last couple of days cause I’ve been extremely busy, once I catch up with work and the many things I’m trying to do I’ll post frequently and give you more feedback. If you recall I was trying to get input of the directories I didn’t want to include from a text file, so what I’m trying to do now is the same but this time using glob and in my list I want to add/remove both files and directoires.
Example of type of data the .txt file will have
+ app\bin
- app\bin\*.txt
+ app\docs
+ app\manuals
- app\manuals\*.xls
- app\source
I really like the way you explain things. Can you please explain how I can iterate through my .txt file and copy and remove according to what I have specified (+ include – don’t include) using glob?
Thanks in advanced
Another problem I'm having with my list is that I don't know how to tell python that in my list I have both files and directories. How do I do that?
You're right... glob is a very convenient way to deal with files and dirs.
But I'm so used to use os.walk and fnmatch that i hardly never use glob (but i should).
I'll come back on your questions tomorrow but what i can say about the last one is that you can use os.path.isdir(your_path) (see http://docs.python.org/library/os.path.html )
something like :
for one_line in open(my_file):
my_glob = one_line.strip()
if os.path.isdir(my_glob):
my_glob = os.path.join(my_glob, "*.*") The problem i can see with your way is that i don't think you can remove "*.txt" files using glob. This maybe one reason to use the old style method i use.
I'll look at that tomorrow
You're right... glob is a very convenient way to deal with files and dirs. But I'm so used to use os.walk and fnmatch that i hardly never use glob (but i should). I'll come back on your questions tomorrow but what i can say about the last one is that you can use os.path.isdir(your_path) (see http://docs.python.org/library/os.path.html ) something like :
for one_line in open(my_file): my_glob = one_line.strip() if os.path.isdir(my_glob): my_glob = os.path.join(my_glob, "*.*")The problem i can see with your way is that i don't think you can remove "*.txt" files using glob. This maybe one reason to use the old style method i use. I'll look at that tomorrow
Thank you so much Jice
First of all, I can't understand why you need to add
- app\source as all the dirs you want to include are listed...
Anyway, i don't see how you can manage your need only with glob so i'll explain it with the old fashioned method I use.
First of all, unless you really need your dir list to be formatted as you posted it, i would do this another way :
On each line, you put the dir to be listed (those that don't need to be listed simply don't need to be in the file)
then, separated with some character (i'll use ";" because it's easier to see than tab) the patterns of your files you want to list (+) or not (-).
You'll never have a + and a - as if you choose to select only some files (+), those you want to exclude won't be in the list (normally)
Anyway, using this method, you can imagine to add other ';' and add other informations in it.
Note : For my very complicated backup scripts, i even use yaml ini files and put plenty of parameters to select very precisely the files i want to backup (drawback : the ini file is long to elaborate)
So here is the look of the file :
app\bin;-.txt,.log
app\docs
app\manuals;-.xls
app\source;+.py
Notice that in the first line lists 2 ext patterns separated by "," but this could be whatever you prefer (if you choose the same as the first one, it will be more complicated but possible as the split function has a max split parameter...
And now, the solution
import os
import os.path
my_file = 'your_filename_here.txt'
dst_dir = 'where/you/want/to/copy/the/files/'
def go_on():
for line in open(my_file): # each line is a dir to explore
src_dir, patterns = line.split(";") # separate the dir from the pattern list
for root, dirs, files in os.walk(src_dir): # We go recursively in the tree
dst_subdir = root.replace(src_dir, dst_dir) # We compute the des_dir name
if not os.path.exists(dst_subdir):
os.makedirs(dst_subdir) # create the dir if not exists
for one_file in files: # we process each file
file_pat = os.path.splitext(one_file) # separate the extension
pattern_list = patterns[1:].split(",") # get the pattern list
if (not patterns) \
or (patterns[0] == "+" and file_pat in pattern_list) \
or (patterns[0] == "-" and file_pat not in pattern_list):
# if no pattern is listed for this dir we assume we want all files
shutil.copy2(os.path.join(src_dir, one_file), dst_subdir) Now, you should be able to do what you want.
glob is interesting when you simply want to do (fnmatch.filter(os.listdir(my_path), my_pattern))
When you want more complicated function, i think you need to use os and os.path