I need to know how to copy some folders not all from a source tmp/folder. I'm new to python and I want to better understand python. I can't use copytree because I don't wish to copy all folders. I want to be able to pick the folders I want to copy from a folder.txt file. please help. thanks.

So basically what I'm trying to do is

1- Copy from a tmp/directory some not all folders, listed in a .txt file
2- Before copying to the destination location, the destination location will be empty
3- Once I copy the folders, the destination location should only have the folders as specified in the .txt file
4- The copying of the folders should be recursebly
5- Linux environment.

I can easily do this w/ shell scripting but I want to move away for shell scripting. please help.

thanks.

Here are some function you can look at :
- os.listdir(mydir) will list all the files from "mydir"
- fnmatch.filter(files_list, "*.txt") will give you a list of all text files in the files_list (tip : fnmatch.filter(os.listdir(mydir))
- os.path.join(mydir, file) will give you the absolute name of the file
- os.walk(mydir) will list mydir and all its subdirs, giving you a tuple (root, dirs, files) where root is the current dir, dirs, the subdirs and files the files of the current dir. Tip : if you remove dir from dirs, it won't be visited.

Here are some function you can look at :
- os.listdir(mydir) will list all the files from "mydir"
- fnmatch.filter(files_list, "*.txt") will give you a list of all text files in the files_list (tip : fnmatch.filter(os.listdir(mydir))
- os.path.join(mydir, file) will give you the absolute name of the file
- os.walk(mydir) will list mydir and all its subdirs, giving you a tuple (root, dirs, files) where root is the current dir, dirs, the subdirs and files the files of the current dir. Tip : if you remove dir from dirs, it won't be visited.

Not sure yet how I'm going to put these together as I'm new to Python, but I'll give it a try. Thanks for your suggestions.

I'll give you more details tomorrow...
But it would be easier if you post some code you've done and precise the situation (what is the content of folder.txt file, which files are to be copied, which are not...)
As your question is general, i gave you general ideas you can use for your particular problem...
Anyway, you can easily google for examples of each of the functions I gave you.
They are not very difficult to understand.

an example of code

import os
import os.path
import shutil
import fnmatch

list_of_dirs_to_copy = ['path/to/dir/1', 'path/to/dir/2'] # List of source dirs
excluded_subdirs = ['dir1', 'dir2']  # subdir to exclude from copy
dest_dir = 'path/to/my/dest/dir'     # folder for the destination of the copy
files_patterns = ['*.txt', '*.doc']
for root_path in list_of_dirs_to_copy:
    for root, dirs, files in os.walk(root_path): # recurse walking
        for dir in excluded_subdirs:
            if dir in dirs:
                dirs.remove(dir)   # remove the dir from the subdirs to visit
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)  # create the dir if not exists
        for pattern in files_patterns:
            for thefile in fnmatch.filter(files, pattern):  # filter the files to copy
                shutil.copy2(os.path.join(root, thefile), dest_dir) #copy file

an example of code

import os
import os.path
import shutil
import fnmatch

list_of_dirs_to_copy = ['path/to/dir/1', 'path/to/dir/2'] # List of source dirs
excluded_subdirs = ['dir1', 'dir2']  # subdir to exclude from copy
dest_dir = 'path/to/my/dest/dir'     # folder for the destination of the copy
files_patterns = ['*.txt', '*.doc']
for root_path in list_of_dirs_to_copy:
    for root, dirs, files in os.walk(root_path): # recurse walking
        for dir in excluded_subdirs:
            if dir in dirs:
                dirs.remove(dir)   # remove the dir from the subdirs to visit
        if not os.path.exists(dest_dir):
            os.makedirs(dest_dir)  # create the dir if not exists
        for pattern in files_patterns:
            for thefile in fnmatch.filter(files, pattern):  # filter the files to copy
                shutil.copy2(os.path.join(root, thefile), dest_dir) #copy file

Thanks a lot Jice for your great help. I'll be working on these all day today, I will post code/questions later. Thanks Jice this is a great start as it gives me an idea on what approach to take. Like I said before I'm really new to python so I've been spending a lot of time learning about the different modules python has to offer. For file manipulation and operating system functions, I now realize that for what I’m trying to do I need to learn the shutil and os modules well. One thing I was surprised to learn about python is that before copying folders from source to destination I have to create the folders before copying them, that was a little odd I thought. In linux a simple cp –r destination folder takes care of that. I noticed in your code, your created a list of the folders I don’t want to copy, however in my case I need to read from a file the folders I don’t wish to copy. So in my case I’ll be working with a tuple :) at least I’m getting the python lingo correct I hope.

My filedirlist.txt looks something like this

+ Folder1
- Folder2
+ Folder3
+ Folder4
+ Folder5
- Folder6
+Folder7

The folders with the plus next to them I want to include with the minus I don’t want to include. The code I wrote to iterate through the list is this

myfile1 = open(“/home/user/locationoffile/filedirlist.txt”, “r”) # open file for reading

"""This function reads the .txt file""" 
def readSource(myfile1):
    for line in myfile1:
        if line.startswith('-'):  #lines to ignore
            continue
        else:
            cpfolders = line.strip('+ \n') #strip the + and empty lines from the list 
            print cpfolders #I know is not necessary just want to see the variable value

this simple function creates and output of the folders and I want to copy and assigns them to the cpfolders variable.

Is this ok Jice? Or do you think there’s a better way to do this?

Thanks

Droid.

Edited 5 Years Ago by D_rOiD: n/a

I just figured out how post indented code, here's the function code I posted in my previous reply

myfile1 = open(“/home/user/locationoffile/filedirlist.txt”, “r”) # open file for reading

"""This function reads the .txt file""" 
def readSource(myfile1):
    for line in myfile1:
        if line.startswith('-'):  #lines to ignore
            continue
        else:
            cpfolders = line.strip('+ \n') #strip the + and empty lines from the list 
            print cpfolders #I know is not necessary just want to see the variable value

Sorry for this late reply...

Some comments :
1. The comment IN the function. This becomes a docstring and is used to document your programs.

2. I don't understand how this work :
"-" dirs are ignored so why are they in the file ?
Your file can contain only the dirs you want to copy so that you don't need to deal with the "+" and "-". Your code will be much simpler (you won't need a function to read the file)

Your function doesn't return anything... so what is the function used for ?

cpfolders only contains the last line so how do you want your main program to copy each dir ?
Note : This would be possible if your function is a generator but i don't think you ment it to be one...
Here is the code to make your function work as a generator (look for python generator on google to have more details)

myfile1 = "/home/user/locationoffile/filedirlist.txt"
# For myfile, i'd just affect the name to the variable and open it in the function.
# In your code, i can't see the closing of the file.
# When you do like I did, the closing is implicit at the end of the loop.

def readSource(myfile1):
    """This function reads the .txt file"""
    for line in open(myfile1):
        if line.startswith('-'):  #lines to ignore
            continue
        else:
            cpfolder = line.strip('+ \n') #strip the + and empty lines from the list 
            yield cpfolder

for mydir in readSource(myfile1):
    print mydir

Easyer to understand, you can return a list

myfile1 = "/home/user/locationoffile/filedirlist.txt"
# For myfile, i'd just affect the name to the variable and open it in the function.
# In your code, i can't see the closing of the file.
# When you do like I did, the closing is implicit at the end of the loop.

def readSource(myfile1):
    """This function reads the .txt file"""
    for line in open(myfile1):
        cpfolders = []
        if line.startswith('-'):  #lines to ignore
            continue
        else:
            cpfolders.append(line.strip('+ \n')) # add the stripped line to the list
    return cpfolders


for mydir in readSource(myfile1):
    print mydir

Note that the most classical way to use file would be (for python 2.6 and after) :

myfile1 = "/home/user/locationoffile/filedirlist.txt"

def readSource(myfile1):
    """This function reads the .txt file"""
    with open(myfile1) as in_file:
        for line in in_file:
            ...

Note that in your case, all this is of no use as you don't need the "-" lines so you can do

myfile1 = "/home/user/locationoffile/filedirlist.txt"

with open(myfile1) as in_file:
    for my_dir in in_file:
        copy_dir(my_dir) # where copy_dir is the function that copy the dir like shown upper.

LAST TIP :
To filter files, you can use "list comprehension" (google it for more details).
Let's admit you want to filter the "+" lines in your file (even if it's of no use here), you can avoid writing a function for this simple task :

myfile1 = "/home/user/locationoffile/filedirlist.txt"

with open(myfile1) as in_file:
    for my_dir in [dir.strip('+ \n') for dir in in_file if dir.startswith('+')]:
        copy_dir(my_dir) # where copy_dir is the function that copy the dir like shown upper.

Incredible thanks, I've been having such a hard time trying to put everything together, it's not as easy as I thought it was going to be, but it's not as difficult either. I'll keep you posted, thanks a million

Sorry for this late reply...

Some comments :
1. The comment IN the function. This becomes a docstring and is used to document your programs.

2. I don't understand how this work :
"-" dirs are ignored so why are they in the file ?
Your file can contain only the dirs you want to copy so that you don't need to deal with the "+" and "-". Your code will be much simpler (you won't need a function to read the file)

Your function doesn't return anything... so what is the function used for ?

cpfolders only contains the last line so how do you want your main program to copy each dir ?
Note : This would be possible if your function is a generator but i don't think you ment it to be one...
Here is the code to make your function work as a generator (look for python generator on google to have more details)

myfile1 = "/home/user/locationoffile/filedirlist.txt"
# For myfile, i'd just affect the name to the variable and open it in the function.
# In your code, i can't see the closing of the file.
# When you do like I did, the closing is implicit at the end of the loop.

def readSource(myfile1):
    """This function reads the .txt file"""
    for line in open(myfile1):
        if line.startswith('-'):  #lines to ignore
            continue
        else:
            cpfolder = line.strip('+ \n') #strip the + and empty lines from the list 
            yield cpfolder

for mydir in readSource(myfile1):
    print mydir

Easyer to understand, you can return a list

myfile1 = "/home/user/locationoffile/filedirlist.txt"
# For myfile, i'd just affect the name to the variable and open it in the function.
# In your code, i can't see the closing of the file.
# When you do like I did, the closing is implicit at the end of the loop.

def readSource(myfile1):
    """This function reads the .txt file"""
    for line in open(myfile1):
        cpfolders = []
        if line.startswith('-'):  #lines to ignore
            continue
        else:
            cpfolders.append(line.strip('+ \n')) # add the stripped line to the list
    return cpfolders


for mydir in readSource(myfile1):
    print mydir

Note that the most classical way to use file would be (for python 2.6 and after) :

myfile1 = "/home/user/locationoffile/filedirlist.txt"

def readSource(myfile1):
    """This function reads the .txt file"""
    with open(myfile1) as in_file:
        for line in in_file:
            ...

Note that in your case, all this is of no use as you don't need the "-" lines so you can do

myfile1 = "/home/user/locationoffile/filedirlist.txt"

with open(myfile1) as in_file:
    for my_dir in in_file:
        copy_dir(my_dir) # where copy_dir is the function that copy the dir like shown upper.

LAST TIP :
To filter files, you can use "list comprehension" (google it for more details).
Let's admit you want to filter the "+" lines in your file (even if it's of no use here), you can avoid writing a function for this simple task :

myfile1 = "/home/user/locationoffile/filedirlist.txt"

with open(myfile1) as in_file:
    for my_dir in [dir.strip('+ \n') for dir in in_file if dir.startswith('+')]:
        copy_dir(my_dir) # where copy_dir is the function that copy the dir like shown upper.

Thanks you, thank you, thank you about the function generator tip ....... Thank you!

If you're interested in generators (and you should be), here is the link that made me use generators and list comprehension:
http://www.dabeaz.com/generators/Generators.pdf

Jice you cannot imagine how much I appreciate your help, thanks buddy. At this moment I'm in the python learning process. Being familiar with shell scripting has helped me a great deal, I’ve been spending quite some time understanding the language structure of Phyton. Thanks for the reference

Thank you for your feedback

Hi Jice I’m still very much involved working with python and I figured out how to make my python scripts to work and your feedback and support has been invaluable. Recently I realized that the best way to manage and manipulate files and directories is with glob, I like glob because I can have all files and directories listed all in one place, glob also gives you the full path of the files and directories, glob also allows you to use wildcards. I haven’t posted in the last couple of days cause I’ve been extremely busy, once I catch up with work and the many things I’m trying to do I’ll post frequently and give you more feedback. If you recall I was trying to get input of the directories I didn’t want to include from a text file, so what I’m trying to do now is the same but this time using glob and in my list I want to add/remove both files and directoires.

Example of type of data the .txt file will have

+ app\bin
- app\bin\*.txt
+ app\docs
+ app\manuals
- app\manuals\*.xls
- app\source

I really like the way you explain things. Can you please explain how I can iterate through my .txt file and copy and remove according to what I have specified (+ include – don’t include) using glob?

Thanks in advanced

Another problem I'm having with my list is that I don't know how to tell python that in my list I have both files and directories. How do I do that?

You're right... glob is a very convenient way to deal with files and dirs.
But I'm so used to use os.walk and fnmatch that i hardly never use glob (but i should).
I'll come back on your questions tomorrow but what i can say about the last one is that you can use os.path.isdir(your_path) (see http://docs.python.org/library/os.path.html)
something like :

for one_line in open(my_file):
    my_glob = one_line.strip()
    if os.path.isdir(my_glob):
        my_glob = os.path.join(my_glob, "*.*")

The problem i can see with your way is that i don't think you can remove "*.txt" files using glob. This maybe one reason to use the old style method i use.
I'll look at that tomorrow

You're right... glob is a very convenient way to deal with files and dirs.
But I'm so used to use os.walk and fnmatch that i hardly never use glob (but i should).
I'll come back on your questions tomorrow but what i can say about the last one is that you can use os.path.isdir(your_path) (see http://docs.python.org/library/os.path.html)
something like :

for one_line in open(my_file):
    my_glob = one_line.strip()
    if os.path.isdir(my_glob):
        my_glob = os.path.join(my_glob, "*.*")

The problem i can see with your way is that i don't think you can remove "*.txt" files using glob. This maybe one reason to use the old style method i use.
I'll look at that tomorrow

Thank you so much Jice

First of all, I can't understand why you need to add

- app\source

as all the dirs you want to include are listed...

Anyway, i don't see how you can manage your need only with glob so i'll explain it with the old fashioned method I use.

First of all, unless you really need your dir list to be formatted as you posted it, i would do this another way :

On each line, you put the dir to be listed (those that don't need to be listed simply don't need to be in the file)
then, separated with some character (i'll use ";" because it's easier to see than tab) the patterns of your files you want to list (+) or not (-).
You'll never have a + and a - as if you choose to select only some files (+), those you want to exclude won't be in the list (normally)
Anyway, using this method, you can imagine to add other ';' and add other informations in it.

Note : For my very complicated backup scripts, i even use yaml ini files and put plenty of parameters to select very precisely the files i want to backup (drawback : the ini file is long to elaborate)

So here is the look of the file :

app\bin;-.txt,.log
app\docs
app\manuals;-.xls
app\source;+.py

Notice that in the first line lists 2 ext patterns separated by "," but this could be whatever you prefer (if you choose the same as the first one, it will be more complicated but possible as the split function has a max split parameter...

And now, the solution

import os
import os.path

my_file = 'your_filename_here.txt'
dst_dir = 'where/you/want/to/copy/the/files/'
def go_on():
    for line in open(my_file): # each line is a dir to explore
        src_dir, patterns = line.split(";") # separate the dir from the pattern list
        for root, dirs, files in os.walk(src_dir): # We go recursively in the tree
            dst_subdir = root.replace(src_dir, dst_dir) # We compute the des_dir name
            if not os.path.exists(dst_subdir):
                os.makedirs(dst_subdir) # create the dir if not exists
            for one_file in files: # we process each file
                file_pat = os.path.splitext(one_file) # separate the extension
                pattern_list = patterns[1:].split(",") # get the pattern list
                if (not patterns) \
                        or (patterns[0] == "+" and file_pat in pattern_list) \
                        or (patterns[0] == "-" and file_pat not in pattern_list):
                    # if no pattern is listed for this dir we assume we want all files 
                    shutil.copy2(os.path.join(src_dir, one_file), dst_subdir)

Now, you should be able to do what you want.
glob is interesting when you simply want to do (fnmatch.filter(os.listdir(my_path), my_pattern))
When you want more complicated function, i think you need to use os and os.path

First of all, I can't understand why you need to add

- app\source

as all the dirs you want to include are listed...

Anyway, i don't see how you can manage your need only with glob so i'll explain it with the old fashioned method I use.

First of all, unless you really need your dir list to be formatted as you posted it, i would do this another way :

On each line, you put the dir to be listed (those that don't need to be listed simply don't need to be in the file)
then, separated with some character (i'll use ";" because it's easier to see than tab) the patterns of your files you want to list (+) or not (-).
You'll never have a + and a - as if you choose to select only some files (+), those you want to exclude won't be in the list (normally)
Anyway, using this method, you can imagine to add other ';' and add other informations in it.

Note : For my very complicated backup scripts, i even use yaml ini files and put plenty of parameters to select very precisely the files i want to backup (drawback : the ini file is long to elaborate)

So here is the look of the file :

app\bin;-.txt,.log
app\docs
app\manuals;-.xls
app\source;+.py

Notice that in the first line lists 2 ext patterns separated by "," but this could be whatever you prefer (if you choose the same as the first one, it will be more complicated but possible as the split function has a max split parameter...

And now, the solution

import os
import os.path

my_file = 'your_filename_here.txt'
dst_dir = 'where/you/want/to/copy/the/files/'
def go_on():
    for line in open(my_file): # each line is a dir to explore
        src_dir, patterns = line.split(";") # separate the dir from the pattern list
        for root, dirs, files in os.walk(src_dir): # We go recursively in the tree
            dst_subdir = root.replace(src_dir, dst_dir) # We compute the des_dir name
            if not os.path.exists(dst_subdir):
                os.makedirs(dst_subdir) # create the dir if not exists
            for one_file in files: # we process each file
                file_pat = os.path.splitext(one_file) # separate the extension
                pattern_list = patterns[1:].split(",") # get the pattern list
                if (not patterns) \
                        or (patterns[0] == "+" and file_pat in pattern_list) \
                        or (patterns[0] == "-" and file_pat not in pattern_list):
                    # if no pattern is listed for this dir we assume we want all files 
                    shutil.copy2(os.path.join(src_dir, one_file), dst_subdir)

Now, you should be able to do what you want.
glob is interesting when you simply want to do (fnmatch.filter(os.listdir(my_path), my_pattern))
When you want more complicated function, i think you need to use os and os.path

Thank you so much Jice for your excellent suggestions. I'm really, really happy to tell you that I figured out how to use glob.glob to get the script to do what I want to do. I'm cleaning up my function and when I complete it. I'll post it so that you can see how I did it. thanks .

oups :

src_dir, patterns = line.strip().split(";")

Hi Jice;

thanks so much for all your help regarding my python question. This thread may be cloased because my issue has been solved. If you're interested here's how I did it. I did learn a few things from you and I'll always be greatfull for that. Read my comments below, thanks. I'm posting my code because I want to show with dedication how easy it is to learn phyton. I found some great youtube videos on pythong, just search for python.

To all Linux/Unix guys, python is the best way to slowly get rid of all your shell scripts.

#-1 No explanation needed, this opens myfile.txt file, also I learned this from you.
#-2 Line to strip either the - or + from myfile.txt
#-3 I create a variable with all the files I'll be removing
#-4 This is where the magic happens, glob.glob really is what makes things happen, it basically has two functions
#   that are extremelly usefull if you work with Linux or Unix, basically glob.glob has two functions that either 
#   returns a list or an iterator of files in a directory using shell patterns matching *brilliant!* IMHO.
#   if you're a Linux/Unix person this is a very powerfull tool, specially if you work with lots of files 
#-5a,b This took me a while to figure out, in this case I needed to use os.path.relpath because this fucntions returns
#   the canonical path of the specified filename, eliminating any symbolic links encountered in the path, only of course 
#   (if they are supported by the operating system)
#-6 Because I'm working with multiple files where there's plenty of room for unwanted results I have to use try
#-7 In python I discovered that there no one way of deleteing both files and directories at the same time
#   os.remove will do both remove files and directories unfortunatelly it only removes empty directoires.
#   So to delete directories recursevelly I need to use shutil.rmtree.  
#-8 So for the files I do want the process is similiar in a revers way
#
# It's amazing how once you get things to work, everything looks so simple, it's really hard when you're a beginner tho. 
                

def prepare_pkg(def_filename, source_folder, dest_folder):
    for line in open(def_filename):#-1
        if line.startswith('-'): #-2
            rm = line.strip('- \n')
            allrm = os.path.join(source_folder, rm) #-3
            for files in glob.glob(allrm): #-4 
                fidir2rm = os.path.join(dest_folder) #-5a
                fidir2rm = os.path.join(fidir2rm, os.path.relpath(files, source_folder)) #-5b
                try: #-6
                    if os.path.isfile(fidir2rm): #-7a
                        print "2Remove :" + fidir2rm
                        os.remove(fidir2rm)
                    elif os.path.isdir(fidir2rm): #-7b
                        print "2Remove :" + fidir2rm
                        shutil.rmtree(fidir2rm)
                except:
                    pass
        elif line.startswith('+'): #-8
            cp = line.strip('+ \n')
            allcp = os.path.join(source_folder, cp)
            for fidir2cp in glob.glob(allcp):
                destt = os.path.join(dest_folder)
                destt = os.path.join(destt, os.path.relpath(fidir2cp, source_folder))
                try:
                    if os.path.isdir(fidir2cp):
                        print "2Copy from :" + fidir2cp + " To :" + destt
                        shutil.copytree(fidir2cp, destt)
                    elif os.path.isfile(fidir2cp):
                        print "2Copy from :" + fidir2cp + " To :" + destt
                        shutil.copy2(fidir2cp, destt)
                except:
                     pass

prepare_pkg(myfile1, source, targPath)

Edited 5 Years Ago by D_rOiD: n/a

Thanks for posting your result.
It's always useful to see the result of searching...

This question has already been answered. Start a new discussion instead.