0

Hi everyone,

I'm a relative beginner to writing UNIX scripts. In the past, I've been able to hack together simple scripts. Now I need a script which a little more complex than I'm used to, and I really need help. I'm up against a tight deadline and am growing desperate, as I can't seem to find a solution either on the web nor in my UNIX programming book.

Here's the problem: I'm on a SunOS system. On the machine, I have a large number of files scattered across a vast directory structure. I have to copy all those target files into my home directory. Luckily, the directory structure is well-organized. It looks like this:

/root/projects/*ARCHIVE*/date/*SUBARCHIVE*/output/*SUBSUBARCHIVE*/The_Files_I_Need

The directories in lowercase have constant names - I don't have to worry about them ever changing. But the directories I've named with *ALLCAPS* do change names. Think of them as wild cards.

Put another way: If I wanted to exhaustively list every directory, the top tier would looks like this:

/root/projects/PROJECT01/
/root/projects/PROJECT02/
/root/projects/PROJECT03/
/root/projects/PROJECT04/
...
/root/projects/PROJECT20/

The next tier would look like this:

/root/projects/PROJECT01/date/JAN2000/
/root/projects/PROJECT01/date/FEB2000/
/root/projects/PROJECT01/date/MAR2000/
...
/root/projects/PROJECT01/date/MAY2010/
/root/projects/PROJECT02/date/JAN1995/
...
/root/projects/PROJECT20/date/DEC2005/

And the next tier would look like this:

/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0002/
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0003/
...
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE1328/
...
/root/projects/PROJECT20/date/DEC1995/output/SAMPLE483822/

And so on. The target files I ultimately need to read are in those final directories.

/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/TARGET_102932
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/TARGET_32323
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/TARGET_32999293
...

There are literally thousands of these target files, all with dynamic names.

So the problem I'm having is I can't just do a "cp /root/projects/*/date/*/output/*/*" because the pathnames become too long. I can't hardwire the directory names I don't know because there's obviously too many of them. I've been experimenting with code, but my results have been frankly pitiful. I'm sure there's some way of doing this as a loop-within-a-loop-within-a-loop... but I can't figure out how to do it.

Here's the quasi-code I've been trying to get to work:

==============================================================================

#!/bin/bash

# create tmp directory into which I'll copy the files
mkdir ${HOME}/TMP


# jump into first common directory, start to drill down
cd /root/projects
for i in PROJECT01 PROJECT02 PROJECT03 (...) PROJECT20
do
  cd $i/date
  ls > SUBARCHIVE_LIST              #how to dynamically store the *SUBARCHIVE* values?
  for j in SUBARCHIVE_LIST
    do
      cd $j/output
      ls > SUBSUBARCHIVE_LIST       #same problem here!
      for k in SUBSUBARCHIVE_LIST
        do
          cd $k
          cp * ${HOME}/TMP          #here I copy the files
        done
    done
done

==============================================================================

Can anyone help? I hope so! I'm hoping this is a relatively easy problem for you experienced folks.


PS - sorry for the very long text; I try to be precise

2
Contributors
2
Replies
3
Views
7 Years
Discussion Span
Last Post by phummon
0

Break it into steps

1. Find files

cd /root/projects
find PROJECT* -type f > $HOME/allProjectFiles.txt

Gets you something like this

$ cat $HOME/allProjectFiles.txt
/root/projects/PROJECT01/date/JAN2000/output/SAMPLE0001/TARGET_102932
/root/projects/PROJECT02/date/JAN2001/output/SAMPLE0002/TARGET_32323
/root/projects/PROJECT03/date/JAN2001/output/SAMPLE0003/TARGET_32999293

2. Extract ones of interest $ awk -F'/' '$6 ~ /JAN2001/' $HOME/allProjectFiles.txt > $HOME/selectedProjectFiles.txt Now you can make the condition in red as complicated as you want.
You can match literal strings, or regular expressions. $2 == "PROJECT01" && $6 ~ /JAN200[0-4]/ being the PROJECT01 files for January for the first half of the decade.

3. Copy files $ cat $HOME/selectedProjectFiles.txt | while read line ; do echo cp $line ${HOME}/TMP ; done When you're happy that the printout of copy commands looks good, just delete the echo and it will actually do the copying.

0

Outstanding! Thank you so much, this is exactly what I hoped for! :)

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.