Hey

Ive been asked (and I may add this to my system as well) to make a shell script that backups to all files in x location to another location in z. The thing is that the backup must only be made if one file has been changed. Example:

/x contains:

a (1 byte)
b (1 byte)
c (1 byte)

/y contains:

nothing

Ill put this script into CRON's file and everytime (say 3) it will execute. 3 arrives and there is nothing in /y and it will copy all files in /x to /y. To compensate space I think a good idea (this I will do to my script for my system) is gzip it all up. Anyways the current state right now is.

/x contains:

a (1 byte)
b (1 byte)
c (1 byte)

/y contains:

a (1 byte)
b (1 byte)
c (1 byte)

Again 3 arrives and /y is the same as /x so nothing changes and nothing happens. But now, I open c in /x and type something inside now it is:

/x contains:

a (1 byte)
b (1 byte)
c (2 bytes)

This changes it so now when the script excutes at 3 it has to delete everything in /y and rewrite it all over again (regardless that the other files havent changed).

How do I do this? I imagine something like doing a ls -a on /x and using cut to cut certain columns and seeing if it equals /y's files/file sizes but im not sure how to make a filename and file size relation and a bit humiliating but I dont know how to use cut either (I know something but still)

If someone could help thank you very much.

Why don't you just use rsync? This is a significant undertaking when the wheel has already been invented.

Why don't you just use rsync? This is a significant undertaking when the wheel has already been invented.

Thanks but I rather use a self made shell script. Any tips?

Yeah .. look at the rsync source for ideas :)

I rather not "rip" code from another utility.

Id just like to make this using a shell script. Please no more comments suggesting rsync. Thank you :)

Comments
fair enough, i wont badger you anymore :P

Ok well build a file index of your source directories and destination directories. Then build a list of source/dest files that do not exist in either folders and delete/add them as necessary. For files that do exist in both directories you should use an md5sum and filesize check depending on how careful you want to be. Collision rates are very low (unless malicious and intentional) for md5sums so you could probably get away with just do that.

You could use ls and parse the columns, or use du or stat :

root@svn:/root/backup# du -b backup.sh | sed 's/\([0-9]*\)\(.*\)/\1/'
4252

Just because the size is the same doesn't mean that it has not changed so also compare the md5 stamp:

root@svn:/root/backup# md5sum backup.sh  | sed 's/\([a-Z0-9]*\)\(.*\)/\1/'
241cca25e509409c1fbd6f1d46ec94e6

Now just automate all of that and you're good to go!

Ok well build a file index of your source directories and destination directories. Then build a list of source/dest files that do not exist in either folders and delete/add them as necessary. For files that do exist in both directories you should use an md5sum and filesize check depending on how careful you want to be. Collision rates are very low (unless malicious and intentional) for md5sums so you could probably get away with just do that.

You could use ls and parse the columns, or use du or stat :

root@svn:/root/backup# du -b backup.sh | sed 's/\([0-9]*\)\(.*\)/\1/'
4252

Just because the size is the same doesn't mean that it has not changed so also compare the md5 stamp:

root@svn:/root/backup# md5sum backup.sh  | sed 's/\([a-Z0-9]*\)\(.*\)/\1/'
241cca25e509409c1fbd6f1d46ec94e6

Now just automate all of that and you're good to go!

Thank you for helping :)

Lets analize everything.
du lists with details everything in the current folder, -b showing it in bytes and sed cuts off all the chars listed. That I understand.

md5sum would actually show me the md5 of backup.sh right? Not the actual files in the backed up file/folder/etc. And why would I cut off those chars if md5 includes letters and numbers?

Also reexplain please the part about "build a file index of your source directories and destination directories. Then build a list of source/dest files that do not exist in either folders and delete/add them as necessary."

would I have to do something like:

ls /x >> /x/xlist
ls /y >>/x/ylist
if xlist -ne ylist then
....

Is that what you ment?

>>md5sum would actually show me the md5 of backup.sh right? Not the actual files in the backed up file/folder/etc. And why would I cut off those chars if md5 includes letters and numbers?

The expression I gave for the md5 didn't cut off anything, it took the entire hash. Just like du you will need to run md5sum on every file to get the file size & check sum.

As far as your concept for builing the directories -- yes, that is how you would go about it. However ls prints out a lot of crap you don't need for this so you would be better off using find /src/dir -type d to make that directories with mkdir -p and then use find /src/dir -type f to get the files.

As far as your concept for builing the directories -- yes, that is how you would go about it. However ls prints out a lot of crap you don't need for this so you would be better off using find /src/dir -type d to make that directories with mkdir -p and then use find /src/dir -type f to get the files.

I believe /src/dir is the current directory so I would have to change it; Is there any variable in Linux that shows the current directory? Could I maybe use pwd?

.... you have a long road ahead of you :)

Take your pick

sk@sk:~$ pwd
/home/wheel/sk
sk@sk:~$ echo ${PWD}
/home/wheel/sk

.... you have a long road ahead of you :)

Rome wasnt built in a day right? Even though I have to get Rome done in alot less days....

Take your pick

sk@sk:~$ pwd
/home/wheel/sk
sk@sk:~$ echo ${PWD}
/home/wheel/sk

AFAIK, pwd is just a function that returns the current path and $PWD is just a variable that changes each time Im at a new path.

Correct? So the best way (IMO) would be just simply using path...

you shouldn't be cd'ing in a shell script to do backups in my opinion so that should be irrelevant.

you shouldn't be cd'ing in a shell script to do backups in my opinion so that should be irrelevant.

Not sure what you ment by cd'ing....I havent used cd in anything Ive said.

Well pwd gives you the current directory. In a shell script if you're not using 'cd' then the result of 'pwd' would never change -- thus raising the question of why use it?

This article has been dead for over six months. Start a new discussion instead.