| | |
Shell script to detect duplicate files and directory checks
Please support our Shell Scripting advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved |
•
•
Join Date: Apr 2009
Posts: 19
Reputation:
Solved Threads: 0
Hello,
I'm still fairly new at shell scripting, can someone please show me how I would go about accomplishing a script to tackle these tasks? I've got a server setup.
I want to find duplicate files that are taking up all
the space on the hard drive. I want to group the files by size
and then by their MD5 check sum. Since two files are presumed to be the same if they have the same MD5 check sum.
I want my shell script to generate a list of files along with their location which are identical.
Can someone help? I think this will be very useful.
I'm still fairly new at shell scripting, can someone please show me how I would go about accomplishing a script to tackle these tasks? I've got a server setup.
I want to find duplicate files that are taking up all
the space on the hard drive. I want to group the files by size
and then by their MD5 check sum. Since two files are presumed to be the same if they have the same MD5 check sum.
I want my shell script to generate a list of files along with their location which are identical.
Can someone help? I think this will be very useful.
I actually made a little script once that takes an md5 checksum of most $PATH directories and uploads them to a remote server for intrustion detection. It was more academic than useful but you want the same concept:
install.sh
md5.conf
md5check:
md5compare
md5update:
install.sh
bash Syntax (Toggle Plain Text)
#!/bin/bash if ( ! [ "$1" = "-f" ] ); then echo "" echo "Edit md5.conf before you proceed" echo "once you are ready to install: " echo "$0 -f" echo "" exit 0 fi ifiles="/usr/sbin/md5check /usr/sbin/md5compare /usr/sbin/md5update /etc/md5.conf" for i in $ifiles do if test -f $i; then echo "Destination file already exists. Exiting" exit 0 fi done cp md5check /usr/sbin cp md5compare /usr/sbin cp md5update /usr/sbin cp md5.conf /etc chmod 500 /usr/sbin/md5check chmod 500 /usr/sbin/md5compare chmod 500 /usr/sbin/md5update chmod 400 /etc/md5.conf chown root:root /usr/sbin/md5check chown root:root /usr/sbin/md5compare chown root:root /usr/sbin/md5update chown root:root /etc/md5.conf for i in $ifiles do chattr +i $i done
md5.conf
bash Syntax (Toggle Plain Text)
# md5 tripwire config # box hostname hname=`hostname -s` # server ip sip=1.2.3.4 # server oirt sport=22 #login name for remote machine lname=sk #directories to search (space delimited) dsearch="/bin/ /sbin/ /usr/bin/ /usr/sbin/ /lib/ /usr/lib/ /usr/local/ /etc/ /boot/"
md5check:
bash Syntax (Toggle Plain Text)
#!/bin/bash source /etc/md5.conf if test `date +md5-$hname.%Y%m%d.txt`; then rm -rf `date +md5-$hname.%Y%m%d.txt` fi echo "" echo "Calculating md5 database" for dir in $dsearch do find $dir -type f | xargs /usr/bin/md5sum >> `date +md5-$hname.%Y%m%d.txt` done echo "post installation md5 database calculated" echo ""
md5compare
bash Syntax (Toggle Plain Text)
#!/bin/bash source /etc/md5.conf if ! [ "$UID" = "0" ]; then echo "ERROR: Must be root to run" exit 0 fi oldfile="md5-$hname.txt.bak" scp -P $sport $lname@$sip:~/.$oldfile.tgz . 2>/dev/null if ( test -f .$oldfile.tgz ); then tar -zxf .$oldfile.tgz fi rm -rf .$oldfile.tgz if ( ! test -f $oldfile || [ "$oldfile" = "" ]); then echo "Error retrieving md5 database from server" exit 0 fi md5check newfile=`find ./ -iname *md5-$hname*.txt` if ! test -f $newfile; then echo "Error generating new md5 database" exit 0 fi diff $newfile $oldfile > changes rm -rf md5-$hname* .md5-$hname* if ( [ `cat changes|wc -l` -eq 0 ] ); then echo "No changes were detected. Cleaning up." rm -rf changes else echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@" echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@" echo " Changes were detected. View _changes_ for details." echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@" echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@" fi
md5update:
bash Syntax (Toggle Plain Text)
#!/bin/bash source /etc/md5.conf logname=`date +md5-$hname.%Y%m%d.txt` cpname="md5-$hname.txt.bak" md5check mv $logname $cpname tar czf $cpname.tgz $cpname echo "Please hit enter to continue." read scp -P $sport $cpname.tgz $lname@$sip:~/.$cpname.tgz 2>/dev/null echo "" echo "File copied to remote host" echo "" rm -rf $cpname rm -rf $cpname.tgz
•
•
Join Date: Apr 2009
Posts: 19
Reputation:
Solved Threads: 0
So basically you're script will take files from specified directories, formulate their checksum and send them off to another server as your baseline elements?
Do you think you could help me here, I think my script is much more simplistic, i'm very new.
Can you show me a script that will find duplicate files, group them by size and then by their MD5 check sum? I want the script to then generate a list of files along with their location and either display it in standard output or even to a text file.
Any ideas? You seem pretty advanced.
Do you think you could help me here, I think my script is much more simplistic, i'm very new.
Can you show me a script that will find duplicate files, group them by size and then by their MD5 check sum? I want the script to then generate a list of files along with their location and either display it in standard output or even to a text file.
Any ideas? You seem pretty advanced.
Well yes but I was thinking you would take those as an example and formulate a solution
. I'm alright with shell scripting but i'm sure there is a better way than this, but it does what you want:
And the commands without my prompt:
. I'm alright with shell scripting but i'm sure there is a better way than this, but it does what you want: bash Syntax (Toggle Plain Text)
sk@sk:~$ cd /tmp sk@sk:/tmp$ mkdir -p ./daniweb/dir1/a ./daniweb/dir1/b ./daniweb/dir2/a ./daniweb/dir2/b sk@sk:/tmp$ cd daniweb sk@sk:/tmp/daniweb$ dd if=/dev/zero of=./dir1/a/file1 bs=1024 count=1024 >> /dev/null 2>&1 sk@sk:/tmp/daniweb$ dd if=/dev/zero of=./dir2/b/file1 bs=1024 count=1024 >> /dev/null 2>&1 sk@sk:/tmp/daniweb$ echo "abc123" >> ./dir1/b/file2 sk@sk:/tmp/daniweb$ echo "abc123" >> ./dir2/a/file2 sk@sk:/tmp/daniweb$ rm -fr .tmp sk@sk:/tmp/daniweb$ for i in `find ./ -type f`; do echo `du ${i} | awk '{ print $1 }'` `md5sum ./dir1/a/file1 | awk '{ print $1 }'` ${i} >> .tmp; done sk@sk:/tmp/daniweb$ sort -k1 -r -n .tmp 1029 b6d81b360a5672d80c27430f39153e2c ./dir2/b/file1 1029 b6d81b360a5672d80c27430f39153e2c ./dir1/a/file1 1 b6d81b360a5672d80c27430f39153e2c ./dir2/a/file2 1 b6d81b360a5672d80c27430f39153e2c ./dir1/b/file2 sk@sk:/tmp/daniweb$
And the commands without my prompt:
bash Syntax (Toggle Plain Text)
cd /tmp mkdir -p ./daniweb/dir1/a ./daniweb/dir1/b ./daniweb/dir2/a ./daniweb/dir2/b cd daniweb dd if=/dev/zero of=./dir1/a/file1 bs=1024 count=1024 >> /dev/null 2>&1 dd if=/dev/zero of=./dir2/b/file1 bs=1024 count=1024 >> /dev/null 2>&1 echo "abc123" >> ./dir1/b/file2 echo "abc123" >> ./dir2/a/file2 rm -fr .tmp for i in `find ./ -type f`; do echo `du ${i} | awk '{ print $1 }'` `md5sum ./dir1/a/file1 | awk '{ print $1 }'` ${i} >> .tmp; done sort -k1 -r -n .tmp
•
•
Join Date: Apr 2009
Posts: 19
Reputation:
Solved Threads: 0
I also want to make sure users' home directories don't contain
world writable directories, directories owned by other users, or
other potential security problems. I'd like to echo any directory where
one user's home directory can be modified some by another user.
Can someone help me with these additions? I think this would be very important as well.
world writable directories, directories owned by other users, or
other potential security problems. I'd like to echo any directory where
one user's home directory can be modified some by another user.
Can someone help me with these additions? I think this would be very important as well.
In fact I have those scripts right next to my md5 generator scripts. If you want to mark this thread solved and start a new thread for your new question I would be more than willing to assist
•
•
Join Date: Apr 2009
Posts: 19
Reputation:
Solved Threads: 0
as you have requested this is marked solved, and here is my new post.
http://www.daniweb.com/forums/post92...tml#post923699
http://www.daniweb.com/forums/post92...tml#post923699
![]() |
Similar Threads
- Help with shell script which looks for a set off files within a directory (Shell Scripting)
- Shell Script to Zip / FTP / Delete transactional files. (Shell Scripting)
- For Pay Shell Script - Zip / FTP / Delete transactional files (Shell Scripting)
- Korn Shell Script for deleting files older than 2 months (Shell Scripting)
- Making a shell script executable from any directory (Shell Scripting)
- c++ or shell script to delete some files (C++)
- How to delete files in UNIX using shell script (Shell Scripting)
- Linux Shell Script (Shell Scripting)
Other Threads in the Shell Scripting Forum
- Previous Thread: killig process via lsof | grep | awk does not work
- Next Thread: Script that checks home directories
Views: 1547 | Replies: 7
| Thread Tools | Search this Thread |
Tag cloud for Shell Scripting






