Shell script to detect duplicate files and directory checks

Please support our Shell Scripting advertiser: Programming Forums - DaniWeb Sister Site
Thread Solved

Join Date: Apr 2009
Posts: 19
Reputation: ViLeNT is an unknown quantity at this point 
Solved Threads: 0
ViLeNT ViLeNT is offline Offline
Newbie Poster

Shell script to detect duplicate files and directory checks

 
0
  #1
Jul 17th, 2009
Hello,

I'm still fairly new at shell scripting, can someone please show me how I would go about accomplishing a script to tackle these tasks? I've got a server setup.

I want to find duplicate files that are taking up all
the space on the hard drive. I want to group the files by size
and then by their MD5 check sum. Since two files are presumed to be the same if they have the same MD5 check sum.

I want my shell script to generate a list of files along with their location which are identical.

Can someone help? I think this will be very useful.
Reply With Quote Quick reply to this message  
Join Date: Feb 2009
Posts: 3,464
Reputation: sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of 
Solved Threads: 629
Sponsor
sknake's Avatar
sknake sknake is offline Offline
.NET Enthusiast

Re: Shell script to detect duplicate files and directory checks

 
0
  #2
Jul 17th, 2009
I actually made a little script once that takes an md5 checksum of most $PATH directories and uploads them to a remote server for intrustion detection. It was more academic than useful but you want the same concept:

install.sh
  1. #!/bin/bash
  2. if ( ! [ "$1" = "-f" ] ); then
  3. echo ""
  4. echo "Edit md5.conf before you proceed"
  5. echo "once you are ready to install: "
  6. echo "$0 -f"
  7. echo ""
  8. exit 0
  9. fi
  10.  
  11. ifiles="/usr/sbin/md5check /usr/sbin/md5compare /usr/sbin/md5update /etc/md5.conf"
  12. for i in $ifiles
  13. do
  14. if test -f $i; then
  15. echo "Destination file already exists. Exiting"
  16. exit 0
  17. fi
  18. done
  19.  
  20. cp md5check /usr/sbin
  21. cp md5compare /usr/sbin
  22. cp md5update /usr/sbin
  23. cp md5.conf /etc
  24.  
  25. chmod 500 /usr/sbin/md5check
  26. chmod 500 /usr/sbin/md5compare
  27. chmod 500 /usr/sbin/md5update
  28. chmod 400 /etc/md5.conf
  29.  
  30. chown root:root /usr/sbin/md5check
  31. chown root:root /usr/sbin/md5compare
  32. chown root:root /usr/sbin/md5update
  33. chown root:root /etc/md5.conf
  34.  
  35. for i in $ifiles
  36. do
  37. chattr +i $i
  38. done

md5.conf
  1. # md5 tripwire config
  2.  
  3. # box hostname
  4. hname=`hostname -s`
  5.  
  6. # server ip
  7. sip=1.2.3.4
  8.  
  9. # server oirt
  10. sport=22
  11.  
  12. #login name for remote machine
  13. lname=sk
  14.  
  15. #directories to search (space delimited)
  16. dsearch="/bin/ /sbin/ /usr/bin/ /usr/sbin/ /lib/ /usr/lib/ /usr/local/ /etc/ /boot/"

md5check:
  1. #!/bin/bash
  2. source /etc/md5.conf
  3.  
  4. if test `date +md5-$hname.%Y%m%d.txt`; then
  5. rm -rf `date +md5-$hname.%Y%m%d.txt`
  6. fi
  7.  
  8. echo ""
  9. echo "Calculating md5 database"
  10.  
  11. for dir in $dsearch
  12. do
  13. find $dir -type f | xargs /usr/bin/md5sum >> `date +md5-$hname.%Y%m%d.txt`
  14. done
  15.  
  16. echo "post installation md5 database calculated"
  17. echo ""

md5compare
  1. #!/bin/bash
  2. source /etc/md5.conf
  3. if ! [ "$UID" = "0" ]; then
  4. echo "ERROR: Must be root to run"
  5. exit 0
  6. fi
  7. oldfile="md5-$hname.txt.bak"
  8. scp -P $sport $lname@$sip:~/.$oldfile.tgz . 2>/dev/null
  9. if ( test -f .$oldfile.tgz ); then
  10. tar -zxf .$oldfile.tgz
  11. fi
  12. rm -rf .$oldfile.tgz
  13. if ( ! test -f $oldfile || [ "$oldfile" = "" ]); then
  14. echo "Error retrieving md5 database from server"
  15. exit 0
  16. fi
  17. md5check
  18. newfile=`find ./ -iname *md5-$hname*.txt`
  19. if ! test -f $newfile; then
  20. echo "Error generating new md5 database"
  21. exit 0
  22. fi
  23. diff $newfile $oldfile > changes
  24. rm -rf md5-$hname* .md5-$hname*
  25. if ( [ `cat changes|wc -l` -eq 0 ] ); then
  26. echo "No changes were detected. Cleaning up."
  27. rm -rf changes
  28. else
  29. echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
  30. echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
  31. echo " Changes were detected. View _changes_ for details."
  32. echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
  33. echo "@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@"
  34. fi

md5update:
  1. #!/bin/bash
  2. source /etc/md5.conf
  3. logname=`date +md5-$hname.%Y%m%d.txt`
  4. cpname="md5-$hname.txt.bak"
  5. md5check
  6. mv $logname $cpname
  7. tar czf $cpname.tgz $cpname
  8. echo "Please hit enter to continue."
  9. read
  10. scp -P $sport $cpname.tgz $lname@$sip:~/.$cpname.tgz 2>/dev/null
  11. echo ""
  12. echo "File copied to remote host"
  13. echo ""
  14. rm -rf $cpname
  15. rm -rf $cpname.tgz
Scott Knake
Custom Software Development
Apex Software, Inc.
Reply With Quote Quick reply to this message  
Join Date: Apr 2009
Posts: 19
Reputation: ViLeNT is an unknown quantity at this point 
Solved Threads: 0
ViLeNT ViLeNT is offline Offline
Newbie Poster

Re: Shell script to detect duplicate files and directory checks

 
0
  #3
Jul 17th, 2009
So basically you're script will take files from specified directories, formulate their checksum and send them off to another server as your baseline elements?

Do you think you could help me here, I think my script is much more simplistic, i'm very new.

Can you show me a script that will find duplicate files, group them by size and then by their MD5 check sum? I want the script to then generate a list of files along with their location and either display it in standard output or even to a text file.

Any ideas? You seem pretty advanced.
Reply With Quote Quick reply to this message  
Join Date: Feb 2009
Posts: 3,464
Reputation: sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of 
Solved Threads: 629
Sponsor
sknake's Avatar
sknake sknake is offline Offline
.NET Enthusiast

Re: Shell script to detect duplicate files and directory checks

 
0
  #4
Jul 17th, 2009
Well yes but I was thinking you would take those as an example and formulate a solution . I'm alright with shell scripting but i'm sure there is a better way than this, but it does what you want:

  1. sk@sk:~$ cd /tmp
  2. sk@sk:/tmp$ mkdir -p ./daniweb/dir1/a ./daniweb/dir1/b ./daniweb/dir2/a ./daniweb/dir2/b
  3. sk@sk:/tmp$ cd daniweb
  4. sk@sk:/tmp/daniweb$ dd if=/dev/zero of=./dir1/a/file1 bs=1024 count=1024 >> /dev/null 2>&1
  5. sk@sk:/tmp/daniweb$ dd if=/dev/zero of=./dir2/b/file1 bs=1024 count=1024 >> /dev/null 2>&1
  6. sk@sk:/tmp/daniweb$ echo "abc123" >> ./dir1/b/file2
  7. sk@sk:/tmp/daniweb$ echo "abc123" >> ./dir2/a/file2
  8. sk@sk:/tmp/daniweb$ rm -fr .tmp
  9. sk@sk:/tmp/daniweb$ for i in `find ./ -type f`; do echo `du ${i} | awk '{ print $1 }'` `md5sum ./dir1/a/file1 | awk '{ print $1 }'` ${i} >> .tmp; done
  10. sk@sk:/tmp/daniweb$ sort -k1 -r -n .tmp
  11. 1029 b6d81b360a5672d80c27430f39153e2c ./dir2/b/file1
  12. 1029 b6d81b360a5672d80c27430f39153e2c ./dir1/a/file1
  13. 1 b6d81b360a5672d80c27430f39153e2c ./dir2/a/file2
  14. 1 b6d81b360a5672d80c27430f39153e2c ./dir1/b/file2
  15. sk@sk:/tmp/daniweb$

And the commands without my prompt:
  1. cd /tmp
  2. mkdir -p ./daniweb/dir1/a ./daniweb/dir1/b ./daniweb/dir2/a ./daniweb/dir2/b
  3. cd daniweb
  4. dd if=/dev/zero of=./dir1/a/file1 bs=1024 count=1024 >> /dev/null 2>&1
  5. dd if=/dev/zero of=./dir2/b/file1 bs=1024 count=1024 >> /dev/null 2>&1
  6. echo "abc123" >> ./dir1/b/file2
  7. echo "abc123" >> ./dir2/a/file2
  8.  
  9. rm -fr .tmp
  10. for i in `find ./ -type f`; do echo `du ${i} | awk '{ print $1 }'` `md5sum ./dir1/a/file1 | awk '{ print $1 }'` ${i} >> .tmp; done
  11. sort -k1 -r -n .tmp
Scott Knake
Custom Software Development
Apex Software, Inc.
Reply With Quote Quick reply to this message  
Join Date: Apr 2009
Posts: 19
Reputation: ViLeNT is an unknown quantity at this point 
Solved Threads: 0
ViLeNT ViLeNT is offline Offline
Newbie Poster

Re: Shell script to detect duplicate files and directory checks

 
0
  #5
Jul 17th, 2009
Excellent, thank you for your help, let me study this and i'll let you know if I have any questions or problems. I appreciate your time.
Reply With Quote Quick reply to this message  
Join Date: Apr 2009
Posts: 19
Reputation: ViLeNT is an unknown quantity at this point 
Solved Threads: 0
ViLeNT ViLeNT is offline Offline
Newbie Poster

Re: Shell script to detect duplicate files and directory checks

 
0
  #6
Jul 19th, 2009
I also want to make sure users' home directories don't contain
world writable directories, directories owned by other users, or
other potential security problems. I'd like to echo any directory where
one user's home directory can be modified some by another user.

Can someone help me with these additions? I think this would be very important as well.
Reply With Quote Quick reply to this message  
Join Date: Feb 2009
Posts: 3,464
Reputation: sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of sknake has much to be proud of 
Solved Threads: 629
Sponsor
sknake's Avatar
sknake sknake is offline Offline
.NET Enthusiast

Re: Shell script to detect duplicate files and directory checks

 
0
  #7
Jul 20th, 2009
In fact I have those scripts right next to my md5 generator scripts. If you want to mark this thread solved and start a new thread for your new question I would be more than willing to assist
Scott Knake
Custom Software Development
Apex Software, Inc.
Reply With Quote Quick reply to this message  
Join Date: Apr 2009
Posts: 19
Reputation: ViLeNT is an unknown quantity at this point 
Solved Threads: 0
ViLeNT ViLeNT is offline Offline
Newbie Poster

Re: Shell script to detect duplicate files and directory checks

 
0
  #8
Jul 20th, 2009
as you have requested this is marked solved, and here is my new post.

http://www.daniweb.com/forums/post92...tml#post923699
Reply With Quote Quick reply to this message  
Reply

This thread has been marked solved.
Perhaps start a new thread instead?
Message:




Views: 1547 | Replies: 7
Thread Tools Search this Thread



Tag cloud for Shell Scripting
About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC