I have an unusual situation where I had a hard drive with some bad sectors and moved the data to a new drive. I did the move with cp -Rfp and >& logfile. I moved the data (~285GB) with about 20 copy commands. The >& should have given me information about any files where there was a problem creating a copy. There were some, and I went and got a good copy from backup files for these.

The problem is that the old and new drives are not using the same amount of storage now. The original drive is using 305,921,662,976 bytes and the new drive is using 304,556,769,280 bytes. That is more or less a full gigabyte difference. Neither drive has compression enabled.

I need to figure out if there are actually files missing and which ones. This is a windows XP box with cygwin, so I have more or less the full kit of linux tools, plus python, perl, ruby, etc, so I don't really care which technology is used. Is there an ls command I can run that will tell me how many files are on the entire partition? That would help because if the number of files is different, I definitely know I am missing something. The number of files is in the millions, so this is not something I can really try with windows explorer.

Thanks for the advice,

LMHmedchem

Recommended Answers

All 4 Replies

If the original drive was bad, what makes you think that the numbers it's reporting are valid?

If you find that the numbers are good then I would suggest you start with the list of files that had problems and you had to replace with backups. Do a diff on them (man diff).

If that does not work then the following will at least tell you where to start looking:

du ${DIR_A} | sort -n > dir_a.dat
du ${DIR_B} | sort -n > dir_b.dat
diff -u dir_a.dat dir_b.dat

The differences there will indicate where you should start to look.

If the original drive was bad, what makes you think that the numbers it's reporting are valid?

If you find that the numbers are good then I would suggest you start with the list of files that had problems and you had to replace with backups. Do a diff on them (man diff).

If that does not work then the following will at least tell you where to start looking:

du ${DIR_A} | sort -n > dir_a.dat
du ${DIR_B} | sort -n > dir_b.dat
diff -u dir_a.dat dir_b.dat

The differences there will indicate where you should start to look.

Thanks for the tip. I am not at all sure about the numbers from the bad drive, but I don't quite feel I can just assume that is the issue. There were a total of about 30 file that I needed to get a backup for. Most of these had a file size of 0 on the bad drive, but they were very small files in general. I could check that, but I would be very surprised if the total was more than a MB or so. There is nothing there that would account for a GB, if that number is real.

I will try the shell you posted and see what it comes up with. I should have backups for everything, but I would like to check completely before I DBan the drive and send it back.

LMHmedchem

Thanks for the tip. I am not at all sure about the numbers from the bad drive, but I don't quite feel I can just assume that is the issue. There were a total of about 30 file that I needed to get a backup for. Most of these had a file size of 0 on the bad drive, but they were very small files in general. I could check that, but I would be very surprised if the total was more than a MB or so. There is nothing there that would account for a GB, if that number is real.

I will try the shell you posted and see what it comes up with. I should have backups for everything, but I would like to check completely before I DBan the drive and send it back.

LMHmedchem

I have attached the final diff file. It is quite large and I am not sure how to interpret it.

LMHmedchem

The diff will indicate the changes between the two directories. The first two lines will tell you what entries belong to which file (changed lines will begin with either + of - ).

After looking at the file though, there needs to be more massaging of the data. sorting by the file size probably impaired the ability of diff in this case. Another way to get the file sizes with names in a uniform order is ls -laR .
Perhaps something along the lines of

ls -laR ${DIR_A} > ls.dir1.txt
ls -laR ${DIR_B} > ls.dir2.txt
diff -U0 ls.dir1.txt ls.dir2.txt

That will recursively list all files (with the sizes). The diff, if there isn't too much difference between the two, should indicate where you have differences in the listings.

There is also the option of exploring rsync since it is designed to find differences between two directory systems. I don't know the tool well enough to answer whether it can just indicate differences without acting on them - you'll have to explore that a bit yourself.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.