Made a small project to help compare two files that I am testing...but for some reason the line I highlighted isn't working as its supposed to. I originally had

c1 != c2

and then changed it to

!strcmp(c1, c2)

Maybe its a dumb logic bug, but would appreciate some help :)

// Compare two files to see if the contents exactly match

#include <iostream>
#include <fstream>
#include <string>
#include <cstring>

int main(int argc, char *argv[])
{
  std::string error1 = "Not enough parameters passed.";
  std::string error2 = "Invalid file name passed.";
  std::string error3 = "File sizes do not match.";
  std::string error4 = "Files have different contents.";
  std::cout << "Running comparison program." << std::endl;
  try 
    {
      if (argc != 3)
	throw error1;
      std::cout << "Filename 1: " << argv[1] << std::endl;
      std::cout << "Filename 2: " << argv[2] << std::endl;
      std::ifstream inFile1(argv[1], std::ios::in);
      std::ifstream inFile2(argv[2], std::ios::in);
      if (!inFile1.is_open())
	throw error2;
      if (!inFile2.is_open())
	throw error2;
      inFile1.seekg(0, std::ios::end);
      std::streamsize len1 = inFile1.tellg();
      inFile1.seekg(0, std::ios::beg);
      inFile2.seekg(0, std::ios::end);
      std::streamsize len2 = inFile2.tellg();
      inFile2.seekg(0, std::ios::beg);
      if (len1 != len2)
	throw error3;
      char *c1 = new char;
      char *c2 = new char;
      char *cArray1 = new char[20];
      char *cArray2 = new char[20];
      inFile1.read(c1, 1);
      inFile2.read(c2, 1);
	  /***** ERROR HERE *****/
      if ( !strcmp(c1, c2) )
	{
	  std::cout << "c1: " << c1 << " c2: " << c2 << std::endl;
	  throw error4;
	}
      for (std::streamsize i = 1; i < len1; i++)
	{
	  delete c1;
	  delete c2;
	  c1 = new char;
	  c2 = new char;
	  inFile1.read(c1, 1);
	  inFile2.read(c2, 1);
	  if (c1 != c2)
	    {
	      i -= 20;
	      inFile1.read(cArray1, 20);
	      inFile2.read(cArray2, 20);
	      for (int j = 0; j < 20; j++)
		std::cout << cArray1[j];
	      std::cout << std::endl;
	      for (int j = 0; j < 20; j++)
		std::cout << cArray2[j];
	      std::cout << std::endl;
	      throw error4;
	    }
	}
      delete c1;
      delete c2;
      inFile1.close();
      inFile2.close();
      std::cout << "The two files match exactly." << std::endl;
      return 0;
    }
  catch (std::string e)
    {
      std::cout << "Exception: " << e << std::endl;
    }
}

Recommended Answers

All 9 Replies

The strcmp() function requires c-strings which must be terminated with '\0'. What your passing to strcmp() is a char pointer to one character.

c-strings

char *str1 = "this is a c string";//compiler adds the '\0'
char str2[] = "this is another c string";//again the compiler adds the '\0'
char str3[] = {'a','b','c','\0'};//another c string
char str4[] = {'a','b','c'};//not a c string no terminating '\0'

You want line 42: if(*c1 != *c2) . The read method does not append trailing nulls, so your current version cannot work; and of course c1 and c2 are pointers, so comparing them directly is useless for your task.

You might want to consider using method get() (with no arguments) which returns static_cast<int>([I]one-byte[/I]) so you don't have to dereference to compare.

Finally, I'll suggest that a double loop, reading blocks of data at a time rather than just one byte at a time might be more efficient on real files (YMMV).

My question is why are you using dynamic memory to create a pointer then a single character? Unless this is a required part of an assignment, it's ludicrous.

Not an assignment, just a simple thing i did on my own, guess ur right doing 1 byte is stupid, and i just realized as well that i am indeed using a pointer so i have to dereference.

What would be a good way of using a buffer? the idea has always confused me slightly.

What would be a good way of using a buffer? the idea has always confused me slightly.

Start by defining a need for the buffer. If you need a character, you don't need a buffer.

It also depends on what you mean by buffer. I'm not sure of your definition, but it doesn't seem to be what most of us would call a buffer.

Using a buffer, in very broad strokes, and obviously not checked for syntax etc.

unsigned int size = 1000;
buffer1 = char[size];
buffer2 = char[size];
// open files, etc
still_more_characters = 1;
while (still_more_characters) {
  // deal with last characters in the file: fewer than 1000, eh?
  count1 = read_into_buffer1; // do this however you like
  count2 = read_into_buffer2; // ditto
  if(count1 != count2) { 
    fail(); 
  }
  for(int i = 0; i < count1; ++i) {
    // unset still_more_characters and break on mismatch
    handle buffer1[i] and buffer2[i]; 
  }
  if (count1 < size) {
    still_more_characters = 0;
  }
}

You can do multi-byte comparisons by casting the buffer contents to, say, unsigned long. That way you get sizeof(unsigned long) bytes compared all at once. Of course that makes the last (odd number of) bytes a little more trouble. (Think about padding with "enough" zeros on the last round)

So I made the file right now so that it reads the entire files into one char array....but I was originally trying to make this program memory efficient, for instance if I had to use two 50 mb file comparisons....thats 100 mb of memory being used.

Would it be possible to use 100 char array chunks of the readings? And if so lets say the files did not have multiples of 100, so lets say in the last reading of the file there is only 45 bytes to read in and not 100, what happens with the extra 55 bytes of allocated memory, if i try to read over those values and match them will the program crash? Or since nothing is in that memory the program will say that they are equal (aka null)? Ty :) Learning a lot from this small project about memory management and comparing files.

In practice, probably, if the wind is from the correct quarter, you can go ahead and check past the end of the last chunk of data in your buffers without causing havoc. But if you do that, you will be depending on things that are not guaranteed by the language; and you may get anything from core dumps to quietly incorrect results... possibly some of each, and possibly depending on things that you cannot reasonably predict. Quietly produced wrong answers are the worst sort of error because you don't know you got the wrong answer... One thing you can pretty much depend on is that the un-initialized bytes aren't null.

Rather, do what I suggested at lines 8 and 9 of my pseudocode: Use a system call that puts data into the buffer AND returns the number of bytes it put. (There is a reason these functions work that way: It is very useful!). Then loop over the number of bytes that were read in.

100 is a very small buffer size. I'm sure the OS is buffering in chunks bigger than that. I suggest no fewer than 2^10 (1024) bytes. You may get very slightly more efficient behavior if your buffers are a nice round binary number. If you can discover what size buffers the filesystem uses, go thou and do likewise, eh?

If the files are in fact identical, wouldn't they be identical 100 bytes at a time? And if you use the proper inputs, you will know exactly how many bytes were read, so your 45 byte problem isn't a problem.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.