If you know any linux scripting, you might be able to write a simple batch script utilizing the diff utility which outputs the difference between two files. The diff utility can ignore white spaces and non-typeable characters making it quite ideal for the job. You may write the script to accept a threshold value which lets you control the level of differences between the student's projects that are detectable. Your script can be written as a main and nested loop that iterates through all the student's files and runs the diff utility, printing out the file names that have the highest level of detectable similarities. You can then open those files and check them by hand.
Would it need to run across multiple files in a project?
Would it need to ignore library calls?
In windows, the fc command can compare two files, but it's not very robust.
Would compiling two programs and comparing the executables be of any benefit?
- That would at least let you know if just variable names were changed but the functioning code is the same.