A bare-bones code snippet to remove duplicate lines from a file. There are a number of ways to accomplish this task but this is a fast and dependable method using perls inplace editor and a simple hash to get the job done.

This probably should not be used for really big files, but files with a few thousand lines or even a few tens of thousands of lines should be OK. The bigger the file, the longer it may take to run.

Comments
I was looking for an example of using inplace editing without having to put the filename on the command line and there it was: local anonymous block and local @ARGV. Thanks.
#!/usr/bin/perl

use strict;
use warnings;

my $file = '/path/to/file.txt';
my %seen = ();
{
   local @ARGV = ($file);
   local $^I = '.bac';
   while(<>){
      $seen{$_}++;
      next if $seen{$_} > 1;
      print;
   }
}
print "finished processing file.";

That's madly tiny and cryptic :cheesy:

I've been working in Java today for a college project; I used Java for yeaars before I used Perl..

To do something similar to that in Java would be a mammoth task. There seems to be 'no such thing' as a useful Java hash, and reading files line by line isn't made easy either.

I certainly prefer the Perl way these days...

It probably is a bit cryptic. But code is that way if you don't understand the syntax of a particular language. It could be written very cryptically as a one-liner. Something like (unchecked for accuracy):

perl -i.bac -ne "next if ++$seen{$_}>1; print;" file.txt

Perl is quite bad (or good depending on how you look at it) for crypticness.

I never use those superglobal variables as implicit parameters or targets; it scares me ;)

But I'd much rather be scared by something powerful at my potential disposal than irritated by the overhead and safety checks involved in doing alot of conceptually simple things in Java...

I guess they certainly aren't languages for the same purpose.. But hey; my college project involves string processing, and could definately make good use of untyped hashes, and it's gotta be done in Java. :mad:

Took me < 5 min in Java? What are you talking about MattEvans? Besides, you could do this even easier in Haskell!

Set<String> lines = new HashSet<String>();

BufferedWriter bw = new BufferedWriter(new FileWriter(args[1].toString()));
BufferedReader br = new BufferedReader(new FileReader(args[0].toString()));
while(br.ready()) {
	String line = br.readLine();
	if (lines.contains(line))  {
		bw.write(line);
		bw.newLine();
	 } else {
		lines.add(line);
	 }
}

Shocking mistake there. :-) It should read...

if (!lines.contains(line))  {
	bw.write(line);
	bw.newLine();
	lines.add(line);
}
The article starter has earned a lot of community kudos, and such articles offer a bounty for quality replies.