Renaming duplicates in a file

Question

Newbi1984 0 Newbie Poster

14 Years Ago

Hi there,

I'm very new to perl so I hope that you can help me.

I have a list of duplicated entries in one file and what I want to do is to search another file for those entries that appear in the first file and then rename each duplicate consecutively.

File I want to change (tab delimited):

ID Entry Entry
1 0 0
2 1 0
2 0 1
3 1 1
4 0 0
4 0 0

List of duplicates:
ID
2
4

Desired output:
ID Entry Entry
1 0 0
2.1 1 0
2.2 0 1
3 1 1
4.1 0 0
4.2 0 0

So what I thought of doing is reading both files into arrays:

$bimfile="";
$n="A";

open(BIM,"$bimfile");
my @bim=split(/\s+/,BIM);

open(DUPLICATES,"<FILENAME>");
my(@duplicates)=chomp(DUPLICATES);

foreach $duplicate (@duplicates){
if

I'm not sure what to put in as the if statement. I think maybe if entries from the @duplicates array match the entries in the @bim array then rename .1 and then .2 consecutively else just print the line.

It is important that I keep the order of the bim file and print out each line as it is?

Is this the best way of going about this or is there an easier way?

Thanks!

perl

3 Contributors
6 Replies
101 Views
2 Days Discussion Span
Latest Post 14 Years Ago Latest Post by Newbi1984

mitchems 12 Posting Whiz in Training

14 Years Ago

I'm not sure how you code is going to work properly. Did you solve your problem? If so, do you want our help or no? If not, mark it as solved. If you do what help, let us know.

d5e5 109 Master Poster

14 Years Ago

#!/usr/bin/perl
use strict;
use warnings;

#I'm reading my __DATA__ section but you can open your file
# instead and slurp it as follows: (I don't read a list of duplicates, only the first file)
undef $/;
my $whole_file = <DATA>; # 'slurp' mode
$/ = "\n"; #Put it back the way it was

#regular expression substitute in multi-line and global mode
$whole_file =~ s/^(\w+)([^\n]+\n)(?=\1)/$1.1$2/mg;
print $whole_file;

__DATA__
ID Entry Entry
1 0 0
2 1 0
2 0 1
3 1 1
4 0 0
4 0 0

This gives the following output:

ID Entry Entry
1 0 0
2.1 1 0
2 0 1
3 1 1
4.1 0 0
4 0 0

mitchems commented: David - that's just an awesome regex! +2

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

Newbi1984 0 Newbie Poster · Answer 1 · 2010-08-20T23:10:06+00:00

I only need to replace the 1st duplicate so I think I've solved my problem:

#!usr/bin/perl;

$bimfile="";
$dup="";

open(BIM,"<$bimfile>") || die("Could not open file!");
my (@bim)=BIM;

open(DUPLICATES,"<$dup>")|| die("Could not open file!");
my(@duplicates)=DUPLICATES;

foreach $bim (@bim){
		foreach $duplicates (@duplicates){
			$bim=~s/$duplicates/$duplicates.1/
		}
	print BIM $bim
};

Newbi1984 0 Newbie Poster · Answer 2 · 2010-08-21T16:39:15+00:00

Hi,

I do still need help. Here is my code so far:

#!usr/bin/perl;
 
open (NEWFILE,">test.txt") || die("Could not open file!");

open (BIM,"NBS_22_4eigenstrat.bim") || die("Could not open file!");
my (@bim)=<BIM>;
print "@bim[0]\n";

open (DUPLICATES,"58C_22_4eigenstrat_duplicates.txt") || die("Could not open file!");
my(@duplicates)=<DUPLICATES>;
print "@duplicates[0]\n";


foreach $bim (@bim){
	foreach $duplicates (@duplicates){
	$replace="$duplicates"."_1";
	$bim=~ s/$duplicates/$replace/;
		}
	print NEWFILE $bim;
};

close(BIM);
close(DUPLICATES);
close(NEWFILE);

I only need to rename one of the duplicates so I thought that I would use the search and replace function to rename the first duplicate but this code replaces all the matches is there a way to program just to replace the first match only?

Thanks

mitchems 12 Posting Whiz in Training · Answer 3 · 2010-08-22T07:17:10+00:00

David,

That's an absolutely elegant solution. Great job.

Mike

Newbi1984 0 Newbie Poster · Answer 4 · 2010-08-22T20:44:41+00:00

Newbi1984 0 Newbie Poster

14 Years Ago

Thanks very much!