Hi there,

I'm very new to perl so I hope that you can help me.

I have a list of duplicated entries in one file and what I want to do is to search another file for those entries that appear in the first file and then rename each duplicate consecutively.


File I want to change (tab delimited):

ID Entry Entry
1 0 0
2 1 0
2 0 1
3 1 1
4 0 0
4 0 0

List of duplicates:
ID
2
4

Desired output:
ID Entry Entry
1 0 0
2.1 1 0
2.2 0 1
3 1 1
4.1 0 0
4.2 0 0

So what I thought of doing is reading both files into arrays:

$bimfile="";
$n="A";

open(BIM,"$bimfile");
my @bim=split(/\s+/,BIM);

open(DUPLICATES,"<FILENAME>");
my(@duplicates)=chomp(DUPLICATES);

foreach $duplicate (@duplicates){
if

I'm not sure what to put in as the if statement. I think maybe if entries from the @duplicates array match the entries in the @bim array then rename .1 and then .2 consecutively else just print the line.

It is important that I keep the order of the bim file and print out each line as it is?

Is this the best way of going about this or is there an easier way?

Thanks!

Recommended Answers

All 6 Replies

I only need to replace the 1st duplicate so I think I've solved my problem:

#!usr/bin/perl;

$bimfile="";
$dup="";

open(BIM,"<$bimfile>") || die("Could not open file!");
my (@bim)=BIM;

open(DUPLICATES,"<$dup>")|| die("Could not open file!");
my(@duplicates)=DUPLICATES;

foreach $bim (@bim){
		foreach $duplicates (@duplicates){
			$bim=~s/$duplicates/$duplicates.1/
		}
	print BIM $bim
};

I'm not sure how you code is going to work properly. Did you solve your problem? If so, do you want our help or no? If not, mark it as solved. If you do what help, let us know.

Hi,

I do still need help. Here is my code so far:

#!usr/bin/perl;
 
open (NEWFILE,">test.txt") || die("Could not open file!");

open (BIM,"NBS_22_4eigenstrat.bim") || die("Could not open file!");
my (@bim)=<BIM>;
print "@bim[0]\n";

open (DUPLICATES,"58C_22_4eigenstrat_duplicates.txt") || die("Could not open file!");
my(@duplicates)=<DUPLICATES>;
print "@duplicates[0]\n";


foreach $bim (@bim){
	foreach $duplicates (@duplicates){
	$replace="$duplicates"."_1";
	$bim=~ s/$duplicates/$replace/;
		}
	print NEWFILE $bim;
};

close(BIM);
close(DUPLICATES);
close(NEWFILE);

I only need to rename one of the duplicates so I thought that I would use the search and replace function to rename the first duplicate but this code replaces all the matches is there a way to program just to replace the first match only?

Thanks

#!/usr/bin/perl
use strict;
use warnings;

#I'm reading my __DATA__ section but you can open your file
# instead and slurp it as follows: (I don't read a list of duplicates, only the first file)
undef $/;
my $whole_file = <DATA>; # 'slurp' mode
$/ = "\n"; #Put it back the way it was

#regular expression substitute in multi-line and global mode
$whole_file =~ s/^(\w+)([^\n]+\n)(?=\1)/$1.1$2/mg;
print $whole_file;

__DATA__
ID Entry Entry
1 0 0
2 1 0
2 0 1
3 1 1
4 0 0
4 0 0

This gives the following output:

ID Entry Entry
1 0 0
2.1 1 0
2 0 1
3 1 1
4.1 0 0
4 0 0
commented: David - that's just an awesome regex! +2

David,

That's an absolutely elegant solution. Great job.

Mike

Thanks very much!

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.