Identify duplicates and update the last 2 digits to 0 for both the Orig and

Question

farawaydsky 0 Newbie Poster

13 Years Ago

Hi,

I have a requirement where I have to identify duplicates from a file based on the first 6 chars (It is fixed width file of 12 chars length) and whenever a duplicate row is found, its original and duplicate row's last 2 chars should be updated to all 0's if they are not same. (I mean last 2 digits of original and duplicate row should be same, if not then default to 00 else keep them as is)

I can use multiple loops and get the results but I would need something which will be faster

here is the sample input and output

input:
1251233Y1234
1221249N8821
1231116Y9945
1231113Y2123
1231109Y3212
1231123N1214
1231126N1214

output should be:
1251233Y1234
1221249N8821
1231116Y9900
1231113Y2100
1231109N3212
1231123N1214
1231126N1214 (Since last 2 digits are same nothing changed)

Any help in achieving the above result using either awk/sed will be greatly appreciated.

Thanks,
Faraway

shell-scripting

2 Contributors
2 Replies
115 Views
3 Days Discussion Span
Latest Post 13 Years Ago Latest Post by L7Sqr

All 2 Replies

L7Sqr 227 Practically a Master Poster

13 Years Ago

Are you required to use awk/sed? This would be much easier if you could employ something like perl/ruby/python.
In general, the loop body would look something like:

if first_line
    previous_line = current_line
    continue
end if

saved_current_line = current_line

if previous_line[0..6] == current_line[0..6]
    previous_line[-2..-1] = "00"
    current_line[-2..-1] = "00"
end if

previous_line = saved_current_line

output previous_line
output current_line

Of course, this completely ignores the case when you have an odd number of sequential lines with a matching prefix - you'd have to add logic in to handle that.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

L7Sqr 227 Practically a Master Poster · Answer 1 · 2012-06-18T17:51:26+00:00

Whoops. The last part of that should output before reassigning to previous_line as well as checking for the last line instead of blindly printing current_line at each loop iteration.

So something like:

output previous_line

previous_line = saved_current_line

if last_line
    output current_line
end if

Identify duplicates and update the last 2 digits to 0 for both the Orig and

Recommended Answers Collapse Answers

All 2 Replies

Recommended Answers