0

Hi,

I have a requirement where I have to identify duplicates from a file based on the first 6 chars (It is fixed width file of 12 chars length) and whenever a duplicate row is found, its original and duplicate row's last 2 chars should be updated to all 0's if they are not same. (I mean last 2 digits of original and duplicate row should be same, if not then default to 00 else keep them as is)

I can use multiple loops and get the results but I would need something which will be faster

here is the sample input and output

input:
1251233Y1234
1221249N8821
1231116Y9945
1231113Y2123
1231109Y3212
1231123N1214
1231126N1214

output should be:
1251233Y1234
1221249N8821
1231116Y9900
1231113Y2100
1231109N3212
1231123N1214
1231126N1214 (Since last 2 digits are same nothing changed)

Any help in achieving the above result using either awk/sed will be greatly appreciated.

Thanks,
Faraway

2
Contributors
2
Replies
3
Views
5 Years
Discussion Span
Last Post by L7Sqr
0

Are you required to use awk/sed? This would be much easier if you could employ something like perl/ruby/python.
In general, the loop body would look something like:

if first_line
    previous_line = current_line
    continue
end if

saved_current_line = current_line

if previous_line[0..6] == current_line[0..6]
    previous_line[-2..-1] = "00"
    current_line[-2..-1] = "00"
end if

previous_line = saved_current_line

output previous_line
output current_line

Of course, this completely ignores the case when you have an odd number of sequential lines with a matching prefix - you'd have to add logic in to handle that.

0

Whoops. The last part of that should output before reassigning to previous_line as well as checking for the last line instead of blindly printing current_line at each loop iteration.

So something like:

output previous_line

previous_line = saved_current_line

if last_line
    output current_line
end if
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.