I have been working with web traces and i need to change a particular format of the web trace to another.

for example

from

cs21 793468639 122791 173 icons-html/HomePageIcon.gif ALT=" 0 0.0
cs21 793468639 122791 173 icons-html/HomePageIcon.gif ALT=" 0 0.0
cs21 793468639 122791 173 icons-html/HomePageIcon.gif ALT=" 0 0.0

to

cs21 793468639 122791 173 -unique id- 0 0.0
cs21 793468639 122791 173 -unique id- 0 0.0
cs21 793468639 122791 173 -unique id- 0 0.0

in each line, the 5th colume (which is a string) has to be converted to a unique-id (integer value).there are few thousands of such lines in the file (textfile).
how should i proceed to do this in shell script ?
Is there a way to do this ?

please help me out !

strings will repeat.so it should able to generate unique numbers to each string.

hi,

awk is the tool.

it uses associative arrays, which subscript could be the string, and which value could be incremented if subscript isn't in array.

so, if string is in array, then change string with array's value
else increment value.

could you please explain me a bit more... how do we actually pass a string to an array ??
i have tried but end up getting errors.. i use ubuntu 10 and bash for shell script

awk is the tool to use
it's easier to ask awk if an element is in an array (or not) than ask the same to bash.

echo "abc def ghi
jkl def mno
abc pqr stu" | awk '!($2 in a ){a[$2]=++n}{$2=a[$2];print}'
abc 1 ghi
jkl 1 mno
abc 2 stu

you see how simple it is.

thanks alot !!! ya it was simple but i was struck somewhere and couldnt get it...
but now i got it... thank u so much

i have got another problem. Even this also the same conversion but its a bit complicated. the thing is now i want a unique number for a substring of a string from a text file containing few thousands of lines(strings).

if a substring repeats in any other line and it was assigned a number then the same number must occur.

for example:: the lines in the text file look like

"http://cs-www.bu.edu/"
"http://cs-www.bu.edu/lib/logo/Ray_Chou_Logo0.small.gif"
"http://cs-www.bu.edu/pointers/Home.html"
"http://cs-www.bu.edu/lib/pics/bu-logo.gif"
"http://www.cs.indiana.edu/cstr/search"
"http://www.cs.indiana.edu/cstr/search"
"http://www.cs.indiana.edu/cstr/search?csl-tr-92-505"
"http://www.cs.indiana.edu/cstr/search"
"http://cs-www.bu.edu/faculty/heddaya/navigation.html"
"http://cs-tr.cs.cornell.edu/TR/Search/"
"http://cs-tr.cs.cornell.edu/TR/Search/?olean=and&author=Singh&title=&abstract=foreign"
"http://cs-tr.cs.cornell.edu/TR/Search/"

in this the server part of it ie ("http://cs-www.bu.edu) must be assigned a number and should be same in all lines and also if it changes the number should increment by 1 and then continue till end of file.

so the output looks like ::

1/"
1/lib/logo/Ray_Chou_Logo0.small.gif"
1/pointers/Home.html"
1/lib/pics/bu-logo.gif"
2/cstr/search"
2/cstr/search"
2/cstr/search?csl-tr-92-505"
2/cstr/search"
1/faculty/heddaya/navigation.html"
3/TR/Search/"
3/TR/Search/?olean=and&author=Singh&title=&abstract=foreign"
3/TR/Search/"

hi,

yet pretty easy ;)

awk '{n=split($0,a,"/"); s=a[1]a[2]a[3]; if(s in A){}else{A[s]=++m}; sub(s,""); print A[s],$0}' UrFile
1 /"
1 /lib/logo/Ray_Chou_Logo0.small.gif"
1 /pointers/Home.html"
1 /lib/pics/bu-logo.gif"
2 /cstr/search"
2 /cstr/search"
2 /cstr/search?csl-tr-92-505"
2 /cstr/search"
1 /faculty/heddaya/navigation.html"
3 /TR/Search/"
3 /TR/Search/?olean=and&author=Singh&title=&abstract=foreign"
3 /TR/Search/"

oops :(
I did a little mistake, s variable needs // for sub() to remove server address

awk '{n=split($0,a,"/"); s=a[1]"//"a[3]; if(s in A){}else{A[s]=++m}; sub(s,""); print A[s],$0}' UrFile
This question has already been answered. Start a new discussion instead.