Thanks a lot for the suggestion.
The problem is that I have to process one line at time rather than whole file at once. The reason is tag whose content is processed differently than the rest of the file.
I can use sed commands in separate file but I'll have to create it dynamically since all replacement strings are collected into an array over 300 elements long, populated from another HTML page that could change in future. Or I could loop though the array and execute sed command in each iteration, but I believe it would be much slower.
This is generally good solution for processing the whole file, but would be too slow for line-by-line processing.
In mean time, I come up with this solution that doesn't even involve sed:
if [ -n $( echo $line | grep '<a name="' ) ]; then
OIFS=$IFS
IFS='<'
line2=
for word in $line; do
pos=$( expr index "$word" '>' )
if [ $pos -ne 0 ]; then
if [ ${word:0:8} == "a name=\"" ]; then
indx=${word:8:(( $pos - 10 ))}
if [ ${references[$indx]} ]; then
word="\\index{"${references[$indx]}"}"${word:$pos:(( ${#word} - $pos ))}
else
word=${word:$pos:(( ${#word} - $pos ))}
fi
else
word="<"$word
fi
fi
line2+=$word
done
IFS=$OIFS
line=$line2
fi
$line is the line of text currently processed, and references is the array containing replacement strings.
The idea is to break the line on '<'s, and then search for '>' in each segment. If we find a name="..."> inside then we replace it with suitable array element (or delete it if such doesn't exist). If not, that means we found some other (still not processed) tag, so we put back the '<'. Works like a charm.