Hi guys,
I am writing this perl script.
Basically, what I want to do with this is that I have a text file vec.txt which looks something like this:
<T> chemical- and bio-terrorism </T>
<T> <C> nerve agents </C> </T> , <T> toxic proteins </T>
<T> toxic protein </T>
<T> terroristic chemical attacks </T>
<T> chemical terroristic attacks </T>
<T> terroristic bombing </T> . To fight <T> chemical terrorism </T>
<T> antiterroristic </T>
<T> terroristic chemical attack </T>
<T> chemical terrorism </T>
<T> chemical-related events </T>
<T> terrorism attacks </T>
<T> human poisoning </T>
<T> toxicology </T>
<T> terrorist group </T>
<T> casualties </T> living close to the <C> sarin </C> release <T> died </T>
<T> kill </T>
<T> murdered </T>
<T> coordinated attack </T>
<T> "casualties" </T>
<T> poisoned </T>
<T> mass casualty </T>
<T> terrorism preparedness </T>
<T> chemical </T>
<T> radiological </T>
<T> anthrax, chemical, and radiological exposures </T>
<T> chemical </T>
<T> terrorism </T>
<T> radiological terrorism </T>
<T> radiological terrorism </T>
<T> suicide scenarios </T>
<T> Weapons of mass destruction </T>
<T> weapons of mass destruction </T>
<T> chemical terrorism </T>
<T> radiological terrorism </T>
<T> chemical, or radiological terrorism </T>
<T> radiological terrorism </T>
<T> acts of terrorism </T>
<T> terrorism </T> scenario is the use of a conventional <T> explosive </T>


I wrote this script so that I can get whatever is inside the <T>.. </T> tags and write it line by line in a separate file.

The script is as follows:

#!/usr/bin/perl
#PERL SCRIPT BEGINS
open(FILE,"vec.txt");
open(FF,">vec2.txt");
while (<FILE>)
{
chomp($_);
@arr=split("",$_);
$len=@arr;
for ($i=0;$i<$len;$i++)
{
if (($arr[$i]=="<")&&($arr[$i+1]=="T")&&($arr[$i+2]==">"))
{
$line="";
chomp($line);
$i=$i+4;
do
{
$line=$line.$arr[$i++];
chomp($line);
}
while (!(($arr[$i]=="<")&&($arr[$i+1]=="/")&&($arr[$i+2]=="T")&&($arr[$i+3]==">")));
print FF "$line\n";
}
else
{
next;
}
}
}
close FILE;
close FF;

Now I expect the output to be something like:
chemical- and bio-terrorism
<C> nerve agents </C>
toxic proteins
toxic protein
terroristic chemical attacks
chemical terroristic attacks
terroristic bombing
chemical terrorism
antiterroristic
terroristic chemical attack
chemical terrorism
chemical-related events
.....


But instead what I get is:


a
d
t
i
T
<
r
e
/
T
T
i
t
<

t
p
n

t
i
c
a
a
/

c
a
r
i
a
/

t
i
b
g
.....


Please tell me what is going wrong.
Any help shall be appreciated.
Thanks.

Recommended Answers

All 5 Replies

Apparently, part of the problem starts at this statement: @arr=split("",$_); #This splits the string into single characters To test, I ran the following:

print join("\n",split("",'abcdefghijklmno'));

And the output was

a
b
c
d
e
f
g
h
i
j
k
l
m
n
o

I know that doesn't help much. If nobody else figures out how to do it, I'll have another look when there's time.

Hey d5e5,
I know that is what it does....and that is what it is supposed to do im my program......dont you think so?...........I initially thought that maybe the output is coming like this because of some unwanted newlines........so I even added chomp statements wherever possible..........but it did not work..............I personally think that the only possible error is in the fact that somehow the $line variable is not getting properly concatenated......because even if u print $line after every concatenation,the output does not change........somehow the inner do..while loop is getting over only after one iteration...................but I dont get what is happening?...............Thanks guys..........
If anybody comes up with a solution please let me know................

even the assignment is wrong in line number 9.You r assigning a vector to a scalar.

No Anurag,That assignment is all right.....by that only the length of the array is being assigned to a scalar............but yes..........I now know that I used the wrong operator.............

Again you need to use "eq" not "==" when comparing strings.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.