| | |
Parsing of information in perl
Please support our Perl advertiser: Programming Forums - DaniWeb Sister Site
![]() |
•
•
Join Date: Aug 2008
Posts: 32
Reputation:
Solved Threads: 0
Hi,
I have strings like this:
I want to parse these tags and create an hash for this string.
The output should be like this:
I tried paring using regular expression but some times the format might be different.
How to parse all information and create hash?
Regards
Vandita
I have strings like this:
Perl Syntax (Toggle Plain Text)
$string="OWN - NLM STAT- Publisher DA - 20091005 AU - Gannon AM AU - Turner EC AU - Reid HM AU - Kinsella BT AU- XYZ AD - UCD School of Biomolecular and Biomedical Sciences";
I want to parse these tags and create an hash for this string.
The output should be like this:
Perl Syntax (Toggle Plain Text)
OWN- NLM STAT- Publisher DA - 20091005 AU- Gannon AM Turner EC Kinsella BT XYZ AD - UCD School of Biomolecular and Biomedical Sciences
I tried paring using regular expression but some times the format might be different.
Perl Syntax (Toggle Plain Text)
$srting=~/OWN-(.*)AU(.*)/g;
How to parse all information and create hash?
Regards
Vandita
0
#2 Oct 16th, 2009
The following program parses the string and stores the substrings in an array or list. So far I haven't figured out how to put the contents of this array into a hash that would be useful.
Perl Syntax (Toggle Plain Text)
#!/usr/bin/perl -w use strict; $_ = "OWN - NLM " . "STAT- Publisher " . "DA - 20091005 " . "AU - Gannon AM " . "AU - Turner EC " . "AU - Reid HM " . "AU - Kinsella BT " . "AU- XYZ " . "AD - UCD School of Biomolecular and Biomedical Sciences"; print; print "\n"; #Parse the above string into key-value pairs. s/([A-Z]+\s*)-/###$1###/g; #Put the resulting substrings into a list my @list = []; @list = split /\s*###/; #Remove leading and trailing spaces from each element foreach my $i (@list) { $i =~ s/^\s+//; $i =~ s/\s+$//; } #The first element is empty. Remove it with shift. shift @list; #Here's what the list contains now for (0..$#list) { print "Element $_: $list[$_]\n"; }
0
#3 Oct 17th, 2009
This should do it.
Perl Syntax (Toggle Plain Text)
#!/usr/bin/perl -w #ParseStringCreateHash.pl use strict; $_ = "OWN - NLM STAT- Publisher DA - 20091005 AU - Gannon AM AU - Turner EC AU - Reid HM AU - Kinsella BT AU- XYZ AD - UCD School of Biomolecular and Biomedical Sciences"; print "\n"; #Parse the above string into key-value pairs. s/([A-Z]+\s*)-/###$1###/g; #Put the resulting substrings into a array my @array = []; @array = split /\s*###/; #Remove leading and trailing spaces from each element foreach my $i (@array) { $i =~ s/^\s+//; $i =~ s/\s+$//; } #The first element is empty. Remove it with shift. shift @array; #Here's what the array contains now print "Here's what the array contains now:\n"; for (0..$#array) { print "Element $_: $array[$_]\n"; } #Create a hash from the above array my %hash = (); for (0..$#array) { if ($_%2 eq 0) { #$_ is even #Array element represents a hash key unless (exists $hash{$array[$_]}) { $hash{$array[$_]} .= ""; } } else { #$_ is odd #Array element[$_] represents part of value associated with previous element-key $hash{$array[$_ - 1]} .= "$array[$_] " }; } print "\nHash keys and values separated by '-' :\n"; for (keys %hash) { print "$_ - $hash{$_}\n"; }
•
•
Join Date: Aug 2008
Posts: 32
Reputation:
Solved Threads: 0
0
#4 Oct 29th, 2009
•
•
•
•
The following program parses the string and stores the substrings in an array or list. So far I haven't figured out how to put the contents of this array into a hash that would be useful.
Perl Syntax (Toggle Plain Text)
#!/usr/bin/perl -w use strict; $_ = "OWN - NLM " . "STAT- Publisher " . "DA - 20091005 " . "AU - Gannon AM " . "AU - Turner EC " . "AU - Reid HM " . "AU - Kinsella BT " . "AU- XYZ " . "AD - UCD School of Biomolecular and Biomedical Sciences"; print; print "\n"; #Parse the above string into key-value pairs. s/([A-Z]+\s*)-/###$1###/g; #Put the resulting substrings into a list my @list = []; @list = split /\s*###/; #Remove leading and trailing spaces from each element foreach my $i (@list) { $i =~ s/^\s+//; $i =~ s/\s+$//; } #The first element is empty. Remove it with shift. shift @list; #Here's what the list contains now for (0..$#list) { print "Element $_: $list[$_]\n"; }
Hi,
Thanks for the reply.
I had one doubt. Suppose for example id string is like this:
Perl Syntax (Toggle Plain Text)
$str=" AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone. TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively";
The output should be like this:
Perl Syntax (Toggle Plain Text)
AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone. TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively.
But the output is not like the above one:
Perl Syntax (Toggle Plain Text)
TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone
Actually the whole paragraph (sentences) belongs to AB Tag only not two separate tags as TP and AB its under only one tag i.e AB.
Finally How to get the output as below??
Perl Syntax (Toggle Plain Text)
AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone. TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively.
How can i get the desired output?
AB , AD are the main tags so the information at the beginning of the tag i.e only AB - , AD - has to be parsed not the sentences which contains TP-beta- isoforms should not be parsed.
Regards
Archana
Last edited by Vandithar; Oct 29th, 2009 at 4:06 am.
•
•
Join Date: Nov 2008
Posts: 63
Reputation:
Solved Threads: 4
0
#5 Oct 29th, 2009
•
•
•
•
Hi,
Thanks for the reply.
I had one doubt. Suppose for example id string is like this:
Perl Syntax (Toggle Plain Text)
$str=" AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone. TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively";
The output should be like this:
Perl Syntax (Toggle Plain Text)
AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone. TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively.
But the output is not like the above one:
Perl Syntax (Toggle Plain Text)
TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone
Actually the whole paragraph (sentences) belongs to AB Tag only not two separate tags as TP and AB its under only one tag i.e AB.
Finally How to get the output as below??
Perl Syntax (Toggle Plain Text)
AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone. TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively.
How can i get the desired output?
AB , AD are the main tags so the information at the beginning of the tag i.e only AB - , AD - has to be parsed not the sentences which contains TP-beta- isoforms should not be parsed.
Regards
Archana
Perl Syntax (Toggle Plain Text)
<ID tags> elements: AB AD TP etc. elements: <id string> <id string> elements: any character that is not an id tag. <School Tags> elements: AU <school info>etc <school info> elemsnts: any character that is not a School Tag
Now that you have given us a second example of a string to be parsed we need to infer a rule that will handle both examples. Say we assign your first example to a variable called $str1 and the second to $str2. Your program needs to parse both of the following strings into keys and values to put in a hash:
I assume you want a program that can parse both $str1 and $str2 into hash keys and values. Is that correct? Right now I can't think of one set of rules that would handle both. Maybe someone else can suggest one.
Perl Syntax (Toggle Plain Text)
#RULE 1: # Every substring consisting of capital letters followed by # a hyphen (-) represents a key and whatever follows before the # next substring consisting of capital letters followed by # a hyphen is the value associated with that key. $str1="OWN - NLM STAT- Publisher DA - 20091005 AU - Gannon AM AU - Turner EC AU - Reid HM AU - Kinsella BT AU- XYZ AD - UCD School of Biomolecular and Biomedical Sciences"; # #RULE 2: # A substring consisting of capital letters followed by a hyphen - # represents a key only if it occurs at the beginning of a line. # It may (or must?) be preceded by a space which we will # remove when we put the key in the hash. # Whatever follows, including substrings of capital letters # followed by a hyphen, up until the end of the line # (indicated by a newline character?) is the value associated # with that key. $str2=" AB - Thromboxane plays an essential role in hemostasis, regulating platelet aggregation and vessel tone. TP - beta- isoforms that are transcriptionally regulated by distinct Prm1 and Prm3, respectively";
![]() |
Similar Threads
- running perl (Perl)
- perl project (Perl)
- difference between session and application scope at jsp (JSP)
- New in the hood have some problems (Perl)
- perl gui (Perl)
- LWP HTML ::TokeParser - running against a bulletin board (Perl)
- Need Help Urgently (JavaScript / DHTML / AJAX)
- who here can do perl? (Perl)
- Pop3 Mail Watcher (Part 1) (Visual Basic 4 / 5 / 6)
Other Threads in the Perl Forum
- Previous Thread: Perl
- Next Thread: search an array for strings
Views: 1178 | Replies: 6
| Thread Tools | Search this Thread |
Tag cloud for hash, parsing, string
2d access animation api array assembly assign binary c# c++ calculator challenge char character client convert count data date datetime drawing dynamic encryption file filename format fstream ftp function gdi+ getline hash i/o ifstream input int java line list listbox lists loop looping matching memory method msqli_multi_query mysql namevaluepairs output parse parsing path pattern perl php program python recursion recursive regex remove reverse rotation saving search searchingfile server single slicenotation socket sql string subscript superscript text-file timer tuple unicode user validation year






