954,568 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

extract multiple nodes from xml files

hi,

using the sample xml below:

asdfabc
RTY 08
SDF 05
some textdata here
SXF 05
xyz
WER 10
TRS 10
WER 10
qwert


need to extract the label attribute value in plus all its descendant element.

are always wrapped within (i.e. sect element with role="para" attribute and label="9999" attribute. "9999" are based on paragraph numbers).
can have more siblings, and can appear in lower levels of the xml tree but always within the wrapper element.

can somebody please help me construct the xpath expression which should give a result that looks something like the one below:

RTY 08
SDF 05
SXF 05
WER 10
TRS 10
WER 10

somebody suggested the following css expression but 'SXF 05' in the example was missed because it appeared one level lower than the other cite elements.

p doc.css('cite[role = "rg"]').map { |x| [x.text, x.parent['label']] }


thanks in advance,
emmanuel

ermercado
Newbie Poster
4 posts since Jul 2010
Reputation Points: 10
Solved Threads: 0
 

This is pretty easy. //sect|//cite

iceandrews
Junior Poster
185 posts since May 2010
Reputation Points: 10
Solved Threads: 30
 

hi iceandrews,

thanks for replying.

only need the label attribute value within (not the whole sect element) plus all its descendant element.

picking up from your suggestion, below is what i have tried so far:

//sect[@label] | //cite[@role='rg']

tried using the string() function to fetch for the label attribute value, i.e.

string(//sect/@label)

for some reason, only returns the first attribute value.

any other ideas?

thanks and regards,
emmanuel

ermercado
Newbie Poster
4 posts since Jul 2010
Reputation Points: 10
Solved Threads: 0
 
This is pretty easy. //sect|//cite

hi iceandrews,

thanks for replying.

only need the label attribute value in (not the whole sect element) plus all its descendant element.

picking up from your suggestion, below is what i have tried so far:

//sect[@label] | //cite[@role='rg']

tried using the string() function to fetch for the attribute value, i.e.

string(//sect/@label)

for some reason, only returns the first attribute value.

any other ideas?

thanks and regards,
emmanuel

ermercado
Newbie Poster
4 posts since Jul 2010
Reputation Points: 10
Solved Threads: 0
 

//sect/@label | //cite

iceandrews
Junior Poster
185 posts since May 2010
Reputation Points: 10
Solved Threads: 30
 
//sect/@label | //cite

thank you so much mate! this will do. : )

ermercado
Newbie Poster
4 posts since Jul 2010
Reputation Points: 10
Solved Threads: 0
 

This question has already been solved

Post: Markdown Syntax: Formatting Help
You