hi,

using the sample xml below:

<hdr>
   <sect role="para" label="1234">
      <legref>asdf</legref>
      <cite role="com">abc</cite>
      <cite role="rg">RTY 08</cite>
      <cite role="rg">SDF 05</cite>
      <othertag>some textdata here
         <cite role="rg">SXF 05</cite>
      </othertag>
   </sect>
   <sect role="para" label="2345">
      <cite role="com">xyz</cite>
      <cite role="rg">WER 10</cite>
      <cite role="rg">TRS 10</cite>
      <cite role="rg">WER 10</cite>
      <legref>qwert</legref>
   </sect>
</hdr>

need to extract the label attribute value in <sect role="para" label="9999"> plus all its descendant <cite role="rg"> element.

<cite role="rg"> are always wrapped within <sect role="para" label="9999"> (i.e. sect element with role="para" attribute and label="9999" attribute. "9999" are based on paragraph numbers).
<cite role="rg"> can have more siblings, and can appear in lower levels of the xml tree but always within the wrapper <sect...> element.

can somebody please help me construct the xpath expression which should give a result that looks something like the one below:

<sect label="1234">
      <cite role="rg">RTY 08</cite>
      <cite role="rg">SDF 05</cite>
      <cite role="rg">SXF 05</cite>
<sect label="2345">
      <cite role="rg">WER 10</cite>
      <cite role="rg">TRS 10</cite>
      <cite role="rg">WER 10</cite>

somebody suggested the following css expression but 'SXF 05' in the example was missed because it appeared one level lower than the other cite elements.

p doc.css('cite[role = "rg"]').map { |x| [x.text, x.parent['label']] }

thanks in advance,
emmanuel

Recommended Answers

All 5 Replies

This is pretty easy. //sect|//cite

hi iceandrews,

thanks for replying.

only need the label attribute value within <sect role="para" label="9999"> (not the whole sect element) plus all its descendant <cite role="rg"> element.

picking up from your suggestion, below is what i have tried so far:

//sect[@label] | //cite[@role='rg']

tried using the string() function to fetch for the label attribute value, i.e.

string(//sect/@label)

for some reason, only returns the first attribute value.

any other ideas?

thanks and regards,
emmanuel

This is pretty easy. //sect|//cite

hi iceandrews,

thanks for replying.

only need the label attribute value in <sect role="para" label="9999"> (not the whole sect element) plus all its descendant <cite role="rg"> element.

picking up from your suggestion, below is what i have tried so far:

//sect[@label] | //cite[@role='rg']

tried using the string() function to fetch for the attribute value, i.e.

string(//sect/@label)

for some reason, only returns the first attribute value.

any other ideas?

thanks and regards,
emmanuel

//sect/@label | //cite

//sect/@label | //cite

thank you so much mate! this will do. : )

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.