0

<text>
<p><s id="s1"><ng><w pws="yes" id="w1" p="CD">25</w> <w l="woman" pws="yes" id="w4" p="NNP" common="true" headn="yes">Woman</w></ng> <vg tense="past" voice="act" asp="simple" modal="no"><w l="create" pws="yes" id="w10" p="VBD" headv="yes">created</w></vg> <pg><w pws="yes" id="w18" p="IN">for</w></pg> <ng><w vstem="succeed" l="success" pws="yes" id="w22" p="NN" headn="yes">success</w> <w l="barbie" pws="yes" id="w30" p="NN" headn="yes">barbie</w></ng> <ng><enamex type="location"><w l="reynold" pws="yes" id="w37" p="NNP" locname="single">Reynolds</w></enamex> <w l="sep" pws="yes" id="w46" p="NN" headn="yes">sep</w></ng> <ng><timex type="date"><w pws="yes" id="w50" p="CD">1986</w></timex></ng> <ng><enamex type="organization"><w l="pari" pws="yes" id="w55" p="NNP" locname="single">Paris</w> <w orgname="single" l="google" pws="yes" id="w61" p="NNP">Google</w> <w l="limited" pws="yes" id="w68" p="NNP" common="true">Limited</w></enamex></ng></s></p>
</text>

=========================================================
From the above XML, i want to get content between <enamex type="location"><w l="reynold" pws="yes" id="w37" p="NNP" locname="single">Reynolds</w></enamex>

I want to retrieve Reynolds from tag.

Any help please !! I am using Python

Thanks in advance!

2
Contributors
1
Reply
3
Views
6 Years
Discussion Span
Last Post by snippsat
1

Look at this post here i explain a little,and give a link to why regex is not the right tool for xml/html.
http://www.daniweb.com/software-development/python/threads/375186

from BeautifulSoup import BeautifulStoneSoup

xml = '''\
<text>
<p><s id="s1"><ng><w pws="yes" id="w1" p="CD">25</w> <w l="woman" pws="yes" id="w4" p="NNP"
common="true" headn="yes">Woman</w></ng> <vg tense="past" voice="act" asp="simple" modal="no"><w l="create"
pws="yes" id="w10" p="VBD" headv="yes">created</w></vg> <pg><w pws="yes" id="w18" p="IN">for</w></pg> <ng><w
vstem="succeed" l="success" pws="yes" id="w22" p="NN" headn="yes">success</w> <w l="barbie" pws="yes" id="w30"
p="NN" headn="yes">barbie</w></ng> <ng><enamex type="location"><w l="reynold" pws="yes" id="w37" p="NNP"
locname="single">Reynolds</w></enamex> <w l="sep" pws="yes" id="w46" p="NN" headn="yes">sep</w></ng>
<ng><timex type="date"><w pws="yes" id="w50" p="CD">1986</w></timex></ng> <ng><enamex type="organization"><w l="pari"
pws="yes" id="w55" p="NNP" locname="single">Paris</w> <w orgname="single" l="google" pws="yes" id="w61" p="NNP">Google</w>
<w l="limited" pws="yes" id="w68" p="NNP" common="true">Limited</w></enamex></ng></s></p>
</text>'''

soup = BeautifulStoneSoup(xml)
tag = soup.findAll('w', {'locname':'single' })
print tag[0].text #Reynolds

For fun one with regex.

import re

r = [match.group(1) for match in re.finditer(r'single">(\w+)', xml)]
print r[0]#Reynolds

Edited by snippsat: n/a

Votes + Comments
beautifulsoup pro
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.