954,517 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Extracting data from RDF/XML files

Hello Experts, you have been of great help to me when it comes to XSLT. Here is another problem I have while I try to extract the data from RDF/XML files. I don't know how to do that as there are terms like dcterms defined in the XML file. They have mentioned the namespace in the XML file. But, I don't know how to extract the data. The XML file looks something like this..

<?xml version="1.0"?>
<rdf:RDF
  xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:dcterms="http://purl.org/dc/terms/"
  xmlns:prism="http://prismstandard.org/namespaces/1.2/basic/"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns="http://www.connotea.org/2005/01/schema#"
>
  
  <dcterms:URI rdf:about="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17477949">
    <link>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17477949</link>
    <dc:title></dc:title>
    
    <tag>Formicidae</tag>
    <tag>RNA virus</tag>
    <tag>strand rna</tag>
    <tag>RNA viruses</tag>
    <tag>genome characteristics</tag>
    <tag>RNA polymerase</tag>
    <tag>genome structure</tag>
    <tag>Solenopsis invicta</tag>
    <tag>red imported fire ant</tag>
    <tag>pubmed</tag>
    <tag>picornaviridae</tag>
    <tag>Polycistronic</tag>
    <tag>helicase</tag>
    <tag>codons</tag>
    <tag>protease</tag>
    <tag>orf</tag>
    <tag>orientation</tag>
    <tag>cdna synthesis</tag>
    <tag>expressed sequence tag</tag>
    
    <postedBy>semant</postedBy>
    
    <postCount>1</postCount>
    <hash>34d77b6b622570e5a215702ff6d7156e</hash>
    <bookmarkID>830485</bookmarkID>
    <created>2007-05-05T22:58:43Z</created>
    <updated>2007-07-13T23:02:01Z</updated>
    <firstUser>semant</firstUser>
    
        <citation>
          <rdf:Description>
            <citationID>482422</citationID>
            <prism:title>A new positive-strand RNA virus with unique genome characteristics from the red imported fire ant, Solenopsis invicta.</prism:title>
            
            <foaf:maker>
              <foaf:Person>
                <foaf:name>Steven M Valles</foaf:name>
              </foaf:Person>
            </foaf:maker>
            
            <foaf:maker>
              <foaf:Person>
                <foaf:name>Charles A Strong</foaf:name>
              </foaf:Person>
            </foaf:maker>
            
            <foaf:maker>
              <foaf:Person>
                <foaf:name>Yoshifumi Hashimoto</foaf:name>
              </foaf:Person>
            </foaf:maker>
            
            <dc:date>2007-05-01T00:00:00Z</dc:date>
            
            <journalID>449933</journalID>
            <prism:publicationName>Virology</prism:publicationName>
            
            <prism:issn>0042-6822</prism:issn>
            
            <doiResolver rdf:resource="http://dx.doi.org/10.1016/j.virol.2007.03.043"/>
            <dc:identifier>doi:10.1016/j.virol.2007.03.043</dc:identifier>
            
            <pmidResolver rdf:resource="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17477949"/>
            <dc:identifier>PMID: 17477949</dc:identifier>
            
          </rdf:Description>
        </citation>
    
    <rdfs:seeAlso rdf:resource="http://www.connotea.org/data/uri/34d77b6b622570e5a215702ff6d7156e" /> <!-- GET this URI to retrieve further information -->
  </dcterms:URI>


And I want to simply extract the data from this file that will look something like this.

<uri>http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17477949</uri>
<title>A new positive-strand RNA virus with unique genome characteristics from the red imported fire ant, Solenopsis invicta.</title>
<author>Steven M Valles</author>
<author>Charles A Strong</author>
<author>Yoshifumi Hashimoto</author>
<PubmedID>PMID: 17477949</PubmedID>


I have worked previously with XML files, but I used to exclude the namespaces. I don't know how to extract with namespaces.
There is much more data than this, I am presenting a snapshot of it. And I am going to generalize this code, so the data retrieving is not specific for this file.
Any help is greatly appreciated.

Thank you,
Sammed

smandape
Newbie Poster
24 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

Hello experts, any idea on this please?

smandape
Newbie Poster
24 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

i have problem with your xml

my parser have porblem with

&db in

<dcterms:URI rdf:about="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=Retrieve&db=pubmed&dopt=Abstract&list_uids=17477949"/>
xml_looser
Junior Poster
179 posts since Apr 2009
Reputation Points: 16
Solved Threads: 21
 

Even I got the same error at that line, and I don't know what to do. I am still figuring it out.
Thank you,
Sammed

smandape
Newbie Poster
24 posts since Feb 2011
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: