Hello experts,
I am trying to extract data from XML file using XSLT. I am trying to code a general XSLT code that can handle similar XML files that may differ a bit from each other.

the XML code I am working on can have the following 4 scenarios for the field [FUNCTION] (i am trying to extract [FUNCTION]). The [FUNCTION] may be in the middle or at the start or at the end. If I try to use tokenize with the delimiter ';', the problem is sometimes it is in between the [FUNCTION] statement as it is here,

<GBSeq_comment>On or before Feb 16, 2007 this sequence version replaced gi:121945493, gi:121751.; [B][FUNCTION] Facilitative glucose transporter. This isoform may be responsible for constitutive or basal glucose uptake. Has a very broad substrate specificity; can transport a wide range of aldoses including both pentoses and hexoses.[/B]; [SUBCELLULAR LOCATION] Cell membrane; Multi-pass </GBSeq_comment>

or

<GBSeq_comment>On or before Feb 16, 2007 this sequence version replaced gi:121945493, gi:121751.; [B][FUNCTION] Facilitative glucose transporter. This isoform may be responsible for constitutive or basal glucose uptake. Has a very broad substrate specificity; can transport a wide range of aldoses including both pentoses and hexoses.[/B]</GBSeq_comment>

or

<GBSeq_comment>[[B]FUNCTION] Facilitative glucose transporter. This isoform may be responsible for constitutive or basal glucose uptake. Has a very broad substrate specificity; can transport a wide range of aldoses including both pentoses and hexoses.[/B]</GBSeq_comment>

or

<GBSeq_comment>[B][FUNCTION] Facilitative glucose transporter. This isoform may be responsible for constitutive or basal glucose uptake. Has a very broad substrate specificity; can transport a wide range of aldoses including both pentoses and hexoses.[/B]; [SUBCELLULAR LOCATION] Cell membrane; Multi-pass </GBSeq_comment>

i want to write a code that can work for all three of this, I have the following XSLT code that works for scenario 1 and 3 (thanks to xml_looser), but doesn't work for 2 and 4.

the code is

<xsl:for-each select="GBSeq_comment">
            <field name="protein_function"> 
                <xsl:choose>
                    <xsl:when test="contains(.,'[FUNCTION]') and contains(.,'; [')">
                        <xsl:value-of select="substring-before(substring-after(.,'; [FUNCTION] '),'; [')"/>
                    </xsl:when>
                    <xsl:when test="contains(.,'[FUNCTION] ')">
                        <xsl:value-of select="substring-after(.,'[FUNCTION] ')"/>
                    </xsl:when>
                   </xsl:choose>
            </field>
                </xsl:for-each>

Could anyone of u please help me. I greatly appreciate your help and your time.
Thank you,
Sammed

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
	<xsl:template match="/">
		<xsl:apply-templates select="root"/>
	</xsl:template>
	<xsl:template match="root">
		<xsl:apply-templates select="GBSeq_comment"/>
	</xsl:template>
	<xsl:template match="GBSeq_comment">
		<xsl:choose>
		<xsl:when test="contains(.,'[FUNCTION] ') and contains(.,'.; [SUB')">
		
				<xsl:value-of select="substring-before(substring-after(.,'[FUNCTION] '),'.; [SUB')"/>
			</xsl:when>
			<xsl:when test="contains(.,'; [FUNCTION] ') and contains(.,'; [SUB')">
				
				<xsl:value-of select="substring-before(substring-after(.,'; [FUNCTION] '),'; [SUB')"/>
			</xsl:when>

			<xsl:when test="contains(.,'[FUNCTION] ')">
				
				<xsl:value-of select="substring-after(.,'[FUNCTION] ')"/>
			</xsl:when>
		</xsl:choose>
	</xsl:template>
</xsl:stylesheet>
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.