for example:
the html file is as fllows:
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=gb2312" />
<title>HTML form </title>
</head>
</html>
then after the htmlparser the xml file is as fllows:
<?xml version="1.0" encoding="utf-8" ?>
- <pagestructure>
- <pageNodeList>
- <pageNode id="0" name="" tagName="html" parentNodeId="-1">
<nodeValue />
</pageNode>
- <pageNode id="1" name="" tagName="head" parentNodeId="0">
<nodeValue />
</pageNode>
- <pageNode id="2" name="" tagName="meta" parentNodeId="1">
- <nodeAttributeList>
<nodeAttribute id="0" attributeName="http-equiv" attributeValue="Content-Type" />
<nodeAttribute id="1" attributeName="content" attributeValue="text/html; charset=gb2312" />
</nodeAttributeList>
<nodeValue />
</pageNode>
- <pageNode id="3" name="" tagName="title" parentNodeId="1">
</pageNodeList>
</pagestructure>
that is to say:I want to give the pageNode an Id to take down.
If anyone of you has sample code please share with me,your suggestion greatly appreciated.
Thanks
luoyi2008061424