I am trying to extract the important text, and only the important text from an RSS feed. I have determimed that all of the items in the feed have the important text underneath a tag that starts
<div id="important text"> ...
In some of the files the important text is under <p> tags that are all childrem of the above <div> tag. In others all the important text is under the <div> tags that are children of the <div id="important text". It is easy enough to extract the text, but there are cases where the formatting gets messed up. These are cases where all the important text, while under the "first level" div tags is actually inside <div> tags that are children of <div> tags that are children of the <div id="important text"> tag. Sometimes the important text is actually in great grandchildren.
In order to select the tags where it useful to put in line breaks, I thought to count the number of text characters that live within "first level" div tags but do not live within "second level" div tags, those that live within "second level" div tags but not within "third level" div tags, etc. I am enough of a mathematician to realize that all I really need to do is to find out how many text characters live within first level div tags, how many live within second level div tags, how many live within third level div tags, etc.c
To end a long story, I would be quite happy to count the number of text characters contained within a specific tag and all of its descendents. How do I do that?
I promise to be reading to try to solve it myself while you are thinking about helping me.