We're a community of 1.1M IT Pros here for help, advice, solutions, professional growth and fun. Join us!
1,080,703 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Start New Discussion Reply to this Discussion

XML to RDF

My requirement is to parse through a HTML document ( which will be an article) and come up with a few keywords which describe what the document is about.

For example, if a professor writes an essay about say, information security; i want to parse this HTML document and come up with keywords say, information security, secure data storage etc.

Are there any tools available for such parsing? Are there any algorithms employed to parse and come up with keywords??

VJ.

2
Contributors
5
Replies
3 Days
Discussion Span
1 Year Ago
Last Updated
6
Views
vijiraghs
Light Poster
30 posts since Mar 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

Maybe this article can help. Or this tool.

pritaeas
Posting Prodigy
Moderator
9,543 posts since Jul 2006
Reputation Points: 1,194
Solved Threads: 1,495
Skill Endorsements: 98

I read both of them. One tool parses the entire document and lists the word density. I can opt for this tool if I work under the assumption that the article is about the word that occurs most number of times. But this might not hold true always. So, if there are any other tools that might be useful, please help me.

VJ.

vijiraghs
Light Poster
30 posts since Mar 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0
pritaeas
Posting Prodigy
Moderator
9,543 posts since Jul 2006
Reputation Points: 1,194
Solved Threads: 1,495
Skill Endorsements: 98

you are my saviour.... Gracias senor!!

VJ

vijiraghs
Light Poster
30 posts since Mar 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

The following is my requirement.

The user writes a blog. It contains a heading, few keywords/tags(user mentions explicitly) and the content. The user then submits the blog.

Before getting stored, the content section of the blog must be scanned using

http://www.alchemyapi.com/api/keyword/

This will yield, say, 10 keywords. Then I have to check if the keywords returned by the scanner are semantically related to the ones that the user has explicitly tagged. If they are, then they are included in a separate field in the database.

*I will have a **semantic library containing the keywords related to the keywords tagged by the user.

** I think "WordNet" will be sufficient. But most of the posts in my website will be computer science oriented. So, if there are any semantic libraries meant for computer science, please let me know.

Also, I want to know if I can achieve my requirement using "protege". If not, are there any other tools/platform with which i can do the above??

vijiraghs
Light Poster
30 posts since Mar 2011
Reputation Points: 10
Solved Threads: 0
Skill Endorsements: 0

This article has been dead for over three months: Start a new discussion instead

Post: Markdown Syntax: Formatting Help
 
You
View similar articles that have also been tagged:
 
© 2013 DaniWeb® LLC
Page generated in 0.0716 seconds using 2.7MB