954,541 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Test Extraction-categorization-summarization

Helo everybody..im new to the forum n these days iv been streesed uip with my project.
wanted to know if perl would be a good choice to implement an automatic program to extract text from online biomedical jounals(can b pdf or html format),categorise and summarise the text and keep it in a db.
thnx

prgmkevin
Newbie Poster
2 posts since Jan 2008
Reputation Points: 10
Solved Threads: 0
 

Perl would be a very good language to do this in. I worked on a school project last semester where I grabbed news articles off of the internet and stored them into a database so that another program could analyze them. The hard part is getting the information you want out of the pages... You might try looking at the HTML-Parser perl module to get you started.

stupidenator
Junior Poster
192 posts since Mar 2005
Reputation Points: 18
Solved Threads: 4
 

extracting data from a pdf file could be difficult. I have seen many questions posted on forums over the years dealing with pdf files and it seems the perl modules there are for reading pdf files may leave something to be desired. The rest should be relatively easy but probably not for a beginner.

KevinADC
Posting Shark
921 posts since Mar 2006
Reputation Points: 246
Solved Threads: 67
 

thanx...this is my university project n im completely loss...iv bin tryg to study n learn perl bt its a challenging task n iv got a ime limitation.what if i shift to php??n can perlscript be run from php...
i mean working perlscript as a backend to deal with some task like z parsing and all oher task accomplised throgh php??

prgmkevin
Newbie Poster
2 posts since Jan 2008
Reputation Points: 10
Solved Threads: 0
 

if you have php questions ask them in the php forum, but I am sure a php script can run a perl program. But if you are at a total loss as to how to even start your project, I really have no idea what to suggest.

KevinADC
Posting Shark
921 posts since Mar 2006
Reputation Points: 246
Solved Threads: 67
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You