I'm looking for a tool that can read text on a web page, or within a web-based document viewer, and insert a polling plugin of sorts at the end of each paragraph.

More specifically, I would like my website to display documents inside some kind of reader. Then, I'd like the ability to automatically insert a 'thumbs-up' or 'thumbs-down' button at the end of each paragraph or sentence. I know this could be done manually on a standard web page. But, I manage lengthy documents that are collaborated on by multiple individuals. A manual solution would be too time consuming.

I am not an IT professional. But, I've been told this would be a text parsing type of solution and the PERL community may be able to advise if such a thing exists, or inform if there are viable alternatives to what I'm trying to achieve.

Any insight would be appreciated

I'm looking for a tool that can read text on a web page, or within a web-based document viewer, and insert a polling plugin of sorts at the end of each paragraph.

More specifically, I would like my website to display documents inside some kind of reader. Then, I'd like the ability to automatically insert a 'thumbs-up' or 'thumbs-down' button at the end of each paragraph or sentence. I know this could be done manually on a standard web page. But, I manage lengthy documents that are collaborated on by multiple individuals. A manual solution would be too time consuming.

I am not an IT professional. But, I've been told this would be a text parsing type of solution and the PERL community may be able to advise if such a thing exists, or inform if there are viable alternatives to what I'm trying to achieve.

Any insight would be appreciated

Hello,

you could of course do this using Perl running as a CGI-Skript on a Webserver. But as I understand your idea, I would rather use PHP for this task. PHP for this kind of exercise has a much steeper learning curve.

How I would approach it: Create a "tunnel" website where you can Navigate to a URL by typing it into a standard input text box. As the FORM script use whatever you end up choosing, either a CGI-Script that does the processing or a PHP-Script. The script would then parse the web pages DOM and intersect it according to criteria you specify. This is non-trivial and depends a lot on the raw material that you get off the web. Not all web pages are well-formed, some have syntax errors. Paragraphs could be separated from each other with <p>, <div>, <br> or whatever other element you can think off.

Now you would insert your polling code after each of the elements that you got from the previous step, reassemble it and pipe it out to the viewer.

Not rocket science and as I mentioned most likely easier to perform using PHP. But definitely possible with Perl as well.

Thanks for the insight maba001. I think I see where you're headed with your approach. With your solution, how would you handle large documents that could be easily navigated by my colleagues?

From my point of view, the easy thing for me to do would be to upload my files (in .doc or .pdf) to my site and be accessed through a document viewer or something that allows for quick page turning/scrolling without rendering a new page each time the document is advanced. However, I'm guessing this would eliminate page objects for DOM to recognize and thereby cripple the solution. Am I right?

If I am correct in my assessment, do you know of a way to upload a document, publish it in a form that preserves the page elements, yet offers easy/quick navigation?

Thanks,

PK

Thanks for the insight maba001. I think I see where you're headed with your approach. With your solution, how would you handle large documents that could be easily navigated by my colleagues?

From my point of view, the easy thing for me to do would be to upload my files (in .doc or .pdf) to my site and be accessed through a document viewer or something that allows for quick page turning/scrolling without rendering a new page each time the document is advanced. However, I'm guessing this would eliminate page objects for DOM to recognize and thereby cripple the solution. Am I right?

If I am correct in my assessment, do you know of a way to upload a document, publish it in a form that preserves the page elements, yet offers easy/quick navigation?

Thanks,

PK

Hello PK,

this is not a simple question. An average web server would not really have a problem parsing a large website. I don't think it is a real resource question.

The key is to define the requirements and think about solutions for the individual requirement. One part of the requirements is a specification of the expected document size. Then you would specify whether or not large documents would be handled and passed through as one big file for the reviewers to work on or whether you want to artificially split the original version into multiple documents and the reviewers would work on parts of the file.

Bottom line: it all depends on how you specify what you want to achieve.

Best regards
Maba

Thanks, Maba. I'm looking into different solutions to handle the document as we speak. I appreciate your input as it is helping me narrow down the field.

PK

This article has been dead for over six months. Start a new discussion instead.