0

I'm helping a friend on a research project. She has a list of websites (~100) which she wants to download all the text from several pages of the site for use in another program. However she is not technology savy (has never used Linux or programmed before) but is very smart and willing to learn. She originally planned to spend several weeks copy-pasting by hand but she's now been convinced to try a computational solution. Is there any tool you would suggest?

Thanks,

Agilemind

2
Contributors
2
Replies
17
Views
4 Years
Discussion Span
Last Post by Agilemind
0

The easiest way would probably be to setup a MySQL database and then run a simple PHP crawler, or to cURL the page as it is only the text you want.

I do however have some concerns relating to copyright infringement and plagerism. Even with it being a research paper, there is no reason for your friend to be copying entire pages worth of information, especially not 100 or more. To research effectively, she should find the relevant facts and statistics from the page, and then quote the source that it was taken from using the appropriate format.

Could you elaborate more on it please, I might be misunderstanding or it does seem that she is plagerising/researching incorrectly.

0

Thanks,

She is comparing the text from various websites to find common/different themes, messages, patterns etc.. in the presentation of a particular subject. She has a program she is familiar with for doing the text analysis but it requires either a text document or Word document input.

Edited by Agilemind

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.