I want to be able to get the html of many different sites and parse them to pull the article text from them, it cant be specific for one site as i want to use it form many sites, what would be the best way of doing this?
Answered by diafol 3,720 in a post from
Jump to Post
First, make sure you have permission to use data from these sites.
Once that's cleared up you have a number of options.
Those sites may have an API/XML/RSS/REST for extracting data. If so - use that - it will be more reliable that trying to scrape data from a page.
All 3 Replies
Be a part of the DaniWeb community
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts learning and sharing knowledge.