I want to be able to get the html of many different sites and parse them to pull the article text from them, it cant be specific for one site as i want to use it form many sites, what would be the best way of doing this?
First, make sure you have permission to use data from these sites.
Once that's cleared up you have a number of options.
Those sites may have an API/XML/RSS/REST for extracting data. If so - use that - it will be more reliable that trying to scrape data from a page.
Otherwise, you'll be looking to use file_get_contents or cURL. BUT beware - this could be a potential security issue unless you lock down and parse all the data properly.
Extracting data from remote sites will slow down your page significantly. Think about how many requests you really NEED to make. BTW - images etc - although a site may give you permission to use data - it may not allow you to use images, the licensing rights to which, may not be held by site owner. Take care. Check the small print for syndication. Some RSS feeds that I'm allowed to reproduce on my site stipulate that only a certain number of articles (and then only the abstracts) may be reproduced.