| | |
RSS feeds to PDF
Please support our RSS, Web Services and SOAP advertiser: PostgreSQL or MySQL? Compare and contrast the two most popular open source databases
![]() |
•
•
Join Date: Jun 2009
Posts: 1
Reputation:
Solved Threads: 0
I want to design a website kind of like "Feedjournal" and "Tabbloid" that will get important rss feeds and blogs to make a newsletter for our company, and then auto email to all staff on set time intervals. I am very new and want to know how to start. What program language should I use? PHP? Any suggestions on how to start?
You might like to look at XSLT. It will act directly on RSS feeds which are just XML conformng to a particular schema.
XSLT is very powerful and can be applied in more than just the normal browser environment. I think you will need to run it server-side along with something that knows how to form PDFs.
quick Google .....
Here you are, FOP from the Apache Project - open source:
http://www.onjava.com/pub/a/onjava/2002/10/16/fop.html
Have fun - it won't be easy - but at least there are sample files to support that article.
Airshow
XSLT is very powerful and can be applied in more than just the normal browser environment. I think you will need to run it server-side along with something that knows how to form PDFs.
quick Google .....
Here you are, FOP from the Apache Project - open source:
http://www.onjava.com/pub/a/onjava/2002/10/16/fop.html
Have fun - it won't be easy - but at least there are sample files to support that article.
Airshow
50% of the solution lies in accurately describing the problem!
•
•
Join Date: Jul 2009
Posts: 2
Reputation:
Solved Threads: 0
I wanted to do something similar a few months ago and I actually wrote my own free software/open source service in PHP. You can try it out here: http://fivefilters.org/pdf-newspaper/
Source code is available to download and you can modify it however you like. Hope that helps.
Keyvan
Source code is available to download and you can modify it however you like. Hope that helps.
Keyvan
Looks like a good piece of work Keyvan. The OP could wish for nothing more.
I have bookmarkd for future reference.
From your list of libraries, I guess it's even more complex than I indicated in my post above.
Could you possible run through the libs to indicate what each one does in this app please? Having done some RSS work myself (RSS to HTML), I am very interested.
Airshow
I have bookmarkd for future reference.
From your list of libraries, I guess it's even more complex than I indicated in my post above.
Could you possible run through the libs to indicate what each one does in this app please? Having done some RSS work myself (RSS to HTML), I am very interested.
Airshow
50% of the solution lies in accurately describing the problem!
•
•
Join Date: Jul 2009
Posts: 2
Reputation:
Solved Threads: 0
Thanks Airshow,
The starting point was looking for a free PDF library that had some HTML support. I settled on TCPDF as it seems to be updated fairly regularly and had an example file showing multi-column support (example number 10). Its support for HTML is limited though - you have to pass it well formed XHTML and it only handles a small set of elements. I was happy with it though as I was only interested in a few elements - the goal wasn't to create a PDF showing the content as it appeared on the original site.
So a lot of the initial work is actually turning the RSS content to clean XHTML. So for each feed item I run its content through HTML Tidy first. The next step is to remove HTML elements that I have no need for, for that I use HTML Purifier - I give it a list of HTML elements and attributes I can deal with and it strips the rest. I then pass the result through SmartyPants to turn the punctuation into a somewhat prettier form (e.g. curly quotes, ellipsis, en- em-dashes).
When I've gone through the items I want, I simply pass the XHTML to TCPDF (I've extended the multi-column example for better spacing and formatting) and it does the rest. :)
As for the other libraries: I use SimplePie to parse the feed and loop through the items and OPML parser to give users the option of submitting multiple feeds.
If you want to see how all the libraries are used, the makepdf.php source ties them all together.
The starting point was looking for a free PDF library that had some HTML support. I settled on TCPDF as it seems to be updated fairly regularly and had an example file showing multi-column support (example number 10). Its support for HTML is limited though - you have to pass it well formed XHTML and it only handles a small set of elements. I was happy with it though as I was only interested in a few elements - the goal wasn't to create a PDF showing the content as it appeared on the original site.
So a lot of the initial work is actually turning the RSS content to clean XHTML. So for each feed item I run its content through HTML Tidy first. The next step is to remove HTML elements that I have no need for, for that I use HTML Purifier - I give it a list of HTML elements and attributes I can deal with and it strips the rest. I then pass the result through SmartyPants to turn the punctuation into a somewhat prettier form (e.g. curly quotes, ellipsis, en- em-dashes).
When I've gone through the items I want, I simply pass the XHTML to TCPDF (I've extended the multi-column example for better spacing and formatting) and it does the rest. :)
As for the other libraries: I use SimplePie to parse the feed and loop through the items and OPML parser to give users the option of submitting multiple feeds.
If you want to see how all the libraries are used, the makepdf.php source ties them all together.
That's great Keyvan. Thank you very much.
I wish I had known about these libs when I wrote my RSS reader (as yet unpublished). I did everything in php and I know that some of it could be done a lot better.
I need to discuss more but want not to hijack splower's topic. I will send a PM later today.
Good luck with your project splower.
Must rush, work beckons.
Airshow
I wish I had known about these libs when I wrote my RSS reader (as yet unpublished). I did everything in php and I know that some of it could be done a lot better.
I need to discuss more but want not to hijack splower's topic. I will send a PM later today.
Good luck with your project splower.
Must rush, work beckons.
Airshow
50% of the solution lies in accurately describing the problem!
![]() |
Similar Threads
- Where are the good Java RSS Feeds? (IT Professionals' Lounge)
- RSS Feeds (DaniWeb Community Feedback)
- RSS feeds (Web Browsers)
- Finding updated rss feeds (ASP.NET)
- Waht Is Rss Feeds (PHP)
- RSS Feeds Question (PHP)
Other Threads in the RSS, Web Services and SOAP Forum
- Previous Thread: Is it possible to code inside XML file for an RSS Feed?
- Next Thread: Extracting tag attributes
| Thread Tools | Search this Thread |
.htaccess 301 accept access alltop api authentication binarysecuritytoken blog card collaboration credit data development ebay email evernote flash google government highrise htaccess intel internet legal live netbeans patent paypal php podcast proxy redirect rss rssfeeds searchmonkey server service soap software swappingxmlfromflash swappingxmlnodes url web webservices webservicesecurity wiki wikipedia xerces xml xslt y!os yahoo ydn





