hello all,
Now my question is - can i apply the code on the part of the board.
In order to get a "Copy" of the board with category 17 and category
3 .... see here
=http://www.nukeforums.com/forums/viewforum.php?f=17
=http://www.nukeforums.com/forums/viewforum.php?f=3
readers from here i look forward to hear from you
Let's talk about perl since this is a perl forum:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::RobotUA;
use HTML::LinkExtor;
use HTML::TokeParser;
use URI::URL;
use Data::Dumper; # for show and troubleshooting
my $url = "http://www.nukeforums.com/forums/viewforum.php?f=17";
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);
my @links;
get_threads($url);
foreach my $page (@links) { # this loops over each link collected from the index
my $r = $ua->get($page);
if ($r->is_success) {
my $stream = HTML::TokeParser->new(\$r->content) or die "Parse error in $page: $!";
# just printing what was collected
print Dumper get_thread($stream);
# would instead have database insert statement at this point
} else {
warn $r->status_line;
}
}
sub get_thread {
my $p = shift;
my ($title, $name, @thread);
while (my $tag = $p->get_tag('a','span')) {
if (exists $tag->[1]{'class'}) {
if ($tag->[0] eq 'span') {
if ($tag->[1]{'class'} eq 'name') {
$name = $p->get_trimmed_text('/span');
} elsif ($tag->[1]{'class'} eq 'postbody') {
my $post = $p->get_trimmed_text('/span');
push @thread, {'name'=>$name, 'post'=>$post};
}
} else {
if ($tag->[1]{'class'} eq 'maintitle') {
$title = $p->get_trimmed_text('/a');
}
}
}
}
return {'title'=>$title, 'thread'=>\@thread};
}
sub get_threads {
my $page = shift;
my $r = $ua->request(HTTP::Request->new(GET => $url), sub {$lp->parse($_[0])});
# Expand URLs to absolute ones
my $base = $r->base;
return [map { $_ = url($_, $base)->abs; } @links];
}
sub wanted_links {
my($tag, %attr) = @_;
return unless exists $attr{'href'};
return if $attr{'href'} !~ /^viewtopic\.php\?t=/;
push @links, values %attr;
} If you have the necessary modules installed, and run it from the command line you'll see output such as the following:
$VAR1 = {
'thread' => [
{
'post' => 'Hello, I\'m pretty new to PHPNuke. I\'ve got my site up and running great! I\'m now starting to make modifications, add modules etc. I\'m using the most recent RavenPHP76. I want to display the 5 most recent forum posts at the top of the forum page. I\'m not sure if this functionality is built in, if so, how to activate. Or if there is a module or block made to do this. I looked at Raven\'s Collapsing Forum block but wasn\'t crazy about the format, and I don\'t want it to be collapsable. Thanks! mopho',
'name' => 'mopho'
},
{
'post' => 'hi there',
'name' => 'sail'
},
{
'post' => 'thanks for asking this; :not very sure if i got you right; Do you want to have a feed of the last forumthreads? guess the easiest way is to go to raven and ask how he did it. hth sail.',
'name' => 'sail'
},
{
'post' => 'Thanks. i found what I was looking for. It wasn\'t so easy to find! It\'s called glance_mod. mopho',
'name' => 'mopho'
},
{
'post' => 'hi there thx',
'name' => 'sail'
},
{
'post' => 'it sound interesting - i will have also a look i google after it - and try to find out more regards sailor',
'name' => 'sail'
}
],
'title' => 'Recent Forum Posts Module'
};
This is really preliminary. It just grabs the basic text from the threads and doesn't handle the quoted text right yet. I don't think that would be hard to fix. There are many parsing approaches that can be taken in perl, I just don't have more time tonight.
You obviously also have to set up a database to capture information you want to store.
Additionally, I just looped over the first index page, I didn't set up a loop to grab each of the index pages but I consider that trivial.
Continue with perl, or use some other language. There will not be a ready made product to take exactly what you want from the web. You will have to make a little effort no matter what method you use.
, this is a super: this is obviously a great idea that is written here. Now my question is -
can i
apply the code on the part of the board. In order to get
a "Copy" of the board with category 17 and category 3 ....
=http://www.nukeforums.com/forums/viewforum.php?f=17
=http://www.nukeforums.com/forums/viewforum.php?f=3
Can this be done with the code written above?!
well i am very happy,
the demonstration is very imressive - and makes me thinking that Perl is very very powerful.
I will try to harvest this category of the Forum (note those both categories are of my
interest nothing more:
=http://www.nukeforums.com/forums/viewforum.php?f=3
=http://www.nukeforums.com/forums/viewforum.php?f=17
i want to discuss a little change here. The minimal change consists of changing
my $url = "<a rel="nofollow" href="http://www.nukeforums.com/forums/viewforum.php?f=17" target="_blank">http://www.nukeforums.com/forums/viewforum.php?f=17</a>";
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);
my @links;
get_threads($url);
foreach my $page (@links) {
...
}
to
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);
my @links;
foreach my $forum_id (17, 3) {
my $url = "<a rel="nofollow" href="http://www.nukeforums.com/forums/viewforum.php?f=$forum" target="_blank">http://www.nukeforums.com/forums/viewforum.php?f=$forum</a>
+_id";
@links = (); # yuck!
my $links = get_threads($url);
foreach my $page (@$links) {
...
}
}
As i want to show, i change the use of the global variable @links.
We're forced to provide and initialize a variable that should be local to get_threads. Here's the fix:
#!/usr/bin/perl
use strict;
use warnings;
use LWP::RobotUA;
use HTML::LinkExtor;
use HTML::TokeParser;
use URI::URL;
use Data::Dumper; # for show and troubleshooting
my $ua = LWP::RobotUA->new();
foreach my $forum_id (17, 3) {
my $url = "<a rel="nofollow" href="http://www.nukeforums.com/forums/viewforum.php?f=$forum" target="_blank">http://www.nukeforums.com/forums/viewforum.php?f=$forum</a>
+_id";
my $links = get_threads($url);
foreach my $page (@$links) {
...
}
}
sub get_thread {
...
}
sub get_threads {
my $page = shift;
my @links;
my $lp = HTML::LinkExtor->new(sub {
my($tag, %attr) = @_;
return unless exists $attr{'href'};
return if $attr{'href'} !~ /^viewtopic\.php\?t=/;
push @links, values %attr;
});
my $request = HTTP::Request->new(GET => $url);
my $response = $ua->request($request, sub {$lp->parse($_[0])});
# Expand URLs to absolute ones
my $base = $response->base;
return [ map { url($_, $base)->abs } @links ];
Discussion:
with that changes i am able to run the code agains the full category.
=http://www.nukeforums.com/forums/viewforum.php?f=3
=http://www.nukeforums.com/forums/viewforum.php?f=17
Question - am i able to get the results of the above mentionde forum categories - and can i get the forum threads that are stored in the two above forums.... i love to hear from you. And all the other readers from here] i look forward to hear from you
i really look forward to hear from you
regards