hello all,

Now my question is - can i apply the code on the part of the board.
In order to get a "Copy" of the board with category 17 and category
3 .... see here

=http://www.nukeforums.com/forums/viewforum.php?f=17
=http://www.nukeforums.com/forums/viewforum.php?f=3
readers from here i look forward to hear from you

Let's talk about perl since this is a perl forum:

#!/usr/bin/perl
use strict;
use warnings;
use LWP::RobotUA;
use HTML::LinkExtor;
use HTML::TokeParser;
use URI::URL;
use Data::Dumper; # for show and troubleshooting
my $url = "[URL]http://www.nukeforums.com/forums/viewforum.php?f=17[/URL]";
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);
my @links;
get_threads($url);
foreach my $page (@links) { # this loops over each link collected from the index
 my $r = $ua->get($page);
 if ($r->is_success) {
  my $stream = HTML::TokeParser->new(\$r->content) or die "Parse error in $page: $!";
  # just printing what was collected
  print Dumper get_thread($stream);
  # would instead have database insert statement at this point
  } else {
  warn $r->status_line;
  }
}
sub get_thread {
 my $p = shift;
 my ($title, $name, @thread);
 while (my $tag = $p->get_tag('a','span')) {
  if (exists $tag->[1]{'class'}) {
   if ($tag->[0] eq 'span') {
    if ($tag->[1]{'class'} eq 'name') {
     $name = $p->get_trimmed_text('/span');
    } elsif ($tag->[1]{'class'} eq 'postbody') {
     my $post = $p->get_trimmed_text('/span');
     push @thread, {'name'=>$name, 'post'=>$post};
    }
   } else {
    if ($tag->[1]{'class'} eq 'maintitle') {
     $title = $p->get_trimmed_text('/a');
    }
   }
  }
 }
 return {'title'=>$title, 'thread'=>\@thread};
}
sub get_threads {
 my $page = shift;
 my $r = $ua->request(HTTP::Request->new(GET => $url), sub {$lp->parse($_[0])});
 # Expand URLs to absolute ones
 my $base = $r->base;
 return [map { $_ = url($_, $base)->abs; } @links];
}
sub wanted_links {
 my($tag, %attr) = @_;
 return unless exists $attr{'href'};
 return if $attr{'href'} !~ /^viewtopic\.php\?t=/;
 push @links, values %attr;
}

If you have the necessary modules installed, and run it from the command line you'll see output such as the following:

$VAR1 = {
          'thread' => [
                        {
                          'post' => 'Hello, I\'m pretty new to PHPNuke. I\'ve got my site up and running great! I\'m now starting to make modifications, add modules etc. I\'m using the most recent RavenPHP76. I want to display the 5 most recent forum posts at the top of the forum page. I\'m not sure if this functionality is built in, if so, how to activate. Or if there is a module or block made to do this. I looked at Raven\'s Collapsing Forum block but wasn\'t crazy about the format, and I don\'t want it to be collapsable. Thanks! mopho',
                          'name' => 'mopho'
                        },
                        {
                          'post' => 'hi there',
                          'name' => 'sail'
                        },
                        {
                          'post' => 'thanks for asking this; :not very sure if i got you right; Do you want to have a feed of the last forumthreads? guess the easiest way is to go to raven and ask how he did it. hth sail.',
                          'name' => 'sail'
                        },
                        {
                          'post' => 'Thanks. i found what I was looking for. It wasn\'t so easy to find! It\'s called glance_mod. mopho',
                          'name' => 'mopho'
                        },
                        {
                          'post' => 'hi there thx',
                          'name' => 'sail'
                        },
                        {
                          'post' => 'it sound interesting - i will have also a look i google after it - and try to find out more regards sailor',
                          'name' => 'sail'
                        }
                      ],
          'title' => 'Recent Forum Posts Module'
        };

This is really preliminary. It just grabs the basic text from the threads and doesn't handle the quoted text right yet. I don't think that would be hard to fix. There are many parsing approaches that can be taken in perl, I just don't have more time tonight.
You obviously also have to set up a database to capture information you want to store.
Additionally, I just looped over the first index page, I didn't set up a loop to grab each of the index pages but I consider that trivial.
Continue with perl, or use some other language. There will not be a ready made product to take exactly what you want from the web. You will have to make a little effort no matter what method you use.

, this is a super: this is obviously a great idea that is written here. Now my question is -
can i apply the code on the part of the board. In order to get
a "Copy" of the board with category 17 and category 3 ....

=http://www.nukeforums.com/forums/viewforum.php?f=17
=http://www.nukeforums.com/forums/viewforum.php?f=3
Can this be done with the code written above?!

well i am very happy,
the demonstration is very imressive - and makes me thinking that Perl is very very powerful.
I will try to harvest this category of the Forum (note those both categories are of my
interest nothing more:
=http://www.nukeforums.com/forums/viewforum.php?f=3
=http://www.nukeforums.com/forums/viewforum.php?f=17

i want to discuss a little change here. The minimal change consists of changing

my $url = "[URL]http://www.nukeforums.com/forums/viewforum.php?f=17[/URL]";
my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);
my @links;
get_threads($url);
foreach my $page (@links) {
    ...
}

to

my $ua = LWP::RobotUA->new;
my $lp = HTML::LinkExtor->new(\&wanted_links);
my @links;
foreach my $forum_id (17, 3) {
    my $url = "[URL]http://www.nukeforums.com/forums/viewforum.php?f=$forum[/URL]
+_id";
    @links = ();  # yuck!
    my $links = get_threads($url);
    foreach my $page (@$links) {
        ...
    }
}

As i want to show, i change the use of the global variable @links.
We're forced to provide and initialize a variable that should be local to get_threads. Here's the fix:

#!/usr/bin/perl
use strict;
use warnings;
use LWP::RobotUA;
use HTML::LinkExtor;
use HTML::TokeParser;
use URI::URL;
use Data::Dumper; # for show and troubleshooting
my $ua = LWP::RobotUA->new();
foreach my $forum_id (17, 3) {
    my $url = "[URL]http://www.nukeforums.com/forums/viewforum.php?f=$forum[/URL]
+_id";
    my $links = get_threads($url);
    foreach my $page (@$links) {
        ...
    }
}
sub get_thread {
    ...
}
sub get_threads {
    my $page = shift;
    my @links;
    my $lp = HTML::LinkExtor->new(sub {
        my($tag, %attr) = @_;
        return unless exists $attr{'href'};
        return if $attr{'href'} !~ /^viewtopic\.php\?t=/;
        push @links, values %attr;
    });
    my $request = HTTP::Request->new(GET => $url);
    my $response = $ua->request($request, sub {$lp->parse($_[0])});
    # Expand URLs to absolute ones
    my $base = $response->base;
    return [ map { url($_, $base)->abs } @links ];

Discussion:
with that changes i am able to run the code agains the full category.

=http://www.nukeforums.com/forums/viewforum.php?f=3
=http://www.nukeforums.com/forums/viewforum.php?f=17


Question - am i able to get the results of the above mentionde forum categories - and can i get the forum threads that are stored in the two above forums.... i love to hear from you. And all the other readers from here] i look forward to hear from you

i really look forward to hear from you

regards

:D

your post is really much too long, probably nobody is going to read all that. My questiojn to you is have you tried the code? Did it work or not? Any error messages if it did not work?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.