help with this scrap please

Question

DjFumon 0 Newbie Poster

11 Years Ago

Hello, i have this code for extract the categories of the movies in one site:

<?php 

$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>

this is the original source:

<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top"><a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a><a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a><a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a><a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a></td>
        </tr>

but this is the result:

<a href="http://www.yaske.net/es/peliculas/genero/drama">Drama</a><a href="http://www.yaske.net/es/peliculas/genero/action">Accion</a><a href="http://www.yaske.net/es/peliculas/genero/biography">Biografias</a><a href="http://www.yaske.net/es/peliculas/genero/sport">Deporte</a>

i need only this:

Drama, Accion, Biografias, Deporte

i hope you can help me.

php

3 Contributors
18 Replies
428 Views
3 Days Discussion Span
Latest Post 11 Years Ago Latest Post by cereal

All 18 Replies

cereal 1,524 Nearly a Senior Poster

11 Years Ago

The cut_str() is a user defined function, so if you want to use it you have to show us your code, otherwise use preg_match_all(), here's an example:

function scrapit($data)
{
        preg_match_all('/[<a[^>]*?>(.*)<\/a>/i', $data, $matches);

        if($matches === false || count($matches) == 0) return false;
        return implode(', ', $matches[1]);
}

To test it:

$html = <<<EOT
        <tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">
                <a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a>
                <a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a>
                <a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a>
                <a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a>
        </td>
        </tr>
EOT;


print_r(scrapit($html));

It will output:

Drama, Accion, Biografias, Deporte

Docs: http://php.net/manual/en/function.preg-match-all.php

Edited 11 Years Ago by cereal

cereal 1,524 Nearly a Senior Poster

11 Years Ago

As I wrote:

The cut_str() is a user defined function, so if you want to use it you have to show us your code

To be more accurate, show the code that defines cut_str() not the usage, i.e.:

function cut_str()
{
    # show us this code
}

Also you could use strip_tags() around it, for example:

echo strip_tags($jm_anime_genero);

cereal 1,524 Nearly a Senior Poster

11 Years Ago

Do you have something like include or require in top of your script? Check in those files.

cereal 1,524 Nearly a Senior Poster

11 Years Ago

I would like as AG would, but we still don't know what it does cut_str(), you're showing us the application use, not the source code of that function.

This is source that we don't know and that only you can check:

function cut_str($arg1, $arg2)
{
    # do the magic here
}

This is application:

echo cut_str($pagina, $codetomatch);

We need to see the source. Have you enabled some PHP module that gives you extra functions?

If you run this code:

$functions = get_defined_functions();
echo 'User Defined Functions:' . PHP_EOL;
print_r($functions['user']);
echo 'Included files:' . PHP_EOL;
print_r(get_included_files());
echo 'Loaded Extensions:' . PHP_EOL;
$exts = get_loaded_extensions();
natcasesort($exts);
print_r($exts);

What do you get?

A part from the above, your function tries to match this:

<tr>
    <td width="133" align="right" valign="top">Genero : </td>
    <td width="329" align="left" valign="top"><a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a><a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a><a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a><a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a></td>
</tr>

the <td> that contains the links is only one, I think here the match should be repeated, the first time to get the <td> contents, the second to strip the <a> tags, but an easy and rough example is this:

<?php

$url = "http://www.yaske.net/es/pelicula/0003843/ver-rush-online.html";
$pagina = file_get_contents($url);
preg_match_all('/<tr>\s*<td[^>]*?>Genero\s*:\s*<\/td>\s*<td[^>]*?>(.*)<\/td>\s*<\/tr>/i', $pagina, $links);

if(count(array_filter($links)) == 0) die('Error!');

$links = array_filter(explode('</a>', $links[1][0]));
$links = implode(', ', array_map('strip_tags', $links));
echo $links;

That outputs:

Drama, Accion, Biografias, Deporte

cereal 1,524 Nearly a Senior Poster

11 Years Ago

Ok, if we are done here, please mark it solved.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

DjFumon 0 Newbie Poster · Answer 1 · 2013-12-31T19:00:59+00:00

not work, i have this code:

<?php 
$url = 'http://www.example.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));
$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>

categories change every movie.

DjFumon 0 Newbie Poster · Answer 2 · 2013-12-31T20:46:17+00:00

no understand, i need scrap Drama, Accion, Biografias, Deporte from http://www.yaske.net/es/pelicula/0003843/ver-rush-online.html

Assembly Guy 72 Posting Whiz · Answer 3 · 2013-12-31T23:12:41+00:00

The cut_str() isn't included in the standard PHP installation, it's probably been written by you or someone else, in a PHP source file. Nobody else has a copy of this function's code because it only exists in your code. Please post the cut_str function's code.

DjFumon 0 Newbie Poster · Answer 4 · 2013-12-31T23:19:29+00:00

$jm_anime_genero = cut_str($pagina, '<tr>
<td width="133" align="right" valign="top">Genero : </td>
<td width="329" align="left" valign="top">', '</td>
</tr>');

DjFumon 0 Newbie Poster · Answer 5 · 2013-12-31T23:52:52+00:00

DjFumon 0 Newbie Poster

11 Years Ago

no have any include or require

Assembly Guy 72 Posting Whiz · Answer 6 · 2013-12-31T23:55:19+00:00

Assembly Guy 72 Posting Whiz

11 Years Ago

Please post all of the code in the PHP file :)

DjFumon 0 Newbie Poster · Answer 7 · 2014-01-01T00:05:51+00:00

<?php 
$url = 'http://www.yaske.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));
$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>

Assembly Guy 72 Posting Whiz · Answer 8 · 2014-01-01T00:10:42+00:00

Assembly Guy 72 Posting Whiz

11 Years Ago

Are you using a PHP framework like CodeIgniter or anything similar?

Edited 11 Years Ago by Assembly Guy

DjFumon 0 Newbie Poster · Answer 9 · 2014-01-01T00:22:29+00:00

DjFumon 0 Newbie Poster

11 Years Ago

no :S

Assembly Guy 72 Posting Whiz · Answer 10 · 2014-01-01T00:26:07+00:00

Well then cut_str() hasn't been defined, and you therefore cannot use it. Look at Cereal's first reply to this thread, he's covered the method you need.

DjFumon 0 Newbie Poster · Answer 11 · 2014-01-01T10:43:06+00:00

I have implemented the method you told me @cereal. But have a small problem:

$link=$_GET['jm'];
$url = 'http://www.yaske.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));


$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

preg_match_all('/[<a[^>]*?>(.*)<\/a>/i', $jm_anime_genero, $matches);
if($matches === false || count($matches) == 0) return false;


echo implode(', ', $matches[1]);

output:

Drama<a href="http://www.yaske.net/es/peliculas/genero/action">Accion</a><a href="http://www.yaske.net/es/peliculas/genero/biography">Biografias</a><a href="http://www.yaske.net/es/peliculas/genero/sport">Deporte

DjFumon 0 Newbie Poster · Answer 12 · 2014-01-02T11:56:40+00:00

DjFumon 0 Newbie Poster

11 Years Ago

any can help me please?

DjFumon 0 Newbie Poster · Answer 13 · 2014-01-02T16:51:19+00:00

DjFumon 0 Newbie Poster

11 Years Ago

thanks mate works perfecty

help with this scrap please

Recommended Answers Collapse Answers

All 18 Replies

Recommended Answers