0

Hello, i have this code for extract the categories of the movies in one site:

<?php 

$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>

this is the original source:

<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top"><a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a><a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a><a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a><a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a></td>
        </tr>

but this is the result:

<a href="http://www.yaske.net/es/peliculas/genero/drama">Drama</a><a href="http://www.yaske.net/es/peliculas/genero/action">Accion</a><a href="http://www.yaske.net/es/peliculas/genero/biography">Biografias</a><a href="http://www.yaske.net/es/peliculas/genero/sport">Deporte</a>

i need only this:

Drama, Accion, Biografias, Deporte

i hope you can help me.

3
Contributors
18
Replies
126
Views
3 Years
Discussion Span
Last Post by cereal
0

The cut_str() is a user defined function, so if you want to use it you have to show us your code, otherwise use preg_match_all(), here's an example:

function scrapit($data)
{
        preg_match_all('/[<a[^>]*?>(.*)<\/a>/i', $data, $matches);

        if($matches === false || count($matches) == 0) return false;
        return implode(', ', $matches[1]);
}

To test it:

$html = <<<EOT
        <tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">
                <a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a>
                <a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a>
                <a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a>
                <a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a>
        </td>
        </tr>
EOT;


print_r(scrapit($html));

It will output:

Drama, Accion, Biografias, Deporte

Docs: http://php.net/manual/en/function.preg-match-all.php

Edited by cereal

0

not work, i have this code:

<?php 
$url = 'http://www.example.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));
$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>

categories change every movie.

Edited by DjFumon

0

As I wrote:

The cut_str() is a user defined function, so if you want to use it you have to show us your code

To be more accurate, show the code that defines cut_str() not the usage, i.e.:

function cut_str()
{
    # show us this code
}

Also you could use strip_tags() around it, for example:

echo strip_tags($jm_anime_genero);
0

The cut_str() isn't included in the standard PHP installation, it's probably been written by you or someone else, in a PHP source file. Nobody else has a copy of this function's code because it only exists in your code. Please post the cut_str function's code.

0

$jm_anime_genero = cut_str($pagina, '<tr>
<td width="133" align="right" valign="top">Genero : </td>
<td width="329" align="left" valign="top">', '</td>
</tr>');

0
<?php 
$url = 'http://www.yaske.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));
$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>
0

Well then cut_str() hasn't been defined, and you therefore cannot use it. Look at Cereal's first reply to this thread, he's covered the method you need.

0

I have implemented the method you told me @cereal. But have a small problem:

$link=$_GET['jm'];
$url = 'http://www.yaske.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));


$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

preg_match_all('/[<a[^>]*?>(.*)<\/a>/i', $jm_anime_genero, $matches);
if($matches === false || count($matches) == 0) return false;


echo implode(', ', $matches[1]);

output:

Drama<a href="http://www.yaske.net/es/peliculas/genero/action">Accion</a><a href="http://www.yaske.net/es/peliculas/genero/biography">Biografias</a><a href="http://www.yaske.net/es/peliculas/genero/sport">Deporte
0

I would like as AG would, but we still don't know what it does cut_str(), you're showing us the application use, not the source code of that function.

This is source that we don't know and that only you can check:

function cut_str($arg1, $arg2)
{
    # do the magic here
}

This is application:

echo cut_str($pagina, $codetomatch);

We need to see the source. Have you enabled some PHP module that gives you extra functions?

If you run this code:

$functions = get_defined_functions();
echo 'User Defined Functions:' . PHP_EOL;
print_r($functions['user']);
echo 'Included files:' . PHP_EOL;
print_r(get_included_files());
echo 'Loaded Extensions:' . PHP_EOL;
$exts = get_loaded_extensions();
natcasesort($exts);
print_r($exts);

What do you get?

A part from the above, your function tries to match this:

<tr>
    <td width="133" align="right" valign="top">Genero : </td>
    <td width="329" align="left" valign="top"><a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a><a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a><a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a><a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a></td>
</tr>

the <td> that contains the links is only one, I think here the match should be repeated, the first time to get the <td> contents, the second to strip the <a> tags, but an easy and rough example is this:

<?php

$url = "http://www.yaske.net/es/pelicula/0003843/ver-rush-online.html";
$pagina = file_get_contents($url);
preg_match_all('/<tr>\s*<td[^>]*?>Genero\s*:\s*<\/td>\s*<td[^>]*?>(.*)<\/td>\s*<\/tr>/i', $pagina, $links);

if(count(array_filter($links)) == 0) die('Error!');

$links = array_filter(explode('</a>', $links[1][0]));
$links = implode(', ', array_map('strip_tags', $links));
echo $links;

That outputs:

Drama, Accion, Biografias, Deporte
This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.