Hello, i have this code for extract the categories of the movies in one site:

<?php 

$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>

this is the original source:

<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top"><a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a><a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a><a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a><a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a></td>
        </tr>

but this is the result:

<a href="http://www.yaske.net/es/peliculas/genero/drama">Drama</a><a href="http://www.yaske.net/es/peliculas/genero/action">Accion</a><a href="http://www.yaske.net/es/peliculas/genero/biography">Biografias</a><a href="http://www.yaske.net/es/peliculas/genero/sport">Deporte</a>

i need only this:

Drama, Accion, Biografias, Deporte

i hope you can help me.

Recommended Answers

All 18 Replies

The cut_str() is a user defined function, so if you want to use it you have to show us your code, otherwise use preg_match_all(), here's an example:

function scrapit($data)
{
        preg_match_all('/[<a[^>]*?>(.*)<\/a>/i', $data, $matches);

        if($matches === false || count($matches) == 0) return false;
        return implode(', ', $matches[1]);
}

To test it:

$html = <<<EOT
        <tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">
                <a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a>
                <a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a>
                <a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a>
                <a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a>
        </td>
        </tr>
EOT;


print_r(scrapit($html));

It will output:

Drama, Accion, Biografias, Deporte

Docs: http://php.net/manual/en/function.preg-match-all.php

not work, i have this code:

<?php 
$url = 'http://www.example.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));
$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>

categories change every movie.

As I wrote:

The cut_str() is a user defined function, so if you want to use it you have to show us your code

To be more accurate, show the code that defines cut_str() not the usage, i.e.:

function cut_str()
{
    # show us this code
}

Also you could use strip_tags() around it, for example:

echo strip_tags($jm_anime_genero);

The cut_str() isn't included in the standard PHP installation, it's probably been written by you or someone else, in a PHP source file. Nobody else has a copy of this function's code because it only exists in your code. Please post the cut_str function's code.

$jm_anime_genero = cut_str($pagina, '<tr>
<td width="133" align="right" valign="top">Genero : </td>
<td width="329" align="left" valign="top">', '</td>
</tr>');

Do you have something like include or require in top of your script? Check in those files.

no have any include or require

Please post all of the code in the PHP file :)

<?php 
$url = 'http://www.yaske.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));
$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

echo $jm_anime_genero;

 ?>

Are you using a PHP framework like CodeIgniter or anything similar?

no :S

Well then cut_str() hasn't been defined, and you therefore cannot use it. Look at Cereal's first reply to this thread, he's covered the method you need.

I have implemented the method you told me @cereal. But have a small problem:

$link=$_GET['jm'];
$url = 'http://www.yaske.net/es/pelicula/'.$link.'.html';
$pagina = file_get_contents($url);
$pagina = utf8_decode(utf8_encode(utf8_decode($pagina)));


$jm_anime_genero = cut_str($pagina, '<tr>
        <td width="133" align="right" valign="top">Genero : </td>
        <td width="329" align="left" valign="top">', '</td>
        </tr>');

preg_match_all('/[<a[^>]*?>(.*)<\/a>/i', $jm_anime_genero, $matches);
if($matches === false || count($matches) == 0) return false;


echo implode(', ', $matches[1]);

output:

Drama<a href="http://www.yaske.net/es/peliculas/genero/action">Accion</a><a href="http://www.yaske.net/es/peliculas/genero/biography">Biografias</a><a href="http://www.yaske.net/es/peliculas/genero/sport">Deporte

any can help me please?

I would like as AG would, but we still don't know what it does cut_str(), you're showing us the application use, not the source code of that function.

This is source that we don't know and that only you can check:

function cut_str($arg1, $arg2)
{
    # do the magic here
}

This is application:

echo cut_str($pagina, $codetomatch);

We need to see the source. Have you enabled some PHP module that gives you extra functions?

If you run this code:

$functions = get_defined_functions();
echo 'User Defined Functions:' . PHP_EOL;
print_r($functions['user']);
echo 'Included files:' . PHP_EOL;
print_r(get_included_files());
echo 'Loaded Extensions:' . PHP_EOL;
$exts = get_loaded_extensions();
natcasesort($exts);
print_r($exts);

What do you get?

A part from the above, your function tries to match this:

<tr>
    <td width="133" align="right" valign="top">Genero : </td>
    <td width="329" align="left" valign="top"><a href='http://www.yaske.net/es/peliculas/genero/drama'>Drama</a><a href='http://www.yaske.net/es/peliculas/genero/action'>Accion</a><a href='http://www.yaske.net/es/peliculas/genero/biography'>Biografias</a><a href='http://www.yaske.net/es/peliculas/genero/sport'>Deporte</a></td>
</tr>

the <td> that contains the links is only one, I think here the match should be repeated, the first time to get the <td> contents, the second to strip the <a> tags, but an easy and rough example is this:

<?php

$url = "http://www.yaske.net/es/pelicula/0003843/ver-rush-online.html";
$pagina = file_get_contents($url);
preg_match_all('/<tr>\s*<td[^>]*?>Genero\s*:\s*<\/td>\s*<td[^>]*?>(.*)<\/td>\s*<\/tr>/i', $pagina, $links);

if(count(array_filter($links)) == 0) die('Error!');

$links = array_filter(explode('</a>', $links[1][0]));
$links = implode(', ', array_map('strip_tags', $links));
echo $links;

That outputs:

Drama, Accion, Biografias, Deporte

thanks mate works perfecty

Ok, if we are done here, please mark it solved.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.