0

Hi All,

I'm trying to remove the <head> tag and contents from some html but I'm having no luck. The <head> portion of the html is:

<head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><title>Resume (Origin design)</title><style><!--
/* Font Definitions */
@font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
    {font-family:"Gill Sans MT";
    panose-1:2 11 5 2 2 1 4 2 2 3;}
@font-face
    {font-family:"Bookman Old Style";
    panose-1:2 5 6 4 5 5 5 2 2 4;}
@font-face
    {font-family:"Wingdings 3";
    panose-1:5 4 1 2 1 8 7 7 7 7;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0cm;
    margin-bottom:.0001pt;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
    {mso-style-priority:99;
    color:blue;
    text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
    {mso-style-priority:99;
    color:purple;
    text-decoration:underline;}
p
    {mso-style-priority:99;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.boldtext, li.boldtext, div.boldtext
    {mso-style-name:boldtext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    font-weight:bold;}
p.boxbackground, li.boxbackground, div.boxbackground
    {mso-style-name:boxbackground;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    background:#E2EFFC;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.fatlink, li.fatlink, div.fatlink
    {mso-style-name:fatlink;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:7.5pt;
    font-family:"Times New Roman",serif;
    color:#0066CC;
    font-weight:bold;}
p.orange, li.orange, div.orange
    {mso-style-name:orange;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    color:#F09261;}
p.sectionheading, li.sectionheading, div.sectionheading
    {mso-style-name:sectionheading;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    background:#FF6600;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.smallboldtext, li.smallboldtext, div.smallboldtext
    {mso-style-name:smallboldtext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:7.5pt;
    font-family:"Times New Roman",serif;
    font-weight:bold;}
p.smalltext, li.smalltext, div.smalltext
    {mso-style-name:smalltext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:8.5pt;
    font-family:"Times New Roman",serif;}
p.whitebold, li.whitebold, div.whitebold
    {mso-style-name:whitebold;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    color:white;
    font-weight:bold;}
p.xslnormboldtext, li.xslnormboldtext, div.xslnormboldtext
    {mso-style-name:xslnormboldtext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    font-weight:bold;}
p.xslnormitalictext, li.xslnormitalictext, div.xslnormitalictext
    {mso-style-name:xslnormitalictext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    font-style:italic;}
p.xslnormtext, li.xslnormtext, div.xslnormtext
    {mso-style-name:xslnormtext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.xslreditalictext, li.xslreditalictext, div.xslreditalictext
    {mso-style-name:xslreditalictext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    color:red;
    font-style:italic;}
p.xslresponsetext, li.xslresponsetext, div.xslresponsetext
    {mso-style-name:xslresponsetext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.xslscreeningkb, li.xslscreeningkb, div.xslscreeningkb
    {mso-style-name:xslscreeningkb;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:15.0pt;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.xslsummarytable, li.xslsummarytable, div.xslsummarytable
    {mso-style-name:xslsummarytable;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
span.xsltealtext
    {mso-style-name:xsltealtext;}
span.xslgraytext
    {mso-style-name:xslgraytext;}
span.boldtext1
    {mso-style-name:boldtext1;
    font-weight:bold;}
span.whitebold1
    {mso-style-name:whitebold1;
    color:white;
    font-weight:bold;}
span.orange1
    {mso-style-name:orange1;
    color:#F09261;}
span.EmailStyle39
    {mso-style-type:personal-reply;
    font-family:"Calibri",sans-serif;
    color:#1F497D;}
.MsoChpDefault
    {mso-style-type:export-only;
    font-size:10.0pt;}
@page WordSection1
    {size:612.0pt 792.0pt;
    margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
    {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head>

So far I've tried:

$string = preg_replace('`(<head>)(.*?)(</head>)`', '', $string);

with no affect whatsoever. I've tried variations of this but the closing tag doesn't seem to be picking up.

I've tried using the DOMDocument but the loadHTML method doesn't like the XML in the <head>.

My last attempt was to use:

$start='<head>';
$end='</head>';

if($pos1=stripos($string,$start)!==false){
    if($pos2=stripos($string,$end)!==false){
        $string=substr_replace($string,'',$pos1,$pos2-$pos1+strlen($end));
    }
}

but that failed also. It's giving me $pos1=1 and $pos2=1 which makes no sense to me as $pos1 should surely be 0 and $pos2 whatever. But it says it's finding them.

I'm confused.

Thanks for any help offered.

3
Contributors
5
Replies
23
Views
2 Years
Discussion Span
Last Post by diafol
0

preg functions aren't really cut out for html. You need to escape special characters like < and > and literal /.

ALternatively you could use something like..

$start = strpos($string, '<head>') + 6; //(or is it 7?)
$end = strpos($string, '</head>');
$length = $end-$start;
$string = substr_replace($string, '', $start, $length);

Not tested. SO you'd probably have to tinker a bit.

0

Thanks Diafol, but I tried using strpos stuff and it came out as I explained above.

0

Ah. My mistake. No need to escape. I hate regex :) heh heh.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.