Hi All,

I'm trying to remove the <head> tag and contents from some html but I'm having no luck. The <head> portion of the html is:

<head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><title>Resume (Origin design)</title><style><!--
/* Font Definitions */
@font-face
    {font-family:"Cambria Math";
    panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
    {font-family:Calibri;
    panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
    {font-family:"Gill Sans MT";
    panose-1:2 11 5 2 2 1 4 2 2 3;}
@font-face
    {font-family:"Bookman Old Style";
    panose-1:2 5 6 4 5 5 5 2 2 4;}
@font-face
    {font-family:"Wingdings 3";
    panose-1:5 4 1 2 1 8 7 7 7 7;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
    {margin:0cm;
    margin-bottom:.0001pt;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
    {mso-style-priority:99;
    color:blue;
    text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
    {mso-style-priority:99;
    color:purple;
    text-decoration:underline;}
p
    {mso-style-priority:99;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.boldtext, li.boldtext, div.boldtext
    {mso-style-name:boldtext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    font-weight:bold;}
p.boxbackground, li.boxbackground, div.boxbackground
    {mso-style-name:boxbackground;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    background:#E2EFFC;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.fatlink, li.fatlink, div.fatlink
    {mso-style-name:fatlink;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:7.5pt;
    font-family:"Times New Roman",serif;
    color:#0066CC;
    font-weight:bold;}
p.orange, li.orange, div.orange
    {mso-style-name:orange;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    color:#F09261;}
p.sectionheading, li.sectionheading, div.sectionheading
    {mso-style-name:sectionheading;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    background:#FF6600;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.smallboldtext, li.smallboldtext, div.smallboldtext
    {mso-style-name:smallboldtext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:7.5pt;
    font-family:"Times New Roman",serif;
    font-weight:bold;}
p.smalltext, li.smalltext, div.smalltext
    {mso-style-name:smalltext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:8.5pt;
    font-family:"Times New Roman",serif;}
p.whitebold, li.whitebold, div.whitebold
    {mso-style-name:whitebold;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    color:white;
    font-weight:bold;}
p.xslnormboldtext, li.xslnormboldtext, div.xslnormboldtext
    {mso-style-name:xslnormboldtext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    font-weight:bold;}
p.xslnormitalictext, li.xslnormitalictext, div.xslnormitalictext
    {mso-style-name:xslnormitalictext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    font-style:italic;}
p.xslnormtext, li.xslnormtext, div.xslnormtext
    {mso-style-name:xslnormtext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.xslreditalictext, li.xslreditalictext, div.xslreditalictext
    {mso-style-name:xslreditalictext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;
    color:red;
    font-style:italic;}
p.xslresponsetext, li.xslresponsetext, div.xslresponsetext
    {mso-style-name:xslresponsetext;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.xslscreeningkb, li.xslscreeningkb, div.xslscreeningkb
    {mso-style-name:xslscreeningkb;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:15.0pt;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
p.xslsummarytable, li.xslsummarytable, div.xslsummarytable
    {mso-style-name:xslsummarytable;
    mso-margin-top-alt:auto;
    margin-right:0cm;
    mso-margin-bottom-alt:auto;
    margin-left:0cm;
    font-size:12.0pt;
    font-family:"Times New Roman",serif;}
span.xsltealtext
    {mso-style-name:xsltealtext;}
span.xslgraytext
    {mso-style-name:xslgraytext;}
span.boldtext1
    {mso-style-name:boldtext1;
    font-weight:bold;}
span.whitebold1
    {mso-style-name:whitebold1;
    color:white;
    font-weight:bold;}
span.orange1
    {mso-style-name:orange1;
    color:#F09261;}
span.EmailStyle39
    {mso-style-type:personal-reply;
    font-family:"Calibri",sans-serif;
    color:#1F497D;}
.MsoChpDefault
    {mso-style-type:export-only;
    font-size:10.0pt;}
@page WordSection1
    {size:612.0pt 792.0pt;
    margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
    {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head>

So far I've tried:

$string = preg_replace('`(<head>)(.*?)(</head>)`', '', $string);

with no affect whatsoever. I've tried variations of this but the closing tag doesn't seem to be picking up.

I've tried using the DOMDocument but the loadHTML method doesn't like the XML in the <head>.

My last attempt was to use:

$start='<head>';
$end='</head>';

if($pos1=stripos($string,$start)!==false){
    if($pos2=stripos($string,$end)!==false){
        $string=substr_replace($string,'',$pos1,$pos2-$pos1+strlen($end));
    }
}

but that failed also. It's giving me $pos1=1 and $pos2=1 which makes no sense to me as $pos1 should surely be 0 and $pos2 whatever. But it says it's finding them.

I'm confused.

Thanks for any help offered.

Member Avatar for diafol

preg functions aren't really cut out for html. You need to escape special characters like < and > and literal /.

ALternatively you could use something like..

$start = strpos($string, '<head>') + 6; //(or is it 7?)
$end = strpos($string, '</head>');
$length = $end-$start;
$string = substr_replace($string, '', $start, $length);

Not tested. SO you'd probably have to tinker a bit.

Thanks Diafol, but I tried using strpos stuff and it came out as I explained above.

YES! You beautiful wizard Pritaeas!

Member Avatar for diafol

Ah. My mistake. No need to escape. I hate regex :) heh heh.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.