hi i have written a code for convertingo uploaded a .doc and view it as html........
i am not able to view thw first line,all the bold string are looking like ordinary string

Recommended Answers

All 4 Replies

code

$fileHandle = fopen($userDoc, "r");
    $line = @fread($fileHandle, filesize($userDoc));   
   $lines = explode(chr(0x0D),$line);
    $outtext = "";
   foreach($lines as $line_num => $thisline)
      {
 
if ($line_num >=0 && $line_num <=150 ) {
        $pos = strpos($thisline, chr(0x00));
        if (($pos !== FALSE)||(strlen($thisline)==0))
 {
 } 
else

{
 $outtext = $thisline;
    $outtext = preg_replace("/[^a-zA-Z0-9\s\,\.\-\n\r\t@\/\_\(\)]/"," ",$outtext);
 echo  "<table>";
echo  "<tr><td>" .htmlspecialchars($outtext). "</td></tr>";
 echo  "</table>";
}
}
}

could you post some example output?

as it is you're putting everything in htmlspecialchars($outtext) so it won't be formatted.
To convert the formatting to HTML formatting, you'll have to know the doc formatting syntax (it's probably version dependent). Then convert each doc formatting into the equivalent HTML formatting.

A program that handles .doc files pretty well and is open source is OpenOffice. Its Java I believe. You can browse the source code to see just how they do it.. though it may be abstracted a bit so any references you can find on the .doc formatting would probably get you there faster.

hi
i have attached the code i the first file and he output in the second file.In the output file..
the expected output is 139 lines only but it displaying junk values

Hi,
I was trying to read MS Word documents but without good results, cause those strange characters.

I then started looking for something on google and I found your code above.
After some changes, I managed to read the first line and remove the junk at the end of the document.

It worked with 97 - 2003 .doc files

Thanks a lot, without your code I wouldn´t have done it.

Here´s the code

<?
	// Read the file and split it into lines
	$pathToFile = "path\\to\\file.doc";
	$lines = explode(chr(0x0D), file_get_contents($pathToFile, "r"));
	
	$outText = "";
	
	// Take care of the first line and removes it from the lines array
	$firstLine = explode(chr(0x00), array_shift($lines));
	$outText .= "<p>".$firstLine[sizeof($firstLine)-1]."</p>\n";
	
	// Read each line found in the doc
	foreach ($lines as $line){
		//Stop if find any weird thing
		$pos = substr_count($line, chr(0x00));
		if (($pos != false)) break;
		
		//No weird thing, add to outText, removing some strange characters
		$line = preg_replace("/[^\w ]/", "", $line);
		$outText .= "<p>".$line."</p>\n";	
	}
	
	// Print the results
	echo ($outText);
?>

I created an account here just to thank you!
All I can tell you about the bold and formatting stuff is that all the information is writen at the end of the file and you need to read the .doc file especification if you want to learn about it.

Thanks, and if I find something to make this code better, I´ll tell you.

(sorry for my english)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.