Hi,

I want to extract text from PDF in C# asp.net. I am using this code as following link ::

Link:: http://www.codeproject.com/KB/cs/PDFToText.aspx

But this code is not working properly. The main problem is that when i get output file they have no that content which are in Inputfile.

Is there any way to resolve this problem ?

Please help me.

Thanks in advance.

Pankaj

I had attempted to use the same code with the latest referenced *dlls and on some PDFs I would get junk text, while others I had gotten everything. It's probably has to do with the format of the PDF (i.e. OCR'd, not indexed, non-standard text, etc...) Debug the code and put a breakpoint @ line 224 on PDFparser.cs to see what it is grabbing. Something else I had seen, is that the output file is UTF-8 formatted.

HTHs

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.