Extract Text from PDF in C#

Question

Pankaj18 -1 Light Poster

15 Years Ago

Hi,

I want to extract text from PDF in C# asp.net. I am using this code as following link ::

Link:: http://www.codeproject.com/KB/cs/PDFToText.aspx

But this code is not working properly. The main problem is that when i get output file they have no that content which are in Inputfile.

Is there any way to resolve this problem ?

Please help me.

Thanks in advance.

Pankaj

asp.net c# pdf

2 Contributors
1 Reply
79 Views
13 Hours Discussion Span
Latest Post 15 Years Ago Latest Post by sinnerFA

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

sinnerFA 9 Junior Poster in Training · Answer 1 · 2010-03-16T04:38:00+00:00

I had attempted to use the same code with the latest referenced *dlls and on some PDFs I would get junk text, while others I had gotten everything. It's probably has to do with the format of the PDF (i.e. OCR'd, not indexed, non-standard text, etc...) Debug the code and put a breakpoint @ line 224 on PDFparser.cs to see what it is grabbing. Something else I had seen, is that the output file is UTF-8 formatted.

HTHs