hello everyone

i was writing a program which is used to convert pdf to text.it is work fine when i am convertin a simple paragraph or table into text but it gives error while convert another file which is send by other and contains 1000 's of pages

can you tell me, where i doing mistake or forgot to check some condition. I m pasting the error

Parsing text from PDF file d:\CUTOFFENGG2006.pdf....
Exception in thread "main" java.lang.NoClassDefFoundError: org/fontbox/cmap/CMapParser
	at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:534)
	at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387)
	at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325)
	at org.pdfbox.util.operator.ShowText.process(ShowText.java:64)
	at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452)
	at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215)
	at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
	at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
	at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
	at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
	at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
	at PDFTextParser.pdftoText(PDFTextParser.java:47)
	at PDFTextParser.main(PDFTextParser.java:88)
Caused by: java.lang.ClassNotFoundException: org.fontbox.cmap.CMapParser
	at java.net.URLClassLoader$1.run(Unknown Source)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	... 13 more

please reply me as soon as possible

6 Years
Discussion Span
Last Post by bhallarahul

As error message state something happen on line 47 of your pdftoText method of PDFTextParser class. You did not provide code therefore we cannot help.

Secondly text extraction from PDF will only work on documents that are text documents converted to PDF, or scanned documents on which OCR (Optical Character Recognition)process been used. If you have just image documents then been just converted without OCR process you will not be able to extract any text of them

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.