hello everyone

i was writing a program which is used to convert pdf to text.it is work fine when i am convertin a simple paragraph or table into text but it gives error while convert another file which is send by other and contains 1000 's of pages

can you tell me, where i doing mistake or forgot to check some condition. I m pasting the error

Parsing text from PDF file d:\CUTOFFENGG2006.pdf....
Exception in thread "main" java.lang.NoClassDefFoundError: org/fontbox/cmap/CMapParser
	at org.pdfbox.pdmodel.font.PDFont.parseCmap(PDFont.java:534)
	at org.pdfbox.pdmodel.font.PDFont.encode(PDFont.java:387)
	at org.pdfbox.util.PDFStreamEngine.showString(PDFStreamEngine.java:325)
	at org.pdfbox.util.operator.ShowText.process(ShowText.java:64)
	at org.pdfbox.util.PDFStreamEngine.processOperator(PDFStreamEngine.java:452)
	at org.pdfbox.util.PDFStreamEngine.processSubStream(PDFStreamEngine.java:215)
	at org.pdfbox.util.PDFStreamEngine.processStream(PDFStreamEngine.java:174)
	at org.pdfbox.util.PDFTextStripper.processPage(PDFTextStripper.java:336)
	at org.pdfbox.util.PDFTextStripper.processPages(PDFTextStripper.java:259)
	at org.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:216)
	at org.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:149)
	at PDFTextParser.pdftoText(PDFTextParser.java:47)
	at PDFTextParser.main(PDFTextParser.java:88)
Caused by: java.lang.ClassNotFoundException: org.fontbox.cmap.CMapParser
	at java.net.URLClassLoader$1.run(Unknown Source)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
	at java.lang.ClassLoader.loadClass(Unknown Source)
	... 13 more

please reply me as soon as possible

Recommended Answers

All 2 Replies

As error message state something happen on line 47 of your pdftoText method of PDFTextParser class. You did not provide code therefore we cannot help.

Secondly text extraction from PDF will only work on documents that are text documents converted to PDF, or scanned documents on which OCR (Optical Character Recognition)process been used. If you have just image documents then been just converted without OCR process you will not be able to extract any text of them

thanks for the information... i will come back to you later on

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.