I just wanted to share my thoughts regarding PDFs and how bad they are to work with. I have 8 bank sources that provide my department with quarterly forecast data and their outputs are always in PDF format. I know PDFs are a convenient way to output data and are typically smaller when it comes to file sizes than say MS Word but data extraction from them is a nightmare. I have researched PDF to Excel/Word converters and have been using Nuance PDF converter but even that software is limited in its ability to successfully split the data properly. The only way to split out data from a PDF is to treat sections as tables but this is not ideal for PDFs that are saved in (basically) image formats.
This leaves the users having to copy and paste data from PDF to Excel but even at that, one page worth will paste down one column in Excel. Because of this, my automation plans in my department will be affected and quite possibly, the data integrity will be affected. I'm currently trying to create code that uses a word search to accurately extract specific data and leave what I call the fluff that is left over. This is not an easy task and is burning up time as far as requirements’ planning is concerned. However, once it is done I will be quite happy I'm sure.
I really don't like PDFs and with today’s storage abilities I can't see file sizes as an issue for your average company. If I had my way, I would destroy all PDFs and make people/organizations output their data in formats that are usable in the information industry.
Sorry for the rant but I’m knee deep in this sh.. – hoopla right now just trying to figure out a way for the user to transfer PDF data into Excel so that I can attempt to work with it.
Recommended Answers
Jump to PostBlasphemy! I really like PDFs. Perhaps your issue isn't with the actual format but with the fact that you are using them incorrectly. They're a good portable format when documents need to be read (not so much modified and worked with, which seems to be your dilemma). They're small in …
Jump to PostAs I pointed out, your work is simply requiring that you use the wrong tool for the job. Not to say that a screwdriver isn't loads useful, but it's rather frustrating when you need it to do the job of a hammer. Sure, you can sit there banging its handle …
Jump to PostI like PDF's too, and having Adobe Acrobat Pro helps a lot.
Download Foxit Reader. This will allow you to save your PDF's as text (*.txt) files. The text file separates the cell data with Tab's, it's very readable and in that format importable.
Jump to PostSorry, I should have explained this better. Foxit Reader will save the PDF data into a text file.
Run Excel and choose 'Open' as if to open a file. Change the format that you want to open in the drop-down list from "All Excel Files..." to "Text Files", and locate …
All 13 Replies
We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.