I just wanted to share my thoughts regarding PDFs and how bad they are to work with. I have 8 bank sources that provide my department with quarterly forecast data and their outputs are always in PDF format. I know PDFs are a convenient way to output data and are typically smaller when it comes to file sizes than say MS Word but data extraction from them is a nightmare. I have researched PDF to Excel/Word converters and have been using Nuance PDF converter but even that software is limited in its ability to successfully split the data properly. The only way to split out data from a PDF is to treat sections as tables but this is not ideal for PDFs that are saved in (basically) image formats.
This leaves the users having to copy and paste data from PDF to Excel but even at that, one page worth will paste down one column in Excel. Because of this, my automation plans in my department will be affected and quite possibly, the data integrity will be affected. I'm currently trying to create code that uses a word search to accurately extract specific data and leave what I call the fluff that is left over. This is not an easy task and is burning up time as far as requirements’ planning is concerned. However, once it is done I will be quite happy I'm sure.
I really don't like PDFs and with today’s storage abilities I can't see file sizes as an issue for your average company. If I had my way, I would destroy all PDFs and make people/organizations output their data in formats that are usable in the information industry.
Sorry for the rant but I’m knee deep in this sh.. – hoopla right now just trying to figure out a way for the user to transfer PDF data into Excel so that I can attempt to work with it.