I have 300,000 files and need a way to quickly classify them and copy them to a logical file structure. The files are named sequentially, with the extensions intact so they could be opened one by one. At one minute apiece, that's 180 man weeks of time with a quick, free tool. I would prefer to point a program at a directory and ask it to evaluate each and all files based on contents: titles, dates, reference numbers, worksheet tabs, and the file extension. Extensions are : all the Office extensions, TXT, PDF, DAT, OCR, and a bunch of strange ones. Then the program should write a copy to a directory structure based on the discovery scripts that are defined in the search.

No problem finding a tool that will do the same at the record level. But I have not found a tool that will do the job at the file level. Am I missing the obvious?


Recommended Answers

All 2 Replies

If I was given that task, I'd probably write some Powershell scripts. It's pretty easy with Powershell to iterate through directory trees and look at file contents, and copy stuff around. Problem is, it takes some time to get familiar with Powershell (less than 180 weeks, though) so if you have some kind of deadline you should look elsewhere.

Good luck!

>>At one minute apiece, that's 180 man weeks

It will not take near that long to do it in c or c++. Maybe one second or less for each file. Reading PDF, OCR and DOC files could take a little longer because their contents are a lot more complicated than TXT files. But even so they can be processed in seconds or less. A lot will depend on what all has to be done with the files.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.