hi folks,

i need a library to analyse references of a scientific document. the lib should be able to identify references in the full text (for instance [1], [2], ... or Author A (1995), ... Author B & C (1968), ...) and it should be able to identify the elements in the reference list. For instance if the reference list looks like this:

...
Smith, J. 1982, A new method for reference analysing, Journal of Information Technology, vol. 23, no. 5, pp. 234-238.
...

the library/algorithm should return for instance an array/list/... like this

|surname| |initial| |year| |title| |journal| ....

Do libraries like this exist in C++, Java, C# or whatever? I would also pay money for it if necessary.

Best regards
Jochen

the format actually doesn't really matter. i would prefer pdf but if it's doc, plain text or whatsoever i will find a way to transform it.

format is important else how will you know smith is name but not part of title itself, you need to specify the format first and then parse the documents to find out all the informations required.

if these documents are going to have an expected format for the stuff and it wont change at all. then it shouldn't be too hard to write a program that can read it all. i dont see what the real problem is unless you're not a programmer and need someone to make this for you. but just realize that the format of the references cannot change at all. it needs to be in the exact order otherwise you'll get garbage results from the program (you mite get "1992" in the surname field for example lol)

This article has been dead for over six months. Start a new discussion instead.