I am doing some research for work (this is potentially a commercial project, not student).
We have a database of authors and publications. Unfortunately, authors don't have unique ids, and are only identified by inconsistent names (sometimes J Smith, sometimes John Smith etc).
I was hoping to develop some software to help group publications by a single author.
It would be a semi-automatic thing; user chooses publication by the person of interest, algorithm looks through other publications and tries to match them based on the metadata (eg author name, year, title subject matter, coauthors etc.)
I am a complete novice, so realise that very similar things must have been done before, but don't know where to look for information on them.
What I do know, is that in some ways this is very similar to information retrieval. The query is the information about the known author, and software returns matches which are ranked according to relevance.
However, I want the user to be able to select more than 1 publication as the original query. This extra information would add confidence to finding other similar publications. Any idea how I should quantify this extra confidence. Or any things you think I should read which might be relevant to the project.