Hi All,

I am doing some research for work (this is potentially a commercial project, not student).

We have a database of authors and publications. Unfortunately, authors don't have unique ids, and are only identified by inconsistent names (sometimes J Smith, sometimes John Smith etc).

I was hoping to develop some software to help group publications by a single author.

It would be a semi-automatic thing; user chooses publication by the person of interest, algorithm looks through other publications and tries to match them based on the metadata (eg author name, year, title subject matter, coauthors etc.)

I am a complete novice, so realise that very similar things must have been done before, but don't know where to look for information on them.

What I do know, is that in some ways this is very similar to information retrieval. The query is the information about the known author, and software returns matches which are ranked according to relevance.

However, I want the user to be able to select more than 1 publication as the original query. This extra information would add confidence to finding other similar publications. Any idea how I should quantify this extra confidence. Or any things you think I should read which might be relevant to the project.

Thank you!


Thanks Ezzaral, that looks like an interesting nights reading!

I know what you mean about getting deep quickly, even when i try and think about it myself for a few minutes i find myself in an endless trail of 'what if i could do this...'!

I think i will need to actually try a few things on some real data to find out what level of complexity is required for good results (unfortunately we would need close to 100% accuracy to make this worthwhile)


Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.