Hey all

I'm trying to write a Multiple Document Summary (MDS) system as proposed by Chali et al. (2009), see attached. And after a lot of reading I've managed to wrap my head around a lot of the concepts.
However, one of the things I need to make is a component to parse sentences into syntactic trees. I've been looking around on Google for a while and I haven't found anything really useful. There are systems which draw the trees given a specific notation, but I have no way of creating that notation. A reference to a parser is given in the text (Charniak (1999)) however I can only read the abstract of this article not the full text.
If anybody could perhaps suggest an algorithm (or at least point me in the right direction) I would be really grateful.

Thanks in advance
M

Hey all

I'm trying to write a Multiple Document Summary (MDS) system as proposed by Chali et al. (2009), see attached. And after a lot of reading I've managed to wrap my head around a lot of the concepts.
However, one of the things I need to make is a component to parse sentences into syntactic trees. I've been looking around on Google for a while and I haven't found anything really useful. There are systems which draw the trees given a specific notation, but I have no way of creating that notation. A reference to a parser is given in the text (Charniak (1999)) however I can only read the abstract of this article not the full text.
If anybody could perhaps suggest an algorithm (or at least point me in the right direction) I would be really grateful.

Thanks in advance
M

There are many natural language parsing implementations.

If you can, I very much suggest using NLTK -- Natural Language Toolkit. This is, however, implemented in Python, though e.g. with Java, you can interface via Jython.

This has the huge advantage that it has ready-to-go POS-tagger and natural grammar parser implementation, though you will still have to spend a fair amount of time figuring out how to actually use them.

Otherwise, if you got a random standalone parser from the web, it is likely that it would require tokenized, POS-tagged input.

Note though, that they do not produce a 100 % correct result, because e.g. due to ambiguities it does not exist, plus if it did, they would still approximate it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.