I am using Lucene Highlighter 2.4.1 for my application. I use the highlighter to get the best matching fragments, and display them.
I make a call to a function String[] getFragmentsWithHighlightedTerms(Analyzer analyzer, Query query, String fieldName, String fieldContents, int fragmentsNumber, int fragmentSize). For example :

String text = doc.get("MetaData");
    getFragmentsWithHighlightedTerms(analyzer, query, "MetaData", Text, 5, 100);

The function getFragmentsWithHighlightedTerms() is defined as follows

private static String[] getFragmentsWithHighlightedTerms( argument list here)
    {
        TokenStream stream = TokenSources.getTokenStream(fieldName, fieldContents, analyzer);
        SpanScorer scorer = new SpanScorer(query, fieldName, new CachingTokenFilter(stream));
        Fragmenter fragmenter = new SimpleSpanFragmenter(scorer, fragmentSize);
 
        Highlighter highlighter = new Highlighter(scorer);
        highlighter.setTextFragmenter(fragmenter);
        highlighter.setMaxDocCharsToAnalyze(Integer.MAX_VALUE);
        
        String[] fragments = highlighter.getBestFragments(stream, fieldContents, fragmentNumber);

        return fragments;
    }

Now my trouble is that the highlighter.getBestFragments() method is returning duplicates. i.e, If i display say the first 5 fragments, no. 1 and 3 are same. I do not quite understand what is causing this. Is there a problem with the code?

IMO this question would be more appropriate on the Lucene mailing list; much better chance of getting a quick answer. Anyways, have you read up on the Javadocs of the given class/method; there is a possibility that the contract defines the existence of duplicates?

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.