I want to extract unique elements from a string.

import java.util.HashSet;
import java.util.Iterator;
import java.util.Set;
import java.util.StringTokenizer;

/**
 *
 * @author Animesh Pandey
 */

public class Main {

    /**
     * @param args the command line arguments
     */

    public static Set getUniqueTokens(String str, String separator) {
        StringTokenizer tokenizer = new StringTokenizer(str, separator);
        Set tokens = new HashSet();
        while (tokenizer.hasMoreTokens()) {
                tokens.add(tokenizer.nextToken());
        }
        return tokens;
    }

    public static void main(String[] args) {
        String s1 = "The Map interface maps unique keys to value means it associate value to unique keys which you use to retrieve value at a later date interface maps unique keys to value means "
                + "associate value to unique keys which you.";
        Set unique = getUniqueTokens(s1, " ");
        Iterator uniset = unique.iterator();
        while (uniset.hasNext()){
            Object element = uniset.next();
            System.out.println(element);
        }        
    }
}

This is a program for it but what should I do if Ineed to ignore the words like 'it, to, a, an, the, is, are' ????
Please Help!

Recommended Answers

All 6 Replies

You can create a Collection (eg ArrayList) of Strings containing all the words you want to ignore. Then, immediately before your tokens.add(tokenizer.nextToken()); you can use the contains(...) method to see if the next token is contained in you collection of words to ignore. Only add it if it's not in the "ignore" Collection.

I wish to find Jaccard Index between two strings ....

public class jaccardIndex {

    public static Set getUniqueTokens(String str, String separator) {
        StringTokenizer tokenizer = new StringTokenizer(str, separator);
        Set tokens = new HashSet();
        while (tokenizer.hasMoreTokens()) {
                tokens.add(tokenizer.nextToken());
        }
        return tokens;
    }

    public float Jaccard (String str1, String str2) {
        Set unique1 = getUniqueTokens(str1, " ");
        Set unique2 = getUniqueTokens(str2, " ");


    }

}

I want to store all unique strings in a list, so that they can be counted and compared for same elements!
like....

n1 = LengthOfunique1;
n2 = LengthOfunique2;

JaccardIndex = No. of common elements/(n1+n2);

How should I do that ??

How will you determine if the Strings are unique? A Set will help you do that.
Or use an ArrayList to hold the Strings and check if the next String you get is already in the list before adding it to the list.

I am trying to read strings from database.
And then tokenize them .....
I wish to find the Jaccard Index between all possible pairs of strings in the database!
How shoudl I do that. ...

              .
              .
              .
              Statement st = null;
              ResultSet rs = null;
              st = conn.createStatement();
              String query = "select * from books";
              rs = st.executeQuery(query);
              while (rs.next()) {
                  String title = rs.getString(2);
                  String synopsis = rs.getString(3);
              }
              .
              .
              .

The string that I am talking about is stored in synopsis. Is there a way to store all string in a list or any collection, so that I can access them after whole database has been read ????
Please Help!

a way to store all string in a list

Define a list like an ArrayList and add the Strings to it as they are read in.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.