yamini222 0 Newbie Poster

Hello,

I'm writting a code for comparing words in two sentences. I'm first comparing query with target (forward) and store the scores and again I'm reverse comparing target with query (reverse) and stores the score. I'm doing this inorder to solve the split word problem.

For e.g.

Sentence1: "Im working on java code which is not working"
Sentence2: "Im work ing on a code written in java which is not working"

Now you can see that word "working" is is splitted into two words in sentence2. I want the algorithm to give me this words as one. So, I developed an ameture algorithm which compares two sentences in forward and reverse direction

I made a list of words from the sentence splitted by space and stored them in a map.

Map<Integer,String> Qmap = new HashMap<Integer, String>();
        Qmap.put(1, Qword1);
        Qmap.put(2, Qword2);
        Qmap.put(3, Qword3);
        Qmap.put(4, Qword4);
        Qmap.put(5, Qword5);
        Qmap.put(6, Qword6);
        Qmap.put(7, Qword7);
        Qmap.put(8, Qword8);
        Qmap.put(9, Qword9);


        Map<Integer,String> Tmap = new HashMap<Integer, String>();
        Tmap.put(1, Tword1);
        Tmap.put(2, Tword2);
        Tmap.put(3, Tword3);
        Tmap.put(4, Tword4);
        Tmap.put(5, Tword5);
        Tmap.put(6, Tword6);
        Tmap.put(7, Tword7);
        Tmap.put(8, Tword8);
        Tmap.put(9, Tword9);
        Tmap.put(10, Tword10);
        Tmap.put(11, Tword11);
        Tmap.put(12, Tword12);
        Tmap.put(13, Tword13);

Now I'm asking user to give a window size inroder to deal with the split word. In this case if the user gives window size of 2, then if the the length of the query word is greater then target word then the algorithm combines the word with the next immidiate word and compares it with the query.

Note: each word in query is compared to every word in the target.

I'm storing score for each word along with the possible hits in a map again

Multimap<Integer, Multimap<Double, String>> forward = comparing(Qmap,Tmap);
Multimap<Integer, Multimap<Double,  String>> reverse = comparing(Tmap,Qmap);


   public static Multimap<Integer, Multimap<Double,  String>> comparing(Map<Integer,List<String>>Qmap,Map<Integer,List<String>>Tmap)
    {
        int next_itr_q = 0;
        Multimap<Double,String> forwardscores = ArrayListMultimap.create();
        Multimap<Integer, Multimap<Double, String>> Forward_bestscores = ArrayListMultimap.create();
        for(Entry<Integer, List<String>> query :Qmap.entrySet())
        {
            double bestval = 0;
            next_itr_q = next_itr_q+1;
            int next_itr_t = 0;

            forwardscores.clear();
            for(Entry<Integer, List<String>> target :Tmap.entrySet())
            {
                next_itr_t = next_itr_t+1;
                if((query.getValue().size() == target.getValue().size()) ||(query.getValue().size() < target.getValue().size()))
                {
                    double score = 0.0;
                    score = 1-((double)LevenshteinDistance.computeLevenshteinDistance(query.getValue(),target.getValue())/(Math.max(query.getValue().size(),target.getValue().size())));
                    Map<Integer,String> positions = new HashMap<Integer, String>();
                    forwardscores.put(score, Integer.toString(next_itr_t));
                }
                else if(query.getValue().size() > target.getValue().size())
                {

                    if(next_itr_t+1 <= Tmap.size())
                    {
                        List<String> t1 = Tmap.get(next_itr_t);
                        List<String> t2 = Tmap.get(next_itr_t+1);
                        int count = t1.size()+t2.size();
                        String itr = next_itr_t+"+"+(next_itr_t+1); 
                        List<String> newList = new ArrayList<String>();
                        newList.addAll(t1);
                        newList.addAll(t2);

                        double score = 0.0;
                        score = 1-((double)LevenshteinDistance.computeLevenshteinDistance(query.getValue(),newList)/(Math.max(query.getValue().size(),newList.size())));

                        forwardscores.put(score, itr);
                    }
                }
            }
            Forward_bestscores.put(next_itr_q, forwardscores);
        }
        System.out.println(Forward_bestscores);
        return Forward_bestscores;

    }

now for each position I have the possible hits and it's score.

The value format of the forward and reverse maps is

Qmap = {1=[{0.30000000000000004=[8+9], 0.8571428571428572=[2+3], 0.5=[3+4, 4+5, 7+8], 0.1428571428571429=[1+2], 0.33333333333333337=[6], 0.3571428571428571=[5+6]}], 2=[{0.0=[1], 0.5555555555555556=[6], 0.8=[3], 0.6666666666666667=[2+3], 1.0=[4, 5], 0.5=[7+8], 0.8333333333333334=[8]}], 3=[{0.0=[1], 0.5555555555555556=[6], 0.8=[3], 0.6666666666666667=[2+3], 1.0=[4, 5], 0.5=[7+8], 0.8333333333333334=[8]}], 4=[{0.7142857142857143=[4+5], 0.8571428571428572=[5+6], 0.4285714285714286=[2+3], 0.3571428571428571=[8+9], 0.6428571428571428=[3+4], 0.1428571428571429=[1+2], 0.5714285714285714=[7+8], 0.9285714285714286=[6+7]}], 5=[{0.0=[1, 7, 9], 0.19999999999999996=[3], 0.33333333333333337=[8], 0.16666666666666663=[2+3], 0.4=[4, 5], 0.2222222222222222=[6]}], 6=[{0.0=[1, 9], 0.6666666666666667=[8], 0.5=[7], 0.33333333333333337=[2+3, 6], 0.4=[3, 4, 5]}], 7=[{0.0=[2+3, 3, 4, 5, 6, 7, 8], 0.4=[1], 0.75=[9]}]}

Tmap = {1=[{0.0=[1, 2, 3, 5+6], 0.0714285714285714=[4], 0.5=[6+7]}], 2=[{0.0=[5, 7], 0.0714285714285714=[4], 0.1428571428571429=[1], 0.19999999999999996=[2, 3], 0.25=[6]}], 3=[{0.8=[2, 3], 0.7142857142857143=[1], 0.6666666666666667=[5+6], 0.125=[6+7], 0.3571428571428571=[4]}], 4=[{0.7142857142857143=[1], 1.0=[2, 3], 0.8333333333333334=[5+6], 0.125=[6+7], 0.3571428571428571=[4]}], 5=[{0.7142857142857143=[1], 1.0=[2, 3], 0.8333333333333334=[5+6], 0.125=[6+7], 0.3571428571428571=[4]}], 6=[{0.9=[2+3], 0.4736842105263158=[3+4], 0.75=[1+2], 0.5555555555555556=[5+6], 0.2222222222222222=[6+7], 0.6428571428571428=[4]}], 7=[{0.8=[2, 3], 0.0=[7], 0.2857142857142857=[4], 0.6666666666666667=[5+6], 0.5=[6], 0.5714285714285714=[1]}], 8=[{1.0=[5+6], 0.5=[2+3], 0.26315789473684215=[3+4], 0.5714285714285714=[1], 0.3571428571428571=[4], 0.25=[6+7]}], 9=[{0.0=[1, 2, 3, 4, 5+6, 6], 0.75=[7]}]}

here {1=[{0.0=[2+3, 3, 4, 5, 6, 7, 8], 0.75=[9], 0.4=[1]}]

this says for the 1st word (1=) in query scores(0.0=) for the compared words in the target([2+3, 3, 4, 5, 6, 7, 8]) here 2+3 means a combination of word 2 and word 3 gave good result.

I want to align the two sentences based on the maximum scores from the two maps for each word, if the owrd is already used then go for the next highest hit this is giving a lot of problems to code, in other words I tried some code which failed to align as I want.

for the scores I gave I expect the alignment output should be some what like this

words:     1       2      3      4     5     6     7
Hits:     2+3      4      5     6+7    8     8     9
scores:   0.85    1.0    1.0    0.92  0.33  0.66  0.75    

Any help would be greatly appreciated.