Hi,

Currently i have an array list listing some bigram. and i would like to have the array list to be formatted like this

[am] - [amazing, amber] and so on
[bi] - [big, bigas] and so on

how should i go about it?thanks for the suggestions. Thanks much

You could use HashMap, but it depends on how you want it to be. In your given example, it would work perfectly (string size of 2 for the key). However, if you have to use an ArrayList, just sort the array when you insert a new element.

ArrayList<String> list = new ArrayList<String>();

// assume that pushUp(String) is a method which will insert the incoming
// String to a proper index in the list.
public void pushUp(String str) {
  // search for a proper index in the array list
  // add the element to the array list at the result index
}

list.pushUp("big");
list.pushUp("amazing");
list.pushUp("bigas");
list.pushUp("amber");

// As a result, the list should contain ["amazing", "amber", "big", "bigas"]
// in the correct sorted order.
// Sort the ArrayList would give an advantage when you do a search
// (could use binary search).

Edited 5 Years Ago by Taywin: n/a

I think what you want can be accomplished by implementing some trie algorithm.

Anyway, the ArrayList approach is wrong.

@end3r
I think it is too harsh to say that the approach is wrong. What is the definition of 'right' and 'wrong' approach for you? To me, 'right' is the program can produce correct answer and 'wrong' is the opposite. I would rule out as 'inefficient' approach instead of wrong. :) Though, it is what the TC wants to do and it should be a good way to learn how to program anyway.

Thanks for the feedback.

How do i do the comparison, for example bi is matches with term bible, bobilo ? i try using regex but it does not seem to do the trick here.

Thanks.

@end3r
I think it is too harsh to say that the approach is wrong. What is the definition of 'right' and 'wrong' approach for you? To me, 'right' is the program can produce correct answer and 'wrong' is the opposite. I would rule out as 'inefficient' approach instead of wrong. :) Though, it is what the TC wants to do and it should be a good way to learn how to program anyway.

:) Right you are.

Didn't mean to be harsh on you, just wanted to point out another more efficient way to accomplish your task. I guess that "wrong" in this context means inefficient.

BR,
Ender

Thanks for the feedback.

How do i do the comparison, for example bi is matches with term bible, bobilo ? i try using regex but it does not seem to do the trick here.

Thanks.

Assuming "bible" is of type String, then you could check

"bible".contains("bi")

, which would return true in this case.

Edited 5 Years Ago by end3r: n/a

Thanks again

How if the data structure is arraylist: {bi, ab, ba, ui} and compare with {bible, tibbit, etc..}. how to make comparison in this case and extract the word with matching bigram? (btw i am doin matching bigrams here) Thanks.

I think I would do something like:

Map<String, Set> map = new HashMap<String, Set>();
for(String element1 : list1){  // iterate first list
  Set<String> result = new HashSet<String>();  //create Set so you won't have duplicates
  for(String element2 : list2){  // iterate second list
    if(element1.contains(element2)){
      result.add(element2);
    }
  }
  map.put(element1, result);
}

Edited 5 Years Ago by end3r: correction

hi there,

tried it but it displays empty arraylist.

I recently modified the message I posted... did you try it using the map ?
If so, you could paste some code.

public static ArrayList<String> NGrams(String word, ArrayList<String> bigrams) {

        ArrayList<String> bigram_ = new ArrayList<String>();
        HashMap abc = new HashMap();
        //lower case the terms
        String lower = word.toLowerCase();
        char[] w = lower.toCharArray();
        // bigram_.add(lower);

        for (int a = 0; a < w.length; a++) {
            if (a + 2 > w.length) {
                break;
            }

            String bigram;
            bigram = lower.substring(a, a + 2);
            bigrams.add(bigram);

            //remove duplicate using Hash set
            Set set = new HashSet();
            List newList = new ArrayList();
            for (Iterator iter = bigrams.iterator(); iter.hasNext();) {
                Object element = iter.next();
                if (set.add(element)) {
                    newList.add(element);
                }
            }
            bigrams.clear();
            bigrams.addAll(newList);

            //macthing bigrams

            ArrayList<String> result = new ArrayList<String>(); //create Set so you won't have duplicates

                for (String element1 : bigram_) { // iterate first list
                    for (String element2 : bigrams) { // iterate second list
                        if (element1.contains(element2)) {
                            result.add(element2);
                            abc.put(element1, element2);
                            System.out.println(result);
                        }
                    }
                }
        }

        return bigrams;
    }

yep above are the codes. any suggestion or helps are most welcome :)

Here is what I noticed:

1. You added the code but you don't use the map or the list anywhere further.
2. The first list should be the "bigrams" and the second list should be "bigram_"
3. The "bigram_" list is empty. Shouldn't it contain at least the word "bible" ? This is why your result list is empty (it won't enter the loop).
4. Why did you use the "String word" parameter ? You said you have two lists, one with words and another with bigrams. The code I provided takes this into account, so you would have to pass the two lists and it would create a map with (word, bigrams) pairs
5. When using generics, it is best that you infer the types.
6. Use a more generic parameter type like List instead of ArrayList.

You should probably think more what the method should do. You could start by separating the tasks by creating more methods.

BR,
ender

hi ender,

the bigrams_1 indeed have a value. this codes has been modified similar to yours.

public static ArrayList<String> NGrams(String word, ArrayList<String> bigrams) {

        ArrayList<String> bigram_ = new ArrayList<String>();
        Map<String, Set> map = new HashMap<String, Set>();
        ArrayList<String> bigram1 = new ArrayList<String>();
        //lower case the terms
        String lower = word.toLowerCase();
        char[] w = lower.toCharArray();
        // bigram_.add(lower);

        for (int a = 0; a < w.length; a++) {
            if (a + 2 > w.length) {
                break;
            }

            String bigram;
            bigram = lower.substring(a, a + 2);
            bigrams.add(bigram);

            //remove duplicate using Hash set
            Set set = new HashSet();
            List newList = new ArrayList();
            for (Iterator iter = bigrams.iterator(); iter.hasNext();) {
                Object element = iter.next();
                if (set.add(element)) {
                    newList.add(element);
                }
            }
            bigrams.clear();
            bigrams.addAll(newList);
            bigram1.add(lower);
           
            //macthing bigrams

                for (String element1 : bigrams) { 
                     Set<String> result = new HashSet<String>();
                    for (String element2 : bigram1) { 
                        if (element1.contains(element2)) {
                            result.add(element2);    
                        }
                    }
                    map.put(element1, result);
                }
                
        }
        System.out.println(map);
        return bigrams;
    }
This article has been dead for over six months. Start a new discussion instead.