0

Hello gentlemen. I have a question over here. with the following piece of code i want to extract data from a document. The question is how can i assign the values Sentences, Words in the Array.

String data[][]; //[files][sentences][words]
    public void ScanSearch() {
    	
        for (int i = 0; i < files.length; i++) {
            try {
            	Scanner scanner = new Scanner(new File(files[i][1]));
            	while (scanner.hasNextLine()){
            		String line = scanner.nextLine();
            		Pattern p = Pattern.compile("[\\.\\!\\?]\\s+", Pattern.MULTILINE);
            		Scanner lineScanner = new Scanner(line);
            		lineScanner.useDelimiter(p);
            		
            		
            		while (lineScanner.hasNext()){
            			String Sentence = lineScanner.next();
            			System.out.println(Sentence);
            			String[] words = Sentence.split(" ");
            			for (int j=0; i<words.length; j++){
                        	System.out.println(words[j]);
                        }
            		}
            	}
                
            	            	
            } catch (Exception e) {
               e.printStackTrace();
            }

        }

help greatly appreciated.

3
Contributors
8
Replies
9
Views
6 Years
Discussion Span
Last Post by JamesCherrill
0

In order to parse the words, you can use the code string Pattern w = Pattern.compile("\\W+"); . The sentences are more problematic, since a sentence can be across lines. After parsing a sentence and all of its lines you have to check whether the last line ends a sentence, and if not you need to extract the next line (or lines, depends on the length of the sentence) and concatenate all of the fragments together into one sentence. A very rough draft should be something of the sort:

  1. Retrieve a line from the file.
  2. Parse all the sentences from it.
  3. Check whether the last part is indeed a complete sentence
  4. If not - hold a variable states that the sentence is not finished yet.
  5. Parse the words, continue to the next line.
  6. When reaching the end of the sentence, check whether you have a sentence from before - if you do, concatenate it to the sentence you are parsing now.
0

wow. My question was way to simple
I want to get an the Array data[Sentence][Word] that will hold the Sentence's String as it's already used in the code and the Word's String.

How can i assign them ?

0

2d arrays in Java are just an array of arrays. You have arrays of words, so you can just add those to an array of sentences - roughly like this...

String[][] data = new String[99][]; // first dim big enough for all sentences
int sentenceNum = 0;
while (lineScanner.hasNext()){
   String sentence = lineScanner.next();
   String[] words = sentence.split(" ");
   data[sentenceNum] = words;
   sentenceNum++;
}

for (int s = 0; s < data.length; s++) {
   System.out.println("Sentence " + s);
   for (int w = 0; w < data[s].length; w++) {
       System.out.println("   Word " + w + ": " + data[s][w]);
   }
}
0

ps
Personally I would avoid the problem of sizing the first dim of the array by using an ArrayList, although I'd keep the array for words since that's what split gives you

ArrayList<String[]> data = new ArrayList<String[]>();
...   
   data.add(sentence.split(" "));
0

by passing String[] in the ArrayList i can achieve the same with a 2d array ??
I got a bit confused with this.
Could you please write down a bit more about how the code would look like with the ArrayList ?

0

An ArrayList of String[] is an ArrayList where each cell contains an array of String. This can be looked at as a dynamic 2d matrix. For example:

ArrayList<String[]> arr2d = new ArrayList<String[]>(); //The proper way to declare an ArrayList of String[]
String[] str1 = new String[2];
str1[0] = "hello";
str1[1] = "world";
String[] str2 = new String[3];
str2[0] = "how";
str2[1] = "are";
str2[2] = "you?";
arr2d.add(str1);
arr2d.add(str2);

This will create and ArrayList with 2 cells, the first one contains an array of length 2, and the second one is an array of length 3.

0

populate it as in my previous post (and apine's longer example). Access it like

for (String[] sa : data) {  // loop thru all the string arrays in the arraylist
   for (int w = 0; w < sa.length; w++) {
       System.out.println("   Word " + w + ": " + sa[w]);
   }
}
This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.