Hello gentlemen. I have a question over here. with the following piece of code i want to extract data from a document. The question is how can i assign the values Sentences, Words in the Array.

String data[][]; //[files][sentences][words]
    public void ScanSearch() {
    	
        for (int i = 0; i < files.length; i++) {
            try {
            	Scanner scanner = new Scanner(new File(files[i][1]));
            	while (scanner.hasNextLine()){
            		String line = scanner.nextLine();
            		Pattern p = Pattern.compile("[\\.\\!\\?]\\s+", Pattern.MULTILINE);
            		Scanner lineScanner = new Scanner(line);
            		lineScanner.useDelimiter(p);
            		
            		
            		while (lineScanner.hasNext()){
            			String Sentence = lineScanner.next();
            			System.out.println(Sentence);
            			String[] words = Sentence.split(" ");
            			for (int j=0; i<words.length; j++){
                        	System.out.println(words[j]);
                        }
            		}
            	}
                
            	            	
            } catch (Exception e) {
               e.printStackTrace();
            }

        }

help greatly appreciated.

Recommended Answers

All 8 Replies

In order to parse the words, you can use the code string Pattern w = Pattern.compile("\\W+"); . The sentences are more problematic, since a sentence can be across lines. After parsing a sentence and all of its lines you have to check whether the last line ends a sentence, and if not you need to extract the next line (or lines, depends on the length of the sentence) and concatenate all of the fragments together into one sentence. A very rough draft should be something of the sort:

  1. Retrieve a line from the file.
  2. Parse all the sentences from it.
  3. Check whether the last part is indeed a complete sentence
  4. If not - hold a variable states that the sentence is not finished yet.
  5. Parse the words, continue to the next line.
  6. When reaching the end of the sentence, check whether you have a sentence from before - if you do, concatenate it to the sentence you are parsing now.

wow. My question was way to simple
I want to get an the Array data[Sentence][Word] that will hold the Sentence's String as it's already used in the code and the Word's String.

How can i assign them ?

2d arrays in Java are just an array of arrays. You have arrays of words, so you can just add those to an array of sentences - roughly like this...

String[][] data = new String[99][]; // first dim big enough for all sentences
int sentenceNum = 0;
while (lineScanner.hasNext()){
   String sentence = lineScanner.next();
   String[] words = sentence.split(" ");
   data[sentenceNum] = words;
   sentenceNum++;
}

for (int s = 0; s < data.length; s++) {
   System.out.println("Sentence " + s);
   for (int w = 0; w < data[s].length; w++) {
       System.out.println("   Word " + w + ": " + data[s][w]);
   }
}

Great!
Thanks a log about the guidance.

ps
Personally I would avoid the problem of sizing the first dim of the array by using an ArrayList, although I'd keep the array for words since that's what split gives you

ArrayList<String[]> data = new ArrayList<String[]>();
...   
   data.add(sentence.split(" "));

by passing String[] in the ArrayList i can achieve the same with a 2d array ??
I got a bit confused with this.
Could you please write down a bit more about how the code would look like with the ArrayList ?

An ArrayList of String[] is an ArrayList where each cell contains an array of String. This can be looked at as a dynamic 2d matrix. For example:

ArrayList<String[]> arr2d = new ArrayList<String[]>(); //The proper way to declare an ArrayList of String[]
String[] str1 = new String[2];
str1[0] = "hello";
str1[1] = "world";
String[] str2 = new String[3];
str2[0] = "how";
str2[1] = "are";
str2[2] = "you?";
arr2d.add(str1);
arr2d.add(str2);

This will create and ArrayList with 2 cells, the first one contains an array of length 2, and the second one is an array of length 3.

populate it as in my previous post (and apine's longer example). Access it like

for (String[] sa : data) {  // loop thru all the string arrays in the arraylist
   for (int w = 0; w < sa.length; w++) {
       System.out.println("   Word " + w + ": " + sa[w]);
   }
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.