I am writing a code that reads a user-input text file, and displays all words(excluding duplicates), and displays them in ascending order. My code runs correctly, and displays the words. The only problem are quotation marks. I uploaded a sample from a fanfiction I wrote a long time ago, and it displays the quotation marks, as well as three periods(which are used to denote a pause in speech). For example, one of the sentences is "Jackson...help them!". In the output, it displays '"Jackson...help' in the sorting. Not only this, but hyphenate words(like 'color-blind'), are displayed as one word without removing the -. The same with words like goin'(short for going). It displays goin', instead of just showing goin. I'm certain the problem lies in my split statement, but I can't find the problem. Is it in the split statement, or elsewhere. Thanks for any help in advance.

import java.io.*;
import java.util.*;
public class ProblemOne {
    public static void main(String[] args) throws IOException{

        Scanner input = new Scanner(System.in);
        System.out.println("Enter a text file, from which all words(excluding their duplicates) are sorted in alphabetical order." + "\nNote! You must input the ENTIRE FILE LOCATION." + "\nExample: C:/Users/Username/File Location/Filename.txt" );
        String text = input.nextLine();
        File file = new File(text);
        if (!file.exists()) {
            System.out.println("The input file you specified either does not exist or is not in the designated location.");
        }
        else
            System.out.println("After sorting, and removing duplicates, the words from the text sorted in alphabetical order are: ");
            BufferedReader wordRead = null;
            Set<String> noDuplicates = new TreeSet<String>();
            String[] words;

        try {   
                wordRead = new BufferedReader(new FileReader(text));
                String nextSentence;
                while ((nextSentence = wordRead.readLine()) != null) {
                    words = nextSentence.split("[ \n \t \r . , ; : '  - ... ! ? ( ) { } ]");
                    for (int i = 0; i < words.length; i++) {
                        noDuplicates.add(words[i]);
                    }
                }

            }
        catch (IOException e) {
            e.printStackTrace();
        }

        finally {
            wordRead.close(); //closes the reader
        }

        List<String> noDuplicateWords = new ArrayList<String>(noDuplicates);
        Collections.sort(noDuplicateWords);
        for (String word : noDuplicateWords) {
            System.out.println(word);
        }
            }
}

Recommended Answers

All 3 Replies

How about replacing all those difficult characters with blanks befor splitting the sentence?

I still can't get it to remove the grammar and punctuation characters. I even tried the replaceAll method, but that doesn't yield an output without them.

import java.io.*;
import java.util.*;
public class ProblemOne {
    public static void main(String[] args) throws IOException{

        Scanner input = new Scanner(System.in);
        System.out.println("Enter a text file, from which all words(excluding their duplicates) are sorted in alphabetical order." + "\nNote! You must input the ENTIRE FILE LOCATION." + "\nExample: C:/Users/Username/File Location/Filename.txt" );
        String text = input.nextLine();
        File file = new File(text);
        if (!file.exists()) {
            System.out.println("The input file you specified either does not exist or is not in the designated location.");
        }

        else 
            System.out.println("The words, excluding duplicates, in ascending order are: ");
            BufferedReader wordRead = null;
            TreeSet<String> words = new TreeSet<String>();
            try {
                wordRead = new BufferedReader(new FileReader(text));
                String line;
                while ((line = wordRead.readLine()) != null) {
                    line.replaceAll("[ |\n|\t|\r|.|,|;|:|!|?|(|)|{|}|\"]", " ");
                    String[] removals = line.split("[ |\n|\t|\r|.|,|;|:|!|?|(|)|{|}|\"]");

                    for (int i = 0; i < removals.length; i++)
                        words.add(removals[i]);
                }
            }

            catch (IOException e) {
                e.printStackTrace();
            }

            finally {
                wordRead.close();
            }

            for (String word : words) {
                System.out.println(word);
            }
        }
    }

replaceAll does not modify the current string, it returns a new string, which you have ignored. Your regex seems too complicated. After the replacement you just need to split on spaces.
Here's a simple example

    String s = "a b-c, d;  e";
    // replace any of , ; - with spaces...
    s = s.replaceAll ( "[,;-]" , " " );
    // split on one or more white space characters...
    System.out.println(Arrays.toString(s.split("\\s+")));
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.