Quite rare to see me post in the Java forum but have been picking it up as part of a long-distance university course I am doing.

I am currently trying to read in an ASCII coded .txt file and output it but am seeing some funny characters showing up.

The content is loaded using

        BufferedReader inputReader;

        System.out.println("Loading book...");

        try {
            inputReader = new BufferedReader(new InputStreamReader(new FileInputStream(fileToRead), "ASCII"));

Which I then split on the new page character into book "pages".

When I come to print these out to console using

        try {
            PrintStream output = new PrintStream(System.out, true, "ASCII");

            for (String theWord : words) {


This: Sorrow came—a gentle sorrow—but not at all in the shape of any disagreeable
Comes as: Sorrow came???a gentle sorrow???but not at all in the shape of any disagreeable

NOTE: The dash is an em-dash

Edited by Mike Askew

4 Years
Discussion Span
Last Post by JamesCherrill

If you have an em-dash then the file is not ASCII encoded!
ASCII is a 7-bit code that includes only the english alphbet, numbers and a handfull of puctuation (not em-dashes), so any other characters will be unreadable when you specify ASCII as the character set.

Simply leaving out the character set will give you the default CharSet for your machine, which will work 95% of the time unless you are importing files from places with a different language.

Otherwize, you could try ISO-8859-1 (ISO Latin 1) or UTF-8

Edited by JamesCherrill


You sir are a genious, I did misread which encoding it was using, UTF-8 being correct.

Running with my machine default char-set works fine.


No, no genius, just been doing it a long time...
(I started using Java when I still lived in Woodham myself - in the last century)

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.