I am working on a shell script that takes a single command line parameter, a file path (might be relative or absolute). The script should examine that file and print a single line consisting of the phrase:

Windows ASCII

if the files is an ASCII text file with CR/LF line terminators, or

Something else

if the file is binary or ASCII with “Unix” LF line terminators.

currently I have tried the following code.

#!/bin/sh
file=$1
if grep -q "\r\n" $file;then
echo Windows ASCII
else
echo Something else
fi


#!/bin/sh
if test -f "$file"
then
echo  Windows ASCII
else
echo Something else
fi



#!/bin/sh
file=$1
case $(file $file) in
*"ASCII test, with CRLF lin terminators")
echo "Windows ASCII"
;;
*)
echo "Something else"
;;
esac

All cases displays information properly, but when I pass something that is not of Windows ASCII type through such as /bin/cat or SomeFile.sh it still id's it as Windows ASCII. When I pass a .doc file type it displays something else as expected it is just on folders that it displays Windows ASCII. I think I am not handling it properly, but I am unsure. Any pointers of how to fix this issue?

I took a different approach to solving this problem. Instead of grepping for newline characters from the start, I used the file command to tell me the mime type of the file. If the file is binary, then nothing is done. If the file is a text file, it is then grepped for "\r\n" to determine if it uses Windows line endings.

You can take this and modify it to suit your needs of course.

#!/bin/bash
# Determines whether or not a file is a text file
# with Windows line endings.

# Helper function to print usage.
print_usage () {
    # Print the usage message and exit with an error code.
    echo -e "\nUsage: ./istextfile.sh <file>\n"
    exit 1
}

# Make sure the user entered valid arguments.
if [[ -z "$1" ]]; then
    print_usage
elif [[ ! -f "$1" ]]; then
    echo -e "\nFile does not exist: $1"
    print_usage
fi

# Get the mime type for this file. (using `cut` to remove the file name..)
filetype="$(file --mime-type "$1" | cut -d " " -f2)"
if [[ "$filetype" == "text/plain" ]]; then
    echo "This is a text file: $1"
else
    # This is not a text file, we have no use for it.
    echo "Not a text file: $1 ($filetype)"
    exit 1
fi

# Determine line endings (probably a better way to do this.)
if grep -q "\r\n" "$1"; then
    echo "Windows: Yes"
else
    echo "Windows: No"
fi
exit 0

The file command alone can give you some insight into what type of file you are dealing with.

# Get file type description.
filedesc="$(file $1)"

# Use regex to test for certain strings.
if [[ "$filedesc" =~ "ASCII text" ]]; then
    echo "This file is an ASCII text file."
elif [[ "$filedesc" =~ "UTF-8 Unicode text" ]]; then
    echo "This file is a UTF-8 text file."
else
    echo "This is not ASCII or UTF-8."
fi

I've read that file will sometimes say "ASCII text, with CRLF line terminators", which is something you can also test for using the same method as above.

This article has been dead for over six months. Start a new discussion instead.