while ((line = br.readLine()) != null) {
                md.update(line.getBytes());
                byte[] bytes = md.digest();
                }

Is it like this James? DigestInputStream does not accept bufferedReader as input.

That's exactly the problem I warned you about. You risk character set conversions going on, so the line.getBytes may or may not be the same as the original bytes in the file (plus what happened to the end of line delimiter?).
DigestInputStream does not accept BufferedReader as input because that's a character-oriented stream that does character set conversions. Message digest needs to read all the original unconverted bytes in the file and doesn't care what they represent.

Have a look at this

Hi James,

Does that mean i should use UTF-8 first? and the meaning of actual bytes from file means the whole strings in the files taken as single bytes? thanks.

No. Forget Strings. Just read the bytes.
See you totmorrow.

Hi James,

But i need the values to be converted into string though and in binary (my initial idea)

I don't know which values you are referring to.

Is SHA unable to get the string one by one? the result is in one whole big chunk of binary string. Any advice guys? Thanks.

Scanner scanner = new Scanner(file);
            //Read File Line By Line
            //while ((line = br.readLine()) != null) {
            while(scanner.hasNextLine()) {
                line = scanner.nextLine();

        md.update(line.getBytes());

        byte byteData[] = md.digest();

        //convert the byte to hex format method 
        for (int m = 0; m < byteData.length; m++) {
         sb.append(Integer.toString((byteData[m] & 0xff) + 0x100, 2).substring(1));
        }

i do a simple text file which has values:

1
2

I read it first line and try to hash it using sha and convert it to binary , then my next plan is to back to read second line and sha and convert again. Is it because of the append thing thats why when i try to system.out.println it show one line of binary values.

I don't think anyone else understands your question either!
SHA processes the bytes from your file and returns a message digest in the form of 160 bits, which the Java code returns to you as a 20 element byte array. That's what it does.
Using a chartacter-oriented input stream and processing the file as lines of String is WRONG WRONG WRONG - see my previous posts.
If you want to convert the digest to a hex or a binary String that's OK.

Hi James,

Can i just process the bytes from one string only and process it using SHA and then get the second string and process it again?

Sorry just kind of confuse about the conversion. It seems its only processing one string value while my input file has 2 strings values. for your advice. Thanks!

Sorry, but this is going to be my last try to explain this.

Your file contains text stored in 1 byte per letter using a character set that we don't know.
If you do anything to read that as Java chars or Strings then Java performs a conversion to 16 bits per character UniCode. The conversion uses the character set and/or locale to re-code the text.
When you use getBytes to convert the String back to 8 bit values Java once again does a conversion from 16 bits back to 8 bits based on the locale and a character set definition. Unless you take complete control of the character sets and the values in the input file there is no guarantee that the getBytes will give you the same bytes as were in the file.
To make things worse, when you use readLine Java discards the end-of-line characters, so these will never be used in computing the file's SHA hash.

Any code you use that converts the file to a Java String or Strings cannot guarantee that the result will be the correct SHA1 hash of the original file.

If you don't care whether your hash is correct or not then you can read lines from the file and pass them to the digest's update method. That will give you a 20 byte message digest, but it won't be the one defined by the SHA1 standard.

If you want a valid standard SHA1 hash then you have to read the file as bytes, not as characters or Strings, so they will be processed without conversion to/from UniCode and without dropping the end-of-line terminators.

I really hope that's clear, because I don't know how to make it any clearer. Maybe someone else can help?

Hi James,

Sorry for troubling you to explain the SHA part.

So in that case if i read the file as one whole bytes then it means all the values inside the text file will become one whole bytes data right? then the final result from SHA is going to be one single hash value am i right? then we wont be able to differentiate the values from original text file inside the SHA because it has become one single hash value?

the values inside the text file will become

The file on disk is made of a sequence of bytes. Reading those bytes into a byte array in the program does not change the contents or the order of the bytes.

differentiate the values from original text file inside the SHA

What are you trying to do? Get a SHA1 hash of parts of the file, not the whole file?
How do you separate the parts of the file for each SHA1 hash value?

Hi Norm,

Just trying to get SHA1 from one the value inside the text file (which is a string). but then inside the text file there are different values.

for example text file contains values:

1
2
10
20

I want to get SHA values from each of the string values in the text file. Is this possible?or SHA just get the bytes from whole text file as explained by James? Thanks.

<Light goes on> OK, now I suddenly understand the confusion. You're NOT trying to get an SHA1 for the file.
<restart>
If you are happy that the text doesn't have any foreign characters, accents etc, then you can get a String (from the file or anywhere else), get it's bytes, update the digest, and calculate the hash.
Then when you want to do another String you call the digest's reset() to clear the existing contents, update with the new String, calculate the new hash etc

Yes, you can get a SHA hash for any byte array. The problem is getting the correct byte array.
See James's explanations for the possible problems when reading the bytes from the file and converting them to a String and then back to bytes.

Hi james,

Yep that was my intention. Apologise that i make you frustrated over the SHA thing. i will write some codes and post it here. Thanks once again.

Hmmm does char like comma and dot will have a huge impact on the SHA?

Here's a table showing all the characters that are "safe" - ie the same as a byte or as UniCode. Stick to those and you'll be OK.

Scanner scanner = new Scanner(file);
            //read line by line
            while(scanner.hasNextLine()) {
                line = scanner.nextLine();
        //get each string in text file and hash it using SHA
        md.update(line.getBytes());
        byte byteData[] = md.digest();

        //convert the byte to binary format method 
        for (int m = 0; m < byteData.length; m++) {
         //into a single line of binary (0 and 1)
         sb.append(Integer.toString((byteData[m] & 0xff) + 0x100, 2).substring(1));          
        }
        //reset and get another line (while loop)
        md.reset();

Strange i still get one single value from the System.out.println(sb.toString());

I should be getting two binary values though because i put some text file as a test which contain:

text file (2 values only for test)

1
2

I guess you forgot to post a closing } after line 15?
You append all the hashes to the same string buffer, so it's only 1 value that keeps getting longer, but you should find it contains 160 x (number of lines) bits

Hi James,

Nope. still figuring out why it only gives one value. it does not seem to hash the second line of the text file which contain value "2"

Yep should be 160 bits value.

I just feel that this code should give me two SHA values but just a single SHA values. Do i put the reset value in wrong line of code?

Scanner scanner = new Scanner(file);
            //Read File Line By Line
            //while ((line = br.readLine()) != null) {
            while(scanner.hasNextLine()) {
                line = scanner.nextLine();
        md.update(line.getBytes());
        byte byteData[] = md.digest();
        //convert the byte to hex format method 
        for (int m = 0; m < byteData.length; m++) {
         sb.append(Integer.toString((byteData[m] & 0xff) + 0x100, 2).substring(1));   
        }
        md.reset();
        s = sb.toString();
        System.out.println(s + "\n");
      }

The reset looks OK to me.
Try debugging! Add some print statements inside the loop(s) to confirm that you are reading both lines and processing all the values that you should be processing.

Hi James,

yes this is the strange things. I did put print statement after line 5 of the codes to ensure that the program read line 1 : "1" and line 2 : "2" of the text file. but it does not show after its being digested. and i am sure the codes seem fine. hmm....

StringBuilder sb = new StringBuilder();
        String s;
        int m;
        //try {
            Scanner scanner = new Scanner(file);
            //Read File Line By Line
            //while ((line = br.readLine()) != null) {
            while(scanner.hasNextLine()) {
                line = scanner.nextLine();
                System.out.println(line);
        md.update(line.getBytes());
        byte byteData[] = md.digest();
        //convert the byte to hex format method 
        for (m = 0; m < byteData.length; m++) {   
         sb.append(Integer.toString((byteData[m] & 0xff) + 0x100, 16).substring(1));  
        }
        md.reset();
        System.out.println(sb.toString());
      }           

And my print result:

1
356a192b7913b04c54574d18c28d46e6395428ab
2
356a192b7913b04c54574d18c28d46e6395428abda4b9237bacccdf19c0760cab7aec4a8359010b0

The value of 1 seems to be added into value of 2. Both has same values for the first hex values. testing using hex convert because its easier to see. does that mean it never be resetted? Thanks.

See my earlier post - I already explained that.

Ah i know why its because of the append. It will also append the SHA hex value of 2. Any way to check that when it comes to read second value, it will not append the first value? Thanks.

Thanks.

You could start a new StringBuffer each time thru the loop, or clear the existing one ...

Hi James,

yes haha. that what i did. It worked now. I just use a small text file to test. Gonna see how it performs when big files being used. Thanks alot for the advices James and Norm.

Hi guys,

I have this code:

for (int a = 0; a < table1.size(); a++) {

            //search every index in the list
            if (array_table1.get(a).contentEquals(s)) {
                System.out.println("found");
            } else {
                System.out.println("not found");
            }
        }
        for (int b = 0; b < table2.size(); b++) {
            if (array_table2.get(b).contentEquals(s)) {
                System.out.println("found");
            } else {
                System.out.println("not found");
            }
        }

I have the value inside the table 1 and show the found message but i dont want to continue search the second table.

Also if the value found in table 2, how do i skip the first table and just find the value in second table. Thanks.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.