Hello to all in forum,

Maybe some java expert could help me.

How can I read chunks from binary file each chunk at a time? The chunks are separated with the
bytes D2 followed by A7 and of variable length. This is whenever D2A7 is found, this is the End/Beginning of a new chunk.

The code I have so far is below and in general does this:

1- Store 1024 bytes in "inputBytes" for each for loop
2- Convert to a hexadecimal string the content of "inputBytes"
3- Replace all the "E8F5" with carriage return \r (in order to be able to use scanner.nextLine feature )

The issue is the last chunk in each iteration of 1024 bytes is incomplete and is needed a way to read 1024
each time and give the offset, but the form "read(inputBytes,Pos,1024)" is not working.

May somebody has an alternative way to do it or how to fix my code. The code is below:

package ReadbyChunks;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;
import java.util.Scanner;
import javax.xml.bind.DatatypeConverter;

public class ReadbyChunks {

    /**
    * @param args the command line arguments
    */
    public static void main(String[] args) {
        // TODO code application logic here
        File inputFile = new File("./binary");
        int lastPos = 0; //To store position of last delimiter

        try (InputStream input = new FileInputStream(inputFile)) {    
            for(int i=1; i<3; i++){ //Loop to read more than one chunk
                byte inputBytes[] = new byte[1024];
                int readBytes = input.read(inputBytes); // Storing 1024 bytes in "inputBytes"
                //Converting to a string the Hexadecimal content of the variable "inputBytes"
                String hexstr=DatatypeConverter.printHexBinary(inputBytes);  

                lastPos=hexstr.lastIndexOf("D2A7")-2; //Storing position of last delimiter
                //Replacing all delimiters in "inputBytes" with \r in order to process each chunk
                String str = hexstr.replaceAll("D2A7", "\r"); 

                //Now process each "line" (chunk) since they are separated with \r
                try (Scanner scanner = new Scanner(str)) {
                    while (scanner.hasNextLine()) {
                        String line = scanner.nextLine();
                        // process the line
                        System.out.println(line);
                    }  
                }       
            }
        } 
        catch (FileNotFoundException ex) {System.err.println("Couldn't read file: " + ex);} 
        catch (IOException ex) {System.err.println("Error while reading file: " + ex);}                        
    }    
}

Thanks in advance

Edited 2 Years Ago by Garidius

Hello James,

Actually all the content of the file is binary, but if the scanner is able to use delimiter such as the sequence 0xD20xA7 and return each token as documentation says, I'd be able to convert that token into hexadecimal string.

I'm trying with code below, but I cannot test the code since appear a message that says "No suitable constructor found for scanner".

How would be the correct way to set the scanner to analyse the binary file?

Thanks for any help

package binaryscanner;

import java.io.File;
import java.io.FileNotFoundException;
import java.util.Scanner;

public class BinaryScanner {
    public static void main(String[] args) throws FileNotFoundException {
     Scanner input = new Scanner(new File("./binary"));
        try (Scanner s = new Scanner(input).useDelimiter("0xDA0xA7")) {
            System.out.println(s.next());
        }         
    }    
}

Regards

There's nothing wrong with new Scanner(File), but line 10 has a new Scanner(Scanner) and that's completely wrong (and unnecessary).
Also, the the delimiter is wrong. Your delimiter is not the String "0xDA0xA7", it should be a String of length 2. You need something like

byte[] delimBytes = {(byte) 0xda, (byte) 0xa7};
String delimString = new String(delimBytes);
input.useDelimiter(delimString);

then you can loop calling input.next() to get all the segments between the delimiters.

Having said all that, Scanner was never designed for parsing binary files - it's deigned for text. You will read bytes from the file and convert them to UniCode (16 bit) chars. Depending on your default CharacterSet you may find some binary values or sequences being interpreted as chars that you didn't expect. You must take time to test all possible byte values.

To be completely safe you will need to read the file as bytes and process them yourself, buffering them in an output array until you hit the delimiter sequence. The code isn't difficult, but it is fairly tedious to work out and debug the first time because you have to deal with full and part-full input buffers from the file.

Edited 2 Years Ago by JamesCherrill

Hello James, thanks for your good answer. I've added the code you shared and it works if the input is text file, but fails with the binary, confirming that scanner doesn't work with binaries.

Besides your other way you suggest (read byte by byte), I've been testing with RamdonAccesFile and Inputstream (as shown in first post) reading in blocks of 1024 bytes. The idea is almost working, but I'm failing since it happens as follow.

1- In first iteration I read 1024 bytes and result 4 sequences of D2A7, resulting in 3 complete chunks and 1 incomplete chunk (224 bytes).
2- Then, from those 1024 bytes I really parse 800 bytes.
3- Now I'd would like to read another 1024 bytes but not beginning in the byte 1025, but beggining in the byte 800. This is order to analyse completely the chunk #4 that was incomplete in first iteration.
The problem is I don't know which method use to say to Java to read groups of 1024 bytes, but beginning in different positions each time (variable offset).

For example:
- In 1rst iteration read 1024 bytes to buffer with offset=0 bytes
- In 2nd iteration read 1024 bytes to buffer with offset=800 bytes
- In 3rd iteration read 1024 bytes to buffer with offset=1540 bytes

The offset would be taking as reference the beginning of file.

Is there a method, function to do this?

I've tried in the code in my original post using the variable "lastPos" but
is not working when I use this vaariable inside read().

Thanks again for great help.

Regards

This article has been dead for over six months. Start a new discussion instead.