Split file into multiple byte[]

Question

Krokcy 14 Newbie Poster

12 Years Ago

So i have a working network file transfer program now (if you want to see some of the code look at my other 'article').

But now that im trying larger files, im running into a problem where i run out of memory (which is understandable when the array is
1 000 000+ in length). I cant figure out how to split a file up into smaller arrays. At the moment i just load the file into one big array:

            byte[] fileArray;

            JFileChooser chooser = new JFileChooser();
            chooser.showDialog(this, "Send");

            File file = chooser.getSelectedFile();

            Path path = Paths.get(file.getAbsolutePath());
            try {
                fileArray = Files.readAllBytes(path);
            } catch (IOException e1) {
                e1.printStackTrace();
                fileArray = null;
            }

I havn't been able to find another way to load the file than the Files.readAllBytes(Path); which doesnt allow me to split it up.
Help is, as always, appreciated!

EDIT: I've found a few solutions suggesting using third party libraries, i'd really prefer it if i could stay with the java framework.

file-system java

Edited 12 Years Ago by Krokcy because: new information

4 Contributors
15 Replies
4K Views
4 Days Discussion Span
Latest Post 12 Years Ago Latest Post by Krokcy

All 15 Replies

JamesCherrill 4,733 Most Valuable Poster

12 Years Ago

The normal way to copy a file from an input stream to an output stream looks like this:

                byte[] data = new byte[4096];
                int count;
                // open input and output streams
                while ((count = in.read(data)) != -1) {
                    out.write(data, 0, count);
                }
                // close input and output streams

Under all normal circumstances that code will not be a bottleneck - any slow-down will be from whatever I/O is being done. Just ensure you use buffered input/output streams.

Edited 12 Years Ago by JamesCherrill

~s.o.s~ 2,560 Failure as a human

12 Years Ago

At the moment i'm not using buffered streams. I have never worked with them before. Should i use buffered streams for the network streams as well or just the file I/O?

I havn't been able to find another way to load the file than the Files.readAllBytes(Path) which doesnt allow me to split it up.

Reading the entire file in-memory is almost always a bad solution unless you always know you are dealing with really small files; streaming is the way to go. The de-facto solution is the one mentioned by James; have a small buffer which reads data from source and writes it to the destination. A helper method which might find helpful is:

public static long copy(InputStream from, OutputStream to, int bufsz) throws IOException {
    final byte[] buf = new byte[bufsz];
    long total = 0;
    int len = -1;
    while ((len = from.read(buf)) != -1) {
        to.write(buf, 0, len);
        total += len;
    }
    return total;
}

There are of course other efficient ways of doing it if you are really hard-pressed for performance (using Java NIO) but are a bit complicated.

I dont know how slow it acually is. It took about 12min+- to send a 75mb file from my harddisk to a flash drive...

I/O operations on flash drives are dog slow when compared to regular disk I/O hence the slowdown. But FWIW, sending across 70 MiB of file from disk to flash drive took me around 2:30 mins. Your code is a bit inefficient in the sense that it sends across the file to the client, a single byte at a time, which implies a syscall for just writing a single byte to the output stream.

At the moment i'm not using buffered streams. I have never worked with them before. Should i use buffered streams for the network streams as well or just the file I/O?

Buffered streams are wrappers over regular streams which add "buffering" capabilities to them. In your case, if you can't use them for whatever reasons, you can do without them. They really shine when you need to small varying lengths from the underlying source without messing around too much with performance (more read calls -> more system calls -> more context switching -> reduced performance). This is done by bulk reading fixed amount of data from the underlying source in advance and supplying it when asked for without actually calling the real "read" method (which in turn entails a system call).

~s.o.s~ 2,560 Failure as a human

12 Years Ago

The code you posted is extremely confusing and doesn't make a whole lot of sense. What is numArrays? Why are you looping that many times? Why is each loop converting the file to an array? Why are you writing length to the stream on every iteration? Is that outDataStream variable of type DataOutputStream? If yes, why not simple OutputStream which you get from the established socket connection?

Also, more importantly, have you incorporated the suggestions provided in the thread which simply transfer across the data from one stream to another without assuming the file size and by using a single byte array?

Edited 12 Years Ago by ~s.o.s~

~s.o.s~ 2,560 Failure as a human

12 Years Ago

And i assume the suggestions have the same problem?

No, really. Right now your code is inefficient and complicated because:

You are reading the entire file in memory; this is obviously going to cause problems in case multiple clients request several big files concurrently which would result in an OutOfMemoryError
If your only aim is transfer a file, there are simpler ways of knowing when a file ends; the trick is that read returns -1 when there is no more data to be read.
The actual code/algorithm is much simpler that what you have right now:
- When client requests a file, open the file using FileInputStream on the server
- Create an intermediate buffer as I have in my sample code snippet posted in my previous post
- Keep reading data from the file input stream by passing in the buffer.
- If there is more data to be read, read method call will always return the number of bytes read from the file. If end of file is reached, read returns -1 in which case you know you need not read any further.

Changing the code which you have should be as simple as just using the copy method on both the client and server piece of code. The only difference is where you get those streams from. In the case of client, the InputStream will be the input stream from the Socket which you created and the OutputStream will be the output stream for the new file which you have to write on the client. In the case of server, the InputStream will be the input stream from the file which you have to transfer to the client and the OutputStream will be the output stream which you get from the client socket.

Krokcy commented: Informative, concise and understanding. +1

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

NormR1 563 Posting Sage Team Colleague · Answer 1 · 2012-06-24T00:55:21+00:00

Can you use a loop that reads some bytes into an array and then writes those bytes? Then reads some more and writes some more until done.

Krokcy 14 Newbie Poster · Answer 2 · 2012-06-24T02:51:57+00:00

Ye, was kind of hoping there were a smarter(well, at least easier ;) ) method.

But this is what i've come up with:

            byte[] fileArray = new byte[BYTES_PER_ARRAY];
            try {
                    int bytesAct = stream.read(fileArray, 0, BYTES_PER_ARRAY);
                    if(bytesAct!=BYTES_PER_ARRAY) { //to make sure there is no empty spaces
                        byte[] toReturn = new byte[bytesAct];
                        for(int i = 0;i<toReturn.length;i++) {
                            toReturn[i] = fileArray[i];
                        }
                        return toReturn;
                    }

            } catch (FileNotFoundException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            }
            return fileArray;

After each read i send the array, byte per byte, to the client.
It seems to be working fairly well. Even though its still very slow. At the moment BYTES_PER_ARRAY is 4080*1024, this seems fairly low. But i don't want to load to much into memory each time, and on the other hand having the client and server stop and write/read every 4.1MB is slowing the process down a fair bit (i think, at least?). Do you think i can increase the array size? Or does anyone have any tips on how to optimize this?

EDIT: I dont know how slow it acually is. It took about 12min+- to send a 75mb file from my harddisk to a flash drive...

Krokcy 14 Newbie Poster · Answer 3 · 2012-06-24T10:50:18+00:00

At the moment i'm not using buffered streams. I have never worked with them before. Should i use buffered streams for the network streams as well or just the file I/O? At a quick glance it doesn't look like they support reading a specific type. Which would make the throw-catch relationship my server client have harder to control.

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 4 · 2012-06-24T11:56:52+00:00

Yes, in general you will always use a buffered stream unless there's some special reason not to. They don't restrict what you can do with the data in any way, nor do thay affect the exceptions that can be thrown..
For example, if you have

DataOutputStream outbound = 
    new DataOutputStream(clientSocket.getOutputStream());

you can add buffering by changing it to

DataOutputStream outbound = new DataOutputStream(
    new BufferedOutputStream(clientSocket.getOutputStream()));

Krokcy 14 Newbie Poster · Answer 5 · 2012-06-25T15:58:58+00:00

Thanks for your help! And it makes sense in my head. But in reality it doesn't quite work. Right now the client is reading things the server never wrote to the stream.
This is my current sending-reciveing routine:

//Sending the information
for (int y = 0; y<numOfArrays; y++) {
                        byte[] fileArray = fileToArrays(file, fileStream);
                        int length = fileArray.length;


                        outDataStream.writeInt(length);
                        text.append("\nlength send: "
                                + String.valueOf(length));
                        outDataStream.write(fileArray, 0, length);
                        text.append("\nPackets send: "
                                + String.valueOf(packetCounter));

                }

 //*************recieving the information

 for(int y = 0; y<numOfArrays; y++) {


                int length = stream.readInt();


                fileArray = new byte[length];
                text.append("\nlength recieved: "
                        + String.valueOf(length));
                stream.read(fileArray, 0, length);
                packetCounter = packetCounter + length;
                text.append("\nPackets recived: "
                        + String.valueOf(packetCounter));
                writeData(fos); //fos is my  File output Stream

            }

And here is what was send and what was recived:

// Send from the server:

num of arrays send 77 //tells the client how many arrays the file will be split into
length send: 2097152 //tells the client how long the current array is
 // i assume its writing this array to the stream since it hasn't said that the send was complete

//recieved by client:

Num of arrays 77
length recieved: 2097152
Packets recived: 2097152  //It cant have recieved this since the server didn't actually send it yet.
                          //until here it looks good but than it starts recieving                
                           info never send by the server
length recieved: 256
Packets recived: 2097408
length recieved: 25600
Packets recived: 2123008
length recieved: 511
Packets recived: 2123519 // than it crashes because the length becomes ridiculous.

The file on disk is 2kb large so it seems that the first array was send across completly in the end. My guess it that the client starts reading the 'length' while the server is still writing the array and it gets mixed up from there, but i can't seem to fix it :s
Also sorry for the wall of text!

Krokcy 14 Newbie Poster · Answer 6 · 2012-06-25T18:42:10+00:00

I'm sending the length of the array each time so the client knows when to stop reading from the array and so that the client array doesn't have trailing empty spaces in the array.

Ye and sorry for the code being confusing i didn't want to put it all in but clearly i didn't explain enough of it.

Item numArrays is the number of arrays the file will be divided into. Ie. filesize/BYTES_PER_ARRAY
Item The outDataStream, is a DataOutputStream. If its the naming you mean, I made Socket.getOutputStream() a variable named outputStream.
Item On incorporating the suggestions, i havn't because i tried (not in that way) transfering the information without any assumption as to the file size put i eneded up with files that were had alot of empty bytes in them (in a text file i had a million spaces for instance). And i assume the suggestions have the same problem?

Krokcy 14 Newbie Poster · Answer 7 · 2012-06-28T02:21:07+00:00

Oka! got it working butifully now! Thanks for your help!

There is a few question though if you dont mind! :

When i use a buffered stream to write to the file it doesn't write the last bit. As far as i understand it the buffered stream only calls the OS APIs when the buffer is full. So if its not full right at the end the last bit is never written to disk. How do I solve this? At the moment i use a non-buffered stream to write the file to disk.
Also have the same issue when sending short commands over the buffered DataStream, the only workaround (i guess it is) i have found is to use the flush() method but that seems wrong? Should i make a unbuffered stream for stuff like that?

NormR1 563 Posting Sage Team Colleague · Answer 8 · 2012-06-28T02:27:21+00:00

NormR1 563 Posting Sage

12 Years Ago

Did the code close the file?

Krokcy 14 Newbie Poster · Answer 9 · 2012-06-28T02:37:32+00:00

Yes the streams closed:

    //***Client
            byte[] fileArray = new byte[4096];
            int bytesReceived;
            int byteCounter = 0;
            while((bytesReceived = inStream.read(fileArray))!=-1) {
                            fos.write(fileArray, 0, bytesReceived);

                            byteCounter = byteCounter + bytesReceived;
                            text.append("\nBytes: " + byteCounter);
                        }

            //close streams
            fosNoBuff.close(); //file output stream no buffer
            fos.close();  //file output stream with buffer

            inStream.close(); 
            outStream.close();
            text.append("Streams closed");



    //***Server
            byte[] fileArray = new byte[4096];
            int byteSend;
            while((byteSend = fis.read(fileArray)) != -1) {
                        outStream.write(fileArray, 0, byteSend);
                        byteSendTracker = byteSendTracker+byteSend;

                        text.append("\nBytes: " + byteSendTracker);
                    }
            outStream.flush(); //doesn't send the last array without this line

            //close streams
            fisNoBuff.close(); //file input stream no buffer
            fis.close(); //file input stream with buffer

            inStream.close();
            outStream.close();
            text.append("Streams closed");

JamesCherrill 4,733 Most Valuable Poster Team Colleague Featured Poster · Answer 10 · 2012-06-28T07:21:50+00:00

Short answer: Stick with buffered streams, use flush() whenever you need to force your output to be physically sent/written. That's perfectly normal.

Krokcy 14 Newbie Poster · Answer 11 · 2012-06-28T23:41:40+00:00

Krokcy 14 Newbie Poster

12 Years Ago

Awsome, thanks for your help everyone!

~s.o.s~ commented: Good luck with your project +14

Split file into multiple byte[]

Recommended Answers Collapse Answers

All 15 Replies

Recommended Answers