1,105,328 Community Members

Best way to solve this encryption file handling problem.

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Hi all, I'm writing a simple encryption algorithm and have stumbled upon a problem relating to the file handling itself.

What my program does is reads text from a file, encrypts it, and writes it to another file. That's all well and good, and it works fine - until a large file is used for input. I tested it with a 30MB file and 300MB file, and got an 'out of memory error - no space on heap' or something like that.

I imagine that this is caused due to the way I'm handling the file.

To solve it, I decided to use a BufferedReader to read a character at a time using the 'read()' method. After the character is read, it is encrypted and then written to a file using the BufferedWriter's 'write()' method.

This method works for large files (i tested it on a 300MB file, and although it took around 2 minutes to finish; it worked).

My main concern is that like this, the harddisk is constantly being accessed for every single character - I'm sure it's highly inefficient.

Can anyone suggest something I can do to improve efficiency? Please note that when I used 'readLine()' instead of 'read()', i got the java heap error, so im guessing i have to read a character at a time.

Also, i cannot post code as this is for an assignment - just some guidelines/suggestions would suffice and would be greatly appreciated :)

Query also posted here: http://www.java-forums.org/new-java/36531-best-way-solve-encryption-file-handling-problem.html#post165491

Member Avatar
Taywin
Posting Maven
2,632 posts since Apr 2010
Reputation Points: 134 [?]
Q&As Helped to Solve: 378 [?]
Skill Endorsements: 17 [?]
 
0
 

How about you can use BufferOutputStream? You could build a byte array at whatever size you want. Then, use BufferOutputStream to write out each time. For example, your byte array size is 4kb, you would write out 4kb each time instead of 1 character at a time.

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

How about you can use BufferOutputStream? You could build a byte array at whatever size you want. Then, use BufferOutputStream to write out each time. For example, your byte array size is 4kb, you would write out 4kb each time instead of 1 character at a time.

Thanks for the reply. Actually it has to use the BufferedReader/Writer strictly.

This is what I thought of doing:

while (a character is read from file and buffered via the BufferedReader) {

encrypt the character

write the character to file

}

What do you think?

Member Avatar
Taywin
Posting Maven
2,632 posts since Apr 2010
Reputation Points: 134 [?]
Q&As Helped to Solve: 378 [?]
Skill Endorsements: 17 [?]
 
0
 

I just look at their API, you could read in and write out using a char array. Adapting with your way, you could read in as a char array with a certain size, then encrypt all the char in the array, and then write the whole array out to the file. Then rinse the array and repeat. Would that be better for you to reduce the HD access?

Member Avatar
~s.o.s~
Failure as a human
10,399 posts since Jun 2006
Reputation Points: 2,496 [?]
Q&As Helped to Solve: 992 [?]
Skill Endorsements: 72 [?]
Administrator
Featured
 
0
 

My main concern is that like this, the harddisk is constantly being accessed for every single character - I'm sure it's highly inefficient.

Yes, it sure is.

Please note that when I used 'readLine()' instead of 'read()', i got the java heap error, so im guessing i have to read a character at a time.

Something's fishy here. The default buffer size of Buffered streams is AFAIK 8KB. Plus if you are looping over the input stream reading the bytes, encrypting them and writing them to a file, garbage collection should ensure that the bytes previously read are collected before throwing an OOME. Are you sure you are not keeping references to previously read data?

Actually it has to use the BufferedReader/Writer strictly.

Again fishy. The encryption algorithm doesn't know/shouldn't know the kind of content it is encrypting and hence it should have been BufferInputStream and BufferedOutputStream instead of its Reader/Writer counterparts unless you are using some sort of special encryption algorithm which operates strictly on textual data. :-)

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Yes it's purely textual data - just for an assignment ;) with the current code it takes around 8 minutes to encrypt a 600mb text file using a simple substitution cipher. What do you think, is it decent?

Member Avatar
~s.o.s~
Failure as a human
10,399 posts since Jun 2006
Reputation Points: 2,496 [?]
Q&As Helped to Solve: 992 [?]
Skill Endorsements: 72 [?]
Administrator
Featured
 
0
 

Yes it's purely textual data - just for an assignment ;) with the current code it takes around 8 minutes to encrypt a 600mb text file using a simple substitution cipher. What do you think, is it decent?

Nope; because for a simple substitution cipher like Caesar cipher, I can encrypt a 600 MB file in 40 seconds. :-)

Are you still reading the file character by character since that would explain your timings?

I think you are getting OutOfMemoryException when using readLine() because your sample 600MB file does not contain any newline and hence the Reader tries to read the entire file content in a single String. The most effective solution here would be to use FileReader/FileWriter and implement your own buffering (32KB buffer would be a good one).

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

This is what im doing:

while ((inp.read()) != -1) { 

encrypt;
writeToFile;

}

Any thoughts?

Member Avatar
~s.o.s~
Failure as a human
10,399 posts since Jun 2006
Reputation Points: 2,496 [?]
Q&As Helped to Solve: 992 [?]
Skill Endorsements: 72 [?]
Administrator
Featured
 
0
 

Is that a single character being read? If yes, that method is painfully slow. Like already mentioned, if readLine() throws OOME, it is possible that your entire file contains a single line. In that case, just use the read() method to read a specific number of characters rather than an entire line. 8Kb char buffer would be a good start.

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Thanks for the reply. How exactly do i set it to read 8kb though?

Member Avatar
~s.o.s~
Failure as a human
10,399 posts since Jun 2006
Reputation Points: 2,496 [?]
Q&As Helped to Solve: 992 [?]
Skill Endorsements: 72 [?]
Administrator
Featured
 
0
 

A sample snippet:

private void process(final Reader reader, final Writer writer) {
    try {
        final char[] cbuf = new char[8 * 1024];
        int len = -1;
        while((len = reader.read(cbuf)) != -1) {
            // translate is your method which takes a string and translates/encrypts it
            writer.write(translate(new String(cbuf, 0, len)));
        }
    } catch(final Exception e) {
        throw new RuntimeException(e);
    }
}

Given that the buffering is done by the method, you need not even use a Buffered reader/writer; a File reader/writer should suffice.

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

A sample snippet:

private void process(final Reader reader, final Writer writer) {
    try {
        final char[] cbuf = new char[8 * 1024];
        int len = -1;
        while((len = reader.read(cbuf)) != -1) {
            // translate is your method which takes a string and translates/encrypts it
            writer.write(translate(new String(cbuf, 0, len)));
        }
    } catch(final Exception e) {
        throw new RuntimeException(e);
    }
}

Given that the buffering is done by the method, you need not even use a Buffered reader/writer; a File reader/writer should suffice.

I tried implementing something like that, using the read(char[], int, int) method, but something bizarre happened - for some reason the text was being duplicated; i.e. the reader just read the first 'x' amount of characters and spam pasted them in the write file :/ My code was pretty similar to yours, except that I used read(cbuf, 0, 1000) instead of just read(cbuf) - shouldnt this read the first 1000 characters, place them in the array and then once they're written, it overwrites the array starting from 0 again?

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

Oh wait, I think I know what's going on. I foolishly assumed that the char[] cbuf automatically flushes its contents when it's written to the file. I should have known better hehe.

The only solution i can see is to initialise the array after '
while((len = reader.read(cbuf)) != -1) { '. However, this causes an error as the reader is unable to read its onctents and place them in the cbuf unless i initialise it before this block.

Any suggestions? :)

Member Avatar
~s.o.s~
Failure as a human
10,399 posts since Jun 2006
Reputation Points: 2,496 [?]
Q&As Helped to Solve: 992 [?]
Skill Endorsements: 72 [?]
Administrator
Featured
 
0
 

I'm not sure what your issue here is because the snippet I posted would work *out of the box* without any modifications as far as the reading and writing part is concerned. I'd recommend reading the Javadocs for the read() method and writing small snippets to understand how it actually works.

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

I'm not sure what your issue here is because the snippet I posted would work *out of the box* without any modifications as far as the reading and writing part is concerned. I'd recommend reading the Javadocs for the read() method and writing small snippets to understand how it actually works.

Ive read the javadocs, but it didnt mention if the array is flushed or not though. I'll just have to experiment i suppose.

Member Avatar
~s.o.s~
Failure as a human
10,399 posts since Jun 2006
Reputation Points: 2,496 [?]
Q&As Helped to Solve: 992 [?]
Skill Endorsements: 72 [?]
Administrator
Featured
 
0
 

I'm not sure why you use the word "flush".

It goes like this: you create a char array (which initially contains all '\0' characters) and pass the array to the read method. This method fills up the "char" array with the characters read and returns the "number of characters" (n) read. You then utilize from the same char array `n' characters which have just being read. Rinse and repeat with the same array; the next read() call simply overwrites the old data; there is no flushing. Simple, no? :-)

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

I'm not sure why you use the word "flush".

It goes like this: you create a char array (which initially contains all '\0' characters) and pass the array to the read method. This method fills up the "char" array with the characters read and returns the "number of characters" (n) read. You then utilize from the same char array `n' characters which have just being read. Rinse and repeat with the same array; the next read() call simply overwrites the old data; there is no flushing. Simple, no? :-)

Yes, i see what you mean :) The only thing is that at some point, the array does not fill completely as it would run out of text - ie the amount of text read would be less than the amount of characters left to be read by the reader. Hence, there would be 'old' values still contained in the array. Im wondering how we can make sure those old characters are not there ;)

Member Avatar
~s.o.s~
Failure as a human
10,399 posts since Jun 2006
Reputation Points: 2,496 [?]
Q&As Helped to Solve: 992 [?]
Skill Endorsements: 72 [?]
Administrator
Featured
 
0
 

how we can make sure those old characters are not there

You don't need to; read my previous post again. read() returns an int which represents the number of characters read. So even when doing your last read if your buffer isn't full, it really doesn't matter since you know "which portion" of the array contains the newly read values. If you'll look at the original snippet which I posted, I use a String constructor which creates a String object based on the "valid slice" of the array using this same return value of read() method.

Member Avatar
SeanC
Light Poster
44 posts since Jul 2010
Reputation Points: 0 [?]
Q&As Helped to Solve: 0 [?]
Skill Endorsements: 0 [?]
 
0
 

You don't need to; read my previous post again. read() returns an int which represents the number of characters read. So even when doing your last read if your buffer isn't full, it really doesn't matter since you know "which portion" of the array contains the newly read values. If you'll look at the original snippet which I posted, I use a String constructor which creates a String object based on the "valid slice" of the array using this same return value of read() method.

The reason I'm asking all this is because I would like to fill the old characters with something else. So to do so, would I need to manually loop through the array from the last character read to the cbuf.length, and fill them myself?

Thanks so much for your assistance by the way :)

Member Avatar
~s.o.s~
Failure as a human
10,399 posts since Jun 2006
Reputation Points: 2,496 [?]
Q&As Helped to Solve: 992 [?]
Skill Endorsements: 72 [?]
Administrator
Featured
 
0
 

Are you talking about re-using the character array for something else? If not, you've got me all lost there; post some sample code/pseudocode as to what that *something else* is and what you are doing right now.

You
This article has been dead for over three months: Start a new discussion instead
Post:
Start New Discussion
View similar articles that have also been tagged: