bguild 163 Posting Whiz

It's surprisingly how little documentation java.io.InputStream has for a class of its complexity and importance. Almost every Java application uses it in one way or another, but how one uses it correctly is not completely spelled out.

Naturally the documentation allows most of the methods to throw an IOException if "an I/O error occurs." Since no attempt is made to define I/O error, so that means an exception could be thrown for any reason at all. The only methods that are restrictive about their exceptions are mark, markSupported, and reset. mark and markSupported cannot throw IOExceptions at all, and reset only throws them under refreshingly specific circumstances. Surprisingly, even available may throw an IOException on the slightest whim.

It's not hard to assume that IOExceptions are like natural disasters, able to come crashing down upon an application at any time and totally beyond control. This is probably necessary because we are interacting with physical devices that may be affected by unpredictable forces, perhaps even actual natural disasters. There are no usage guidelines that will protect your application from the disk it is reading being destroyed by an earthquake. Even so, I wish the documentation made some sort of promise about that rather than letting us wonder if some combination of reads might cause an IOException where a different combination of reads could have gotten the same bytes without the exception. With the documentation as it is, a certain InputStream might throw an exception upon any attempt to read an odd number of bytes.

The skip method is most worrying of all. Even in comparison to the other methods of InputStream, skip makes very few promises about what it will do. At least read promises to read at least one byte unless the end the stream has been reached. On the other hand, skip might skip 0 bytes because of "any number of conditions." With read you can use a loop to wait for everything you need to read and each time through the loop you are guaranteed to get at least one byte. With skip you have no guarantee that that the loop will ever end or even that skip will do any blocking to prevent your loop from consuming your CPU. Is it a good practice to use skip in a loop with a Thread.sleep and a Thread.interrupted so the rest of the rest of the application can abort the read if it seems like the skip loop might go on forever? Or would the better practice be to read one byte between skips to guarantee that the loop will block when waiting for input and to check for the end of the stream?

I have seen people use skip to skip over just a few bytes that aren't needed in a stream, ignoring the return value and the minefield of possible behaviours that skip could have, even in tutorials. There is even a suggestion that skip might throw an IOException when an equivalent read would not: the skip documentation says that it throws an IOException "if the stream does not support seek, or if some other I/O error occurs." Since there is no apparent way to check of seek is supported, does that mean that the correct way to use skip is something like this?

try { skipped = in.skip(n); } catch(IOException e) { skipped = in.read(new byte[n]); }

Naturally that won't work if n is very large, but it's much worse than that because it is ignoring an IOException that might be caused by something more serious than seek not being supported, and we have no way of being sure that the read would rethrow the same exception instead of reading corrupted data, such as if half the bytes were skipped by skip before it threw the exception.

On one hand you naturally want to save time and memory by skipping bytes that you don't need, but on the other hand you are faced by the possibility that by trying to skip bytes you may cause an IOException to derail your reading of a stream that would have been read successfully otherwise.