Streams

All I/O in Java is handled by streams. Streams are an abstract idea and like everything else in Java are embodied in classes. A stream is an ordered sequence of bytes. Java and other languages build the handling of external data on the model of sources and sinks. They view the data like a kind of fluid that moves in streams. You put things you want to go to the output into a stream and the system moves it to its destination. Streams allow Java programs to treat the standard output, files of different sorts, and even internal memory all the same way. This is also true on input.

You can write a program that generates output and connect it to different streams as you need to. This means that your program can initially write data to the screen and then easily be changed to write to a file. The same can be done in the other direction.

Programs can be built and run and to accept input from the keyboard and then be changed to read from a file, without having to change the processing code.

Streams are meant to run in one direction. Each stream connects a source to a destination. Streams can be connected one to another. For example, you can create a file input stream which will read raw data from a file, and connect this to the data input stream that would can be used to extract the information in a certain format, for example as integers.

Since streams are objects they have methods and properties like any other. These will include flushing the stream which means to push everything that is in the pipeline to its final destination. You can close this stream which means you're not using any more. You can count the number of bytes that are in this stream, read or write data from it and other activities.

Streams in Java fall into two major categories. One has to do with streams of bytes and the other has has to do with streams of characters. Since in many languages characters are represented by a single byte, it may seem that these are the same thing. However, in Java, characters are represented in Unicode and are not just one byte. Byte streams are used to represent binary data like images, sounds and for other non human readable files.

Character streams are used for text data. They can be viewed as a long unbroken sequence of characters or as a collection of junks like lines. The top classes of the input stream hierarchy are InputStream for byte streams and Reader for character strings. One of the main methods on these input streams is read(). In both cases, this returns an integer. In the case of byte streams this integer is limited to values between zero and 255. For character streams, this value is limited to the range from zero to 16383. The difference is that a single byte is eight bits and thus has values from zero to 255 and a character is sixteen bits and has a larger range.

In the output stream hierarchy, the top classes are OutputStream for byte streams and Writer for character streams. Like the input streams, the output streams have a main method called write() that takes either a single byte or a single character as an argument represented as an integer.

Exceptions

There is also a set of exception classes just for the IO streams. They are descended from the IOException class. Most IO operations should be wrapped in a try block.

Files

Files on the disk are represented by the File class. This class has methods for checking a file's properties, getting information about the file and changing the state of the file. The File class is not involved in actually reading from the file or writing to the file. This is done by the streams.

The File object is passed to the streams to tell them what to operate on. Generally, you create a File object to refer to the file that you want to operate on. Then you will pass that file to the constructor of a basic file stream object. That file stream object will be used in the constructor of a more specific file stream object. This last is the one you will use for your operations.

There are three constructors for the File class. The most common takes a string argument which is the name of the file to be used. Another takes two strings, the first is a path to the directory that contains the file and a second is the name the file. The third constructor as the first argument of type File. This File object represents the directory that contains the file we are interested in. The second argument is again a string that is the name of the file.

There are a number of methods on the File class that we can use. Some of these check the type or properties of a file. For example, we can check if the file exists in the file system, if it can be read or written, if it is a directory or file. There are methods to get at the name of the file, the path leading to the file, directory the file is in and there are operations like delete and rename. In our examples we will see the uses of some of these methods.

One thing to note is to be careful about the references to paths. In Windows, the separator between sections of a path is a backslash (\). In Unix-based operating systems, the separator is a forward slash (/). There is a constant the File class called separator which can be used to concantenate strings together to construct a path.

To start using files in your program, you should first create a File object and then create a basic file stream. Which file stream object you use depends on the kind of stream you're trying to make.

If you're working with a character file the basic file stream you would use would be a FileReader stream. If you are using a binary file, then you would use a FileInput stream. There are similar objects for output streams of the two types.

These stream objects only provide you the very low level read and write routines that work one byte at a time. To get a more useful set of methods to operate on your stream, you need to connect this basic file stream to more powerful stream object

There are two parent classes of streams that process other streams. These are called filter streams and there is an abstract class for input streams and output streams.

These abstract classes have a number of child classes that are more directly useful. For example, the FilterInput stream class has children that include DataInputStream which handles binary data of primitive types like integers, LineNumberInputStream which keeps track of the number of lines that have been read and PushBackInputStream which can be used to look one character ahead to allow you to put that character back into the stream if it's not the one you want. This is useful for processing input text that needs some parsing.

The FilterOutputStream class has children like DataOutputStream that is useful for writing the binary format of data, and PrintStream which outputs the Unicode form of data which is good for printing on human readable output.

Data streams

Data streams handled data input and output at a binary level. The data is written and read from the stream in machine independent way. Different computers especially with different processors encode things like integers in slightly different ways This means that data written out in binary format from one type of processor may not be able to be read on a machine with a different processor.

Using the data streams and the methods that go with them, will prevent this problem since any Java program will understand the format in which it was written. There are methods on these streams to process the basic primitive types like integer, long, float, char, Boolean and string.

Print streams

When trying to write output to a file or other stream where it should be human readable, you should use the PrintWriter object. We have seen examples of this already, the System.out object is of type PrintWriter ( actually, it is a PrintStream, but acts like a PrintWriter). This has both a print and println method that can be used to write out in a human readable form.

In the case of System.out, this stream writes to the console display. If you create a PrintWriter that is connected to a disk file, the output that you print or println will appear much like it does the console. There is a PrintStream class which is used for byte streams. Since the intent is to output in human readable form, you will usually use a PrintWriter rather than a PrintStream.

Buffered streams

The streams that we have seen so far essentially read one byte at a time. This is not very efficient so Java provides a set of buffered streams. These read a block of input from the stream at a time and provide information to you as needed. Reading in larger chunks from a disk file is more efficient than reading one byte at a time. So we have buffered input streams, buffered output streams, buffered readers and buffered writers. These provide the same functionality as the other streams but are more efficient in their access of data from the disk.

Console input and output

Java provides streams to allow access to the console. System.out is an example of a stream that we can use to output text to the console. We can construct objects that allow us to read from the input as well. The files3 example shows how to set this up and use it.

Object streams

Often in Java applications, you create a number of objects that you want to save between executions of the program. This is referred to as persistence. One way to do that is to store the information in a database. We will address that in a later lecture.

We can also store the objects in a file. There are two parts to doing this. If we want to store objects in a file we actually have two choices We have already seen one of them, we can use the binary output streams discussed above, to write out the data one piece at a time. This is a complicated technique, because we not only have to call a method on each private data element in our class, but we have to do this for all the data elements we inherit from other classes. This is not only tedious but error prone. If the class changes we have to add more code to the output methods to handle the changes.

Java provides us with an easier mechanism for doing this as it is a fairly common operation. We have to mark our class as one that can be written to a file. The process of writing an object to a file is called serialization. To tell Java that we want to do this, we have to implement an interface. This is called a marker interface, because it has no methods that we have to implement. It just tells the compiler that we want the objects to be saveable. We simply mark the class by indicating that it implements serializable.

Once we have marked our class, we can create object output streams that we can use to save the objects on a file We can also make object input streams to bring the objects back into the application at a later time.

There are two restrictions that need to be addressed. One is that static data is not written out when an object is serialized. The other is that there may be data fields in the class that are not themselves serializable or that we don't want to write out. An example of this is if your class has a Thread object embedded in it. The Thread class does not implement serializable and so can't be written out. To handle this, you can mark data elements in your class with the keyword transient. These data fields will not be written out when the object is written to the stream. If your class contains objects of other classes, and these are serializable, these will be written as well.