Tuesday, May 22, 2012

What's the best way to read a file in Java?

The title of this post was an innocuous and seemingly straightforward question in StackOverflow that caught my eye. (http://stackoverflow.com/questions/4716503/best-way-to-read-a-text-file)  The poster had noticed that there was more than one way to read an ASCII file and as a noob was curious as to what the "best" way was. There was hardly a consensus in the answer and I believe there is something to be learned in that lack of a rapid consensus.

First some background. I do not profess to be a Java expert. Since my needs in this department are simple I usually mimic what I find in the first text I grab. Since there are different ways in different texts, I'll lay those out first.

From Savitch:
Scanner inputStream = new Scanner (new FileInputStream("stuff.txt"));

From Gaddis:
BufferedReader inputFile = new BufferedReader(new FileReader("stuff.txt"));

So pangea suggests Gaddis' approach as does Knubo for his first suggestion. However Knubo offers an alternative that only uses the FileInputStream rather than encapsulating it into a Scanner object. Jesus Ramos also agrees with the first suggestion. Juan Carlos Kuri Pinto suggests his way is better but it still uses the Gaddis classes.

Peter Lawrey offered a novel approach.

for(String line: FileUtils.readLines("my-text-file"))
    System.out.println(line);


as did Claude


The methods within apache.commons.io.FileUtils may also be very handy, e.g.:
/**
 * Reads the contents of a file line by line to a List
 * of Strings using the default encoding for the VM.
 */
static List readLines(File file)

One poster, jzd, referenced a page at Oracle which attempts to provide an overview of the different methods and their differences. (http://docs.oracle.com/javase/tutorial/essential/io/file.html)

So if I were to reply to this poster with all of this information, what would I say? All these suggestions will result in code that will read an ASCII file. What are the differences? What makes this interesting to me is that the differences are not in the gross functionality of the code but rather the non-functional qualities that the resulting system will have.

Let's take a trivial and obvious example, the try-catch block. If we know that all our data is good and want a quick-and-dirty piece of code, we might drop the coding of this block since we want to get the code written quickly and have enough control over the file that the code isn't needed. Here we are trading off coding speed (time-to-market) with robustness. The first exception will cause the code to halt. In many cases we don't care and choose for the quick-and-dirty.

There are two important qualities that vary in these treatments. First, is the relationship between the operating code (the dynamic structures) and time. The buffering feature will provide for a more efficient operation for a static file that is to be read and processed as it might in a batch system. But presumably there is some price to be paid for this feature in terms of processor and memory load. In a modern system it is likely that this is negligible in most systems but for very large, high-performance systems, this cannot always be taken for granted. The quality trade-off may become an architectural consideration if this file reading is in the critical path of a high-performance system or one embedded with limited hardware performance.

Another quality difference between the coding options is maintainability. As jzd observes, Oracle now offers some classes that provide much faster processing but at the cost of readability. This illustrates how quality choices are often trade-offs between two or more different requirements. I suspect in the end, we first code what we know best in the absence of any requirement forcing consideration of other options.

Normally I need to find more complicated examples to illustrate this concept. As the design gets more complex, the design task becomes more complicated

No comments:

Post a Comment