Saturday, February 11, 2012

Understanding and Documenting Programs

This essay is from the book I'm reading, Foundations of Empirical Software Engineering: The Legacy of Victor Basili and was originally published in 1980. It gives me a platform for some of the thoughts I have had about programs/systems and what is really meant by documenting.

The essay discusses an experiment where he sought to answer questions about a math routine only having some complex code and a gross statement of the routine's semantics. It brings to mind a discussion I had with my adviser about a project I was working on where I had initially titled the work as the documentation of the system I intended to extend. She pointed out the negative connotation of the word "to document" and persuaded me it understated the work I had done. I immediately understood her comment but I am still coming to grips with its implications years later. What was I doing if not documenting? I was clearly engaged in a form of reverse engineering and attempting to do exactly what Basili describes here. The code by itself certainly worked; that much I had demonstrated. But to extend the system required a deeper understanding of the structure and decomposed semantics of the parts. That required significant work (it was over 100KSLOC in Python by sophisticated developers) and the most visible artifact of that work was the documentation. In the end I called it architectural reconstruction and that still seems the best description of what I achieved in that project. What Basili's paper shows is the painful reasoning that must be done to recover knowledge that existed at a prior point in the development of the code artifact. It clearly shows that the design artifacts that preceded the code can have real economic value to the maintenance coder who must perform repair and maintenance work. I also find this essay instructive for demonstrating what SEI maintains, that the structure of the program is not driven by the functional requirements as much as by the non-functional, or quality requirements.

In my professional practice, I was frequently involved in projects to upgrade systems for a client. While the reasons varied, the one constant was the question, "Why do you have to gather requirements again? We already have a system that does all those things. Why can't you just look at that to get the requirements." While the practitioners reading this may cringe, it does ask an apt question that I believe is still important today. I has to do with both the entropy of the development process and the weighing of the importance of the runnable artifact from its antecedents. After all, if the requirements document from the prior project was available and sufficiently self-explanatory, a new requirements phase would not be required except to discover any delta from that prior effort.

One obvious reason why a new requirements elicitation phase was required was the simple absence of the document. The executable is well protected to ensure integrity as well as the source code that created it (usually). But the farther removed from the source code you get the more unlikely there will be good artifacts. By the time you get to the artifacts like the charter or the early design documents, the more likely you will need to engage in a fair amount of modern archaeology to find them. Once found, they still need to be verified, comprehended and extended to include the newest concerns to the organization. What is remarkable is no matter how extensive the document is, there is inevitably information that just never made it into the document. Even in the most document driven organization, there is a significant amount of oral tradition that exists. To call the recovery of this requirements documentation a documentation task is indeed condescending given the skills required.

When confronted with dense code, a software engineer doesn't so much document what is found as use documentation to record the results of his reasoning so as to supplement his memory. As the essay explains, the reasoning is hardly trivial. Even if it is possible for the engineer to remember the key structures and embedded semantics of the program, they only exist in that engineer's mind and are not readily available to other engineers.

I can't help but make a small digression. My generation of software engineers fell prey to a social pathology that may or may not still exist in practice today. Management at the time was clueless about non-executable project artifacts (with the possible exception of the military-industrial complex) and hence required little beyond properly functioning code. The result was that maintenance software engineers would need to consult with the more senior engineers who first wrote the code. There was obvious pride in authorship, which is good, but also a clear sense of superiority and sense of power that these more senior engineers felt in being sought out. I would like to say they were all paragons of maturity but the industry seemed to attract a somewhat emotionally stilted type of person who did not always have the organization's best intentions as their greatest imperative. I'm not exaggerating when I say some were pompous asses who would lord their knowledge over the newbie and extract great stress as a form of rite of passage. It is little wonder that maintenance programming was a phase of the engineer's career they sought to end as soon as possible.

My ultimate interest in this essay though is not the reverse engineering process however but how this essay brings out SEI's thesis. The code in question clearly did not exhibit high modifiability if it required this much analysis to understand for a straightforward question. This code was highly tuned to perform well. The quality of performance was far more important than the quality of modifiability and design choices clearly showed that. This is largely self-evident to anyone who understand the context of a math routine in the larger system. It will be very stable once it functions properly and exhibits the proper blend of qualities. Therefore the increased maintenance cost for those few changes that may be required or enhancements that future generations find for this code, can be accepted as a trade-off for the increased speed and efficiency of the more complicated algorithm. The structure of this code was obviously highly influenced by this quality requirement.

I had first encountered Basili in my software metrics class and I had misunderstood his oeuvre from that context. I am now understanding his contribution to software engineering and am beginning to see him in the same class as Parnas. Why did it take me so long to discover his?

No comments:

Post a Comment