Dale Fletter Research Journal: The Semantics of "Bug" in Empirical Software Engineering

In my graduate studies I regularly read papers and hear people talk about software bugs. But I often find that what is meant by the word bug has a very specific meaning in their work. This short essay voices some of my concerns about the implications of this for serious work on software engineering research.

Perhaps the best comment about the meaning of the word bug comes from Dijkstra who famously railed against its use at all. The term apocryphally only came into use because of the physical nature of the post-war computers which used relays that were fouled by insects. It stuck as a way of talking about something that causes an algorithm to behave in an unintended way. But it has the unfortunate connotation of a logic error that is deus in machina, something that was not an error in thought but something that could not be foreseen. It provides far too large a loophole for algorithm designers, who design the algorithms and distances them from their lack of rigor.

In contemporary research the most common definition of bug is a reported defect or error in some bug tracking system. This can be most anything depending upon the context of the system and the human process by which those reports are created. This can include such obvious failures as an abnormal end or crash which causes the uncontrolled termination by the operating system or potentially even a crash of the operating system itself. It could be some unquantified quality that is missing such as too low a MTTF because it is difficult to find the source of the errors. It could be some expected behavior of the system which was not met but on reflection was always an expectation by the client that commissioned the system's creation. But this can also include defects that only exist because a change in the environment or requirements not anticipated at the time of the original creation. These should never be judged as all being bugs of the same category but the practicality of empirical software engineer forces them to become equal. I personally find it difficult to do any meta analysis of research because of this and doubt the results of individual papers that do not properly consider the heterogeneity of the data sources used. Can we find some agreement on the semantics of the word rather than allow it to become any observed deviation from what the end user thinks? I think this is clearly yes. But it opens a Pandora's box of semantics as it forces us to deal with design and specification, not merely the implementation.

A formal definition of bug would probably be some observable deviation of behavior from what was specified. But outside the realm of formal methods there are almost never any well formed specifications that can support this metric. For functional specifications these metrics are infrequent enough but for the previously "non-functional requirements" (software qualities) these are even rarer outside a couple domains such as military or aerospace. Even when they exist I believe the will to keep records to relate a particular code change to the one or more quality specification within its scope is lacking in most commercial products.

One technique that offers a different approach is Test Driven Development (TDD) which packages a code unit with a series of tests that become part of the automated recursion testing. This fixes the shortcoming of having an easily identified specification to match to the code unit. But I have never heard of a recursion testing system that includes the many more difficult tests such as capacity testing or performance testing. An of course testing for usability or maintainability are so difficult that I doubt anyone tries to include them in recursion testing. So while TDD offers some promise it is far from providing a framework for the kind of empirical study that the industry could benefit from.

In my conversations with Silicon Valley software engineers I have noticed a distinct trend away from anything that even bears a resemblance to the old waterfall methodologies. TDD may be the last vestige of a process that requires thought into how to articulate the behavior that is desired. This has been spurred by the strong reaction that consulting practitioners brought to the industry against waterfall methodologies. I have not yet read anything that uses the words I do to contrast the two approaches to system creation so let me explain the contrast I see.

Waterfall was a big-design-up-front. It required deep analysis and frequently caused paralysis through analysis as everyone became afraid of making an error. Agile broke that logjam by insisting on fast turnaround and the delivery of quantitized function. It encouraged an organic approach where some prototype of the function was continually reworked and accreted function until the gap between the desired system and the delivered prototype "satisficed" and was accepted into production.

But the Agile process has an inherent defect introduced by that process, it discourages a great deal of thought into global issues of system design. For small systems or systems that do not require integration later, this is perfectly acceptable. However many large systems that will accrete their functionality over a long period of time in a partly unknown domain will not lend itself to the kind of immediate insight that allows for prescient decision making from the first prototype. There are some qualities of a system that are only evident in the whole and cannot be found in the constituent parts. These can be emergent or are qualities that can be impaired from the failure of one component. We use the term "software architecture" when we try to discuss design issues of a large ensemble of components that comprise a system. And a failure to properly appreciate the interconnectedness of these components in the beginning can lead to some very painful refactoring later in the project. It is this trend toward needed refactoring coupled with management's reluctance to acknowledge and fund the refactoring by denying the technical debt the system has accumulated. Software engineers can call this a management failure but management will call it an engineering failure as the relationship between the incremental functionality and that cost of implementing that functionality diverge because of that refactoring. At its most dysfunctional an existing system will be abandoned and rebuilt while a project not yet implemented may be canceled.

So I argue that the term "bug" can be indicative of a software engineer who is lacking in the maturity of their profession that comes after many years of watching these forces play out on real projects. I have a tendency to extrapolate from my experience to see this in the current software engineering practice. But I hear enough stories to convince myself that no one has found a magic bullet which helps a software engineer use a process that is in any way analogous to that used by engineers in other engineering fields. The reasons for this alone are interesting but beside the point; we must accept the fact that real software quality cannot be attained when the only focus on the delivered software product is limited to the most easily or badly quantified measures that sweep the subtlety of software defects away in an effort to use an existing dataset and avoid the time and expense of data gathering.

Dale Fletter Research Journal

Monday, May 13, 2019

The Semantics of "Bug" in Empirical Software Engineering

No comments:

Post a Comment