Dale Fletter Research Journal: On the efficacy of software quality metrics

For the past decade at least, the dialog about software quality has been intense. Since there is a way of looking at a problem that says if you can't measure it, you can't be sure it exists, there has also been an inquiry into software quality metrics and the decomposition of the components of software quality. It has now been several generations since the Japanese clearly demonstrated that the quality of automobiles can be analyzed, understood and lead to actionable management strategies that result in a product that both scores high on these quality metrics and is judged a quality product in the marketplace. At least for the next few weeks this topic is one that I will be looking at.

To be honest, I eventually found the Basili book a bit tiresome. In my metrics class we had already covered his GQM methodology and I get it. The remainder of the essays seem to offer a historical perspective on how he got to that formulation. The book does end with an essay that hints at things beyond this idea but I think those thoughts will naturally work into this topic.

As I imagine is norm for contemporary researchers these days, my first stop this morning was Google to see what pops up with the term "software quality metrics." As is no surprise to most people who care, Wikipedia was the first few results which I promptly ignored. The first non-trivial result was from the site www.developer.com, specifically a post called Software Quality Metrics. [1] Since it acknowledges the work of SEI and Watts Humphrey as well as the work on TQM, it passes my filter for serious consideration.

They quickly acknowledge that quality is a multi-dimensional attribute. A good start. They claim that IBM measures customer satisfaction along 8 dimensions:

capability or functionality,
usability,
performance,
reliability,
installability,
maintainability,
documentation, and
availability.

While the authors are comfortable with this immediate link between software quality metrics and the dimensions, I want to take a moment to deconstruct this for myself.

There is no doubt that in the marketplace, customer satisfaction is an important metric. You'll need to pardon my cynicism on the topic however since I know how shamelessly companies manipulate this metric in the commercial setting. Yet at a conceptual level I concede that any company that delivers a product judged to be of inferior quality (whatever that means at this point) cannot maintain a high customer satisfaction rating for long.

But customer satisfaction is itself a multi-faceted attribute and there are components that have little to do with the product quality depending upon how you define product. Sales force honesty and integrity, post-sales support, the cultural fit between the company and the customer can all affect the reported satisfaction and have nothing to do with the software artifact. Since the given taxonomy seems to not have those things but instead focuses on the most of the same qualities as the SEI SAPP taxonomy, I'll assume this was handled elsewhere. I am still dropping a red flag on this topic since I will want to understand how we connect customer satisfaction to perceived product quality. Indeed, I will want to see how they even define product.

The authors do take the time to discuss fitness for use as an important dimension. What I don't see in this article on first read is a good discussion of the various stake holders and their unique evaluation of the quality of the product. An end user will be the single most important stakeholder to a package software company selling a mass produced shrinkwrap product. But even this is moderated by other qualities that are of no interest to the customer such as profitability, maintainability, supportability, testability, or extendability. These qualities are important to the ownership stakeholders; shareholders, management, staff. The product management is ideally centrally involved with the design tradeoffs that are needed when balancing between these two different sets of qualities. Time-to-market and unity of design can easily be in opposition and a reasonable decision must be made to ensure the most optimal balance for that organization with that product at that point in time.

I want to take a moment to drill into how customer perceived satisfaction will change over time. Complex product are not mastered in the first few hours of use but will create a curve of frustration and satisfaction. The IBM taxonomy uses documentation as a dimension. At Andersen 20 years ago we stopped talking about documentation and instead stressed integrated performance support. To speak of documentation is to assume an artifact separate from the product itself. My position is that whatever documentation that must be provided is done so in the most integrated manner possible so as to minimize any difference between the two artifacts. At its best, the user does not perceive an separation between the two artifacts.

What I'm also mindful of in reading this article is how the authors do not challenge the taxonomies of quality. A central point of SEI's treatment of quality requirements is how ultimately meaningless the taxonomy becomes. The ideal statement of a quality property is a metric against some use case. When you accept this as the ultimate statement of the quality requirement, the hierarchy of how these are aggregated becomes less meaningful for a researcher. In fact, you can have several different taxonomies in addition to other tree structures to organize any number of metrics. Unless I were driven away from this approach, I will cling to SEIs way of looking at this since it naturally places the emphasis on the metrics and defers the sometimes endless semantic discussions of which category a particular metric should be placed into.

The authors suggest that IEEE's formulation of quality [2] is derivative of VOC and QFD. While I also studied these in my metrics class I have not looked at IEEE's interpretation. This standard should provide good guidance on how to roll the metrics up to a projected cust sat number or drill down from customer statements into the specific metrics to drive the process but I'll reserve judgement for when I get a chance to review it. As they say "TQM methodology is based on the teachings of such quality gurus as Philip B. Crosby, W. Edwards Deming, Armand V. Feigenbaum, Kaoru Ishikawa, and Joseph M. Juran."

The authors present some classic material on defect correction before they present a section they title Software Science. I am intrigued. They say "In 1977 Professor Maurice H. Halstead distinguished software science from computer science by describing programming as a process of collecting and arranging software tokens, which are either operands or operators. " I am embarrassed to admit I have not yet studied Halstead's work. The authors give a high-level review of some of his metrics. What I note in passing is that I have not yet heard of someone who has attempted to apply these concepts to the requirements engineering phase. I am inclined to think that a simple count of the tokens in a requirements document may be a reasonable place to begin with requirements metrics. Function point analysis has a long history but it was also reviled in the communities I worked in. However now as an academic, I need to reopen my mind to this material and see what I think.

The authors cover a brief overview of cyclomatic complexity.

The authors have one paragraph that I've read 3 times without feeling certain I have a complete grasp of the point they are making:

Availability and Customer Satisfaction Metrics
To the end user of an application, the only measures of quality are in the performance, reliability, and stability of the application or system in everyday use. This is "where the rubber meets the road," as users often say. Developer quality metrics and their assessment are often referred to as "where the rubber meets the sky." This article is dedicated to the proposition that we can arrive at a priori user-defined metrics that can be used to guide and assess development at all stages, from functional specification through installation and use. These metrics also can meet the road a posteriori to guide modification and enhancement of the software to meet the user's changing needs. Caution is advised here, because software problems are not, for the most part, valid defects, but rather are due to individual user and organizational learning curves. The latter class of problem calls places an enormous burden on user support during the early days of a new release. The catch here is that neither alpha testing (initial testing of a new release by the developer) nor beta testing (initial testing of a new release by advanced or experienced users) of a new release with current users identifies these problems. The purpose of a new release is to add functionality and performance to attract new users, who initially are bound to be disappointed, perhaps unfairly, with the software's quality. The DFTS approach we advocate in this article is intended to handle both valid and perceived software problems.

I am inclined to agree that it should be possible to develop customer sat targets a priori and use those to guide development through the creation of various metrics. This is a tall order. This paragraph also makes the point I had observed earlier that the product evaluation is not immediate but is best presented as a time series.

Their next section talks about the current state of metrics. They cite this book as their primary source:

About the Source of the Material

Design for Trustworthy Software: Tools, Techniques, and Methodology of Developing Robust Software
By Bijay Jayaswal, Peter Patton

Published: Aug 31, 2006, Hardcover: 840 pages
Copyright 2007 Pearson Education, Inc.
ISBN: 0131872508
Retail price: $64.99

I found it used on Amazon for $10 so I'll have it next week. I'll save my review for this material until after I've read their source material.

REFERENCES
[1] http://www.developer.com/tech/article.php/3644656/Software-Quality-Metrics.htm
[2] IEEE, Standard for a Software Quality Metrics Methodology (New York: IEEE, Inc., 1993)

MY TODO LIST
Review VOC, QFD, IEEE's
Study Halstead, function point analysis(IFPUG, International Function Point User's Group Standard (IFPUG, 1999).
d

Dale Fletter Research Journal

Tuesday, February 14, 2012

On the efficacy of software quality metrics

About the Source of the Material

No comments:

Post a Comment