Saturday, May 4, 2013

The Accretion of Structure in Large, Dynamic Software Systems: A Socio-Technical View

On the outside chance there is really anyone following this blog, apologies. If you acknowledge reading this I will be more inclined to post more often. But this blog serves as my sounding board for my developing research thoughts. The title of this post could potentially be a dissertation but it is too ambitious to be mine. Yet it does indicate the direction my thoughts are taking.

Large software systems that reach any great level of popularity do not fall from the sky fully formed. They may have germinated from the seed of some idea that one individual had or they may have been created to solve some problem. Either way, that is only the origin story of the product. To reach its mature form, required the efforts of many people over some period of time and we classically refer to this as a project. A project has the creation of the product as its end goal. Once that goal is achieved the classical view is that some organization, an organization that may have been created coincident to the creation of the product, will take ownership of the product and use that asset for the betterment of the organization. If the single product is the sole asset of that organization, its fortune will be tied to the product life-cycle of that asset and will cease to exist with the retirement of that product.

Contemporary software engineering research owes a debt of gratitude to the open source movement. The benefit of the movement to research is the open nature of the work. Rather than guard the artifacts of software development behind the shroud of proprietary secrets, the open source community thrives on an atmosphere of complete transparency. Even better, it has a tradition of maintaining these records for posterity making longitudinal study possible in a way that is unthinkable with for-profit corporate development. The oldest and best known of these efforts is the Apache web server and the organization it spawned. It is this organization and the many products that can be tied to this organization that has been my latest object of research.

The history of the Apache Software Foundation is well known. It has matured as an organization over the years and epitomizes a form of software creation and stewardship that is neither completely altruistic and selfless nor constrained by the commercial marketplace for software. As such, it may, or may not, result in software that is qualitatively different over the product life-cycle than the products from such well known software houses as Microsoft and Oracle. But whether they are different or not, it is possible to study the inception, growth, decay and retirement of these software products in ways that are impossible for the commercial products. It is the forces of this evolution that I am drawn to and am looking very closely at in the various Apache projects. (Note that unlike a classic project, an Apache project is really an organization that is responsible for both the creation, maintenance and general stewardship of the software product until its retirement.)

A thesis I have is that program correctness is not really the most appropriate aspect of the dynamic code maintenance that most of my colleagues think it is. Rather I believe that it is the change in the non-functional behavior of the product that drives more decisions, and ultimately predicts the ultimate failure of the product. The froth that is seen around the correctness of the product is certainly important. But a product that fails to deliver in performance, scalability, maintainability, adaptability or any of dozens of other qualities will either find a niche or will need to be re-engineered to meet these new challenges.

I say re-engineered in the sense that the small focus of most enhancement requests or bug reports do not allow for a scope that is sufficient to address the refactoring that is most often needed to achieve these ends. In many cases the skills gained by  the creation of this product result in a functional team that are motivated to either significantly enhance the product to give it those qualities or to leave the product as is and create a new one via fork or greenfield development that will improve upon the prior product in these key qualities. Often the relationships between these distinct products is overlooked and the debt one product owes to a prior product lost in the mist of time.

The history of the Apache Software Foundation is celebrated. But every day there are projects that choose to put their products into retirement because of lack of interest. If other projects were spawned by the project, they are sporadically documented, sometimes in the project archives and communications, or sometimes through the press which covers open source development. What I have lately come to realize is that there is no interest in documenting this history and as web links rot or repositories taken off line, history is being lost every day. It is perhaps Quixotic to concern myself with trying to prevent this complete loss but I do. As such, I am trying to document as many origin stories for Apache projects as I can in the belief that if I do not, some of this information will eventually become unattainable. What is harder is to justify this effort in terms of my own research.

No major software product exists that did not require thousands of decisions, some big, most small, made by the team members over time. Software developers are notoriously averse to documentation so these decisions are as often inferred as observed. This is where Big Data and the statistical methods of empirical software engineering are most helpful. But where the data is not available in a form usable by these techniques, the techniques are useless. Since the gathering of this data and putting it into a form that is suitable for quantitative analysis is exceedingly tedious, it is usually not done. In this reticence to tackle difficult data gathering tasks, I see an opportunity. I see this as an opportunity to both do the deep dive into more interesting decisions made during software development, their relationship to the form and structure of the software system and also make a significant contribution to the preservation of some important history that may have value for researchers in the future. Perhaps I am suffering from a touch of hubris in this but it helps my motivation to feel that my contribution may surpass my own prosaic goals of satisfying institutional requirements for a degree or getting some papers published.

I'll end my post here since the work is ongoing and quite fluid. But this post gives me a short statement that I can share with others who may take an interest in my current direction.


No comments:

Post a Comment