Friday, May 18, 2012

The Cathedral and the Bazaar

Against all odds, I see hits against this blog. So for those of you who may be following me I'd like to offer a few words about my absence. I am teacshing a course at a local JC titled Introduction to Computer Science which has no articulation to any four year CS curriculum. While the book the supervisor of this course has chosen is probably the best in its class, that is very faint praise. These texts fall into one of two camps: the either mirror a first course in computer science which is often a programming methodology or they are computer literacy. I rebel against the thought that this course should fall into either of those two camps. I cannot countenance the thought of making non-cs majors competent at any level of programming to the exclusion of all other topics. Nor do I accept the "dummies" approach to teaching skills that the average high-school student has already mastered. Computer literacy is not something that should earn a college credit, even at a JC. Instead I approach the course as teaching core topics that are resonant with material from other non-CS classes but which also illustrate important concepts, and vocabulary, from CS. I sometimes spend way too much time thinking about this and trying to twist the material to my liking.

Were this all I am doing, I'd still have plenty of time to blog. But as I'm sure I've already mentioned, I start a PhD program in the fall and am looking to ease into that program with the least amount of trauma and stress possible. To that end, I enrolled in the natural language processing course offered online by the Stanford professors. I took the AI class last semester which did not include programming assignments. I looked forward to the programming of this course. What I did not count on was the linguistic background this course seems to take for granted. So while I was getting up to speed with the mechanics of downloading their scaffolding code, I was mildly challenged to complete their assignments in the week they were given. For me, two assignments, admittedly the hardest of the course, just became too time consuming to take seriously. The first was the creation of a probabilistic context free grammar for a restricted lexicon and the other a probabilistic CYK parser that would be trained from a tree bank they provide. I love both the assignments but I find that I am not willing to spend the time right now to get an acceptable solution to these assignments to the exclusion of the other things going on. I expect to complete them after the course is through but no in time to get any credit for them. Not that the credit matters to me anyway.

So I am posting today to give my thoughts on an article I just read that, in part, addresses a question P. posed some weeks ago, "What does an open source project lose by not having a traditional project manager?". I think the question is one that deserves a good answer since if I cannot articulate the differences between open source projects and traditional closed source projects, I am not as familiar with this form of organization as I need to be.

In my research into open source software, I came across a paper by Eric Steven Raymond titled "The Cathedral and the Bazaar" from about 1996. I suspect this is well know to people more familiar with open source software than I am but I have a tendency to enjoy the search for beginnings and this looked like a good place to start. What I find in this paper is the articulation of many core beliefs I have about the right way to develop software in the context of his own experience developing the fetchmail program. I'll review some of his key points and my thoughts about them.

His first lesson: every good work of software starts by scratching a developer's personal itch. I don't agree with this formulation of the thought but see a basic truth expressed here about why open source software can be so powerful. Systems workers use many tools to do their work: operating systems, compilers, integrated development environments, databases, etc. We become intimately familiar with these tools including their shortcomings and strengths. In the world before open source software these were products that needed to be purchased from an organization, mostly for-profit organizations. Prior to the PC, the tools of production were too expensive for the average person to afford and corporations had an incentive to price them for corporations and not people. This placed the cost of production for software outside the hands of a hobbyist or even a garage entrepreneur. But the open source movement spawned by the transparancy brought by UNIX and the dramatic reduction in the cost of hardware changed that. We now have the ability to change what we don't like in our tools and many incentives to do so. When we have an itch to add a new feature, we now have the ability to scratch it. For those who have the requisite skills and motivation, we can craft our own tools by making small modifications to what is already available. This is very empowering and I believe a major motive force behind open source software. Now I just need to find the evidence to support it or refute it.

Where I disagree with this maxim is the suggestion that this broadly applies to all software. Is it reasonable to assume that there will always be a group of people who will have an itch to develop any imaginable piece of software and be able to grow a circle of supports around it? I find this hard to accept. That would suggest that any commercial product out there could eventually fall to an open sourced alternative. I'm not yet confident enough to say that this is unrealistic, yet I believe this is treating open source software as a silver bullet, and we all know how successful those have been in the history of software engineering. Yes, a great many software products are likely to be at least partly open source software in the future but I believe this will be far more graduated than this maxim suggests.

I cannot consider this maxim without also considering Karl Marx. One of the popular conceptions from his is that the workers should own the means of production. Sadly I have not yet studied his work yet and this is something I will want to do to see if this explains some of the open source software movement.

2. "Good programmers know what to write. Great ones know what to rewrite (and reuse). "
On the surface, this maxim acknowledges that great programmers recognize good code and see how to reuse it. What I immediately get out of this is how other programmers cannot quickly see the value of an existing piece of code and can only understand code that they have written. Alternatively other programmers, in hubris, feel they can always do it better and dismiss code written by others. I think this poses a good research question, "What factors are cited by programmers who have access to other similar code when they duplicate functions?". I also believe, without good evidence, that better programmers are better designers and have greater powers of abstraction for seeing how existing components can be reassembled for a novel application.

3. "Plan to throw one away: you will anyhow." from Mythical Man-Month by Brooks
This maxim is often cited but I have not yet seen a good theory for why this should be so. In my opinion, this happens because the design space is novel to the programmer and the process of creating the first version is an exploration of that space. Often early design decisions necessitate later design choices and block others. It is not uncommon for the programmer to realize an alternative version only after a significant amount of effort and the resistance to the refactoring of the code that would be needed. The second time, if starting from a blank page, the programmer will avoid the bad decisions  of the first construction while retaining to good decisions made in that design.

4. "If you have the right attitude, interesting problems will find you."
It is really not clear from his essay what this maxim really means to him. I can mean that embracing a collaborative mindset will make you open to seeing possibilities for interesting work. I am not sure this is really a software engineering maxim as it is a philosophy of life. There is so much need for high-quality software that the opportunities are boundless. Yet few people find software development interesting. Even those who do can take a parochial attitude that if they are going to do work, it should be for a company that will pay them for their effort. In software, we are lucky enough to enjoy a lifestyle that both rewards us monetarily while offers us an opportunity to do what we love. But as for artists or philosophers, the show Cabaret tells us what a happens to love for something when there is no money "...and the fat little pastor tells you to love everymore, and the hunger comes a rat-tat-a-tat at your window, and your love flies out the door. Money makes the world go around..."

5. "When you lose interest in a program your last duty to it is to hand it off to a competent successor."
How is this software engineering? This is social consciousness 101 and it applies to any work for public good and is seen in boards all across the country. Many a good organization was built by a competent lead and languished under the leadership, or lack thereof, by subsequent people. If you really love something, you have an obligation that goes beyond your own interests.

6. Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.
There may be a software engineering truth here but first let's separate the business truth. No business can be assured of continued customer goodwill if they do not treat their customers with respect. The organization must create value for its customers or else it will be vulnerable. The world is not static and an organization has probably never gotten the function set exactly right in the first place. Either an organization will be responsive to its customers or not. If it is not responsive to its customers, it is vulnerable. Where is the software engineering principle that is separate from the business truth?

Customer intimacy is practiced by organizations that are committed to being responsive. In open source software, the customers are the users who are often the potential developers or at least valued members of the "hallow". The collapsing of these relationships dramatically reduces the communications noise and filtering that would exist in a close source environment. There is no product manager, or any management for that matter, to block or alter a message. Since the communication comes from someone who is more invested in the product, they are willing to give more to see that feature added. This can be extended discussion to work out details of behavior, prototyping or testing. For many reasons, this is often not the case in close-source software and the additional contributions are made at the right place and time to speed the work of the developer.

One aspect that is not discussed in the paper though is the decision making process that a suggestion must go through. In a closed source environment, this is part of the organization processes. I understand there are some standard processes to be found in open source software (OSS) projects but the reality is that someone must decide if this suggestion become acted upon if the requester does not have the skills needed to perform the addition themselves. Earlier I talked about how OSS can differ in the self-selected nature of the people who will work on a project. They are often drawn to work on the tools of their own production. What if the product were a legal database system that served the needs of lawyers? While developers who earned a living consulting to law firms would have a vested interest in the product, the end-users are the lawyers. How would this work in an OSS environment? Would lawyers themselves be posting their suggestions? Would they be articulate enough to provide actionable specifications for the new features they need? Would they remain engaged with the developer long enough to see their suggestion be developed into a product feature? These are all questions in my mind regarding the applicability of OSS outside the domain of system's software.

7. Release early. Release often. And listen to your customers.
I think the listen to your customers is a repeat of a prior point.

This software engineering maxim has proven its worth but I don't think this essay really explains why. Nothing demonstrates an organization's commitment to their customers than an immediate response. OSS projects differ from ordinary organizations in that the submitter is likely to also be the person who submits the proto-solution, whether for a bug-fix or for an enhancement. Whatever the form of leadership, their job is far easier than in a closed source software environment. First, the submitter is motivated to submit a working solution and has most likely demonstrated it in their environment. The OSS project has less of an investment to make. Second, the philosophy of the project is to get the new code in front of as many eyeballs as possible as quickly as possible. There is no expectation of exhaustive quality assurance before the code is seen by a set of users who are inclined to use the test version. The more the code is exercised as a form of black box testing, and examined by other programmers, as a form of white box testing or static analysis, the faster any questionable code will be found. This keeps the process streamlined to production and imposes a caveat emptor on the product pushing more responsibility to test the product against their own acceptance criteria.

While I can see the benefit on the customer side, what I believe is the bigger justification for this maxim is the impact it has on the development side. Long product life-cycles were the norm in the traditional waterfall methodologies.

No comments:

Post a Comment