Dale Fletter Research Journal

Friday, May 18, 2012

Research Questions: April 2012

In today's symposium, a comment was tossed to me concerning a question of programmer productivity. In the discussion P drew a diagram showing the economic 1% (Xaxis, income, Yaxis, number of people) and compared it to the number of contributions in a typical open source project (Xaxis, contributions, Yaxis number of people). Why, he asked, should the shape of these two curves be so similar? U suggested that it was because of the lower cost of creation that this 1% experienced; that their ability to crank out code faster gave them an advantage. V suggested that it was due to the distribution of talent in the population. I objected to the suggestion that talent was the primary factor and that led P to suggest that some basic research into the factors that influence the contributions to open source projects. Perhaps he's right.

For anyone who has worked in a software shop, we easily accept the oft made assertion that there is a factor of ten between the productivity of a good programmer and the productivity of a poor programmer. I, like I suspect most people who have written code, believe that there is clearly a talent component in one's ability to perform the kind of abstract reasoning and design needed to create computer code. However I also see people generalize far too quickly and accept these assertions uncritically. I, for one, can suggest many reasons that can explain differences in contributions in addition, possibly instead of, talent.

In my own experience, my periods of highest coding productivity happened when coding was my full time job. It seems pretty self-evident that if all you are doing is coding day-after-day you get pretty good at it. This alone can explain a lot of variation in open source software contribution.

Not everyone who contributes to an open source project may be a full-time coder. (Q: How many hours a week does a contributor spend coding? How many of those are for the open source project?)

Even when a contributor is a full-time coder, there may be differences between the coding environment of the open source project and the coding environment of their other coding work. Unlike a shop where everyone is using a small set of languages, possibly only one, with significant support for that language, an open source project will pull in a wide variety of people using a wide variety of languages at various levels. Where the contributor either works full-time on the open source project or happens to code in the same environment in their other coding work, this may contribute to high-contribution levels. (Q: When not all coding is for the open source project, what is the environment of the open source project? What is the environment for the other coding work? Language, version, IDE, )

The Cathedral and the Bazaar

Against all odds, I see hits against this blog. So for those of you who may be following me I'd like to offer a few words about my absence. I am teacshing a course at a local JC titled Introduction to Computer Science which has no articulation to any four year CS curriculum. While the book the supervisor of this course has chosen is probably the best in its class, that is very faint praise. These texts fall into one of two camps: the either mirror a first course in computer science which is often a programming methodology or they are computer literacy. I rebel against the thought that this course should fall into either of those two camps. I cannot countenance the thought of making non-cs majors competent at any level of programming to the exclusion of all other topics. Nor do I accept the "dummies" approach to teaching skills that the average high-school student has already mastered. Computer literacy is not something that should earn a college credit, even at a JC. Instead I approach the course as teaching core topics that are resonant with material from other non-CS classes but which also illustrate important concepts, and vocabulary, from CS. I sometimes spend way too much time thinking about this and trying to twist the material to my liking.

Were this all I am doing, I'd still have plenty of time to blog. But as I'm sure I've already mentioned, I start a PhD program in the fall and am looking to ease into that program with the least amount of trauma and stress possible. To that end, I enrolled in the natural language processing course offered online by the Stanford professors. I took the AI class last semester which did not include programming assignments. I looked forward to the programming of this course. What I did not count on was the linguistic background this course seems to take for granted. So while I was getting up to speed with the mechanics of downloading their scaffolding code, I was mildly challenged to complete their assignments in the week they were given. For me, two assignments, admittedly the hardest of the course, just became too time consuming to take seriously. The first was the creation of a probabilistic context free grammar for a restricted lexicon and the other a probabilistic CYK parser that would be trained from a tree bank they provide. I love both the assignments but I find that I am not willing to spend the time right now to get an acceptable solution to these assignments to the exclusion of the other things going on. I expect to complete them after the course is through but no in time to get any credit for them. Not that the credit matters to me anyway.

So I am posting today to give my thoughts on an article I just read that, in part, addresses a question P. posed some weeks ago, "What does an open source project lose by not having a traditional project manager?". I think the question is one that deserves a good answer since if I cannot articulate the differences between open source projects and traditional closed source projects, I am not as familiar with this form of organization as I need to be.

In my research into open source software, I came across a paper by Eric Steven Raymond titled "The Cathedral and the Bazaar" from about 1996. I suspect this is well know to people more familiar with open source software than I am but I have a tendency to enjoy the search for beginnings and this looked like a good place to start. What I find in this paper is the articulation of many core beliefs I have about the right way to develop software in the context of his own experience developing the fetchmail program. I'll review some of his key points and my thoughts about them.

His first lesson: every good work of software starts by scratching a developer's personal itch. I don't agree with this formulation of the thought but see a basic truth expressed here about why open source software can be so powerful. Systems workers use many tools to do their work: operating systems, compilers, integrated development environments, databases, etc. We become intimately familiar with these tools including their shortcomings and strengths. In the world before open source software these were products that needed to be purchased from an organization, mostly for-profit organizations. Prior to the PC, the tools of production were too expensive for the average person to afford and corporations had an incentive to price them for corporations and not people. This placed the cost of production for software outside the hands of a hobbyist or even a garage entrepreneur. But the open source movement spawned by the transparancy brought by UNIX and the dramatic reduction in the cost of hardware changed that. We now have the ability to change what we don't like in our tools and many incentives to do so. When we have an itch to add a new feature, we now have the ability to scratch it. For those who have the requisite skills and motivation, we can craft our own tools by making small modifications to what is already available. This is very empowering and I believe a major motive force behind open source software. Now I just need to find the evidence to support it or refute it.

Where I disagree with this maxim is the suggestion that this broadly applies to all software. Is it reasonable to assume that there will always be a group of people who will have an itch to develop any imaginable piece of software and be able to grow a circle of supports around it? I find this hard to accept. That would suggest that any commercial product out there could eventually fall to an open sourced alternative. I'm not yet confident enough to say that this is unrealistic, yet I believe this is treating open source software as a silver bullet, and we all know how successful those have been in the history of software engineering. Yes, a great many software products are likely to be at least partly open source software in the future but I believe this will be far more graduated than this maxim suggests.

I cannot consider this maxim without also considering Karl Marx. One of the popular conceptions from his is that the workers should own the means of production. Sadly I have not yet studied his work yet and this is something I will want to do to see if this explains some of the open source software movement.

2. "Good programmers know what to write. Great ones know what to rewrite (and reuse). "
On the surface, this maxim acknowledges that great programmers recognize good code and see how to reuse it. What I immediately get out of this is how other programmers cannot quickly see the value of an existing piece of code and can only understand code that they have written. Alternatively other programmers, in hubris, feel they can always do it better and dismiss code written by others. I think this poses a good research question, "What factors are cited by programmers who have access to other similar code when they duplicate functions?". I also believe, without good evidence, that better programmers are better designers and have greater powers of abstraction for seeing how existing components can be reassembled for a novel application.

3. "Plan to throw one away: you will anyhow." from Mythical Man-Month by Brooks
This maxim is often cited but I have not yet seen a good theory for why this should be so. In my opinion, this happens because the design space is novel to the programmer and the process of creating the first version is an exploration of that space. Often early design decisions necessitate later design choices and block others. It is not uncommon for the programmer to realize an alternative version only after a significant amount of effort and the resistance to the refactoring of the code that would be needed. The second time, if starting from a blank page, the programmer will avoid the bad decisions of the first construction while retaining to good decisions made in that design.

4. "If you have the right attitude, interesting problems will find you."
It is really not clear from his essay what this maxim really means to him. I can mean that embracing a collaborative mindset will make you open to seeing possibilities for interesting work. I am not sure this is really a software engineering maxim as it is a philosophy of life. There is so much need for high-quality software that the opportunities are boundless. Yet few people find software development interesting. Even those who do can take a parochial attitude that if they are going to do work, it should be for a company that will pay them for their effort. In software, we are lucky enough to enjoy a lifestyle that both rewards us monetarily while offers us an opportunity to do what we love. But as for artists or philosophers, the show Cabaret tells us what a happens to love for something when there is no money "...and the fat little pastor tells you to love everymore, and the hunger comes a rat-tat-a-tat at your window, and your love flies out the door. Money makes the world go around..."

5. "When you lose interest in a program your last duty to it is to hand it off to a competent successor."
How is this software engineering? This is social consciousness 101 and it applies to any work for public good and is seen in boards all across the country. Many a good organization was built by a competent lead and languished under the leadership, or lack thereof, by subsequent people. If you really love something, you have an obligation that goes beyond your own interests.

6. Treating your users as co-developers is your least-hassle route to rapid code improvement and effective debugging.
There may be a software engineering truth here but first let's separate the business truth. No business can be assured of continued customer goodwill if they do not treat their customers with respect. The organization must create value for its customers or else it will be vulnerable. The world is not static and an organization has probably never gotten the function set exactly right in the first place. Either an organization will be responsive to its customers or not. If it is not responsive to its customers, it is vulnerable. Where is the software engineering principle that is separate from the business truth?

Customer intimacy is practiced by organizations that are committed to being responsive. In open source software, the customers are the users who are often the potential developers or at least valued members of the "hallow". The collapsing of these relationships dramatically reduces the communications noise and filtering that would exist in a close source environment. There is no product manager, or any management for that matter, to block or alter a message. Since the communication comes from someone who is more invested in the product, they are willing to give more to see that feature added. This can be extended discussion to work out details of behavior, prototyping or testing. For many reasons, this is often not the case in close-source software and the additional contributions are made at the right place and time to speed the work of the developer.

One aspect that is not discussed in the paper though is the decision making process that a suggestion must go through. In a closed source environment, this is part of the organization processes. I understand there are some standard processes to be found in open source software (OSS) projects but the reality is that someone must decide if this suggestion become acted upon if the requester does not have the skills needed to perform the addition themselves. Earlier I talked about how OSS can differ in the self-selected nature of the people who will work on a project. They are often drawn to work on the tools of their own production. What if the product were a legal database system that served the needs of lawyers? While developers who earned a living consulting to law firms would have a vested interest in the product, the end-users are the lawyers. How would this work in an OSS environment? Would lawyers themselves be posting their suggestions? Would they be articulate enough to provide actionable specifications for the new features they need? Would they remain engaged with the developer long enough to see their suggestion be developed into a product feature? These are all questions in my mind regarding the applicability of OSS outside the domain of system's software.

7. Release early. Release often. And listen to your customers.
I think the listen to your customers is a repeat of a prior point.

This software engineering maxim has proven its worth but I don't think this essay really explains why. Nothing demonstrates an organization's commitment to their customers than an immediate response. OSS projects differ from ordinary organizations in that the submitter is likely to also be the person who submits the proto-solution, whether for a bug-fix or for an enhancement. Whatever the form of leadership, their job is far easier than in a closed source software environment. First, the submitter is motivated to submit a working solution and has most likely demonstrated it in their environment. The OSS project has less of an investment to make. Second, the philosophy of the project is to get the new code in front of as many eyeballs as possible as quickly as possible. There is no expectation of exhaustive quality assurance before the code is seen by a set of users who are inclined to use the test version. The more the code is exercised as a form of black box testing, and examined by other programmers, as a form of white box testing or static analysis, the faster any questionable code will be found. This keeps the process streamlined to production and imposes a caveat emptor on the product pushing more responsibility to test the product against their own acceptance criteria.

While I can see the benefit on the customer side, what I believe is the bigger justification for this maxim is the impact it has on the development side. Long product life-cycles were the norm in the traditional waterfall methodologies.

Is computer code really a language?

I've already discussed the paper that is the main topic of this post. (http://www.blogger.com/blogger.g?blogID=8640583147187562680#editor/target=post;postID=7393746261888193647) I am trying to refine that post and setup for further work in this area.

This post title is a complete rip off of a paper I just finished reading (again) titled On the Naturalness of Software by Hindle, Barr, Gabel, Su, Devanbu and has been accepted for ICSE2012. The hypothesis is that code utterances are amenable to the same kind of simple language models that have worked well for NLP over the past decade. The paper suggests that token completion and suggestions in an IDE can be considerably enhanced by the use of a (relatively) simple language model. What catches my interest is that the going-in position is that the "naturalness" of the coders use of the language must be established. I am glad this is being done but I find this a nearly established fact in my mind.

Clearly computer languages and they way they are used are not human languages in the sense that they primarily serve human-to-human communication. The obvious receiver for the messages is the computer. And we know how limited the language capabilities of compilers and interpreters are. But code is written as much for other humans as it is for the computer. The use of white space is mostly compliant with norms in the industry. But when you teach an intro programming class you realize how much of that is a cultural norm and not a requirement of the language. Not only are pretty-print nicities like the vertical alignment of parallel structures but naming conventions, when a temporary result is committed to a variable and the choices someone makes for helper methods are all hallmarks of an individuals style. When that style is divergent from norms and not consistent or clear, reading the code is simply painful. This alone is enough to convince me that human communication is an inherent property of computer code and thereby prove that computer code that offers expressive capabilities is a natural language in this capacity.

My own research interests are less with computer code itself than with the broader context in which computer code is created. In particular I am fascinated with the transformations of language that span the life-cycle: starting with problem recognition; project definition; problem statement; requirements definition; specification; code construction; and all the feedback loops reversing the waterfall. I am all too aware how difficult the research challenges are outside of code and approach it with great caution. I need to start small.

Having asserted that code is still (at least in part) a human language, I am now concerned with asserting another hypothesis: that pre-code artifacts written in a common natural language are in fact more structured than other non-fiction prose. That is, a requirements document can be shown to use a more restricted form of its natural language and may have clues as to how the language can be more narrowly defined so as to improve the subsequent quality of the product to be built. Even this is a challenging assignment.

In preparation for some of the challenges that are inherent in the above two research directions, I am interested in doing a narrow study to see if the text artifacts in an OSS can be mechanically and successfully categorized in a way that makes a reasonable prediction or correlation to some attribute of the later construction. My first stab at this would be to look at the text in bug reports, use a relatively simple language model and try training that corpus with some hand coded bugs using different categories that I come up with through intuition and common practice. As I think of this research it sounds like an exploritory survey of the corpus. I would do this first for one of the larger and well respected OSS projects (Apache?) to see if any categorization can achieve a reasonable prediction using that language model.

Saturday, May 12, 2012

My Notes on Qualitative Techniques in Empirical Software Engineering Research - Part 1

I think my current line of study can best be summarized by a paragraph from the book, Guide to Advanced Empirical Software Engineering by Shull, singer, Sjøberg. This paragraph is from chapter two which is a paper by Carolyn Seaman titled Qualitative Methods:

The study of software engineering has always been complex adn difficult. The complexity arises from technical issues, from the awkward intersection of machine and human capabilities, and from the central role of the people performing software engineering tasks. The first two aspects provide more than enough complex problems to keep empirical software engineering researchers busy. ut the last factor, the people themselves, introduces aspects that are especially difficult to capture. However, studies attempting to capture human behavior as it relates to software engineering are increasing and, not surprisingly, are increasingly employing qualitative methods.

This post is my first attempt to distill some of what I am learning about qualitative techniques in software engineering research. My first stop is from a text I have from a software metrics class which seems to frame this well for me. In Software Metrics: A Rigorous and Practical Approach by Norman Fenton and Shari Lawrence Pfleeger they state that there are three investigative techniques: survey, case study, and formal experiment. They characterize surveys as research in the large, formal experiments as research in the small, and case studies as research in the typical. They also point out that surveys are most often retrospective while case studies and formal experiments require a decision regarding what will be investigated. They present four principles of investigation that are common to all three types of investigation. The principles of investigation are:

choosing an investigative technique
stating the hypothesis
maintaining control over variables
making your investigation meaningful

At this time, I am more interested in looking at case studies than the other two techniques. On page 148, section 4.3 is on Planning Case Studies.

While many issues are shared between formal experiments and case studies, the book explores some differences.

A case study usually compares one situation with another: the results of using one method or tool with the results of using another, for example. To avoid bias and make sure that you are testing the relationship you hypothesize, you can organize your study in one of three ways: sister project, baseline, or random selection.

Sister Projects
Suppose your organization is interested in modifying the way it performs code inspections. You decide to perform a case study to assess the effects of using a new inspection technique. To perform such a study, you select two projects, called sister projects, each of which is typical of the organization and has similar values for the state variables that you have planned to measure. for instance, the projects may be similar in terms of application domain, implementation language, specification technique, and design method. Then, you perform inspections the current way on the first project, and the new way on the second project. By selecting projects that are as similar as possible, you are controlling as much as you can. This situation allows you to attribute any differences in result to the difference in inspection technique.
Baselines
If you are unable to find two projects similar enough to be sister projects, you can compare your new inspection technique with a general baseline. Here, your company or organization gathers data from its various projects, regardless of how different one project is from another. In addition to the variable information mentioned above, the data can include descriptive measures, such as product size, effort expended, number of faults discovered, and so on. then, you can calculate measures of central tendency and dispersion on the data in the database, so you have some idea of the "average" situation that is typical in your company. Your case study involves completing a project using the new inspection technique, and then comparing the results with the baseline. in some cases, you may be able to select from the organization database a subset of projects that is similar to the one using the new inspection technique; again, the subset adds a degree of control to your study, giving you more confidence that any differences in result are caused by the difference in inspection technique.
Random selection
Sometimes, it is possible to partition a single project into parts, where one part uses the new technique while the other does not. Here, the case study resembles a formal experiment, because you are taking advantage of randomization and replication in performing your analysis. It is not a formal experiment, however, because the project was not selected at random from among the others in the company or organization. In this case, you randomly assign the code components to either the old inspection technique or the new. As with an experiment, the randomization helps to reduce the experimental error and balance out the confounding factors.
This type of case study design is particularly useful for situation where the method being studied can take on a variety of values. For example, you may want to determine whether preparation time affects the effectiveness of the inspections. You record the preparation time as well as component size and faults discovered. You can then investigate whether increased preparation time result sin a higher detection rate.

At the end of this chapter, there some suggestions for further reading. The ones that catch my eye are:

Curtis, B., "Measurement and experimentation in software engineering," Proceedings of the IEEE, 68(9), pp. 1144-57, 1980

Basili, V.R., Selby, R.W., and utchens, D.H., "Experimentation in software engineering," IEEE Transactions on Software Engineering, 12(7), pp. 733-43, 1986

Shen, V.Y., Conte S.D., and Dunsmore, H.E., "Software science revisted: A critical analysis of the theory and its empirical support," IEEE Transactions on Software Engineering, 9(2), pp. 155-65, 1983

Swanson, E.B. and Beath, C.M., "The use of cast study data in software management research," Journal of Systems and Software, 8, pp.63-71, 1988

Kitchenham, B., Pickard L., and Pfleeger, S.L., "Case studies for method and tool evaluation," IEEE Software, 12(4), pp. 52-62, 1995

Friday, May 11, 2012

What Gay Marriage and Sloppy Programming Have In Common

This week marked a high water mark for the progressive agenda in the president's acknowledgement that he does not see a fundamental difference between gay and straight marriage. As today's NYT observed, the speed with which social change occurs seems to be accelerating and they posit that it is the result of media. I'll leave that question to other researchers like Barton Friedland to ponder. But for me this comes the same week our group looked at a paper titled Sloppy Programming by Little, Miller, Chou, Bernstein, Cypher and Lau from MIT's SAIL and IBM's Almaden Research Center. The discussion raised my mojo and seems to hit close to where my thesis may go. I also see a rather significant connection between these two events and this takes some explanation.

The Sloppy Programming referenced by the title of the paper refers to the error that occurs when a coder enters a line of code that fails for lack of syntax or semantics, among other things. They explore several ways in which this error can be handled including Quack, and Eclipse plugin. They attempt to mimic the auto complete functionality that Eclipse offers in Java whereby it offers a reasonable list of things that might come next while the code is being entered.

The existing auto complete function draws its power from the fact that as the line is typed, the possible tokens that can come next becomes severely constrained. Once one types an object and then a period, the set of possible tokens is limited to the methods and attributes of that object. The authors attempt to expand this concept to encompass other constraints in an attempt to interpret tokens that are not syntactically correct as they appear on the line. The existing auto complete cannot function since the context for the typed statements cannot be as easily interpreted and the task is to look at what they COULD mean rather than what they DO mean. They do this using techniques from NLP that basically look at the context for the statement and look for syntactically valid statements that include the typed words in that statement. I don't find the paper itself groundbreaking since this seems like a very straightforward attempt to offer greater support to the coder in the task of creating both a semantically and syntactically correct statement. What fascinates me about the paper is how it was received.

My biggest turnoff in my undergrad experience was the cultural norms of the engineering school I attended. I was completely and utterly an outsider. Coming out at the time did nothing to help me feel comfortable in my own skin or at the institution. In retrospect, it is little wonder that I dropped out and took a very bitter aftertaste of academic life with me.

From my own perspective, engineers culturally have a surprisingly uniform and distinct world view from other disciplines. I often joke that every engineer is a bit autistic but in my heart I'm not so sure there isn't a bit of truth in that joke. Engineers see the beauty in logic and math and have crystalline visions for how systems should work. As long as they are dealing with purely physical systems, they do brilliant work. However most engineers I know begin to act like a misfiring engine when faced with human systems. The management decision processes, the illogical behavior of markets and human behavior are accepted to various degrees but usually grudgingly.

Business people are sometimes diametrically opposed to this world view embracing the humanity of systems and often with little or no true comprehension for the mechanical world they depend upon. I had a very competent lawyer friend who once needed very close handholding on how far to insert a document into a fax auto feeder having no feel for anything so mechanical. The intuition that you put it in until you feel the resistance was something that was utterly foreign to his consciousness even while he had the haptic sensations, he couldn't integrate that into the task. I mention this cultural divide because I think it still exists today albeit at an attenuated level and I see that divide in this paper and the reactions to it.

Engineers are a macho lot on the whole. Software engineers are not immune to this pull. I see it in the pride they take in their mastery of the various arcane skills that are needed to create quality systems. This paper seems to cut against the grain for some of them by offering a "crutch" to help coders by reducing the cognitive load a language may offer. In a spirited debate after the paper, D. and I discussed the value of this work with him questioning whether the empirical evidence could be gathered to show that this innovation would do any good. He also rightly pointed out that there would need to be a better statement of the problem and context before the empirical study could be approached. He is, of course, correct.

But beside the research questions, which I intend to return to, I find it interesting that his knee-jerk reaction to the work was negative. I also heard a similar reference to "imprecision" in a comment by V. in questioning the paper. Most damning is the title itself. Sloppy is pejorative. While the intent is clearly understood, it belies a certain judgement that if the coder does not get the syntax correct, it is some failure to properly learn the language so as to avoid the error in the first place. For my taste, at least, I could have accepted "Fuzzy Programming" as a more accurate way of conveying the same idea.

I hear echos of a very old argument from Dijkstra in this line. He properly observed that "bug" in its origin story in computers is not the correct term for the errors that are found in source code. A bug is beyond human control, almost literally a deus ex machina that cannot be predicted or prevented (OK, maybe putting the screens back in the windows would have prevented them). A modern equivalent would be the overheating of a rack due to the failure of a cooling component in the machine room. But to ascribe the label bug to an error in coding is to disown and distance oneself from the error in logic that was made. In the very bad old days of mainframe computing, professional became amazingly good at deskchecking for syntax errors simply because it was necessary for improving productivity.

The strength in D.'s resistance to the paper stems from another paper he remembers reading that suggested that code crutches delayed the language proficiency rather than accelerated it. Hence his reference to this mechanism as one of a crutch. He saw a difference between this mechanism and the auto complete suggestions of Eclipse Java but we didn't have a chance to drill down into where the distinction exists in his view. I suspect it has to do with the extent to which the mechanism can be ignored by an experienced coder. I want to explore this in a bit more depth before I go on.

D. echoed something I also deeply believe in; computer languages are not so different from human languages in that you do not learn them superficially. He went so far as to suggest that you learn to dream in them and while I don't quite see it that way, I do believe that the constructs of the language go deeper than our linguistic center and allow us to envision constructs of statements in that language and express those constructs in the syntax very quickly once we become proficient in that language. In his discussion, he made a impassioned argument on why committing a language to memory was vital to gaining full mastery of that language. A part of me agrees that maximum productivity in a language will never be achieved if you continually struggle for the syntactically correct way to express a semantic point. Just like a speaker not yet comfortable in conversational English will struggle to find the correct word or improperly conjugate a vowel, maximum communication over that channel cannot be achieved since these errors can impeded understanding.

But where I begin to differ with D. regarding the direction of this paper has to do with the larger context of computer languages and our orientation to them. The authors of this paper started by showing a browser command line interface that would accept more natural language and attempt to interpret user commands relative to the current page. It seems clear that the primary thought for a command line such as this would not be for a keyboard interface but rather for a spoken language interface for the casual user where verbal utterances often use linguistic devices that assume the current context. This is a markedly different context for this feature than the proficient programmer coding in a favored language that D. had objected to. Yet I don't see the bright line between the two that D. does. Rather I take a cue for interpreting this from HCI literature.

In HCI, the user is never taken as a member of a homogeneous group. If the tool has any complexity at all, there will be noobs, intermediate users and experts in the tool. The type and level of support change with the experience and expectations of the user. D.'s reaction was at the level of the expert or those who would want to become an expert. One assumption here is that everyone using a given language intends to become an expert in that language. An expert will value innate knowledge of the syntax and a wide vocabulary of the keywords in that language. Any thing that would stand in the way of that expertise is to be avoided. This leads me to a research question that would be interesting: RQ1: given a support tool such as described by this paper, how is the learning curve moved by its presence or absence over the long-term? I suspect that in the study D. recalls, the cohort was not as motivated to achieve high levels of proficiency in the language under study. I would expect that if the cohort were students who were not committed to using this language over a significant period of time with significant incentives to improve productivity. The kind of effort required to memorize the nuances or quirks of a language represent a significant cognitive investment. Like any investment, motivation will depend upon the payback. For a professional committed to a given language over a career, the payback is easy to see. For a student with no committment to the continued use of the language, the payback is not so clear.

D. questioned the value of the enhancement for the noob. I agree that for someone who has not yet begun to grasp the fundamentals of the language this type of tool would have very limited utility to them in a programming language. But in the broader application of the concept in various contexts, I can see the value of adding additional intelligence into the application to go beyond a simple rejection of the statement as invalid. This will have its greatest use for someone who is at an intermediate level of proficiency but who may occasionally forget or mis-remember a keyword or misspell a variable name while coding. In HCI literature, the point is made that most experts become intermediate users at various points in their use of a product. And this brings me to the main objection I have: a computer language is best viewed as a product and not a language.

While for some purposes it makes sense to accept the metaphor of computer language it is not always the most helpful metaphor. If you are to take a code-monkey who spends more than 4 hours a day writing in a specific language and virtually no other, the metaphor is completely apt. Their familiarity and recall in that language can quickly ramp up to the point where supplementary materials are simply unneeded. This is to be expected as any professional becomes intimately familiar with the tools of their daily trade. But the days when only a few languages existed has long passed. No software engineer today should expect to spend the rest of their career coding in even the most common languages of today 30 years hence. There is simply too much innovation in the field to think that these languages will remain static. Nor should the junior software engineer expect to never need to master new languages in their career. I don't have any data to back it up, but I don't believe the cognitive load on a new software engineer will be less than it was on a senior software engineer (RQ2: over the past 50 years, how many languages have software engineers needed to use over their careers? What is the cognitive load of each language? Has this been consistent over time or is the cognitive load and complexity of the languages been increasing over time?)

Exploring the metaphor of computer language as language, let's compare it to multi-lingualism. It is quite common for people to learn multiple languages in their life. Some become completely fluent in multiple languages. However this is not the norm, especially for languages learned after adolescence. Most people will speak with an accent, a fact I feel we can safely ignore in this metaphor, but more importantly, will stumble on the cultural idioms of that language and will often make mistakes in the conjugation and declension of words or make odd diction choices when expressing themselves. The errors rarely severely degrade communication either in writing or speech due to the dialog nature of verbal communication and the redundancies of the language and our ability to fill in syntactic missteps.

This flexibility in the communications does not extend to communications with a computer at this time. Only in the past few decades have people begun to suggest that human utterances should be met by a less inscrutable host. This one-way communication had been the norm and had been the major stumbling block in the use of computers by non-trained users; ie. experts. The statistical techniques of the 90s and the renewed interest in using natural language is showing how computers can now engage in something that is less like a dialog from human to a computer and moving more toward a dialog. Errors of syntax and semantics are naturally handled in a dialog since the receiver will point out the confusion and wait for the sender to clarify. At its heart, this is what I think this paper is attempting to address. As we must use more languages in our jobs and as people who are not dedicated to full-time use of a language must grapple with expressing their thoughts in that language grow, I think this form of just-in-time end-user support makes more sense than the language model.

The forces of cultural conservatism will naturally resist some of this movement with the expected calls of the loss of discipline among the younger adherents and the we're-all-going-to-hell-in-a-handbasket attitude toward these sops to the sloppy programmers who are too lazy to really learn the languages. I'm clearly not one of them. Rather I have grown frustrated with how little innovation there has been in the tools we use to program computers over my career. I have not yet memorized the correct spelling of the method names for even the most common classes in Java yet and unless I am going to be working in that language for more than 6 months full time I don't see the point. For me the auto complete feature saves me from making a query to lookup the exact spelling. This tool would actually help me even more since if I type what I believe is the correct method name, a feature like this could recognize the small edit distance between what I typed and what was needed and propose it as a correction.

I believe the human truth at work here is the limited speed with which a culture can change. Engineering cultures are far less willing to change than most others and this runs headlong into the rapid innovation in the field. I don't think it is any accident that the most rapid innovation is occurring in the open source marketplace. Open source is highly influenced by the commercial acceptance of new products. The wall between the development lab and the marketplace is virtually transparent there and the culture of business permeates open source and brings a dynamism that drives change. But even progressive cultures have a speed limit. Conservative cultures an even slower speed limit, one that sometimes approaches zero.

This finally brings me to what I see as shared between this movement toward more intelligent and helpful programming tools and the cultural movement toward same-sex marriage. Clearly for many Americans now, it is difficult for them to understand why a relationship built on sexual attraction, shared responsibility and a life-long commitment should be different for two members of the same sex than for two people of different sex. The bonds between reproduction and sex were severed decades ago and the pocketbook issues have nothing to do with any religious tenets. Yet for a majority of Americans today, gay marriage is simply a change they won't accept. I believe that change happens at different speeds for different peoples at different times. It is clear that at some point this will all be old news. But for the moment there is significant resistance to this change.

So too will the engineering culture begin to view programming languages as products which serve as the tools which help us build better software systems and not tests of manhood. The tools we use are not and should not be static but should be fluid objects that grow as our understanding of the tasks at hand grow. Rather than embracing an ethic that says that these tools will change with each new release. In time, it will be common for the tool to suggest improvements in the way we have coded something so as to take advantage of a new feature which we may not yet be familiar with. Is this a bad thing? Will it be wrong if we insist that it accept a deprecated syntax instead of recasting it into the new form? I don't think so but I suspect some contemporary software engineers will just feel they have gone over the hill when they are reduced to this diminished role as coder.

In a perfect world of research, I would propose a longitudinal study to look at the use of language and tools over the natural arc of a software engineer's career. That is clearly beyond the reach of my own research with the possible exception of doing in-depth interviews of senior software engineers of their recollections of the early part of their careers and comparing them to the current generation. Hmm, maybe that isn't too bad an idea.

Friday, April 13, 2012

The Naturalness of Computer Code

I am not proud of the gap in this journal. I have been taking the NLP class offered online by Stanford profs Jurafsky and Manning and, together with my teaching and laziness, have contributed to this gap. But I am compelled to write about my thoughts on the naturalness of computer code. This is in response to two papers I have recently read. The first, Learning a Metric for Code Readability by Buse and Weimer (together with a presentation of a paper by Posnett which extended this work) and On the Naturalness of Software by Hindle, Barr, Su and Devanbu. They touch on a topic I find I have strong opinions about and this journal entry is my first attempt to work these out.

The second paper states a conjecture that "most software is also natural, in the sense that it is created by humans at work...". The first attempts to derive a metric which is predictive of a human's ability to comprehend the code. I would suggest that neither go far enough in their theses. I propose that all software is a specialized type of non-fiction writing that fits into a continuum with other non-fiction writing. That is to say that there is no bright line between computer code and other forms of human utterances that are committed to paper for the purpose of achieving some economic (as opposed to artistic) end. I must confess I am not even convinced of the latter part of this assertion but I will take on the first before considering the second.

Casting a piece of computer code in this light, let us consider some of the maxims of non-fiction writing. The most oft repeated is to know your audience. I think most people naively believe that the audience, if they even think of it that way, for a piece of computer code is the computer itself. I think that this assertion fails almost as quickly as it is uttered. It cannot be denied that the code must be "comprehensible" to the compiler and that the language uttered by the compiler must be understood by the machine it is targeted for. However if machine understanding were all that is required there would not be so much ink spilt over the language debates. In fact I believe I can support the assertion that the human comprehensibility of the code may be the single most important factor in achieving "high-quality" code (whatever that term may mean to you). Computer code has many different human readers. First is the author themselves. Mechanistically writing is viewed as some form of transcription from the author's mind to some artifact. However anyone who has proceeded beyond high school English understands that the process of writing is not simple transcription from some internal form to its external artifact. Writing is thinking; writing is editing; writing is analysis; writing is synthesis. All of these processes occur when one writes computer code and to deny that is to deny the human element of the software creation process.

Even once the product has been committed to some final form that is held both acceptable to the author and the machine in which it will become embodied, human eyes are destined to read it again. As our software artifacts have extended lives, the need to modify the code become more common for correction and extension. Given the separation in time between the original authorship and the modification, even when the author is the one to make the modification, they are a different person. So code modifiability is often a software quality that has economic impact on the owner and the metrics of this quality are worthy of study. This underlies the justification for the research on code readability, one important component of modifiability.

A central thesis of SEI is that software structure is not dictated by functionality but by required qualities. (ok, my statement of their thesis). That thesis is nowhere as clear as it is when attention is given to code readability. Buse, Weimer, and Posnett identify many identifiable traits that frequently encouraged in texts about good software writing; intelligent use of whitespace, good naming standards, complexity of statement structure are three that are sufficient to make my point. Two programs with exactly the same functionality in the same language and the same qualities in use can differ dramatically just from the manipulation of these three variables. I think Posnett showed this quite convincingly in his illustration of a piece of code structured to show the letter pi.

It is obvious that non-fiction writing also has the need for readability and in many cases, modifiability. Organizational procedures manuals, civil codes, want ads, and web sites are only four of a multitude of human media that depend upon modifiability to achieve their purpose in our society. It is no accident that the structure of these artifacts become as important as the text (and images) they contain. Processes and procedures exist to ensure consistency, accuracy, integrity, and availability. Is it just me, or does this not begin to sound a lot like computer code? After thinking about this a great deal, I'm convinced there is no meaningful distinction to be made there. So returning to the theme of "know your audience", we can directly confront the challenge of comprehensibility of code.

Writers strive for clarity in non-fiction writing. Clarity is high when the reader follows the text without confusion, frustration or the need to put it aside to check some reference. The need is the same in computer code. A complicating factor, especially when teaching how to program, is the invisibility of any other human reader and the lack of temporal distance between the author who writes the code and the author who modifies the code. This robs the student of the ability to experience the weaknesses in their own writing. Grader comments help, no doubt, but are often lost in the race for a final grade and a general lack of meaningful experience in which to assimilate the information, assuming of course that the grader has done anything more than enforce a few rubrics given in class. Students rarely learn how to write good code until they have been employed for a time and been taught by journeymen in the art of clear computer code.

This art and its maxims are not unknown, at least in part. As the Buse argument goes, complexity plays a part. This cannot be a surprise since it plays just as much a part in non-fiction text. I once suffered from non-fiction text that had the complexity and rhythm of computer code. Ditty-ditty-dah. Ditty-ditty-dah. The lack of any grace in my text embarrassed me even when it was perfectly functional prose. But that has as much to do with art and the finer aspects of writing as it does clarity. That text was completely clear and comprehensible, just boring and artless. I was not born a good writer and do not consider myself to be good, just better. The tie between complexity and art is that artistic effect comes from the near infinite variation possible in human language and the way that plays upon the reader.

To be clear and artful is a laudable goal for human writing but far beyond what is needed for computer code. Yet the ability to vary the complexity in code exists just as much in computer code as in non-fiction text due to the audience needs. If I am writing code for exemplars for a beginning programming course, I will use very simple statements that don't go beyond ditty-ditty-dah; verb object constructs in a procedural language. But I am also just a capable at railing at code written by very sophisticated programmers who manage to chain a dozen tokens together with dot constructs making the code all but incomprehensible to anyone beyond the author, assuming even they remember how this all worked. The complexity of the statement structures are invariably dictated by the needs of both the author (who after all is the one reader who is never ignored) and anyone else who has influence over the author.

I didn't see it cited in the papers I've read so far on code readability but I assume there was some influence from the work on text readability from Flesch and others. (William F Buckley come to mind). While I don't completely buy into their reduction of text readability into a simple number I must accept the practical use of this when used over large corpora, especially when they are from the same or similar authors such as from newspapers. I believe the number gives an indication but that the true behavior change is more subtle. However given its ease of use it is a good place to start. I take the Buse direction in that same spirit. "You cannot manage what you cannot measure" (who originally said that?) throws down a gauntlet to empiricists and I doubt Buse, Weimar or Posnett believe they wrote the final chapter in that book. But I don't want to get lost in the thicket of numbers before reflecting on the intuition behind the factors that detract from code readability.

I can't resist a digression into a discussion I had with a colleague about a recent article in ACM about Turing. He was derisive of the article in large part because of the obtuse nature of its prose. This was a direct example of how a writer must not become self-indulgent in their style of writing but must treat themselves as a servant in service to the reader. There was no doubt that this author allowed a stuffy English-bred Oxford style of prose permeate their argument to the point where only the most devout reader was likely to extract whatever wisdom existed in that piece. Before we could have even begun to discuss the merits of his argument we would have been forced to agree on what it was he was trying to say. In the end it just wasn't worth the work to us that day.

This brings me to the present where I am looking at the successes of NLP and the assertion that code is a form of natural language that will lend itself to the same processes that have worked with other natural languages. I find resistance in myself to completely embracing this research direction and this is the motivation for this posting. Yes, I see the value in the readability metric but it isn't very difficult to construct code that can generate a "good" number in that metric yet be utterly incomprehensible. The number is far from complete and I have some reservations in believing that comprehensibility can ever be captured in a number while driven to attempt to find one. Before I pursue the quantitative, I want to do some qualitative work in this direction.

For instance, why does white space matter? We know it does but I have not seen a good discussion yet that demonstrates an understanding of something so basic. This brings to mind some lessons from layout design and the language of visual communication. Computer science is far too focused on the linear model of a one dimensional string of text. But code is never comprehended from a linear reading of the text. Our visual mechanism is highly adapted to chunking things and the insertion of white space is no different from punctuation in providing structure to the code. We insert it to make paragraphs of the statements we write. And like a paragraph, we almost immediately reduce a paragraph to some abstraction of that paragraph. We could reduce it to a sub-routine or other linguistic device that explicitly reduced it to a smaller number of tokens. But this loss of linearity can detract on the flow of thought and increase, rather than decrease the readability. I experienced this when I was trying to illustrate how 3 numbers could be sorted by a series of if statements and one student introduced subroutines to capture recurring statements. My instinct rejected that construction since the parallelism was lost in that exercise by the distraction of referring to non-inline code. There are times when the cohesion is maximized by the absence of abstraction.

A pet peeve of mine is the poor use of vertical formatting in many pieces of code. We are all familiar with the need to "pretty print" control structures to clearly show the conditional blocks. The eye naturally sees these as the comprehensive blocks they are and immediately grasps which are "inside" and which are "outside". But I have often seen, and been complimented on, how a long statement has had its comprehensibility improved by the judicious use of line continuation and vertical columnar alignment of repeating constructs. Take a complex if statement with many predicates but one that also has a parallelism that makes it easy to understand once that parallelism is understood. If you put those predicates in a long list with haphazard line continuation the communication of that parallelism is all but lost. A text formatter can easily give you the proper indentation for control structures but one that can bring out the parallelism of this kind of statement cannot simply because the first is purely syntactic while the second is semantic. The comprehensibility of these two code segments will be very different even while their numeric readability score can be made identical. At its core comprehensibility cannot be divorced from the human communication that is being performed by these utterances. To focus too quickly on the numeric assessment of a piece of code in my mind distracts from the inherently human activity that is the ultimate aim of this metric.

Given the inaccessibility of the inherently human activity of comprehension, it is too easy to reject a quantitative approach. I have not reached that level of certainty given the successes of NLP and the low entropy of human utterances. I accept without question that any artifact created by a human will display statistical evidence of authorship. I feel certain it would be possible to determine the authorship of computer code as easily, perhaps even more easily, that it is to assess human authorship in fiction or non-fiction prose. What is even more intriguing is that computer code does not exist in a vacuum but ordinarily exists in the context of many artifacts. While entity naming may vary from person to person, the shared context of these utterances should result in some statistically significant correlation that could potentially tie testing, requirements and code together mechanistically. My research dreams see the cohesion of all project artifacts together with the organizational artifacts to create a transparent and comprehensible system that can demonstrably connect the needs of the organization to the mechanisms that enable the solution.

With that I'll end this stream of consciousness and get back to some real work.

Thursday, April 5, 2012

Analysis and synthesis in software engineering

"In engineering, as in other creative arts, we must learn to do analysis to support our efforts in synthesis. One cannot build a beautiful and functional bridge without a knowledge of steel and dirt, and a considerable mathematical technique for using this knowledge to compute the properties of structures. Similarly, one cannot build a beautiful computer system without a deep understanding of how to 'previsualize' the process generated by the code one writes."
~Abelson and Sussman

Engineering, unlike art, aims to satisfy a client's needs. This involves tradeoffs between the different qualities in the final artifact. To achieve an acceptable tradeoff requires that the designer be able to predict the qualities that the final artifact will exhibit before it is built. Software Engineering currently has a poor track record of predictably achieving a balance of qualities in the final software product. We have achieved some success in predictive models for performance and availability, at least I hope we have. But there are many other qualities that we still lack good measures for no less predictive models for assessing design. End-user usability, code comprehension, modifiability, tracability are just some of the ones that come to my mind. I think this quote is one of the best for capturing the essence of the problem.