Friday, May 11, 2012

What Gay Marriage and Sloppy Programming Have In Common

This week marked a high water mark for the progressive agenda in the president's acknowledgement that he does not see a fundamental difference between gay and straight marriage. As today's NYT observed, the speed with which social change occurs seems to be accelerating and they posit that it is the result of media. I'll leave that question to other researchers like Barton Friedland to ponder. But for me this comes the same week our group looked at a paper titled Sloppy Programming by Little, Miller, Chou, Bernstein, Cypher and Lau from MIT's SAIL and IBM's Almaden Research Center. The discussion raised my mojo and seems to hit close to where my thesis may go. I also see a rather significant connection between these two events and this takes some explanation.

The Sloppy Programming referenced by the title of the paper refers to the error that occurs when a coder enters a line of code that fails for lack of syntax or semantics, among other things. They explore several ways in which this error can be handled including Quack, and Eclipse plugin. They attempt to mimic the auto complete functionality that Eclipse offers in Java whereby it offers a reasonable list of things that might come next while the code is being entered.

The existing auto complete function draws its power from the fact that as the line is typed, the possible tokens that can come next becomes severely constrained. Once one types an object and then a period, the set of possible tokens is limited to the methods and attributes of that object. The authors attempt to expand this concept to encompass other constraints in an attempt to interpret tokens that are not syntactically correct as they appear on the line. The existing auto complete cannot function since the context for the typed statements cannot be as easily interpreted and the task is to look at what they COULD mean rather than what they DO mean. They do this using techniques from NLP that basically look at the context for the statement and look for syntactically valid statements that include the typed words in that statement. I don't find the paper itself groundbreaking since this seems like a very straightforward attempt to offer greater support to the coder in the task of creating both a semantically and syntactically correct statement. What fascinates me about the paper is how it was received.

My biggest turnoff in my undergrad experience was the cultural norms of the engineering school I attended. I was completely and utterly an outsider. Coming out at the time did nothing to help me feel comfortable in my own skin or at the institution. In retrospect, it is little wonder that I dropped out and took a very bitter aftertaste of academic life with me.

From my own perspective, engineers culturally have a surprisingly uniform and distinct world view from other disciplines. I often joke that every engineer is a bit autistic but in my heart I'm not so sure there isn't a bit of truth in that joke. Engineers see the beauty in logic and math and have crystalline visions for how systems should work. As long as they are dealing with purely physical systems, they do brilliant work. However most engineers I know begin to act like a misfiring engine when faced with human systems. The management decision processes, the illogical behavior of markets and human behavior are accepted to various degrees but usually grudgingly.

Business people are sometimes diametrically opposed to this world view embracing the humanity of systems and often with little or no true comprehension for the mechanical world they depend upon. I had a very competent lawyer friend who once needed very close handholding on how far to insert a document into a fax auto feeder having no feel for anything so mechanical. The intuition that you put it in until you feel the resistance was something that was utterly foreign to his consciousness even while he had the haptic sensations, he couldn't integrate that into the task. I mention this cultural divide because I think it still exists today albeit at an attenuated level and I see that divide in this paper and the reactions to it.

Engineers are a macho lot on the whole. Software engineers are not immune to this pull. I see it in the pride they take in their mastery of the various arcane skills that are needed to create quality systems. This paper seems to cut against the grain for some of them by offering a "crutch" to help coders by reducing the cognitive load a language may offer. In a spirited debate after the paper, D. and I discussed the value of this work with him questioning whether the empirical evidence could be gathered to show that this innovation would do any good. He also rightly pointed out that there would need to be a better statement of the problem and context before the empirical study could be approached. He is, of course, correct.

But beside the research questions, which I intend to return to, I find it interesting that his knee-jerk reaction to the work was negative. I also heard a similar reference to "imprecision" in a comment by V. in questioning the paper. Most damning is the title itself. Sloppy is pejorative. While the intent is clearly understood, it belies a certain judgement that if the coder does not get the syntax correct, it is some failure to properly learn the language so as to avoid the error in the first place. For my taste, at least, I could have accepted "Fuzzy Programming" as a more accurate way of conveying the same idea.

I hear echos of a very old argument from Dijkstra in this line. He properly observed that "bug" in its origin story in computers is not the correct term for the errors that are found in source code. A bug is beyond human control, almost literally a deus ex machina that cannot be predicted or prevented (OK, maybe putting the screens back in the windows would have prevented them). A modern equivalent would be the overheating of a rack due to the failure of a cooling component in the machine room. But to ascribe the label bug to an error in coding is to disown and distance oneself from the error in logic that was made. In the very bad old days of mainframe computing, professional became amazingly good at deskchecking for syntax errors simply because it was necessary for improving productivity.

The strength in D.'s resistance to the paper stems from another paper he remembers reading that suggested that code crutches delayed the language proficiency rather than accelerated it. Hence his reference to this mechanism as one of a crutch. He saw a difference between this mechanism and the auto complete suggestions of Eclipse Java but we didn't have a chance to drill down into where the distinction exists in his view. I suspect it has to do with the extent to which the mechanism can be ignored by an experienced coder. I want to explore this in a bit more depth before I go on.

D. echoed something I also deeply believe in; computer languages are not so different from human languages in that you do not learn them superficially. He went so far as to suggest that you learn to dream in them and while I don't quite see it that way, I do believe that the constructs of the language go deeper than our linguistic center and allow us to envision constructs of statements in that language and express those constructs in the syntax very quickly once we become proficient in that language. In his discussion, he made a impassioned argument on why committing a language to memory was vital to gaining full mastery of that language. A part of me agrees that maximum productivity in a language will never be achieved if you continually struggle for the syntactically correct way to express a semantic point. Just like a speaker not yet comfortable in conversational English will struggle to find the correct word or improperly conjugate a vowel, maximum communication over that channel cannot be achieved since these errors can impeded understanding.

But where I begin to differ with D. regarding the direction of this paper has to do with the larger context of computer languages and our orientation to them. The authors of this paper started by showing a browser command line interface that would accept more natural language and attempt to interpret user commands relative to the current page. It seems clear that the primary thought for a command line such as this would not be for a keyboard interface but rather for a spoken language interface for the casual user where verbal utterances often use linguistic devices that assume the current context. This is a markedly different context for this feature than the proficient programmer coding in a favored language that D. had objected to. Yet I don't see the bright line between the two that D. does. Rather I take a cue for interpreting this from HCI literature.

In HCI, the user is never taken as a member of a homogeneous group. If the tool has any complexity at all, there will be noobs, intermediate users and experts in the tool. The type and level of support change with the experience and expectations of the user. D.'s reaction was at the level of the expert or those who would want to become an expert. One assumption here is that everyone using a given language intends to become an expert in that language. An expert will value innate knowledge of the syntax and a wide vocabulary of the keywords in that language. Any thing that would stand in the way of that expertise is to be avoided. This leads me to a research question that would be interesting: RQ1: given a support tool such as described by this paper, how is the learning curve moved by its presence or absence over the long-term? I suspect that in the study D. recalls, the cohort was not as motivated to achieve high levels of proficiency in the language under study. I would expect that if the cohort were students who were not committed to using this language over a significant period of time with significant incentives to improve productivity. The kind of effort required to memorize the nuances or quirks of a language represent a significant cognitive investment. Like any investment, motivation will depend upon the payback. For a professional committed to a given language over a career, the payback is easy to see. For a student with no committment to the continued use of the language, the payback is not so clear.

D. questioned the value of the enhancement for the noob. I agree that for someone who has not yet begun to grasp the fundamentals of the language this type of tool would have very limited utility to them in a programming language. But in the broader application of the concept in various contexts, I can see the value of adding additional intelligence into the application to go beyond a simple rejection of the statement as invalid. This will have its greatest use for someone who is at an intermediate level of proficiency but who may occasionally forget or mis-remember a keyword or misspell a variable name while coding. In HCI literature, the point is made that most experts become intermediate users at various points in their use of a product. And this brings me to the main objection I have: a computer language is best viewed as a product and not a language.

While for some purposes it makes sense to accept the metaphor of computer language it is not always the most helpful metaphor. If you are to take a code-monkey who spends more than 4 hours a day writing in a specific language and virtually no other, the metaphor is completely apt. Their familiarity and recall in that language can quickly ramp up to the point where supplementary materials are simply unneeded. This is to be expected as any professional becomes intimately familiar with the tools of their daily trade. But the days when only a few languages existed has long passed. No software engineer today should expect to spend the rest of their career coding in even the most common languages of today 30 years hence. There is simply too much innovation in the field to think that these languages will remain static. Nor should the junior software engineer expect to never need to master new languages in their career. I don't have any data to back it up, but I don't believe the cognitive load on a new software engineer will be less than it was on a senior software engineer (RQ2: over the past 50 years, how many languages have software engineers needed to use over their careers? What is the cognitive load of each language? Has this been consistent over time or is the cognitive load and complexity of the languages been increasing over time?)

Exploring the metaphor of computer language as language, let's compare it to multi-lingualism. It is quite common for people to learn multiple languages in their life. Some become completely fluent in multiple languages. However this is not the norm, especially for languages learned after adolescence. Most people will speak with an accent, a fact I feel we can safely ignore in this metaphor, but more importantly, will stumble on the cultural idioms of that language and will often make mistakes in the conjugation and declension of words or make odd diction choices when expressing themselves. The errors rarely severely degrade communication either in writing or speech due to the dialog nature of verbal communication and the redundancies of the language and our ability to fill in syntactic missteps.

This flexibility in the communications does not extend to communications with a computer at this time. Only in the past few decades have people begun to suggest that human utterances should be met by a less inscrutable host. This one-way communication had been the norm and had been the major stumbling block in the use of computers by non-trained users; ie. experts. The statistical techniques of the 90s and the renewed interest in using natural language is showing how computers can now engage in something that is less like a dialog from human to a computer and moving more toward a dialog. Errors of syntax and semantics are naturally handled in a dialog since the receiver will point out the confusion and wait for the sender to clarify. At its heart, this is what I think this paper is attempting to address. As we must use more languages in our jobs and as people who are not dedicated to full-time use of a language must grapple with expressing their thoughts in that language grow, I think this form of just-in-time end-user support makes more sense than the language model.

The forces of cultural conservatism will naturally resist some of this movement with the expected calls of the loss of discipline among the younger adherents and the we're-all-going-to-hell-in-a-handbasket attitude toward these sops to the sloppy programmers who are too lazy to really learn the languages. I'm clearly not one of them. Rather I have grown frustrated with how little innovation there has been in the tools we use to program computers over my career. I have not yet memorized the correct spelling of the method names for even the most common classes in Java yet and unless I am going to be working in that language for more than 6 months full time I don't see the point. For me the auto complete feature saves me from making a query to lookup the exact spelling. This tool would actually help me even more since if I type what I believe is the correct method name, a feature like this could recognize the small edit distance between what I typed and what was needed and propose it as a correction.

I believe the human truth at work here is the limited speed with which a culture can change. Engineering cultures are far less willing to change than most others and this runs headlong into the rapid innovation in the field. I don't think it is any accident that the most rapid innovation is occurring in the open source marketplace. Open source is highly influenced by the commercial acceptance of new products. The wall between the development lab and the marketplace is virtually transparent there and the culture of business permeates open source and brings a dynamism that drives change. But even progressive cultures have a speed limit. Conservative cultures an even slower speed limit, one that sometimes approaches zero.

This finally brings me to what I see as shared between this movement toward more intelligent and helpful programming tools and the cultural movement toward same-sex marriage. Clearly for many Americans now, it is difficult for them to understand why a relationship built on sexual attraction, shared responsibility and a life-long commitment should be different for two members of the same sex than for two people of different sex. The bonds between reproduction and sex were severed decades ago and the pocketbook issues have nothing to do with any religious tenets. Yet for a majority of Americans today, gay marriage is simply a change they won't accept. I believe that change happens at different speeds for different peoples at different times. It is clear that at some point this will all be old news. But for the moment there is significant resistance to this change.

So too will the engineering culture begin to view programming languages as products which serve as the tools which help us build better software systems and not tests of manhood. The tools we use are not and should not be static but should be fluid objects that grow as our understanding of the tasks at hand grow. Rather than embracing an ethic that says that these tools will change with each new release. In time, it will be common for the tool to suggest improvements in the way we have coded something so as to take advantage of a new feature which we may not yet be familiar with. Is this a bad thing? Will it be wrong if we insist that it accept a deprecated syntax instead of recasting it into the new form? I don't think so but I suspect some contemporary software engineers will just feel they have gone over the hill when they are reduced to this diminished role as coder.

In a perfect world of research, I would propose a longitudinal study to look at the use of language and tools over the natural arc of a software engineer's career. That is clearly beyond the reach of my own research with the possible exception of doing in-depth interviews of senior software engineers of their recollections of the early part of their careers and comparing them to the current generation. Hmm, maybe that isn't too bad an idea.

No comments:

Post a Comment