Saturday, May 4, 2013

An Outline for a New GE Course in Computer Science

I have just completed teaching a course at a community college for the second time. Before all the volatile thoughts I had are completely lost, I want to sketch out the unique ideas I have about how a course like this should be taught. What I find generally lacking is a narrative thread that connects the disparate arcana that must be covered into a motivating course.

I. Introduction
Is this course really about "computer science"? If so, what is "information technology"? "management information science"? And why is it focused on "computation"? What I teach is really information technology and the central role of information to individuals, groups, organizations and society. While the motivation for most of the innovations through history have been focused on computation or communications, we are finding that we are looking toward and information convergence where all forms of information will be handled with the same technology. To be an educated and productive member of our society you must gain certain basic understandings of this technology, how it is used and the various ways you can approach it in your academic studies.

Show some entertaining Rube Goldberg machines and discuss causality and the design of mechanisms to achieve some end. Emphasize that computers are machines. While details may be daunting, there is nothing going on that a motivated person cannot understand.

I like to present computers in a continuum of mechanized development. The tie between Jacquard and information processing is well known, I think the tie between continued development and the industrial age is not fully explored. I think this is important simply because those students who have mechanical aptitude will find the mechanistic aspect of computers easy to grasp and dwelling on the more mechanical nature of the machines avoids alienating those students who lack this intuitive grasp of machines.



II. History
What is information? Is there information without humans? Does it server human needs? If so, which ones? Who historically funded the innovations in information technology and communications? commerce, military, government, religion. Eventually we see entertainment. Taking commerce as a starting point, you must understand the basics of numbers and numeric representation. This motivates an algebraic understanding of our positional number system and its variants in different bases. Depending on the depth possible in the course, negative numbers and real numbers can be covered in binary notation. Even more interesting is the notion of the digitization of numbers in even real representation and the inherently limited size of machine memory. No physical memory is unlimited and therefore has inherent limits on the size of the number that it can represent. An interesting philosophical point is the innovation of zero and the difficulty of representing nothing. This comes up again with null sets, representing a space and other places in the curriculum.

The basic arithmetic algorithms for the conversion of one base to another versus the conceptual understanding of those representations. The algorithm for long division. The concept of an algorithm as a series of steps that are executed to achieve some result. Summarize rules of binary addition into table.

Introduce the concept of mathematical series to calculate irrational numbers. Present enough to convince students that this work becomes tedious. Conclude with the method of calculating polynomial expressions using a series of differences.

Jump to text and some of the innovations. Transition from oral society to literate society. The first representation of spoken words through writing. Code of Hamirabi. Pictographic representation of thoughts. Development of phonetic alphabet. The limited growth of literacy. Look at literacy rates of 18th century France for grounding.

Use Greeks to discuss discourse and rhetoric. Codification of the laws of argumentation and thinking. The introduction of deductive logic. Cover basic forms of AND, OR (both inclusive and exclusive), NOT. Introduce truth tables. Nesting of expressions.

I like to cover the reality of the Medieval period of Europe and the growth of the Islamic world since major innovations occurred outside Europe and were introduced in the Renaissance like Arabic numerals, zero, algebra and double entry bookkeeping. It was the growth of trade that made this bookkeeping needed and also provided the explosive growth that led to the middle class and the growth of literacy, the reformation and the importance of the moveable type press. lower case, typesetting, italic, serifs. These are all needed to understand web technologies and word processors.

Cover the concept of variable in algebra if needed. The unknown which becomes a container for a value to be determined. If possible and necessary, cover rules of algebra like commutative, associative and distributive.

Discuss early attempts at mechanical computation like abacus used for commerce and still popular and the Pascal calculator. Show how innovations in timekeeping and metallurgy made this possible. Discuss decade counter (carry), odometer. If desired, discuss how negative is represented using an odometer (-1 = 9999) and tie back to the concept of ones complement.

Communication innovations. Lanterns in towers, flags. All depended on some system of alphabetic or message encoding. Discuss difference and give examples. motivates ASCII code later on. Military applications and encoding for secrecy. Discuss bandwidth, parallel, serial, representing null messages, secrecy and ways of encrypting messages.

Introduce Enlightenment and Industrial Revolution. Discuss basic electromagnetism if needed. Discuss need for communication to achieve coordination and the parallel growth of railroads and communication. Morse code, mechanics of receive and forward of discrete messages throughout a telegraph network. Do this in a way that motivate the packet switched network and introduce basic concepts like point-to-point, broadcast, routing. The growth in the publishing of books of transcendental equations for engineers and scientists.

Cover more logic. Boolean algebra. discuss how rules of Boolean algebra can be accomplished using simple mechanical devices. Introduce notion of binary addition as identical to simple Boolean expressions.

Talk about the tedium of looming, especially complex Jacquard fabrics. Describe the method of weaving them prior to mechanical loom. Discuss how the use of cards was used to control the needles and could be looped together to repeat the pattern. This allowed one loom to create many different patterns by changing the cards. The master weaver could then create many card decks for the various patterns.

Discuss Babbage first creation which mechanized the solving of polynomial equations with a geared machine. Input was via dials and levers. Discuss how he used Jacquard innovation to make machine setup easier. Ada describing cards that would solve a problem. Hollerith and 1890 census and success of punched cards to hold information.

Introduce typewriter as an innovation of moveable type press. Show how combination of telegraph and typewriter lead to teletype. Discuss difficulty of having national codes when communication goes global. Cover Baudot code if time allows. ASCII code. Multiplexing to compress/decompress signals.

Telephone. Analog versus digital. Waveform. How to digitize? PCM. How do you represent digital using analog? modem. can cover circuit switching if desired. Emphasize bandwidth. If desired, cover carrier and its encoding. Explain this as mechanism for most DSL.

There is a great deal of history on non-contemporary computer architecture. I skip it and jump ahead to the concept of gates. I like to emphasize that a gate can be implemented with very simple switches. Progress through innovations to speed switching; relays, tubes, transistors.

Revisit logic and introduce NAND, NOR and then describe half-adder circuit. Make tie between logic and arithmetic explicit. Motivate need for speed by discussion of military uses; munitions that could fire beyond horizon making visual correction impossible, the element of surprise that is lost if first shell does not land on target. the widespread use of "computers" (people) to do many complex algebraic computations. This leads to the rush to create a machine to perform the theoretical calculations needed to create the A-bomb and later the H-bomb.

The red scare and the race for space spurred new innovations like the integrated circuit. Moore's law.




privacy (anonymity) versus secrecy

**********************************

Addendum. I am reconsidering this approach in that more math must be introduced earlier. I still believe that it is better to introduce language and its various representations before math, it should still be introduced almost immediately in the form of numeric representation. I think the leap from addition and subtraction to multiplication and division would be good, particularly at the community college level since long division offers a good introductory algorithm that is not beyond their mathematical maturity. This ties in nicely with the transition from clocks to calculators as they grappled with these operations using gears.

Structuring Software Engineering Case Studies to Cover Multiple Perspectives

This post is my riff on a paper titled Structuring Software Engineering Case Studies to Cover Multiple Perspectives by Emil Boerjesson and Rober Feldt from Chalmers Univ. The paper offers suggestions on how multiple perspectives can be ensured by using a 6 step process. In their case study, they wanted to ensure they looked at the V&V process using four different perspectives; Business, Architecture, Process and Organization (BAPO) as well as from three distinct temporal perspectives; past, present, future (PCF).

The paper does not have any deep contribution to the case study approach to software engineering research; however it does provide an easy paper for the start of understanding how case studies can be used in the research of software engineering.

They call a case study:

  • an observational research method,
  • a way to investigate a phenomenon in its context,
  • applicable when there is no clear distinction between the phenomena and its context,
  • have guidelines recently published by Runeson and Hoest [1]

Their six-step methodology is:
  1. Get Knowledge about the Domain
  2. Develop focus Questions/Areas
  3. Choice of Detailed Research Methods
  4. Data collection
  5. Data analysis and Alignment
  6. Valuation Discussion














[1] P. Runeson and M. Hoest, "Guidelines for conducting and reporting case study research in software engineering," Empirical software Engineering, vol. 14, no. 2, pp. 131-164, 2009.


The Accretion of Structure in Large, Dynamic Software Systems: A Socio-Technical View

On the outside chance there is really anyone following this blog, apologies. If you acknowledge reading this I will be more inclined to post more often. But this blog serves as my sounding board for my developing research thoughts. The title of this post could potentially be a dissertation but it is too ambitious to be mine. Yet it does indicate the direction my thoughts are taking.

Large software systems that reach any great level of popularity do not fall from the sky fully formed. They may have germinated from the seed of some idea that one individual had or they may have been created to solve some problem. Either way, that is only the origin story of the product. To reach its mature form, required the efforts of many people over some period of time and we classically refer to this as a project. A project has the creation of the product as its end goal. Once that goal is achieved the classical view is that some organization, an organization that may have been created coincident to the creation of the product, will take ownership of the product and use that asset for the betterment of the organization. If the single product is the sole asset of that organization, its fortune will be tied to the product life-cycle of that asset and will cease to exist with the retirement of that product.

Contemporary software engineering research owes a debt of gratitude to the open source movement. The benefit of the movement to research is the open nature of the work. Rather than guard the artifacts of software development behind the shroud of proprietary secrets, the open source community thrives on an atmosphere of complete transparency. Even better, it has a tradition of maintaining these records for posterity making longitudinal study possible in a way that is unthinkable with for-profit corporate development. The oldest and best known of these efforts is the Apache web server and the organization it spawned. It is this organization and the many products that can be tied to this organization that has been my latest object of research.

The history of the Apache Software Foundation is well known. It has matured as an organization over the years and epitomizes a form of software creation and stewardship that is neither completely altruistic and selfless nor constrained by the commercial marketplace for software. As such, it may, or may not, result in software that is qualitatively different over the product life-cycle than the products from such well known software houses as Microsoft and Oracle. But whether they are different or not, it is possible to study the inception, growth, decay and retirement of these software products in ways that are impossible for the commercial products. It is the forces of this evolution that I am drawn to and am looking very closely at in the various Apache projects. (Note that unlike a classic project, an Apache project is really an organization that is responsible for both the creation, maintenance and general stewardship of the software product until its retirement.)

A thesis I have is that program correctness is not really the most appropriate aspect of the dynamic code maintenance that most of my colleagues think it is. Rather I believe that it is the change in the non-functional behavior of the product that drives more decisions, and ultimately predicts the ultimate failure of the product. The froth that is seen around the correctness of the product is certainly important. But a product that fails to deliver in performance, scalability, maintainability, adaptability or any of dozens of other qualities will either find a niche or will need to be re-engineered to meet these new challenges.

I say re-engineered in the sense that the small focus of most enhancement requests or bug reports do not allow for a scope that is sufficient to address the refactoring that is most often needed to achieve these ends. In many cases the skills gained by  the creation of this product result in a functional team that are motivated to either significantly enhance the product to give it those qualities or to leave the product as is and create a new one via fork or greenfield development that will improve upon the prior product in these key qualities. Often the relationships between these distinct products is overlooked and the debt one product owes to a prior product lost in the mist of time.

The history of the Apache Software Foundation is celebrated. But every day there are projects that choose to put their products into retirement because of lack of interest. If other projects were spawned by the project, they are sporadically documented, sometimes in the project archives and communications, or sometimes through the press which covers open source development. What I have lately come to realize is that there is no interest in documenting this history and as web links rot or repositories taken off line, history is being lost every day. It is perhaps Quixotic to concern myself with trying to prevent this complete loss but I do. As such, I am trying to document as many origin stories for Apache projects as I can in the belief that if I do not, some of this information will eventually become unattainable. What is harder is to justify this effort in terms of my own research.

No major software product exists that did not require thousands of decisions, some big, most small, made by the team members over time. Software developers are notoriously averse to documentation so these decisions are as often inferred as observed. This is where Big Data and the statistical methods of empirical software engineering are most helpful. But where the data is not available in a form usable by these techniques, the techniques are useless. Since the gathering of this data and putting it into a form that is suitable for quantitative analysis is exceedingly tedious, it is usually not done. In this reticence to tackle difficult data gathering tasks, I see an opportunity. I see this as an opportunity to both do the deep dive into more interesting decisions made during software development, their relationship to the form and structure of the software system and also make a significant contribution to the preservation of some important history that may have value for researchers in the future. Perhaps I am suffering from a touch of hubris in this but it helps my motivation to feel that my contribution may surpass my own prosaic goals of satisfying institutional requirements for a degree or getting some papers published.

I'll end my post here since the work is ongoing and quite fluid. But this post gives me a short statement that I can share with others who may take an interest in my current direction.


Saturday, March 23, 2013

Quotes from The End of History and the Last Man by Francis Fukuyama. First Free Press trade paperback edition 2006

A seat mate of mine on a flight from DC to Sacramento once suggested that this book is an influential book for the neocon movement and hence it went on my list of books to read. What follows are quotes that I might be able to use in the future.

"But the truth is considerably more complicated, for the success of liberal politics and liberal economics frequently rests on irrational forms of recognition that liberalism was supposed to overcome. For democracy to work, citizens need to develop an irrational pride in their own democratic institutions, and must also develop what Tocqueville called the 'art of associating' which rests on the prideful attachment to small communities. These communities are frequently based on religion, ethnicity, or other forms of recognition that fall short of the universal recognition on which the liberal state is based. The same is true for liberal economics. Labor has traditionally been understood in the Western liberal economic tradition as an essentially unpleasant activity undertaken for the sake of the satisfaction of human desires and the relief of human pain. But in certain cultures with a strong work ethic, such as that of the Protestant entrepreneurs who created European capitalism, or of the elites who modernized Japan after the Meiji restoration, work was also undertaken for the sake of recognition. To this day, the work ethic in many Asian countries is sustained not so much by material incentives as by the recognition provided for work by overlapping social groups, from the family to the nation, on which these societies are based. This suggests that liberal economics succeeds not simply on the basis of liberal principles, but requires irrational forms of thymos as well."
pp xix-xx


Friday, January 25, 2013

Our paper this week is Echoes of Power: Language Effects and Power Differences in Social Interaction  which explores how to identify power differences between people in a domain independent way. http://www.mpi-sws.org/~cristian/Echoes_of_power_files/echoes_of_power.pdf.

The central thought is that the way in which one person "coordinates" their linguistic style to the style of the person or group with which they are communicating can be an indicator of the power relationship between them. If this is true for open source software as well as for the Wikipedia and Supreme Court corpa they explore, it can be exploited to create a power hierarchy in these open source projects. With a graph of the power network it may be possible to infer project outcomes based on various network metrics of that power network.

This paper also makes me wonder to what extent it could be extended to code style. The references suggest that they are building on some other work that finds a prose style that is characteristic for an individual. Given the high level of semantics in the tokens and rigid syntax in a computer program is it even possible to find some domain independent marker of programmer style? I know this has been extensively explored and this paper makes me more curious to see what has already been found.

Friday, August 10, 2012

Free Software and Communism

Sac State has a program through MSDN that provides their current OS for free to the students. UC Davis does not (or at least I haven't found it yet if they do). Curious. But the bigger issue is my changing thoughts about what I expect to spend for software.

In the 30 years I have had an association with Microsoft I have probably spent in excess of $10,000 buying their products. Despite this continued relationship Microsoft doesn't know I exist and provides some of the worst customer service I have ever experienced. I have had a few times where it was a bug in their product I wanted to talk about and have been given the run-around or been asked to fork over $100 to talk to someone. Microsoft has historically charged too much for an inferior product for far too long. They seem to relate to the consumer market the way a farmer relates to a field crop.

While I myself am not the sort inclined to tinker with an OS, many people I have known are. The Linux and open source movement have been incredibly empowering for those of us who have the capability and inclination to tinker. First it is free. That of course is a huge advantage seeing that Microsoft has the audacity to float an undiscounted price for Windows 7 that is over $500. If I hadn't been getting their products for free from school I would have abandoned them long ago. There is a certain hubris that the people they might want to recruit to write operating systems would be asked to spend a significant amount of money in their education using scarce resources to buy their products. It would be in their interest to ensure that every CS major receives a generous support from them if for no other reason than good PR. So at some level good CS students must find it galling that they are asked to pay a high price for an inferior product. Is it any wonder that it is often students are the most creative in finding ways to avoid paying for their products?

Since it is CS students and the professionals they become use software as their tools of the trade daily over their careers, it makes perfect sense that they will be the most critical consumers of these tools. Any worker who uses a tool daily will choose and maintain her tools with great care. When the market provides only a mediocre tool, there is an opportunity for some worker to turn their attention to the tool. Of course many workers do not have the skills to tinker with their tools. A machine shop worker probably does not have the technology at hand to tinker with a high-precision lathe. But a software engineer with the source code for a major piece of system's software does. While it is only a minority of these engineers who are inclined to tinker, it is enough to create an alternative in the tool market. We have certainly seen that exhibited in the market for open source software and the active support it gets.

In some idealized libertarian universe, these talented tinkerers could have formed an alternative to Microsoft and made some money from their collective efforts. But creating and managing a business is not a trivial affair and certainly not what motivates talented software engineers. Engineers in general are great admirers of functionality and not of management. Their are significant intrinsic rewards to the creation of a good product but few in the marketing and sales of that product to them. Yes, there are some rare individuals who excel at both. But their scarcity relative to the number of highly talented engineers is precisely my point. The open source model was appealing to them because it gave them control over their means of production and empowered them to do things for their clients that are impossible in a closed  model.

Some have tried to cast these engineers as proto-communists contributing to a common good. (from each according to his ability, to each according to their need). But if that attitude exists in the open source community, I have not yet seen it baldly stated. Rather than being driven by some ideological position, the market seems to be moving along a highly pragmatic path of least resistance. The closed software market has not met their needs to provide high-quality, responsive solution to the clients using their tools. Instead of forming highly responsive relationships with these integrators, the companies seem to have put their needs behind the company's. Doesn't it make sense that someone responsible for providing the company with a high-quality web server would be more inclined to support Apache than IIS? It is in their self-interest to choose this simply because they have greater control over the resolution of any problem or enhancement than they would over a close product at the leading edge of its deployment. The fact that other people gain access to a high quality product as a consequence is irrelevant to them and takes nothing from them. In fact, the widespread adoption of Apache enhances their marketability since employers who desire to avoid IIS, for whatever reason, will seek them for employment.

No, I am not finding any incipient communism in the growing open source movement but instead a marketplace that is responding to the maxim 'information wants to be free" plus an enlightened self-interest among the most talented software engineers of our day. They seem to be having a blast. Now if we can only figure out where the margins of this new market lie.

Wednesday, August 1, 2012

A First Look at Commit Data for an Open Source Software Project

I got a little something to sink my teeth into. I am looking at commit data for some open source projects. This is mostly an exercise in regaining my sql chops and learning R. Here is my first plot:


The plot has the alias of the submitter along the X and the timestamp along the Y. What you can see for this one project is that there are two heavily dedicated submitters who both started working on the project about the same time, one who is more sporadic and started shortly before the two of them and two who appear to have started the project but only periodically commit although their activity is relatively consistent over the entire length of the project. What is somewhat surprising is how many there are who have almost no commit activity (there is some doubt regarding whether this alias is the submitter or committer although it is supposed to be the commit id). It seems odd that someone would have gained committer status and then stopped commiting. Will I see this pattern in other projects? Who are these heavy committers and how do they differ in role from the people who appear to have started the project?