Saturday, May 4, 2013

An Outline for a New GE Course in Computer Science

I have just completed teaching a course at a community college for the second time. Before all the volatile thoughts I had are completely lost, I want to sketch out the unique ideas I have about how a course like this should be taught. What I find generally lacking is a narrative thread that connects the disparate arcana that must be covered into a motivating course.

I. Introduction
Is this course really about "computer science"? If so, what is "information technology"? "management information science"? And why is it focused on "computation"? What I teach is really information technology and the central role of information to individuals, groups, organizations and society. While the motivation for most of the innovations through history have been focused on computation or communications, we are finding that we are looking toward and information convergence where all forms of information will be handled with the same technology. To be an educated and productive member of our society you must gain certain basic understandings of this technology, how it is used and the various ways you can approach it in your academic studies.

Show some entertaining Rube Goldberg machines and discuss causality and the design of mechanisms to achieve some end. Emphasize that computers are machines. While details may be daunting, there is nothing going on that a motivated person cannot understand.

I like to present computers in a continuum of mechanized development. The tie between Jacquard and information processing is well known, I think the tie between continued development and the industrial age is not fully explored. I think this is important simply because those students who have mechanical aptitude will find the mechanistic aspect of computers easy to grasp and dwelling on the more mechanical nature of the machines avoids alienating those students who lack this intuitive grasp of machines.



II. History
What is information? Is there information without humans? Does it server human needs? If so, which ones? Who historically funded the innovations in information technology and communications? commerce, military, government, religion. Eventually we see entertainment. Taking commerce as a starting point, you must understand the basics of numbers and numeric representation. This motivates an algebraic understanding of our positional number system and its variants in different bases. Depending on the depth possible in the course, negative numbers and real numbers can be covered in binary notation. Even more interesting is the notion of the digitization of numbers in even real representation and the inherently limited size of machine memory. No physical memory is unlimited and therefore has inherent limits on the size of the number that it can represent. An interesting philosophical point is the innovation of zero and the difficulty of representing nothing. This comes up again with null sets, representing a space and other places in the curriculum.

The basic arithmetic algorithms for the conversion of one base to another versus the conceptual understanding of those representations. The algorithm for long division. The concept of an algorithm as a series of steps that are executed to achieve some result. Summarize rules of binary addition into table.

Introduce the concept of mathematical series to calculate irrational numbers. Present enough to convince students that this work becomes tedious. Conclude with the method of calculating polynomial expressions using a series of differences.

Jump to text and some of the innovations. Transition from oral society to literate society. The first representation of spoken words through writing. Code of Hamirabi. Pictographic representation of thoughts. Development of phonetic alphabet. The limited growth of literacy. Look at literacy rates of 18th century France for grounding.

Use Greeks to discuss discourse and rhetoric. Codification of the laws of argumentation and thinking. The introduction of deductive logic. Cover basic forms of AND, OR (both inclusive and exclusive), NOT. Introduce truth tables. Nesting of expressions.

I like to cover the reality of the Medieval period of Europe and the growth of the Islamic world since major innovations occurred outside Europe and were introduced in the Renaissance like Arabic numerals, zero, algebra and double entry bookkeeping. It was the growth of trade that made this bookkeeping needed and also provided the explosive growth that led to the middle class and the growth of literacy, the reformation and the importance of the moveable type press. lower case, typesetting, italic, serifs. These are all needed to understand web technologies and word processors.

Cover the concept of variable in algebra if needed. The unknown which becomes a container for a value to be determined. If possible and necessary, cover rules of algebra like commutative, associative and distributive.

Discuss early attempts at mechanical computation like abacus used for commerce and still popular and the Pascal calculator. Show how innovations in timekeeping and metallurgy made this possible. Discuss decade counter (carry), odometer. If desired, discuss how negative is represented using an odometer (-1 = 9999) and tie back to the concept of ones complement.

Communication innovations. Lanterns in towers, flags. All depended on some system of alphabetic or message encoding. Discuss difference and give examples. motivates ASCII code later on. Military applications and encoding for secrecy. Discuss bandwidth, parallel, serial, representing null messages, secrecy and ways of encrypting messages.

Introduce Enlightenment and Industrial Revolution. Discuss basic electromagnetism if needed. Discuss need for communication to achieve coordination and the parallel growth of railroads and communication. Morse code, mechanics of receive and forward of discrete messages throughout a telegraph network. Do this in a way that motivate the packet switched network and introduce basic concepts like point-to-point, broadcast, routing. The growth in the publishing of books of transcendental equations for engineers and scientists.

Cover more logic. Boolean algebra. discuss how rules of Boolean algebra can be accomplished using simple mechanical devices. Introduce notion of binary addition as identical to simple Boolean expressions.

Talk about the tedium of looming, especially complex Jacquard fabrics. Describe the method of weaving them prior to mechanical loom. Discuss how the use of cards was used to control the needles and could be looped together to repeat the pattern. This allowed one loom to create many different patterns by changing the cards. The master weaver could then create many card decks for the various patterns.

Discuss Babbage first creation which mechanized the solving of polynomial equations with a geared machine. Input was via dials and levers. Discuss how he used Jacquard innovation to make machine setup easier. Ada describing cards that would solve a problem. Hollerith and 1890 census and success of punched cards to hold information.

Introduce typewriter as an innovation of moveable type press. Show how combination of telegraph and typewriter lead to teletype. Discuss difficulty of having national codes when communication goes global. Cover Baudot code if time allows. ASCII code. Multiplexing to compress/decompress signals.

Telephone. Analog versus digital. Waveform. How to digitize? PCM. How do you represent digital using analog? modem. can cover circuit switching if desired. Emphasize bandwidth. If desired, cover carrier and its encoding. Explain this as mechanism for most DSL.

There is a great deal of history on non-contemporary computer architecture. I skip it and jump ahead to the concept of gates. I like to emphasize that a gate can be implemented with very simple switches. Progress through innovations to speed switching; relays, tubes, transistors.

Revisit logic and introduce NAND, NOR and then describe half-adder circuit. Make tie between logic and arithmetic explicit. Motivate need for speed by discussion of military uses; munitions that could fire beyond horizon making visual correction impossible, the element of surprise that is lost if first shell does not land on target. the widespread use of "computers" (people) to do many complex algebraic computations. This leads to the rush to create a machine to perform the theoretical calculations needed to create the A-bomb and later the H-bomb.

The red scare and the race for space spurred new innovations like the integrated circuit. Moore's law.




privacy (anonymity) versus secrecy

**********************************

Addendum. I am reconsidering this approach in that more math must be introduced earlier. I still believe that it is better to introduce language and its various representations before math, it should still be introduced almost immediately in the form of numeric representation. I think the leap from addition and subtraction to multiplication and division would be good, particularly at the community college level since long division offers a good introductory algorithm that is not beyond their mathematical maturity. This ties in nicely with the transition from clocks to calculators as they grappled with these operations using gears.

Structuring Software Engineering Case Studies to Cover Multiple Perspectives

This post is my riff on a paper titled Structuring Software Engineering Case Studies to Cover Multiple Perspectives by Emil Boerjesson and Rober Feldt from Chalmers Univ. The paper offers suggestions on how multiple perspectives can be ensured by using a 6 step process. In their case study, they wanted to ensure they looked at the V&V process using four different perspectives; Business, Architecture, Process and Organization (BAPO) as well as from three distinct temporal perspectives; past, present, future (PCF).

The paper does not have any deep contribution to the case study approach to software engineering research; however it does provide an easy paper for the start of understanding how case studies can be used in the research of software engineering.

They call a case study:

  • an observational research method,
  • a way to investigate a phenomenon in its context,
  • applicable when there is no clear distinction between the phenomena and its context,
  • have guidelines recently published by Runeson and Hoest [1]

Their six-step methodology is:
  1. Get Knowledge about the Domain
  2. Develop focus Questions/Areas
  3. Choice of Detailed Research Methods
  4. Data collection
  5. Data analysis and Alignment
  6. Valuation Discussion














[1] P. Runeson and M. Hoest, "Guidelines for conducting and reporting case study research in software engineering," Empirical software Engineering, vol. 14, no. 2, pp. 131-164, 2009.


The Accretion of Structure in Large, Dynamic Software Systems: A Socio-Technical View

On the outside chance there is really anyone following this blog, apologies. If you acknowledge reading this I will be more inclined to post more often. But this blog serves as my sounding board for my developing research thoughts. The title of this post could potentially be a dissertation but it is too ambitious to be mine. Yet it does indicate the direction my thoughts are taking.

Large software systems that reach any great level of popularity do not fall from the sky fully formed. They may have germinated from the seed of some idea that one individual had or they may have been created to solve some problem. Either way, that is only the origin story of the product. To reach its mature form, required the efforts of many people over some period of time and we classically refer to this as a project. A project has the creation of the product as its end goal. Once that goal is achieved the classical view is that some organization, an organization that may have been created coincident to the creation of the product, will take ownership of the product and use that asset for the betterment of the organization. If the single product is the sole asset of that organization, its fortune will be tied to the product life-cycle of that asset and will cease to exist with the retirement of that product.

Contemporary software engineering research owes a debt of gratitude to the open source movement. The benefit of the movement to research is the open nature of the work. Rather than guard the artifacts of software development behind the shroud of proprietary secrets, the open source community thrives on an atmosphere of complete transparency. Even better, it has a tradition of maintaining these records for posterity making longitudinal study possible in a way that is unthinkable with for-profit corporate development. The oldest and best known of these efforts is the Apache web server and the organization it spawned. It is this organization and the many products that can be tied to this organization that has been my latest object of research.

The history of the Apache Software Foundation is well known. It has matured as an organization over the years and epitomizes a form of software creation and stewardship that is neither completely altruistic and selfless nor constrained by the commercial marketplace for software. As such, it may, or may not, result in software that is qualitatively different over the product life-cycle than the products from such well known software houses as Microsoft and Oracle. But whether they are different or not, it is possible to study the inception, growth, decay and retirement of these software products in ways that are impossible for the commercial products. It is the forces of this evolution that I am drawn to and am looking very closely at in the various Apache projects. (Note that unlike a classic project, an Apache project is really an organization that is responsible for both the creation, maintenance and general stewardship of the software product until its retirement.)

A thesis I have is that program correctness is not really the most appropriate aspect of the dynamic code maintenance that most of my colleagues think it is. Rather I believe that it is the change in the non-functional behavior of the product that drives more decisions, and ultimately predicts the ultimate failure of the product. The froth that is seen around the correctness of the product is certainly important. But a product that fails to deliver in performance, scalability, maintainability, adaptability or any of dozens of other qualities will either find a niche or will need to be re-engineered to meet these new challenges.

I say re-engineered in the sense that the small focus of most enhancement requests or bug reports do not allow for a scope that is sufficient to address the refactoring that is most often needed to achieve these ends. In many cases the skills gained by  the creation of this product result in a functional team that are motivated to either significantly enhance the product to give it those qualities or to leave the product as is and create a new one via fork or greenfield development that will improve upon the prior product in these key qualities. Often the relationships between these distinct products is overlooked and the debt one product owes to a prior product lost in the mist of time.

The history of the Apache Software Foundation is celebrated. But every day there are projects that choose to put their products into retirement because of lack of interest. If other projects were spawned by the project, they are sporadically documented, sometimes in the project archives and communications, or sometimes through the press which covers open source development. What I have lately come to realize is that there is no interest in documenting this history and as web links rot or repositories taken off line, history is being lost every day. It is perhaps Quixotic to concern myself with trying to prevent this complete loss but I do. As such, I am trying to document as many origin stories for Apache projects as I can in the belief that if I do not, some of this information will eventually become unattainable. What is harder is to justify this effort in terms of my own research.

No major software product exists that did not require thousands of decisions, some big, most small, made by the team members over time. Software developers are notoriously averse to documentation so these decisions are as often inferred as observed. This is where Big Data and the statistical methods of empirical software engineering are most helpful. But where the data is not available in a form usable by these techniques, the techniques are useless. Since the gathering of this data and putting it into a form that is suitable for quantitative analysis is exceedingly tedious, it is usually not done. In this reticence to tackle difficult data gathering tasks, I see an opportunity. I see this as an opportunity to both do the deep dive into more interesting decisions made during software development, their relationship to the form and structure of the software system and also make a significant contribution to the preservation of some important history that may have value for researchers in the future. Perhaps I am suffering from a touch of hubris in this but it helps my motivation to feel that my contribution may surpass my own prosaic goals of satisfying institutional requirements for a degree or getting some papers published.

I'll end my post here since the work is ongoing and quite fluid. But this post gives me a short statement that I can share with others who may take an interest in my current direction.