« October 2019 | Main

Sunday, November 3, 2019

Reading List: Sunburst and Luminary

Eyles, Don. Sunburst and Luminary. Boston: Fort Point Press, 2018. ISBN 978-0-9863859-3-3.
In 1966, the author graduated from Boston University with a bachelor's degree in mathematics. He had no immediate job prospects or career plans. He thought he might be interested in computer programming due to a love of solving puzzles, but he had never programmed a computer. When asked, in one of numerous job interviews, how he would go about writing a program to alphabetise a list of names, he admitted he had no idea. One day, walking home from yet another interview, he passed an unimpressive brick building with a sign identifying it as the “MIT Instrumentation Laboratory”. He'd heard a little about the place and, on a lark, walked in and asked if they were hiring. The receptionist handed him a long application form, which he filled out, and was then immediately sent to interview with a personnel officer. Eyles was amazed when the personnel man seemed bent on persuading him to come to work at the Lab. After reference checking, he was offered a choice of two jobs: one in the “analysis group” (whatever that was), and another on the team developing computer software for landing the Apollo Lunar Module (LM) on the Moon. That sounded interesting, and the job had another benefit attractive to a 21 year old just graduating from university: it came with deferment from the military draft, which was going into high gear as U.S. involvement in Vietnam deepened.

Near the start of the Apollo project, MIT's Instrumentation Laboratory, led by the legendary “Doc” Charles Stark Draper, won a sole source contract to design and program the guidance system for the Apollo spacecraft, which came to be known as the “Apollo Primary Guidance, Navigation, and Control System” (PGNCS, pronounced “pings”). Draper and his laboratory had pioneered inertial guidance systems for aircraft, guided missiles, and submarines, and had in-depth expertise in all aspects of the challenging problem of enabling the Apollo spacecraft to navigate from the Earth to the Moon, land on the Moon, and return to the Earth without any assistance from ground-based assets. In a normal mission, it was expected that ground-based tracking and computers would assist those on board the spacecraft, but in the interest of reliability and redundancy it was required that completely autonomous navigation would permit accomplishing the mission.

The Instrumentation Laboratory developed an integrated system composed of an inertial measurement unit consisting of gyroscopes and accelerometers that provided a stable reference from which the spacecraft's orientation and velocity could be determined, an optical telescope which allowed aligning the inertial platform by taking sightings on fixed stars, and an Apollo Guidance Computer (AGC), a general purpose digital computer which interfaced to the guidance system, thrusters and engines on the spacecraft, the astronauts' flight controls, and mission control, and was able to perform the complex calculations for en route maneuvers and the unforgiving lunar landing process in real time.

Every Apollo lunar landing mission carried two AGCs: one in the Command Module and another in the Lunar Module. The computer hardware, basic operating system, and navigation support software were identical, but the mission software was customised due to the different hardware and flight profiles of the Command and Lunar Modules. (The commonality of the two computers proved essential in getting the crew of Apollo 13 safely back to Earth after an explosion in the Service Module cut power to the Command Module and disabled its computer. The Lunar Module's AGC was able to perform the critical navigation and guidance operations to put the spacecraft back on course for an Earth landing.)

By the time Don Eyles was hired in 1966, the hardware design of the AGC was largely complete (although a revision, called Block II, was underway which would increase memory capacity and add some instructions which had been found desirable during the initial software development process), the low-level operating system and support libraries (implementing such functionality as fixed point arithmetic, vector, and matrix computations), and a substantial part of the software for the Command Module had been written. But the software for actually landing on the Moon, which would run in the Lunar Module's AGC, was largely just a concept in the minds of its designers. Turning this into hard code would be the job of Don Eyles, who had never written a line of code in his life, and his colleagues. They seemed undaunted by the challenge: after all, nobody knew how to land on the Moon, so whoever attempted the task would have to make it up as they went along, and they had access, in the Instrumentation Laboratory, to the world's most experienced team in the area of inertial guidance.

Today's programmers may be amazed it was possible to get anything at all done on a machine with the capabilities of the Apollo Guidance Computer, no less fly to the Moon and land there. The AGC had a total of 36,864 15-bit words of read-only core rope memory, in which every bit was hand-woven to the specifications of the programmers. As read-only memory, the contents were completely fixed: if a change was required, the memory module in question (which was “potted” in a plastic compound) had to be discarded and a new one woven from scratch. There was no way to make “software patches”. Read-write storage was limited to 2048 15-bit words of magnetic core memory. The read-write memory was non-volatile: its contents were preserved across power loss and restoration. (Each memory word was actually 16 bits in length, but one bit was used for parity checking to detect errors and not accessible to the programmer.) Memory cycle time was 11.72 microseconds. There was no external bulk storage of any kind (disc, tape, etc.): everything had to be done with the read-only and read-write memory built into the computer.

The AGC software was an example of “real-time programming”, a discipline with which few contemporary programmers are acquainted. As opposed to an “app” which interacts with a user and whose only constraint on how long it takes to respond to requests is the user's patience, a real-time program has to meet inflexible constraints in the real world set by the laws of physics, with failure often resulting in disaster just as surely as hardware malfunctions. For example, when the Lunar Module is descending toward the lunar surface, burning its descent engine to brake toward a smooth touchdown, the LM is perched atop the thrust vector of the engine just like a pencil balanced on the tip of your finger: it is inherently unstable, and only constant corrections will keep it from tumbling over and crashing into the surface, which would be bad. To prevent this, the Lunar Module's AGC runs a piece of software called the digital autopilot (DAP) which, every tenth of a second, issues commands to steer the descent engine's nozzle to keep the Lunar Module pointed flamy side down and adjusts the thrust to maintain the desired descent velocity (the thrust must be constantly adjusted because as propellant is burned, the mass of the LM decreases, and less thrust is needed to maintain the same rate of descent). The AGC/DAP absolutely must compute these steering and throttle commands and send them to the engine every tenth of a second. If it doesn't, the Lunar Module will crash. That's what real-time computing is all about: the computer has to deliver those results in real time, as the clock ticks, and if it doesn't (for example, it decides to give up and flash a Blue Screen of Death instead), then the consequences are not an irritated or enraged user, but actual death in the real world. Similarly, every two seconds the computer must read the spacecraft's position from the inertial measurement unit. If it fails to do so, it will hopelessly lose track of which way it's pointed and how fast it is going. Real-time programmers live under these demanding constraints and, especially given the limitations of a computer such as the AGC, must deploy all of their cleverness to meet them without fail, whatever happens, including transient power failures, flaky readings from instruments, user errors, and completely unanticipated “unknown unknowns”.

The software which ran in the Lunar Module AGCs for Apollo lunar landing missions was called LUMINARY, and in its final form (version 210) used on Apollo 15, 16, and 17, consisted of around 36,000 lines of code (a mix of assembly language and interpretive code which implemented high-level operations), of which Don Eyles wrote in excess of 2,200 lines, responsible for the lunar landing from the start of braking from lunar orbit through touchdown on the Moon. This was by far the most dynamic phase of an Apollo mission, and the most demanding on the limited resources of the AGC, which was pushed to around 90% of its capacity during the final landing phase where the astronauts were selecting the landing spot and guiding the Lunar Module toward a touchdown. The margin was razor-thin, and that's assuming everything went as planned. But this was not always the case.

It was when the unexpected happened that the genius of the AGC software and its ability to make the most of the severely limited resources at its disposal became apparent. As Apollo 11 approached the lunar surface, a series of five program alarms: codes 1201 and 1202, interrupted the display of altitude and vertical velocity being monitored by Buzz Aldrin and read off to guide Neil Armstrong in flying to the landing spot. These codes both indicated out-of-memory conditions in the AGC's scarce read-write memory. The 1201 alarm was issued when all five of the 44-word vector accumulator (VAC) areas were in use when another program requested to use one, and 1202 signalled exhaustion of the eight 12-word core sets required by each running job. The computer had a single processor and could execute only one task at a time, but its operating system allowed lower priority tasks to be interrupted in order to service higher priority ones, such as the time-critical autopilot function and reading the inertial platform every two seconds. Each suspended lower-priority job used up a core set and, if it employed the interpretive mathematics library, a VAC, so exhaustion of these resources usually meant the computer was trying to do too many things at once. Task priorities were assigned so the most critical functions would be completed on time, but computer overload signalled something seriously wrong—a condition in which it was impossible to guarantee all essential work was getting done.

In this case, the computer would throw up its hands, issue a program alarm, and restart. But this couldn't be a lengthy reboot like customers of personal computers with millions of times the AGC's capacity tolerate half a century later. The critical tasks in the AGC's software incorporated restart protection, in which they would frequently checkpoint their current state, permitting them to resume almost instantaneously after a restart. Programmers estimated around 4% of the AGC's program memory was devoted to restart protection, and some questioned its worth. On Apollo 11, it would save the landing mission.

Shortly after the Lunar Module's landing radar locked onto the lunar surface, Aldrin keyed in the code to monitor its readings and immediately received a 1202 alarm: no core sets to run a task; the AGC restarted. On the communications link Armstrong called out “It's a 1202.” and Aldrin confirmed “1202.”. This was followed by fifteen seconds of silence on the “air to ground” loop, after which Armstrong broke in with “Give us a reading on the 1202 Program alarm.” At this point, neither the astronauts nor the support team in Houston had any idea what a 1202 alarm was or what it might mean for the mission. But the nefarious simulation supervisors had cranked in such “impossible” alarms in earlier training sessions, and controllers had developed a rule that if an alarm was infrequent and the Lunar Module appeared to be flying normally, it was not a reason to abort the descent.

At the Instrumentation Laboratory in Cambridge, Massachusetts, Don Eyles and his colleagues knew precisely what a 1202 was and found it was deeply disturbing. The AGC software had been carefully designed to maintain a 10% safety margin under the worst case conditions of a lunar landing, and 1202 alarms had never occurred in any of their thousands of simulator runs using the same AGC hardware, software, and sensors as Apollo 11's Lunar Module. Don Eyles' analysis, in real time, just after a second 1202 alarm occurred thirty seconds later, was:

Again our computations have been flushed and the LM is still flying. In Cambridge someone says, “Something is stealing time.” … Some dreadful thing is active in our computer and we do not know what it is or what it will do next. Unlike Garman [AGC support engineer for Mission Control] in Houston I know too much. If it were in my hands, I would call an abort.

As the Lunar Module passed 3000 feet, another alarm, this time a 1201—VAC areas exhausted—flashed. This is another indication of overload, but of a different kind. Mission control immediately calls up “We're go. Same type. We're go.” Well, it wasn't the same type, but they decided to press on. Descending through 2000 feet, the DSKY (computer display and keyboard) goes blank and stays blank for ten agonising seconds. Seventeen seconds later another 1202 alarm, and a blank display for two seconds—Armstrong's heart rate reaches 150. A total of five program alarms and resets had occurred in the final minutes of landing. But why? And could the computer be trusted to fly the return from the Moon's surface to rendezvous with the Command Module?

While the Lunar Module was still on the lunar surface Instrumentation Laboratory engineer George Silver figured out what happened. During the landing, the Lunar Module's rendezvous radar (used only during return to the Command Module) was powered on and set to a position where its reference timing signal came from an internal clock rather than the AGC's master timing reference. If these clocks were in a worst case out of phase condition, the rendezvous radar would flood the AGC with what we used to call “nonsense interrupts” back in the day, at a rate of 800 per second, each consuming one 11.72 microsecond memory cycle. This imposed an additional load of more than 13% on the AGC, which pushed it over the edge and caused tasks deemed non-critical (such as updating the DSKY) not to be completed on time, resulting in the program alarms and restarts. The fix was simple: don't enable the rendezvous radar until you need it, and when you do, put the switch in the position that synchronises it with the AGC's clock. But the AGC had proved its excellence as a real-time system: in the face of unexpected and unknown external perturbations it had completed the mission flawlessly, while alerting its developers to a problem which required their attention.

The creativity of the AGC software developers and the merit of computer systems sufficiently simple that the small number of people who designed them completely understood every aspect of their operation was demonstrated on Apollo 14. As the Lunar Module was checked out prior to the landing, the astronauts in the spacecraft and Mission Control saw the abort signal come on, which was supposed to indicate the big Abort button on the control panel had been pushed. This button, if pressed during descent to the lunar surface, immediately aborted the landing attempt and initiated a return to lunar orbit. This was a “one and done” operation: no Microsoft-style “Do you really mean it?” tea ceremony before ending the mission. Tapping the switch made the signal come and go, and it was concluded the most likely cause was a piece of metal contamination floating around inside the switch and occasionally shorting the contacts. The abort signal caused no problems during lunar orbit, but if it should happen during descent, perhaps jostled by vibration from the descent engine, it would be disastrous: wrecking a mission costing hundreds of millions of dollars and, coming on the heels of Apollo 13's mission failure and narrow escape from disaster, possibly bring an end to the Apollo lunar landing programme.

The Lunar Module AGC team, with Don Eyles as the lead, was faced with an immediate challenge: was there a way to patch the software to ignore the abort switch, protecting the landing, while still allowing an abort to be commanded, if necessary, from the computer keyboard (DSKY)? The answer to this was obvious and immediately apparent: no. The landing software, like all AGC programs, ran from read-only rope memory which had been woven on the ground months before the mission and could not be changed in flight. But perhaps there was another way. Eyles and his colleagues dug into the program listing, traced the path through the logic, and cobbled together a procedure, then tested it in the simulator at the Instrumentation Laboratory. While the AGC's programming was fixed, the AGC operating system provided low-level commands which allowed the crew to examine and change bits in locations in the read-write memory. Eyles discovered that by setting the bit which indicated that an abort was already in progress, the abort switch would be ignored at the critical moments during the descent. As with all software hacks, this had other consequences requiring their own work-arounds, but by the time Apollo 14's Lunar Module emerged from behind the Moon on course for its landing, a complete procedure had been developed which was radioed up from Houston and worked perfectly, resulting in a flawless landing.

These and many other stories of the development and flight experience of the AGC lunar landing software are related here by the person who wrote most of it and supported every lunar landing mission as it happened. Where technical detail is required to understand what is happening, no punches are pulled, even to the level of bit-twiddling and hideously clever programming tricks such as using an overflow condition to skip over an EXTEND instruction, converting the following instruction from double precision to single precision, all in order to save around forty words of precious non-bank-switched memory. In addition, this is a personal story, set in the context of the turbulent 1960s and early ’70s, of the author and other young people accomplishing things no humans had ever before attempted.

It was a time when everybody was making it up as they went along, learning from experience, and improvising on the fly; a time when a person who had never written a line of computer code would write, as his first program, the code that would land men on the Moon, and when the creativity and hard work of individuals made all the difference. Already, by the end of the Apollo project, the curtain was ringing down on this era. Even though a number of improvements had been developed for the LM AGC software which improved precision landing capability, reduced the workload on the astronauts, and increased robustness, none of these were incorporated in the software for the final three Apollo missions, LUMINARY 210, which was deemed “good enough” and the benefit of the changes not worth the risk and effort to test and incorporate them. Programmers seeking this kind of adventure today will not find it at NASA or its contractors, but instead in the innovative “New Space” and smallsat industries.

Posted at 13:32 Permalink

Friday, November 1, 2019

Reading List: Always Another Dawn

Crossfield, Albert Scott and Clay Blair. Always Another Dawn. Seattle, CreateSpace, [1960] 2018. ISBN 978-1-7219-0050-3.
The author was born in 1921 and grew up in Southern California. He was obsessed with aviation from an early age, wangling a ride in a plane piloted by a friend of his father (an open cockpit biplane) at age six. He built and flew many model airplanes and helped build the first gasoline-powered model plane in Southern California, with a home-built engine. The enterprising lad's paper route included a local grass field airport, and he persuaded the owner to trade him a free daily newspaper (delivery boys always received a few extra) for informal flying lessons. By the time he turned thirteen, young Scott (he never went by his first name, “Albert”) had accumulated several hours of flying time.

In the midst of the Great Depression, his father's milk processing business failed, and he decided to sell out everything in California, buy a 120 acre run-down dairy farm in rural Washington state, and start over. Patiently, taking an engineer's approach to the operation: recording everything, controlling costs, optimising operations, and with the entire family pitching in on the unceasing chores, the ramshackle property was built into a going concern and then a showplace.

Crossfield never abandoned his interest in aviation, and soon began to spend some of his scarce free time at the local airport, another grass field operation, where he continued to take flight lessons from anybody who would give them for the meagre pocket change he could spare. Finally, with a total of seven or eight hours dual control time, one of the pilots invited him to “take her up and try a spin.” This was highly irregular and, in fact, illegal: he had no student pilot certificate, but things were a lot more informal in those days, so off he went. Taking the challenge at its words, he proceeded to perform three spins and spin recoveries during his maiden solo flight.

In 1940, at age eighteen, Scott left the farm. His interest in aviation had never flagged, and he was certain he didn't want to be a farmer. His initial goal was to pursue an engineering degree at the University of Washington and then seek employment in the aviation industry, perhaps as an engineering test pilot. But the world was entering a chaotic phase, and this chaos perturbed his well-drawn plans. “[B]y the time I was twenty I had entered the University, graduated from a civilian aviation school, officially soloed, and obtained my private pilot's license, withdrawn from the University, worked for Boeing Aircraft Company, quit to join the Air Force briefly, worked for Boeing again, and quit again to join the Navy.” After the U.S. entered World War II, the Navy was desperate for pilots and offered immediate entry to flight training to those with the kind of experience Crossfield had accumulated.

Despite having three hundred flight hours in his logbook, Crossfield, like many military aviators, had to re-learn flying the Navy way. He credits it for making him a “professional, disciplined aviator.” Like most cadets, he had hoped for assignment to the fleet as a fighter pilot, but upon completing training he was immediately designated an instructor and spent the balance of the war teaching basic and advanced flying, gunnery, and bombing to hundreds of student aviators. Toward the end of the war, he finally received his long-awaited orders for fighter duty, but while in training the war ended without his ever seeing combat.

Disappointed, he returned to his original career plan and spent the next four years at the University of Washington, obtaining Bachelor of Science and Master of Science degrees in Aeronautical Engineering. Maintaining his commission in the Naval Reserve, he organised a naval stunt flying team and used it to hone his precision formation flying skills. As a graduate student, he supported himself as chief operator of the university's wind tunnel, then one of the most advanced in the country, and his work brought him into frequent contact with engineers from aircraft companies who contracted time on the tunnel for tests on their designs.

Surveying his prospects in 1950, Crossfield decided he didn't want to become a professor, which would be the likely outcome if he continued his education toward a Ph.D. The aviation industry was still in the postwar lull, but everything changed with the outbreak of the Korean War in June 1950. Suddenly, demand for the next generation of military aircraft, which had been seen as years in the future, became immediate, and the need for engineers to design and test them was apparent. Crossfield decided the most promising opportunity for someone with his engineering background and flight experience was as an “aeronautical research pilot” with the National Advisory Committee for Aeronautics (NACA), a U.S. government civilian agency founded in 1915 and chartered with performing pure and applied research in aviation, which was placed in the public domain and made available to all U.S. aircraft manufacturers. Unlike returning to the military, where his flight assignments would be at the whim of the service, at NACA he would be assured of working on the cutting edge of aviation technology.

Through a series of personal contacts, he eventually managed to arrange an interview with the little-known NACA High Speed Flight Test Station at Edwards Air Force Base in the high desert of Southern California. Crossfield found himself at the very Mecca of high speed flight, where Chuck Yeager had broken the sound barrier in October 1947 and a series of “X-planes” were expanding the limits of flight in all directions.

Responsibility for flying the experimental research aircraft at Edwards was divided three ways. When a new plane was delivered, its first flights would usually be conducted by company test pilots from its manufacturer. These pilots would have been involved in the design process and worked closely with the engineers responsible for the plane. During this phase, the stability, maneuverability, and behaviour of the plane in various flight regimes would be tested, and all of its component systems would be checked out. This would lead to “acceptance” by the Air Force, at which point its test pilots would acquaint themselves with the new plane and then conduct flights aimed at expanding its “envelope”: pushing parameters such as speed and altitude to those which the experimental plane had been designed to explore. It was during this phase that records would be set, often trumpeted by the Air Force. Finally, NACA pilots would follow up, exploring the fine details of the performance of the plane in the new flight regimes it opened up. Often the plane would be instrumented with sensors to collect data as NACA pilots patiently explored its flight envelope. NACA's operation at Edwards was small, and it played second fiddle to the Air Force (and Navy, who also tested some of its research planes there). The requirements for the planes were developed by the military, who selected the manufacturer, approved the design, and paid for its construction. NACA took advantage of whatever was developed, when the military made it available to them.

However complicated the structure of operations was at Edwards, Crossfield arrived squarely in the middle of the heroic age of supersonic flight, as chronicled (perhaps a bit too exuberantly) by Tom Wolfe in The Right Stuff. The hangars were full of machines resembling those on the covers of the pulp science fiction magazines of Crossfield's youth, and before them were a series of challenges seemingly without end: Mach 2, 3, and beyond, and flight to the threshold of space.

It was a heroic time, and a dangerous business. Writing in 1960, Crossfield notes, “Death is the handmaiden of the pilot. Sometimes it comes by accident, sometimes by an act of God. … Twelve out of the sixteen members of my original class at Seattle were eventually killed in airplanes. … Indeed, come to think of it, three-quarters of all the pilots I ever knew are dead.” As an engineer, he has no illusions or superstitions about the risks he is undertaking: sometimes the machine breaks and there's nothing that can be done about it. But he distinguishes being startled with experiencing fear: “I have been startled in an airplane many times. This, I may say, is almost routine for the experimental test pilot. But I can honestly say I have never experienced real fear in the air. The reason is that I have never run out of things to do.”

Crossfield proceeded to fly almost all of the cutting-edge aircraft at Edwards, including the rocket powered X-1 and the Navy's D-558-2 Skyrocket. By 1955, he had performed 99 flights under rocket power, becoming the most experienced rocket pilot in the world (there is no evidence the Soviet Union had any comparable rocket powered research aircraft). Most of Crossfield's flights were of the patient, data-taking kind in which the NACA specialised, albeit with occasional drama when these finicky, on-the-edge machines malfunctioned. But sometimes, even at staid NACA, the blood would be up, and in 1953, NACA approved taking the D-558-2 to Mach 2, setting a new world speed record. This was more than 25% faster than the plane had been designed to fly, and all the stops were pulled out for the attempt. The run was planned for a cold day, when the speed of sound would be lower at the planned altitude and cold-soaking the airframe would allow loading slightly more fuel and oxidiser. The wings and fuselage were waxed and polished to a high sheen to reduce air friction. Every crack was covered by masking tape. The stainless steel tubes used to jettison propellant in an emergency before drop from the carrier aircraft were replaced by aluminium which would burn away instants after the rocket engine was fired, saving a little bit of weight. With all of these tweaks, on November 20, 1953, at an altitude of 72,000 feet (22 km), the Skyrocket punched through Mach 2, reaching a speed of Mach 2.005. Crossfield was the Fastest Man on Earth.

By 1955, Crossfield concluded that the original glory days of Edwards were coming to an end. The original rocket planes had reached the limits of their performance, and the next generation of research aircraft, the X-15, would be a project on an entirely different scale, involving years of development before it was ready for its first flight. Staying at NACA would, in all likelihood, mean a lengthy period of routine work, with nothing as challenging as his last five years pushing the frontiers of flight. He concluded that the right place for an engineering test pilot, one with such extensive experience in rocket flight, was on the engineering team developing the next generation rocket plane, not sitting around at Edwards waiting to see what they came up with. He resigned from NACA and took a job as chief engineering test pilot at North American Aviation, developer of the X-15. He would provide a pilot's perspective throughout the protracted gestation of the plane, including cockpit layout, control systems, life support and pressure suit design, simulator development, and riding herd on the problem-plagued engine.

Ever wonder why the space suits used in the X-15 and by the Project Mercury astronauts were silver coloured? They said it was something about thermal management, but in fact when Crossfield was visiting the manufacturer he saw a sample of aluminised fabric and persuaded them the replace the original khaki coverall outer layer with it because it “looked like a real space suit.” And they did.

When the X-15 finally made its first flight in 1959, Crossfield was at the controls. He would go on to make 14 X-15 flights before turning the ship over to Air Force and NASA (the successor agency to the NACA) pilots. This book, originally published in 1960, concludes before the record-breaking period of the X-15, conducted after Crossfield's involvement with it came to an end.

This is a personal account of a period in the history of aviation in which records fell almost as fast as they were set and rocket pilots went right to the edge and beyond, feeling out the treacherous boundaries of the frontier.

A Kindle edition is available, at this writing, for just US$0.99. The Kindle edition appears to have been prepared by optical character recognition with only a rudimentary and slapdash job of copy editing. There are numerous errors including many involving the humble apostrophe. But, hey, it's only a buck.

Posted at 01:08 Permalink