16th International System Safety Conference

September 14-19, 1998

Seattle, Washington USA

System Safety - The Total Solution

http://www.system-safety.org

 

Potential Impact of the Millennium Bug on the Flight Deck

Robert B. Barnes; Robert B. Barnes Associates, Inc.;
Scottsdale, Arizona, USA

Keywords: Avionics, Classic Air Transport Aircraft, Flight Crew Workload, Risk Management,

Upgrade/Retrofit, Year 2000 (Y2K)

 

Abstract

When members of the operational aviation community were recently asked "what will be the impact of the Year 2000 Bug on flight deck operations?", the typical response was "It’s a non-issue." The general feeling seemed to be that this is only a concern for a very few large organizations using poorly designed database systems, not for flight crews.

Unfortunately, this is a frequently heard technical response to the Year 2000 issue across many industries. However, there may be a much more vital concern for all of us in the aviation community – Risk Management.

This issue has been spotlighted by the aviation insurance industry. It has challenged airlines to prove that their fleet avionics are free of the "millennium bug" which threatens to disrupt computer software. Failure to do so will cause the loss of insurance coverage for any incidents which might result from it.

Technically, many systems will not suffer significant disruption. However, what is the risk to an organization if a critical system fails and this failure results in a business loss or personal injury liability? The remediation costs will pale by comparison with the business and legal costs that could result.

Generally acknowledged statistics are that between 3% and 5% of all integrated circuits ("chips") will fail and that up to 20-30% of embedded software applications could be directly affected. In view of these odds, prudence should dictate extensive disaster planning and risk mitigation activity.

This paper will summarize current risk mitigation initiatives in systems related to the flight deck and suggest additional issues that need to be addressed by the operational aviation community.

Introduction

The problems posed for global computing and electronic system users by the Year 2000 deadline are providing plenty of fuel for the Doomsday prophets. It is not the intention of this paper to generate any additional feelings of gloom or total helplessness. One need only visit the myriad of Internet Websites dedicated to this problem to grasp the enormity and significance of the entire issue. Should any reader be interested in delving deeper into this general topic, a very good place to start is at:

http://www.year2000.com.

It is the intent of this paper, rather, to focus on a specific area of concern for flight crews – the potential impact of the millenium bug on the flight deck. Some already-made-public examples of both processes and specific Y2K "fixes" will be provided for both reference and reflection in the context of this paper’s focus. In some cases, the limits of these processes and/or fixes will be noted, not to disparage the action but more importantly to recognize the limits of such action.

"When informed of the Year-2000 problem, most people – including (ironically) many computer professionals – shrug their shoulders and say, ‘Well, I guess it could be a problem. I sure hope they’re working on it …’ " (ref. 15, p. 1)

As Risk Managers and System Safety Engineers, we are a very important part of the "THEY" to whom such people refer. Further, we need to remember that …

"… no matter how good a job we do up front, we must still keep in mind that designs and products fail; people will make mistakes; sequences will be out of control; and unexpected environments will occur." (ref. 2)

The Year 2000 problem is enormous and extremely complex, especially when multiple component or system interactions are involved. As a result, it is quite possible that all of the necessary Year 2000 fixes may simply not be accomplished. Therefore, we have a responsibility to anticipate what might occur and provide mitigation recommendations to our operators – the flight crews.

"As always, it is better to foresee and manage risks than to wait until they come to fruition." (ref. 13, p. 550)

So What’s the Big Problem?

The Year 2000 problem involves date interpretation by ANY computer-based system (more on this later). If a computer interprets the Year 2000 as solely 00, errors in date representation and date calculations can occur leading to a system failure. Some of the most common failures (ref. 13, p. xxxv) are:

  • Misinterpreting 00 as 1900 rather than 2000
  • Sharing system information erroneously
  • Malfunction due to missing programs and information
  • Sorting data incorrectly
  • Treating dates as flags
  • Leap year problems
  • Shutdown because the date was used in a calculation (divide by 0)
  • Password and license expiration.

Very simply, our electronic systems may not be able to deal with the date 01/01/00 and may also not be able to deal with operational "cycles" such as "do X every 100 days." (ref. 9)

As a leader in the aerospace industry, Boeing has been at the forefront of addressing the Year 2000 issue for its products. For example, the following comments are from a 27 March 98 message to its customers:

"Many operators have identified to Boeing that they have heard media reports of computer difficulties when changing from the year 1999 to 2000. This information has led to inquiries asking Boeing to confirm that the Year 2000 software issue will not affect any airplane systems, Boeing-provided, supplier provided, and BFE-provided on any Boeing model airplanes … Boeing has established a team to coordinate with suppliers, review supplier and Boeing data, and determine the system impacts (if any) of the Year 2000 on any of the Boeing provided systems on all of our models … Operator installation of airborne systems installed by a supplemental type certificate (STC) will not be included as part of the Boeing review." (ref. 1)

For the purposes of this paper, the critical element of the foregoing statement is the following:

"Operator installation of airborne systems installed by a supplemental type certificate (STC) will not be included as part of the Boeing review." (ref. 1)

To the flight crew, this is an extremely significant limit to Boeing’s review. Let’s examine why …

The "Classics"

The term "Classic Aircraft" refers to an airplane that is out of production; however, through various upgrades, modifications, etc., it still remains in revenue service. In the broadest sense, this definition could even include the DC-3 but for purposes of this discussion it will only refer to transport category aircraft still in global commercial use.

How large is the current population of such aircraft?

In a recent census of the world’s aging aircraft, it was reported that there are more than 6,500 transport category aircraft which were built prior to 1983 and are still in operational service plus more than 650 others of these same types currently being stored. Of these operational "classics," 22% are Boeing 727, 12% are Boeing 737-100/200, and 12% are MDC DC-9. The balance includes such airplanes as the B-747 (early models), L-1011, and DC-10. (ref. 8)

Many operators have extended the life of their classic fleets by adding upgraded avionics systems and related life-extension modifications. In the United States, such retrofits are accomplished through the Federal Aviation Administration’s Supplemental Type Certificate (STC) certification process generally by aftermarket specialists without any participation from the aircraft manufacturer.

Since the driving force behind these retrofits is typically economic, the operator’s goal is to add as much increased functionality as possible for the least cost. This can and does lead to a mix of avionics products being installed on the flight deck. Each of these avionics products contains embedded logic and some of these systems interact with other systems on the flight deck.

[Note: Although the focus of this paper is transport category aircraft, It is important to recognize that there may be similar numbers of executive or business category aircraft facing these same issues.]

The Embedded Problem

Earlier, it was mentioned that the Year 2000 problem involves date interpretation by ANY computer-based system. Basically, an embedded chip is a "micro-computer" which has been "embedded" within some larger piece of engineering equipment or industrial product. Historically, embedded systems have provided the intelligence associated with "process control" or "data acquisition" systems. However, over the past decade, computer "chips" have become dramatically smaller, cheaper, and much more sophisticated. Today, the term "embedded system" encompasses almost any device that has "built-in" computer logic. (ref. 15, p. 285)

The first reaction of most people is that the "classics" are still steam-gauge (i.e. analog) aircraft except for some minor additions which don’t communicate with each other. However, the possibility that there may be chips embedded somewhere on the flight deck which use timing elements starts to raise some concern. Generic devices known to use such chips are collision avoidance, windshear, ground proximity, radar, and digital communications systems (the typical avionics upgrades often found in the "classics"). Could two devices be installed from different manufacturers that need to cross talk? The answer is "Yes." And, If either device has a chip that uses timing intervals, there could very well be a potential problem.

How serious could the failure of just one date-sensitive, embedded micro-processor be to an integrated system? At midnight on New Year’s Eve, 1996-7, a computer glitch at an aluminum smelter in the South Island of New Zealand shut down all operations simultaneously and without warning. The failure was traced to a faulty computer software program which failed to account for 1996 being a leap year. Since the computer was not programmed to handle the 366th day of the year, all 660 process control computers hung up simultaneously at midnight. (ref. 15, p. 284) Whether such an error is in software code or firmware is immaterial, the result will still be the same – a total system failure.

Boeing states in its March 1998 message that "the systems that are impacted are due to the use of navigational databases (NDB) whose date effectivity is either checked or displayed by operational software within these systems. The effects of the Year 2000 rollover range from transient nuisance messages during the one month period from December 1999 to January 2000, to recurring effects, to total loss of operational capability." (ref. 1)

However, does this statement take into account all the data which may be date stamped and embedded in avionics systems that have found their way onto the flight deck of the "classics?" Not necessarily. One must consider just how much data could actually be date stamped.

There does not have to be a date displayed by the system for it to have date sensitivity. For example, some chips are programmed to look at intervals but these intervals are based upon a date stamp burned into the embedded chip by its manufacturer (this is done to give the chip a starting date so it will be "year aware"). There is simply no way to trick the system into the century change to test the system since it doesn’t display or even accept date change inputs. It will simply keep pressing on until it reaches the century change over and either continues to function normally or starts doing something different (which may or may not be good).

In this case the only way to be absolutely certain of Y2K compliance is to go back to the chip design level and determine if the chip uses a date stamp function for any of its operations. Of course, the next question is who designed and built the chip or chips and how does one determine this bit of minutia? Going to the device manufacturer may be a starting point but it is quite possible that the chip was purchased without even the device designer knowing this fact. In addition, different chips can find their way into the same devices based upon available supply at the time of manufacture.

The fact that an embedded system is year-aware and, thereby, Year-2000 vulnerable, does not necessarily mean that it will fail on January 1, 2000. It simply means that we should take steps to find out in advance if it will fail and what the consequences of a failure might be.

The Issue of Compliance

Unfortunately, the definition of Year 2000 compliance often differs significantly from one organization to the next. In fact, one of the first steps to Year 2000 compliance suggested in the literature is typically "Your compliance definition should identify possible sources of Year 2000 problems and should describe the functionality of a Year 2000 compliant system." (ref. 13, p. 13)

What does being "compliant" really mean? Is the device "in compliance" or simply "in readiness?" Are you absolutely certain that even though a device is compliant it is truly compliant with other systems with which it may need to interact?

"Compliance" means that the device will readily handle the century change-over and thereafter handle dates normally forever. "Readiness" on the other hand simply means that the device will handle the century change due to a software "patch" of some sort. However, this "patch" will have a finite life and unfortunately will re-open the date change issue again at some point in the future. The logic behind using a "readiness" solution is that the product can be made acceptable for a set period of time assumed to be its maximum useful life. (But wasn’t that how we got into the Year 2000 issue in the first place?)

Although a device may be shown to be "compliant" or even "ready" this also does not necessarily solve potential interface issues with other devices. For example:

A date field today may be shown as mmddyy. It can be made "ready" through a process called "windowing." The manufacturer simply inserts an arbitrary > or < line of code into the program to cause the device to make a decision as to 19xx or 20zz. Unfortunately there is no standardization which determines the yy value. It could be yy <30 or 40 or 50 or >10, etc.

To be perfectly Y2K compliant, the mmddyy should be changed to mmddyyyy which is fine inside one device but what happens if that device sends information to another device which is merely "ready" and still uses the mmddyy format but with the > or < code added? Will the two devices talk to each other correctly or simply shut down? No one really knows the answer to this because most Y2K compliance work today is being done at the individual product level in isolation from other systems.

The bottom line is that there is simply no standard approach being used to achieve Year 2000 compliance.

Evaluating System Compliance

The safest approach for anyone concerned with potential Y2K problems is to assume your critical systems will have an issue and develop a certification process to prove total system compliance rather than assume that everything will continue to interact properly. There is a growing body of examples and experiences where testing the date change yielded unexpected and often catastrophic system-wide results.

An electrical plant in England suffered a plant wide shut down when clocks were reset to just before midnight, 12/31/99, because a sensor in a smokestack decided that it had not been serviced in 100 years and initiated a shut down. The result was uncomfortably similar to the New Zealand incident mentioned earlier but fortunately both of these examples occurred on the ground. (ref. 11)

One approach to achieving Y2K certification has been provided to defense suppliers by the DoD and USAF (ref. 14). This Y2K weapon system evaluation strategy applies to aircraft and associated ground support systems such as training and mission planning systems, automated test equipment, system/software engineering support environments (S/SEEs), and development/integration laboratory systems. It consists of five phases:

  • Awareness involves gathering all potential stakeholders and gaining consensus on the scope of the system-wide problem.
  • Assessment focuses on developing a complete inventory of affected components and assessing the scope of their Y2K impact.
  • Renovation involves the actual "fixing" of non-compliant system components.
  • Validation focuses on verifying and certifying systems for Y2K compliance.
  • Implementation places Y2K compliant systems into use (an important component of this phase is the updating and distribution of a risk management and contingency strategy).

Compliance Assessment

Although not mentioned in the DoD approach, there are actually two main reasons to conduct a thorough assessment (or inventory) of both the hardware and software components of any integrated system – such as the flight deck of a classic.

First, there have been and will continue to be many press releases from manufacturers across all industries regarding the ability of specific devices to handle the date change and what correction methods, if any, the manufacturer recommends. One should be skeptical of broad manufacturer claims that its products are ready for the date change. Recently, several companies have publicly announced full confidence in their products, only to later retract those statements upon further testing. (ref. 10)

Also, as has been mentioned earlier, all manufacturer’s claims will naturally have certain limits. It is essential that these limits are fully understood as they relate to your specific operational situation.

Secondly, a thorough assessment will also assist risk managers in comparing the cost of possible system upgrades with the cost of compliance repair. Since compliance repair can become extremely expensive, it may very well be more cost-effective to simply replace suspected devices with demonstrably compliant upgrade devices.

If remediation is the chosen path, then one must be ready to define the depth of intended system compliance. For example, will the compliance assessment only be concerned with "safety of flight" issues or will it need to address more fundamental issues such as the safety to continue all approved operations (e.g. ETOPS, etc.)? In any case, the following system analysis approach may be a good place to start:

  1. Define the various systems involved (e.g. flight controls, engine and fuel controls, navigation systems, communication systems, etc.) and their sub-components (e.g. GPWS, FMS, EFIS, flow controls, actuators, etc.)
  2. Determine how these systems and their sub-systems interact.
  3. Determine how to test the individual sub-components and/or verify method of Y2K compliance.
  4. Determine how to test these sub-components for Y2K compliance when integrated into a specific system.
  5. Determine how to test the totally integrated system (e.g. the "classic" flight deck with upgrades)
  6. Conduct a Fault Tree Analysis to determine the requirement for any additional risk mitigation.

Risk Mitigation

One approach to risk mitigation is to break potential system problems down into three levels of significance (much like a medical "triage"). Level-1 might involve safety of flight items (hardware and/or software); Level-2 might involve mission essential items (e.g. those supporting specific operational capabilities such as ETOPS); and Level-3 might involve nuisance items (e.g. passenger entertainment systems, etc.).

One concern with the triage approach expressed by some is that there may be a communications gap between what the people running the Y2K effort think is critical and what operational personnel know is really critical. The Year 2000 problem is not simply one for programmers to solve. It involves everyone in the operational chain and, therefore, the process of identifying levels of criticality needs to include all potentially affected participants.

Obviously, Level-1 items must be fixed. However, Level-2 and Level-3 items may not all get fixed simply for no other reason than they were not discovered. How, then, do their affects get mitigated? Can all such risk mitigation be left to the initiative and skills of the flight crew and other support personnel.

For example, here is a current mitigation statement for two different Level-2 flight deck systems which have been found to either display erroneous scratch pad messages or date ranges as a result of Y2K:

"… if flight and maintenance personnel are aware of this condition, the above listed versions can continue to be used after the Year 2000 with no [product] changes … Alternatively, operators may resolve the display condition by upgrading to [new versions of the product]." (ref. 1)

How many such identified Level-2 and Level-3 situations can be accepted on the flight deck before they begin to have a measurable affect on flight crew workload and, hence, performance? And, what about those Y2K failures that occur without warning because they simply were not identified? This brings to mind the frequent pilot comment "what’s that thing doing now?" – a certain indicator of pilot distraction and increased workload.

Questions to Ponder

There are a number of stakeholders involved with Y2K risk mitigation issues involving the "classics." Here are some examples of their possible concerns –

Operators: Are all of my aircraft Y2K compliant? How do I know (each "classic" has nearly become a custom design)? Are there any data sharing issues? What is my liability if a Y2K failure occurs?

Regulators: What is the applicant’s definition of Year 2000 compliance? How does this solution relate to the definition/solutions used by other similar or complementary products in the same cockpit? Is cross-talk an issue? What are the effects of the combined mitigation methods on flight crew workload?

STC Holders/Applicants: How are my design changes impacted by other systems or products already in place? What is my potential liability because an STC implies that the aircraft will work as designed and approved? How can the affects of a Y2K failure be mitigated without increasing flight crew workload?

Flight Crews: How are Y2K failures most likely to appear on my flight deck and what do I do about them if they do?

Maintenance Personnel: How are Y2K failures most likely to appear in my aircraft or operations and what do I do about them if they do?

Training Personnel (for both maintenance and flight): What do the personnel that I am responsible for training need to know about Y2K and its potential impact upon our company’s operations?

In Summary

How significant is the already perceived Y2K risk potential for aircraft operations? One need only consider the actions of the world aviation insurance market –

"Airline insurers in the U.K. are working on the final draft of an exclusion clause for the risks associated with "millenium bug" glitches in avionics and computer software. A joint working group of the Aviation Insurance Offices Association and Lloyd’s Aviation Underwriters Association also is developing a questionnaire for carriers to assess the measures they are taking to minimize the potential risks involved if computer systems do not recognize the date change to 2000. Tony Medniuk, managing director of the British Aviation Insurance Group, said the aim is to achieve "clarity." He said insurers need an "informed basis" upon which to consider specially agreed coverage with carriers based on precautionary measures they have taken. London-based insurers represent 20-30% of the world aviation insurance market." (ref 5, p 19)

and …

Although a technical solution to the millennium problem is actually quite simple, the added costs and follow-up charges caused by it are considerable. The "millennium bug" is certain to be a major concern for the insurance industry. In the meantime, the following rule of thumb holds true for the computer date dilemma: "burning buildings are not insurable." [emphasis added] (ref. 4)

References

  1. Boeing External Message Number M-7200-98-01196, "Boeing Study on the Impact to all Models Regarding the Rollover to the Year 2000," (for multiple external distribution), 27 Mar 98.
  2. D’Antonio, P.E., "President’s Message," Hazard Prevention, Journal of the System Safety Society, Volume 34, No. 1, 1998.
  3. "FAA: Y2K Issue Under Control," Aviation Week & Space Technology, McGraw-Hill, NY, April 13, 1998, p. 78.
  4. Favre, R., Gantner, D., and Wiest, R., "Computer problems 2000: The millennium muddle." Swiss Reinsurance Company, Zurich, 1997.
  5. "Fine Print on Y2K Insurance," Aviation Week & Space Technology, McGraw-Hill, NY, April 13, 1998, p. 19.
  6. Happel, D.A., "GPS User Equipment Year 2000 Rollover Effects on the FMS-800 Flight Management System," Collins Avionics & Communication Divisions, Rockwell International, 29 August 1997.
  7. IEEE Standard Glossary of Software Engineering Terminology.
  8. Kingsley-Jones, M., and Sheppard, I. "Ageing Airliner Census 1997," Flight International, Reed International Business, UK, 9-15 July 1997, pp. 37-50.
  9. Lyons, M., "Businesses should already have a recovery plan," The Irish Times, April 10, 1998.
  10. Pegalis, A.M., "Perspective; For Risk Managers, The Year 2000 is Now; Here is a Guide to Help in the Critical Process of Making Systems Compliant," Business Insurance, Crain Communications, December 23, 1996.
  11. Report on EPRI, Year 2000 Embedded Systems Workshop, Scottsdale, AZ, September 1997.
  12. Rowe, W.D., An Anatomy of Risk, John Wiley & Sons, NY, 1977.
  13. Sims, D., Robinson, J.M., McConnell, C., Silo, E., Shapiro, J., and Wilbanks, C., How To 2000Ô , Raytheon E-Systems, IDG Books Worldwide, Inc., Foster City, CA. 1998.
  14. Weapon System Strategy for Year 2000, U.S. Department of Defense, Rev G, 29 August 1997.
  15. Yourdon, E., and Yourdon, J., Time Bomb 2000—What the Year 2000 Computer Crisis Means to You!, Prentice Hall PTR, Upper Saddle River, NJ., 1998.

Author’s Biography

Robert B. Barnes, President, Robert B. Barnes Associates, Inc., 8711 E. Pinnacle Peak Road, #337, Scottsdale, AZ, 85255, USA, telephone – +(602) 585-5703, facsimile – +(602) 585-5703, e-mail – RbarnesAZ@worldnet.att.net

His company has specialized for 10 years in assisting technology-based companies enter the commercial marketplace. As a result, he is regularly involved with the commercialization of leading-edge systems in the aerospace, computer electronics, and semiconductor sectors. Frequently, he provides human factors-related project management assistance to aerospace companies.

He holds a Bachelor’s Degree in Aerospace Engineering and a Master’s Degree in Educational Psychology plus a Professional Certificate in Aviation Safety. A former USAF instructor pilot and flying safety officer, he also holds a Certificate of Master Instructor in USAF Flying Training. His experience includes the certification of after-market flight deck modifications to Classic air transport aircraft.

Mr. Barnes is a member of the American Institute of Aeronautics and Astronautics, Society of Automotive Engineers, System Safety Society, Association of Aviation Psychologists, and the SAE’s G-10 Committee on Aerospace Behavioral Engineering Technology.


CRM-DEVEL Home ] What's New ] Resources ] CRM-DEVEL Mailing List ]
Neil Krey's Professional Aviation Instruction Forum ] Neil Krey's FLIGHT DECK ] Aviation Instruction Bookstore ] Aviation Instruction Career Center ]


Copyright © 1996-2005 by Neil C. Krey unless otherwise indicated.
Non-commercial reproduction rights granted if the following notice is included:
"Source: Neil Krey's CRM Developers Forum, http://www.crm-devel.org"