 |
16th
International System Safety Conference
September 14-19,
1998
Seattle, Washington
USA
System
Safety - The Total Solution
http://www.system-safety.org
|
 |
Potential Impact of the Millennium Bug on the Flight
Deck
Robert B. Barnes; Robert B. Barnes Associates, Inc.;
Scottsdale, Arizona, USA
Keywords: Avionics, Classic Air Transport
Aircraft, Flight Crew Workload, Risk Management,
Upgrade/Retrofit, Year 2000 (Y2K)
Abstract
When members of the operational aviation
community were recently asked "what will be the impact of the
Year 2000 Bug on flight deck operations?", the typical response
was "Its a non-issue." The general feeling seemed
to be that this is only a concern for a very few large organizations using
poorly designed database systems, not for flight crews.
Unfortunately, this is a frequently heard
technical response to the Year 2000 issue across many industries. However,
there may be a much more vital concern for all of us in the aviation community
Risk Management.
This issue has been spotlighted by the
aviation insurance industry. It has challenged airlines to prove that
their fleet avionics are free of the "millennium bug" which
threatens to disrupt computer software. Failure to do so will cause the
loss of insurance coverage for any incidents which might result from it.
Technically, many systems will not suffer
significant disruption. However, what is the risk to an organization if
a critical system fails and this failure results in a business loss or
personal injury liability? The remediation costs will pale by comparison
with the business and legal costs that could result.
Generally acknowledged statistics are
that between 3% and 5% of all integrated circuits ("chips")
will fail and that up to 20-30% of embedded software applications could
be directly affected. In view of these odds, prudence should dictate extensive
disaster planning and risk mitigation activity.
This paper will summarize current risk
mitigation initiatives in systems related to the flight deck and suggest
additional issues that need to be addressed by the operational aviation
community.
Introduction
The problems posed for global computing
and electronic system users by the Year 2000 deadline are providing plenty
of fuel for the Doomsday prophets. It is not the intention of this paper
to generate any additional feelings of gloom or total helplessness. One
need only visit the myriad of Internet Websites dedicated to this problem
to grasp the enormity and significance of the entire issue. Should any
reader be interested in delving deeper into this general topic, a very
good place to start is at:
http://www.year2000.com.
It is the intent of this paper, rather,
to focus on a specific area of concern for flight crews the potential
impact of the millenium bug on the flight deck. Some already-made-public
examples of both processes and specific Y2K "fixes" will be
provided for both reference and reflection in the context of this papers
focus. In some cases, the limits of these processes and/or fixes will
be noted, not to disparage the action but more importantly to recognize
the limits of such action.
"When informed of the Year-2000
problem, most people including (ironically) many computer professionals
shrug their shoulders and say, Well, I guess it could be
a problem. I sure hope theyre working on it
"
(ref. 15, p. 1)
As Risk Managers and System Safety Engineers,
we are a very important part of the "THEY" to whom such people
refer. Further, we need to remember that
"
no matter how good a
job we do up front, we must still keep in mind that designs and products
fail; people will make mistakes; sequences will be out of control; and
unexpected environments will occur." (ref. 2)
The Year 2000 problem is enormous and
extremely complex, especially when multiple component or system interactions
are involved. As a result, it is quite possible that all of the necessary
Year 2000 fixes may simply not be accomplished. Therefore, we have a responsibility
to anticipate what might occur and provide mitigation recommendations
to our operators the flight crews.
"As always, it is better to foresee
and manage risks than to wait until they come to fruition." (ref.
13, p. 550)
So Whats the Big Problem?
The Year 2000 problem involves date interpretation
by ANY computer-based system (more on this later). If a computer interprets
the Year 2000 as solely 00, errors in date representation and date calculations
can occur leading to a system failure. Some of the most common failures
(ref. 13, p. xxxv) are:
- Misinterpreting 00 as 1900 rather than 2000
- Sharing system information erroneously
- Malfunction due to missing programs and information
- Sorting data incorrectly
- Treating dates as flags
- Leap year problems
- Shutdown because the date was used in a calculation
(divide by 0)
- Password and license expiration.
Very simply, our electronic systems may
not be able to deal with the date 01/01/00 and may also not be able to
deal with operational "cycles" such as "do X every 100
days." (ref. 9)
As a leader in the aerospace industry,
Boeing has been at the forefront of addressing the Year 2000 issue for
its products. For example, the following comments are from a 27 March
98 message to its customers:
"Many operators have identified
to Boeing that they have heard media reports of computer difficulties
when changing from the year 1999 to 2000. This information has led to
inquiries asking Boeing to confirm that the Year 2000 software issue will
not affect any airplane systems, Boeing-provided, supplier provided, and
BFE-provided on any Boeing model airplanes
Boeing has established
a team to coordinate with suppliers, review supplier and Boeing data,
and determine the system impacts (if any) of the Year 2000 on any of the
Boeing provided systems on all of our models
Operator installation
of airborne systems installed by a supplemental type certificate (STC)
will not be included as part of the Boeing review." (ref. 1)
For the purposes of this paper, the critical
element of the foregoing statement is the following:
"Operator installation of airborne
systems installed by a supplemental type certificate (STC) will not be
included as part of the Boeing review." (ref. 1)
To the flight crew, this is an extremely
significant limit to Boeings review. Lets examine why
The "Classics"
The term "Classic Aircraft"
refers to an airplane that is out of production; however, through various
upgrades, modifications, etc., it still remains in revenue service. In
the broadest sense, this definition could even include the DC-3 but for
purposes of this discussion it will only refer to transport category aircraft
still in global commercial use.
How large is the current population of
such aircraft?
In a recent census of the worlds
aging aircraft, it was reported that there are more than 6,500 transport
category aircraft which were built prior to 1983 and are still in operational
service plus more than 650 others of these same types currently being
stored. Of these operational "classics," 22% are Boeing 727,
12% are Boeing 737-100/200, and 12% are MDC DC-9. The balance includes
such airplanes as the B-747 (early models), L-1011, and DC-10. (ref. 8)
Many operators have extended the life
of their classic fleets by adding upgraded avionics systems and related
life-extension modifications. In the United States, such retrofits are
accomplished through the Federal Aviation Administrations Supplemental
Type Certificate (STC) certification process generally by aftermarket
specialists without any participation from the aircraft manufacturer.
Since the driving force behind these retrofits
is typically economic, the operators goal is to add as much increased
functionality as possible for the least cost. This can and does lead to
a mix of avionics products being installed on the flight deck. Each of
these avionics products contains embedded logic and some of these systems
interact with other systems on the flight deck.
[Note: Although the focus of this paper
is transport category aircraft, It is important to recognize that there
may be similar numbers of executive or business category aircraft facing
these same issues.]
The Embedded Problem
Earlier, it was mentioned that the Year
2000 problem involves date interpretation by ANY computer-based system.
Basically, an embedded chip is a "micro-computer" which has
been "embedded" within some larger piece of engineering equipment
or industrial product. Historically, embedded systems have provided the
intelligence associated with "process control" or "data
acquisition" systems. However, over the past decade, computer "chips"
have become dramatically smaller, cheaper, and much more sophisticated.
Today, the term "embedded system" encompasses almost any device
that has "built-in" computer logic. (ref. 15, p. 285)
The first reaction of most people is that
the "classics" are still steam-gauge (i.e. analog) aircraft
except for some minor additions which dont communicate with each
other. However, the possibility that there may be chips embedded somewhere
on the flight deck which use timing elements starts to raise some concern.
Generic devices known to use such chips are collision avoidance, windshear,
ground proximity, radar, and digital communications systems (the typical
avionics upgrades often found in the "classics"). Could two
devices be installed from different manufacturers that need to cross talk?
The answer is "Yes." And, If either device has a chip that uses
timing intervals, there could very well be a potential problem.
How serious could the failure of just
one date-sensitive, embedded micro-processor be to an integrated system?
At midnight on New Years Eve, 1996-7, a computer glitch at an aluminum
smelter in the South Island of New Zealand shut down all operations simultaneously
and without warning. The failure was traced to a faulty computer software
program which failed to account for 1996 being a leap year. Since the
computer was not programmed to handle the 366th day of the
year, all 660 process control computers hung up simultaneously at midnight.
(ref. 15, p. 284) Whether such an error is in software code or firmware
is immaterial, the result will still be the same a total system
failure.
Boeing states in its March 1998 message
that "the systems that are impacted are due to the use of navigational
databases (NDB) whose date effectivity is either checked or displayed
by operational software within these systems. The effects of the Year
2000 rollover range from transient nuisance messages during the one month
period from December 1999 to January 2000, to recurring effects, to total
loss of operational capability." (ref. 1)
However, does this statement take into
account all the data which may be date stamped and embedded in avionics
systems that have found their way onto the flight deck of the "classics?"
Not necessarily. One must consider just how much data could actually be
date stamped.
There does not have to be a date displayed
by the system for it to have date sensitivity. For example, some chips
are programmed to look at intervals but these intervals are based upon
a date stamp burned into the embedded chip by its manufacturer (this is
done to give the chip a starting date so it will be "year aware").
There is simply no way to trick the system into the century change to
test the system since it doesnt display or even accept date change
inputs. It will simply keep pressing on until it reaches the century change
over and either continues to function normally or starts doing something
different (which may or may not be good).
In this case the only way to be absolutely
certain of Y2K compliance is to go back to the chip design level and determine
if the chip uses a date stamp function for any of its operations. Of course,
the next question is who designed and built the chip or chips and how
does one determine this bit of minutia? Going to the device manufacturer
may be a starting point but it is quite possible that the chip was purchased
without even the device designer knowing this fact. In addition, different
chips can find their way into the same devices based upon available supply
at the time of manufacture.
The fact that an embedded system is year-aware
and, thereby, Year-2000 vulnerable, does not necessarily mean that it
will fail on January 1, 2000. It simply means that we should take steps
to find out in advance if it will fail and what the consequences of a
failure might be.
The Issue of Compliance
Unfortunately, the definition of Year
2000 compliance often differs significantly from one organization to the
next. In fact, one of the first steps to Year 2000 compliance suggested
in the literature is typically "Your compliance definition should
identify possible sources of Year 2000 problems and should describe the
functionality of a Year 2000 compliant system." (ref. 13, p.
13)
What does being "compliant"
really mean? Is the device "in compliance" or simply "in
readiness?" Are you absolutely certain that even though a device
is compliant it is truly compliant with other systems with which it may
need to interact?
"Compliance" means that the
device will readily handle the century change-over and thereafter handle
dates normally forever. "Readiness" on the other hand simply
means that the device will handle the century change due to a software
"patch" of some sort. However, this "patch" will have
a finite life and unfortunately will re-open the date change issue again
at some point in the future. The logic behind using a "readiness"
solution is that the product can be made acceptable for a set period of
time assumed to be its maximum useful life. (But wasnt that how
we got into the Year 2000 issue in the first place?)
Although a device may be shown to be "compliant"
or even "ready" this also does not necessarily solve potential
interface issues with other devices. For example:
A date field today may be shown as mmddyy.
It can be made "ready" through a process called "windowing."
The manufacturer simply inserts an arbitrary > or < line of code
into the program to cause the device to make a decision as to 19xx or
20zz. Unfortunately there is no standardization which determines the yy
value. It could be yy <30 or 40 or 50 or >10, etc.
To be perfectly Y2K compliant, the mmddyy
should be changed to mmddyyyy which is fine inside one device but what
happens if that device sends information to another device which is merely
"ready" and still uses the mmddyy format but with the > or
< code added? Will the two devices talk to each other correctly or
simply shut down? No one really knows the answer to this because most
Y2K compliance work today is being done at the individual product level
in isolation from other systems.
The bottom line is that there is simply
no standard approach being used to achieve Year 2000 compliance.
Evaluating System Compliance
The safest approach for anyone concerned
with potential Y2K problems is to assume your critical systems will have
an issue and develop a certification process to prove total system compliance
rather than assume that everything will continue to interact properly.
There is a growing body of examples and experiences where testing the
date change yielded unexpected and often catastrophic system-wide results.
An electrical plant in England suffered
a plant wide shut down when clocks were reset to just before midnight,
12/31/99, because a sensor in a smokestack decided that it had not been
serviced in 100 years and initiated a shut down. The result was uncomfortably
similar to the New Zealand incident mentioned earlier but fortunately
both of these examples occurred on the ground. (ref. 11)
One approach to achieving Y2K certification
has been provided to defense suppliers by the DoD and USAF (ref. 14).
This Y2K weapon system evaluation strategy applies to aircraft and associated
ground support systems such as training and mission planning systems,
automated test equipment, system/software engineering support environments
(S/SEEs), and development/integration laboratory systems. It consists
of five phases:
- Awareness involves gathering all potential
stakeholders and gaining consensus on the scope of the system-wide problem.
- Assessment focuses on developing a complete
inventory of affected components and assessing the scope of their Y2K
impact.
- Renovation involves the actual "fixing"
of non-compliant system components.
- Validation focuses on verifying and certifying
systems for Y2K compliance.
- Implementation places Y2K compliant systems
into use (an important component of this phase is the updating and distribution
of a risk management and contingency strategy).
Compliance Assessment
Although not mentioned in the DoD approach,
there are actually two main reasons to conduct a thorough assessment (or
inventory) of both the hardware and software components of any integrated
system such as the flight deck of a classic.
First, there have been and will continue
to be many press releases from manufacturers across all industries regarding
the ability of specific devices to handle the date change and what correction
methods, if any, the manufacturer recommends. One should be skeptical
of broad manufacturer claims that its products are ready for the date
change. Recently, several companies have publicly announced full confidence
in their products, only to later retract those statements upon further
testing. (ref. 10)
Also, as has been mentioned earlier, all
manufacturers claims will naturally have certain limits. It is essential
that these limits are fully understood as they relate to your specific
operational situation.
Secondly, a thorough assessment will also
assist risk managers in comparing the cost of possible system upgrades
with the cost of compliance repair. Since compliance repair can become
extremely expensive, it may very well be more cost-effective to simply
replace suspected devices with demonstrably compliant upgrade devices.
If remediation is the chosen path, then
one must be ready to define the depth of intended system compliance. For
example, will the compliance assessment only be concerned with "safety
of flight" issues or will it need to address more fundamental issues
such as the safety to continue all approved operations (e.g. ETOPS, etc.)?
In any case, the following system analysis approach may be a good place
to start:
- Define the various systems involved (e.g. flight controls,
engine and fuel controls, navigation systems, communication systems,
etc.) and their sub-components (e.g. GPWS, FMS, EFIS, flow controls,
actuators, etc.)
- Determine how these systems and their sub-systems interact.
- Determine how to test the individual sub-components
and/or verify method of Y2K compliance.
- Determine how to test these sub-components for Y2K
compliance when integrated into a specific system.
- Determine how to test the totally integrated system
(e.g. the "classic" flight deck with upgrades)
- Conduct a Fault Tree Analysis to determine the requirement
for any additional risk mitigation.
Risk Mitigation
One approach to risk mitigation is to
break potential system problems down into three levels of significance
(much like a medical "triage"). Level-1 might involve safety
of flight items (hardware and/or software); Level-2 might involve mission
essential items (e.g. those supporting specific operational capabilities
such as ETOPS); and Level-3 might involve nuisance items (e.g. passenger
entertainment systems, etc.).
One concern with the triage approach expressed
by some is that there may be a communications gap between what the people
running the Y2K effort think is critical and what operational personnel
know is really critical. The Year 2000 problem is not simply one for programmers
to solve. It involves everyone in the operational chain and, therefore,
the process of identifying levels of criticality needs to include all
potentially affected participants.
Obviously, Level-1 items must be fixed.
However, Level-2 and Level-3 items may not all get fixed simply for no
other reason than they were not discovered. How, then, do their affects
get mitigated? Can all such risk mitigation be left to the initiative
and skills of the flight crew and other support personnel.
For example, here is a current mitigation
statement for two different Level-2 flight deck systems which have been
found to either display erroneous scratch pad messages or date ranges
as a result of Y2K:
"
if flight and maintenance
personnel are aware of this condition, the above listed versions can continue
to be used after the Year 2000 with no [product] changes
Alternatively,
operators may resolve the display condition by upgrading to [new versions
of the product]." (ref. 1)
How many such identified Level-2 and Level-3
situations can be accepted on the flight deck before they begin to have
a measurable affect on flight crew workload and, hence, performance? And,
what about those Y2K failures that occur without warning because they
simply were not identified? This brings to mind the frequent pilot comment
"whats that thing doing now?" a certain
indicator of pilot distraction and increased workload.
Questions to Ponder
There are a number of stakeholders involved
with Y2K risk mitigation issues involving the "classics." Here
are some examples of their possible concerns
Operators: Are all of my aircraft
Y2K compliant? How do I know (each "classic" has nearly become
a custom design)? Are there any data sharing issues? What is my liability
if a Y2K failure occurs?
Regulators: What is the applicants
definition of Year 2000 compliance? How does this solution relate to the
definition/solutions used by other similar or complementary products in
the same cockpit? Is cross-talk an issue? What are the effects of the
combined mitigation methods on flight crew workload?
STC Holders/Applicants: How are
my design changes impacted by other systems or products already in place?
What is my potential liability because an STC implies that the aircraft
will work as designed and approved? How can the affects of a Y2K failure
be mitigated without increasing flight crew workload?
Flight Crews: How are Y2K failures
most likely to appear on my flight deck and what do I do about them if
they do?
Maintenance Personnel: How are
Y2K failures most likely to appear in my aircraft or operations and what
do I do about them if they do?
Training Personnel (for both maintenance
and flight): What do the personnel that I am responsible for training
need to know about Y2K and its potential impact upon our companys
operations?
In Summary
How significant is the already perceived
Y2K risk potential for aircraft operations? One need only consider the
actions of the world aviation insurance market
"Airline insurers in the U.K.
are working on the final draft of an exclusion clause for the risks associated
with "millenium bug" glitches in avionics and computer software.
A joint working group of the Aviation Insurance Offices Association and
Lloyds Aviation Underwriters Association also is developing a questionnaire
for carriers to assess the measures they are taking to minimize the potential
risks involved if computer systems do not recognize the date change to
2000. Tony Medniuk, managing director of the British Aviation Insurance
Group, said the aim is to achieve "clarity." He said insurers
need an "informed basis" upon which to consider specially agreed
coverage with carriers based on precautionary measures they have taken.
London-based insurers represent 20-30% of the world aviation insurance
market." (ref 5, p 19)
and
Although a technical solution to the
millennium problem is actually quite simple, the added costs and follow-up
charges caused by it are considerable. The "millennium bug"
is certain to be a major concern for the insurance industry. In the meantime,
the following rule of thumb holds true for the computer date dilemma:
"burning buildings are not insurable." [emphasis
added] (ref. 4)
References
- Boeing External Message Number M-7200-98-01196, "Boeing
Study on the Impact to all Models Regarding the Rollover to the Year
2000," (for multiple external distribution), 27 Mar 98.
- DAntonio, P.E., "Presidents Message,"
Hazard Prevention, Journal of the System Safety Society, Volume
34, No. 1, 1998.
- "FAA: Y2K Issue Under Control," Aviation
Week & Space Technology, McGraw-Hill, NY, April 13, 1998, p. 78.
- Favre, R., Gantner, D., and Wiest, R., "Computer
problems 2000: The millennium muddle." Swiss Reinsurance Company,
Zurich, 1997.
- "Fine Print on Y2K Insurance," Aviation Week
& Space Technology, McGraw-Hill, NY, April 13, 1998, p. 19.
- Happel, D.A., "GPS User Equipment Year 2000 Rollover
Effects on the FMS-800 Flight Management System," Collins Avionics
& Communication Divisions, Rockwell International, 29 August 1997.
- IEEE Standard Glossary of Software Engineering Terminology.
- Kingsley-Jones, M., and Sheppard, I. "Ageing Airliner
Census 1997," Flight International, Reed International Business,
UK, 9-15 July 1997, pp. 37-50.
- Lyons, M., "Businesses should already have a recovery
plan," The Irish Times, April 10, 1998.
- Pegalis, A.M., "Perspective; For Risk Managers,
The Year 2000 is Now; Here is a Guide to Help in the Critical Process
of Making Systems Compliant," Business Insurance, Crain
Communications, December 23, 1996.
- Report on EPRI, Year 2000 Embedded Systems Workshop,
Scottsdale, AZ, September 1997.
- Rowe, W.D., An Anatomy of Risk, John Wiley &
Sons, NY, 1977.
- Sims, D., Robinson, J.M., McConnell, C., Silo, E.,
Shapiro, J., and Wilbanks, C., How To 2000Ô
, Raytheon E-Systems, IDG Books Worldwide, Inc., Foster City, CA.
1998.
- Weapon System Strategy for Year 2000, U.S. Department
of Defense, Rev G, 29 August 1997.
- Yourdon, E., and Yourdon, J., Time Bomb 2000What
the Year 2000 Computer Crisis Means to You!, Prentice Hall PTR,
Upper Saddle River, NJ., 1998.
Authors Biography
Robert B. Barnes, President, Robert
B. Barnes Associates, Inc., 8711 E. Pinnacle Peak Road, #337, Scottsdale,
AZ, 85255, USA, telephone +(602) 585-5703, facsimile +(602)
585-5703, e-mail RbarnesAZ@worldnet.att.net
His company has specialized for 10 years
in assisting technology-based companies enter the commercial marketplace.
As a result, he is regularly involved with the commercialization of leading-edge
systems in the aerospace, computer electronics, and semiconductor sectors.
Frequently, he provides human factors-related project management assistance
to aerospace companies.
He holds a Bachelors Degree in Aerospace
Engineering and a Masters Degree in Educational Psychology plus
a Professional Certificate in Aviation Safety. A former USAF instructor
pilot and flying safety officer, he also holds a Certificate of Master
Instructor in USAF Flying Training. His experience includes the certification
of after-market flight deck modifications to Classic air transport aircraft.
Mr. Barnes is a member of the American
Institute of Aeronautics and Astronautics, Society of Automotive Engineers,
System Safety Society, Association of Aviation Psychologists, and the
SAEs G-10 Committee on Aerospace Behavioral Engineering Technology.
|