An ASRS Incident Report Study of Flight Deck Automation and Cockpit Task Management |
| Synopsis: | This page describes a sutdy of aircraft incident reports conducted to determine the effect, if any, of level of flight deck automation on Cockpit Task Management (CTM). The findings suggest that CTM may be more challenging and prone to error in advanced technology aircraft than in traditional technology aircraft. | ||
| Keywords: | cockpit task management, flight deck automation | ||
| Authors: | |||
| Jennifer Wilson | <Jennifer.Wilson@ResearchIntegrations.com> | Research Integrations, Inc., Tempe, Arizona, USA | |
| Ken Funk | <funkk@engr.orst.edu> | Department of Industrial and Manufacturing Engineering, Oregon State University, Corvallis, Oregon, USA | |
| Last Update: | 23 Dec 99 | This is a Work in Progress and its contents are subject to continual revision. | |
The last two decades have brought significant change to the commercial transport aircraft flight deck. Complex functions once performed by human pilots are now routinely performed by automated systems. Autoflight systems control aircraft altitude, heading, and speed as selected by the pilots. Flight management systems (FMSs) allow the human flightcrew to pre-program flight paths, complete with altitude and speed restrictions, then couple through the autoflight systems to fly those paths accurately and economically.
Flight deck automation has generally been well received by pilots and the aviation industry. There is little doubt that the addition of flight deck automation has made significant contributions to the safety and efficiency of operations, and accident records tend to corroborate this point: the hull loss accident rates for advanced technology (more automated) aircraft are generally lower than those for comparable traditional technology (less automated) aircraft (Boeing, 1998).
Yet several accidents involving advanced technology aircraft have called some of these perceived benefits into question. For example, on April 14, 1990, an Indian Airlines Airbus A320 aircraft crashed just short of the runway at the Bangalore, India airport destroying the aircraft and killing 90 people on board. The investigators determined that the probable cause of the accident was the failure of the pilots to realize the gravity of the situation and immediately apply thrust. The pilots spent the final seconds of the flight trying to understand why the autoflight system was in idle/open descent mode rather than taking appropriate action to avoid impact with the ground (Ministry of Civil Aviation, 1990).
The reports from the investigation of this and other recent advanced technology aircraft accidents suggest that besides some of the obvious problems with the design of human-machine interfaces, these aircraft may pose substantial challenges to the flightcrew as they perform a crucial process we call Cockpit Task Management (CTM). CTM is the process by which pilots prioritize and perform the multiple, concurrent tasks that compete for their attention. In CTM the pilot chooses (consciously or automatically) what task(s) to attend to. The concept is closely related to attention allocation.
The purpose of this document is to show that recent flight deck automation human factors research suggests that attention allocation or CTM is a critical safety issue in advanced technology aircraft, to relate that finding to task management research, and to suggest a course for future research to address that issue.
With the advent of advanced technology aircraft and the transfer of safety-critical functions away from human awareness and control, pilots, scientists, and aviation safety experts have expressed reservations about flight deck automation. Investigators like Wiener (1989), Wise and his colleagues (1993), Billings (1997), and Sarter and Woods (1995), as well as a human factors team chartered by the US Federal Aviation Administration (1996), have identified a variety of automation issues related to such things as the design of automation interfaces, the complexity of automated systems, pilots' lack of understanding of automation, the possibility that automation actually increases rather than reduces pilot workload, and the tendency for automated systems to distract pilots from safety-critical flight control tasks.
Yet until recently there did not exist a comprehensive list of such issues. This prevented a full understanding of flight deck automation issues and a coordinated effort to address those issues using limited research, development, manufacturing, operational, and regulatory resources. To address that lack, we conducted a more comprehensive review of flight deck issues. This study is described briefly below and more fully in a paper (Funk et al, 1999) and a website (Flight Deck Automation Issues).
The objectives of this automation issues study were to:
Our methodology involved two phases. In Phase 1 we compiled a list of possible problems with, or concerns about, flight deck automation, as expressed by pilots, scientists, engineers, and flight safety experts. We reviewed 960 source documents, including papers and articles from the scientific literature as well as the trade and popular press, accident reports, incident reports, questionnaires filled out by pilots and others, and documentation from our own analyses, recording citations of problems and concerns. In Phase 1 we did not attempt to substantiate the claims made about automation problems. Rather, we merely identified and recorded citations of people's perceptions of problems and their concerns about automation as a prelude to our Phase 2 work.
In Phase 2 we located and recorded evidence related to the possible problems and concerns identified in Phase 1 from a wide variety of sources. Since an issue is "[a] point of discussion, debate, or dispute ..." [9], in the following we refer to these possible problems and concerns as flight deck automation issues or, just issues, except where referring to the results of Phase 1, where we refer to them as problems and concerns.
The more than 100 sources we reviewed for evidence included accident reports, documents describing incident report studies, and documents describing scientific experiments, surveys and other studies. We also conducted a survey of automation experts, individuals with broad knowledge related to human factors and flight deck automation. We reviewed these sources for data and other objective information related to an issue. For each instance of this evidence we assessed the extent to which it supported one side of the issue or the other, and assigned a numeric strength rating between -5 and +5. We assigned a positive strength rating to evidence supporting that side of the issue claiming that a problem truly exists (supportive evidence) and a negative strength rating to evidence supporting the other side (contradictory evidence).
For example, consider the statement of the workload issue alluded to above, issue079: Automation may increase overall pilot workload, or increase pilot workload at high workload times and reduce pilot workload at low workload times, possibly resulting in excess workload and/or boredom. When we found evidence in a source indicating that pilot workload is increased by automation (at least under some circumstances), we recorded an excerpt from the source document and assigned this supportive evidence a positive rating, perhaps as great as +5. When we found evidence in a source indicating that no such problem exists (at least under some circumstances), we recorded an excerpt and assigned this contradictory evidence a negative rating, perhaps as great as -5.
We developed detailed strength assignment guidelines for evidence from each type of information source. For example, in pilot surveys of automation issues, if at least 90 per cent of the respondents were in agreement with a statement consistent with an issue statement, we assigned a strength of +5. If at least 90 per cent were reported as agreeing with a statement contradictory to an issue statement, we assigned a strength of -5. We developed similar guidelines for each type of evidence source.
For each instance of evidence found, we recorded in a database the related issue, an excerpt from the source document describing the evidence, source document reference information, the type of aircraft and equipment to which the evidence applied (if specified), and a strength rating. During the process of collecting and recording evidence, we revised, updated, consolidated, and organized the issues, yielding 92 flight deck automation issues. For each one, we compiled supportive and contradictory evidence related to it in preparation for dissemination.
We also performed a meta-analysis of this data to summarize the evidence in order to identify those issues that are problems in need of solutions, those issues that do not appear to represent problems, and those issues which require more research. We ranked the issues based on several criteria, including the number of citations (from Phase 1), the extent to which the experts in our survey agreed that the issue is truly a problem, their rating of the significance of that problem, and the sum of strength assignments made in our reviews.
In Phase 1, we found 2,428 specific citations of 114 possible flight deck automation problems and concerns. In Phase 2 we identified and recorded more than 700 pieces of evidence for 92 distinct automation issues. The issue ranking highest in supportive evidence alone was issue105: Pilots may not understand the structure and function of automation or the interaction of automation devices well enough to safely perform their duties. (Issue numbers greater than 92 are possible because more than 92 issues were identified, but some were eliminated or combined with others in the process.) Other issues ranking high in supportive evidence were related to automation behavior that surprises pilots and automation-induced complacency.
We found primarily contradictory evidence for several issues. In particular, issue079 (automation may adversely affect pilot workload, see above) ranked second in contradictory evidence despite its rank of eighth in number of Phase 1 citations.
A particularly noteworthy issue was issue102: The attentional demands of pilot-automation interaction may significantly interfere with performance of safety-critical tasks. (e.g., "head-down time", distractions, etc.). By virtue of its focus on how automation may influence the allocation of attention among tasks, this issue is clearly related to CTM. In the meta-analysis it ranked second with respect to the extent that our automation experts agreed that it is truly a problem and it ranked first in number of Phase 1 citations. It also ranked in the top 20 of all rankings based on other criteria.
In general, we consider those issues with the greatest overall supportive evidence (such as issue105, above) and especially those issues ranking highest in multiple criteria (e.g., issue102, above) to indicate real problems which require solutions, and resources should be dedicated to finding those solutions Along those same lines, we consider that those issues with the greatest overall contradictory evidence are not significant problems, and resources would be better used in solving real problems or further exploring unresolved issues, those ranking high in neither supportive nor contradictory evidence.
In particular, we find the contrast between issue102 (attention), which ranked high in multiple criteria, and issue079 (workload), which ranked high in contradictory evidence, interesting. In Phase 2, we found that there is strong evidence and agreement that automation can and often does draw pilot attention away from safety-critical flight control tasks. On the other hand, despite the fact that workload and attention are closely related, we also found that there is pretty strong evidence that automation does not significantly increase workload, per se. Clearly, the solution to the attention problem requires further investigation.
Even before our meta-analysis had revealed issue102 as so significant, we had already begun a study investigating it. Results from earlier studies (Chou et al, 1996) suggested that CTM, which is in fact attention allocation, may be particularly problematic in advanced technology, highly automated aircraft.
There are several reasons behind this speculation. First, there are a greater number of tasks to be performed in the automated aircraft. All the flight control tasks found in the traditional technology aircraft must still be performed in the advanced technology aircraft but, in addition to these tasks, there are now tasks associated with communicating with and managing the automation. Adding tasks to the queue of tasks demanding attention increases the demands on the flightcrew. While the automation provides additional external resources for the flightcrew to utilize, these resources must be managed, which increases demands on the function of CTM. Second, the same resources may be overloaded in the automated aircraft. Some of the demands added by automation require the cognitive processing resources that are already taxed in the traditional technology aircraft. Because of this, more prioritization may be required because more tasks are demanding the same resources. And third, some of the advanced systems, such as the Flight Management System (FMS), may inappropriately draw the attention of the flightcrew away from more critical tasks. When the FMS fails to behave as expected, the flightcrew's attention can be drawn away from the highest priority tasks required for flying the aircraft. Two factors contribute to the ability of the FMS to draw the flightcrew's attention. First, because of the nature of the FMS the flightcrew often cannot proceed with other tasks until they either satisfy its needs or they turn it off. If pilots have an incentive to keep the FMS on then they must correct the problem before their attention can be turned elsewhere. Second, when the FMS fails to behave as expected the flightcrew's attention is drawn toward it, as suggested by schema theory. As the functioning of the FMS defies explanation within the currently active schema, attention will be directed toward finding a better fitting schema. This phenomenon is sometimes referred to as 'novel pop out' (Johnston et al 1990).
Because our interest lies in TM as it occurs in real flight operations, we ideally would like to collect data from real flight operations. However, this method is often impractical. A viable alternative to viewing actual line operations is the use of incident reports submitted by pilots, such as those submitted to the Aviation Safety Reporting System (ASRS).
The ASRS was created as a means to collect reports of situations that compromise safety so that strategies to prevent these situations from becoming accidents could be created (Chappell, 1994). These reports are called "incident reports" and are submitted voluntarily by aviation operations personnel (e.g., pilots, Air Traffic Controllers, flight attendants, ground personnel). The reports contain a description of a situation occurring in flight operations that the reporter believes has safety implications. With each report providing a description of an event that occurred in operations, they can be used as a practical way to view real line operations from a pilot's perspective.
Following is an example of an ASRS incident report.
Synopsis
ACR MLG ALT DEVIATION EXCURSION FROM CLRNC ALT. REPORTER SAYS FMA CHANGED FLT MODE AND ALT SELECT BY ITSELF.
Narrative
THE F/O WAS FLYING THE ACFT. WE HAD BEEN ISSUED SEVERAL VECTORS AND TURNS BY ATC FOR FLOW CTL INTO CHICAGO O'HARE. I WAS ON THE P/A EXPLAINING THE ENRTE DELAY TO THE PAX WHEN I NOTICED THE FMA HAD CHANGED FROM "PERF CRUISE" TO "PERF DSCNT," AND THE ALT SELECT HAD CHANGED FROM 35000 TO 33000'. I ASKED THE F/O IF WE HAD BEEN CLRED TO FL330. HE SAID NO. THE ACFT ALT WAS 34600' WHEN I NOTICED THE PROB. THE DSCNT WAS STOPPED AT 34500'. I DON'T KNOW WHY THE AUTOPLT ENTERED A DSCNT MODE. AN ALT WARNING DIDN'T OCCUR BECAUSE THE ALT SELECT HAD CHANGED ALSO. I SUSPECT A PWR SURGE IN THE ELECTRICAL SYS MAY HAVE CAUSED THE PROB. I HAVE EXPERIENCED THIS PROB IN THE PAST WITH THE MLG FLT GUIDANCE SYS WHEN A HYD PUMP IS TURNED FROM LOW TO HIGH.
The abbreviations used can make the report difficult to understand so following is a more readable translation of this example.
Accession #92507
Synopsis
A medium-large transport aircraft used by an air carrier committed an altitude deviation. The aircraft made an excursion from the clearance altitude. The reporter says that the Flight Mode Annunciator (FMA) changed flight mode and altitude select by itself.
Narrative
The first officer was flying the aircraft. We had been issued several vectors and turns by Air Traffic Control to control the flow of traffic into Chicago O'Hare International Airport. I was on the public address explaining the enroute delay to the passengers when I noticed the FMA had changed from "PERF CRUISE" to "PERF DSCNT," and the altitude select had changed from 35000 to 33000 feet. I asked the first officer if we had been cleared to a flight level of 33000 feet. He said no. The aircraft's altitude was 34600 feet when I noticed the problem. The descent was stopped at 34500 feet. I don't know why the autopilot entered a descent mode. An altitude warning didn't occur because the altitude select had changed also. I suspect a power surge in the electrical system may have caused the problem. I have experienced this problem in the past with the medium-large aircraft flight guidance system when a hydraulic pump is turned from low to high.
In the past due to the nature of the data, ASRS incident reports have been used primarily for descriptive analyses. In this study, however, we felt it more useful to conduct an inferential analysis. Such an analysis may be conducted by carefully constructing a research question and choosing an appropriate statistical test. Because few researchers have taken this approach, there are not many examples of effective inferential analysis using ASRS incident report data.
The flightcrew's function of CTM on the commercial flight deck is an important part of flight operations and committing errors in CTM can have severe consequences. There is reason to believe that the level of automation may effect CTM, however before this study there had been little research that directly addressed this effect. Thus, the primary objective of this study was to begin evaluating the relationship between CTM of commercial airline pilots and the level of automation on the flight deck by determining how automation affects the frequency of Task Prioritization errors as reported in Aviation Safety Reporting System (ASRS) incident reports.
Because ASRS incident reports are primarily used for descriptive analyses, a methodology for conducting a good statistical comparison analysis is lacking. Therefore, the secondary objective of this study was to create a methodology that models an effective way to use ASRS incident report data in an inferential analysis.
We met the objectives of this study by carefully constructing a study to ensure that a fair comparison was made between the advanced and traditional technology populations. To accomplish this, we drew representative data samples from an ASRS incident report database and analyzed using an analysis tool constructed specifically for this study.
We compared two samples of ASRS incident reports in this study to determine if level of automation on the commercial aircraft flight deck affected the frequency of Task Prioritization errors. The first sample was composed of 210 incident reports submitted by pilots flying advanced technology aircraft and the second sample was composed of 210 incident reports submitted by pilots flying traditional technology aircraft. In total, 420 incident reports were analyzed.
The possibility exists that the effect of the level of technology of the aircraft could be confounded with differences in experience level because the advanced aircraft are comparatively new to commercial air carrier's fleets. To help avoid this confounding effect, we divided the two samples into three sub-samples each made up of 70 reports submitted during a specified time period: 1988-1989, 1990-1991, and 1992-1993. These submission periods were based on the availability of incident reports with narratives in the CD-ROM database we used.
We determined the sample sizes by performing a power analysis using the following values: power = 0.80, significance level of a = 0.05, and the effect size index of w = 0.20. We determined that a sample size of 196 incident reports was required to reject the null hypothesis, or in other words, conclude that there is a significant difference between the frequencies found in the two samples. Because each sample was to be divided into 3 sub-samples (196/3 = 65.333), the sample size was rounded up to 210 (210/3 = 70).
We performed a second power analysis to determine if the sub-sample size of 70 was adequate. With the power = 0.80, significance level of a = 0.05, and the effect size index = 0.40, we determined that a sub-sample size of 49 incident reports was required to reject the null hypothesis. Because 70 is greater than 49, the sub-sample size of 70 was determined to be adequate.
It should be noted that the two power analyses conducted each used a different effect size index. The effect size index for each of the power analyses was chosen specifically for the effect that was to be detected. For the two aircraft technology type samples, we wanted to detect the smallest effect size without the sample size becoming prohibitively large. If a difference between the frequency rates of Task Prioritization errors between the two technology types existed, we wanted to detect it. The effect size of w = 0.20 (loosely referred to as a 'medium-small' effect) was chosen for these samples. For the submission period sub-samples, we were interested only in detecting an effect of submission period that was large enough to significantly confound the effect of aircraft technology. It was not necessary to detect as small an effect for the sub-samples as was required for the aircraft technology type samples. Thus, we chose to use an effect size of w = 0.40 (loosely referred to as a 'medium-large' effect) for the submission period sub-samples.
We obtained the ASRS incident reports used in this study from the ASRS Aeroknowledge CD-ROM database (DOS Version Release 96-1). Homogeneity between samples is very important for statistical comparison studies. In an effort to collect homogenous samples, the sample populations were constrained so that the level of automation (i.e., aircraft technology type) and the submission period were the only two differences between the samples. For example, all the reports from both the advanced technology and the traditional technology samples were constrained to reports submitted by a member of the flightcrew flying a two-person commercial air carrier aircraft in which the aircraft was classified as a medium-large transport, large transport or widebody transport aircraft.
Another parameter that we held constant was phase of flight. Based on the fact that over half of all commercial hull loss accidents (Boeing, 1998) and that approximately 50% of incidents reported to ASRS by commercial air carrier pilots occur during the terminal phases of flight (Wilson, 1998), these phases of flight were considered a good place to look for errors. Thus, all reports analyzed were classified as having occurred during the descent or approach phase of flight.
We collected the reports from the database in the following way to ensure that the samples were representative of the population. First, we compiled the six populations (i.e., the two aircraft technology populations each divided into three submission periods) from the database based on the population parameters described above. Second, based on the total number of reports in each of the six populations we generated 70 random numbers for each sample to determine which of the reports would be included in the sample. This allowed the samples to be drawn randomly without replacement. Third, we then tagged the appropriate reports and downloaded them into a word processing document. Fourth, we removed all information related to the report except for the ASRS number, the synopsis, and the narrative. This was done so that the analyst would be unable to use this information to identify the report during analysis. Any information in the synopsis or the narrative that identified the report was not removed because the deletions would have left the data incomplete.
We developed an incident analysis form specifically for use in this project. This form allowed the analyst to classify the ASRS incident reports as either containing a Task Prioritization error or not, based on the description given in the narrative of the report. Using the form, we identified the tasks that were being performed during the incident period reported. We evaluated prioritization by identifying whether the active tasks were related to the task categories of aviate, navigate, communicate, manage systems, or non-flight related tasks. If a task of lower priority was active while a task of higher priority that required resources was not active, we classified the report as containing a Task Prioritization error.
The incident analysis form contained a listing of all tasks that must be performed during the descent and approach phases of flight. The task listing used was based on a functional analysis of a generic commercial air transport mission. We organized these tasks into four categories and the priority of the task was determined by the category to which it belonged (where 1 is highest and 5 is lowest): 1. Aviate, 2. Navigate, 3. Communicate, 4. Manage Systems, and 5. Non-Flight Related. There was no further prioritization within a category; it was assumed that all tasks that fell in a particular category were of the same priority. We defined each listed task not only in terms of performing the task itself, but also as maintaining awareness of the task's status. For example, the task '1.5 Control/monitor vertical profile' included controlling the vertical profile either manually or using the autopilot and monitoring the status of the vertical profile.
To illustrate how the analysis form was filled out, consider incident report #92507 shown above and its corresponding analysis form shown below.
Associated with each task listed on the form were three sets of boxes that were marked to highlight the parameters that were considered in the analysis. When any of the boxes were marked for a given task, we entered an excerpt or short summary upon which the judgment to mark the box had been based in the column called 'Related Excerpt/Comment.'
Starting on the left, the first set of boxes, 'Reported Tasks,' were used to indicate all of the tasks that were reported as being performed during the block of time described in the incident. This set of boxes was used to give a rough summary of all tasks that the reporter had described. We marked the 'explicitly stated' box if the reporter specifically mentioned the task in the narrative of the report. For example, given the following statement from incident report #92507:
"WE HAD BEEN ISSUED SEVERAL VECTORS AND TURNS BY ATC FOR FLOW CTL INTO CHICAGO O'HARE."
We would mark the 'explicitly stated' box for the Task 3.1 'Communicate with ATC' and include the excerpt 'ISSUED SEVERAL VECTORS AND TURNS BY ATC.' Reading on from this statement, it is implied, though not explicitly stated, that the flight crew began to carry out these requests given by the ATC.
"WE HAD BEEN ISSUED SEVERAL VECTORS AND TURNS BY ATC FOR FLOW CTL INTO CHICAGO O'HARE. I WAS ON THE P/A EXPLAINING THE ENRTE DELAY TO THE PAX..."
We would mark the 'strongly implied' box for the Task 1.3 'Control/monitor lateral profile' and again include the excerpt 'ISSUED SEVERAL VECTORS AND TURNS BY ATC.'
We marked the next box, 'ACTIVE TASKS during CRITICAL PERIOD,' when the task was active during the critical period of the incident. The critical period consisted of all the events that took place between the time that the "desired state" was defined and the time that the flightcrew became aware that the desired state was not or would not be met (i.e., a deviation occurred). We entered the critical period in the appropriate space at the bottom of the form. In incident report #92507, the critical period was "given clearance altitude" to "I noticed the problem." This would indicate that the critical period included all tasks that occurred between the point that the desired state of maintaining the cleared altitude was declared and the point that the flightcrew realized that they had overshot this altitude. In this report, the clearance for their desired altitude had been given before the window of time described in this incident report so all the tasks described up to the point that the captain noticed the problem were considered active tasks.
We marked the last set of boxes, 'STATUS during CRITICAL PERIOD,' if the task was active during the critical period (i.e. had been marked 'ACTIVE TASKS during CRITICAL PERIOD'). We marked the 'Unknown' box when we were unable to discern the task's status from the narrative. For example, it cannot be determined from this narrative if the public address system was working correctly and that the passengers actually heard the captain's announcement. In this case we would mark Task 3.5 'Communicate with passengers' as status 'Unknown.'
we marked the 'Satisfactory' box when the desired state of the task had and/or would be achieved given the current trend of activities. For example, given the following statement:
"...I ASKED THE F/O IF WE HAD BEEN CLRED TO FL330. HE SAID NO..."
We would mark the 'Satisfactory' box for the Task 3.4 'Communicate with flight crew'. The first officer and the captain effectively communicated this information.
We marked the 'Unsatisfactory' box when the reporter stated in the narrative that the desired state of the task had not and/or would not be achieved given the current trend of activities. For example, given the following statement:
"...THE ALT SELECT HAD CHANGED FROM 35000 TO 33000'. I ASKED THE F/O IF WE HAD BEEN CLRED TO FL330. HE SAID NO. THE ACFT ALT WAS 34600' WHEN I NOTICED THE PROB..."
We would mark the 'status unsatisfactory' box for the Task 1.6 'Maintain clearances and restrictions.' In this example, the desired altitude was 35,000 feet yet the altitude of the aircraft was 34,600 feet, a discrepancy of 400 feet.
Once all the appropriate boxes were marked on the analysis form, we classified the incident report as to whether a Task Prioritization error was committed by circling 'yes' or 'no.' We classified using the following rule:
If the status of a higher priority task is unsatisfactory and it is not active AND a lower priority task is active, then the incident report is classified as "TP error occurred" (otherwise it is classified as "no TP error occurred").
When a report was classified as containing a Task Prioritization error then we listed the tasks involved in this error in the space provided at the bottom of the analysis form. In incident report #92507, Task 1.6 'Maintain clearances and restrictions' was not active and unsatisfactory while the lower priority tasks 3.4 'Communicate with flight crew' and 3.5 'Communicate with passengers' were active, thus we classified this incident report as containing a Task Prioritization error.
We analyzed each incident report using the incident report analysis form described above. To minimize bias during the analysis, the two samples (including the three sub-samples within each) were randomly mixed and the sample to which each incident report belonged was not specified until all analyses were complete. After all reports had been analyzed, we sorted the reports and summarized the data.
Of the 420 incidents reports analyzed, we classifed 43 (10.2%) as containing Task Prioritization errors. Of these, 28 were from the advanced technology sample and 15 were from the Traditional Technology sample (see table).
| Submission Period |
Task Prioritization Error Frequency | Total Errors by Submission Period | |
| Advanced Technology | Traditional Technology | ||
| 1988-1989 | 13 | 7 | 20 |
| 1990-1991 | 11 | 5 | 16 |
| 1992-1993 | 4 | 3 | 7 |
| Total Errors by Aircraft Technology | 28 | 15 | |
We used the Chi Square (chi2) test to determine if the
difference between 28 Task Prioritization errors found in advanced technology incident
reports and the 15 Task Prioritization errors in traditional technology aircraft was
statistically significant. The chi2 value calculated was 4.379 at 1 degree of
freedom with a p value of 0.036. Using a significance level of a = 0.05, we concluded that
this difference was statistically significant.
We used the chi2 test next to compare the frequency difference between advanced and traditional technology aircraft by submission period. For each of the three submission periods, the difference between the technology types was not statistically significant (p-value = 0.10).
We divided the two samples into three sub-samples each made up of 70 reports submitted during a specified time period: 1988-1989, 1990-1991, and 1992-1993. The data for each submission period from both the advanced technology and the traditional technology aircraft were combined. The chi2 test was used to determine if the differences between the submission periods were significant. The chi2 value was 6.891 at 2 degrees of freedom with a p-value of 0.032. Using a significance level of a = 0.05, we concluded that this difference was statistically significant.
The data from the advanced aircraft only was used and chi2 value was calculated to compare the three submission periods. The chi2 value was 5.522 at 2 degrees of freedom with a p-value of 0.063. This was significant at a = 0.10.
The same approach taken in analyzing the advanced technology sample frequency data by submission period was used to analyze the traditional technology data. The result was not statistically significant (p-value = 0.423).
The primary objective of this study was to begin evaluating the relationship between CTM of commercial airline pilots and the level of automation on the flight deck by determining how automation affects the frequency of Task Prioritization errors as reported in ASRS incident reports. We found that Task Prioritization errors occurred in both advanced technology and traditional technology aircraft, and that overall there was a statistically significant difference between the number of reports classified as containing a Task Prioritization error in the advanced and traditional technology aircraft. This difference in the frequency of Task Prioritization errors suggests that Task Management may be more difficult in the Advanced Technology aircraft.
We cannot unequivocally state that the difference was caused by the nature of the design of the automation because this is confounded by the novelty of the advanced aircraft in air carrier fleets. In an attempt to better understand the effect of aircraft technology type, we looked more closely at the difference by submission period between the advanced and traditional technology samples. However, we found that the difference by submission period between aircraft technology was not statistically significant. Why would this be the case? The answer is in the power of the statistical test. For the overall test in which the three submission periods' frequency data were combined for the two technology types, the power of the test was such that a medium-small effect could be detected. For the tests conducted by submission period, however, the power of the test was such that a medium-large effect could be detected. This difference in effect size detection was due to the difference in sample size. In the population of ASRS incident reports, the actual effect that we were trying to detect was smaller than medium-large and therefore the test by submission period lacked the appropriate power to detect it. To determine if there was a significant difference between aircraft technology in each submission period, the sub-sample size would have needed to be increased.
We also looked at the effect of submission period on Task Prioritization errors. By separating the two samples into three equal sub-samples based on submission period, a decrease in the frequency of Task Prioritization errors in both the advanced technology sample and the traditional technology sample over time became apparent. This difference was statistically significant for the advanced technology sample; however, it was not statistically significant for the traditional technology sample. These data are consistent with the idea that industry experience with the advanced technology aircraft played a role in the differences in the frequency of Task Prioritization errors, but this cannot be stated conclusively. It may be the case that improved pilot training programs, or any number of other factors could have contributed to this reduction in Task Prioritization errors and that this reduction may have occurred in all aircraft, regardless of their level of technology. Further research is required to determine if the novelty of the advanced aircraft indeed played the critical role in creating the difference of frequency of Task Prioritization errors between the two aircraft types.
When evaluating the results of this study, one must bear in mind the limitations of ASRS incident report data. The samples used in this study were drawn from a non-random sample of events occurring in aviation operations and the ASRS incident reports themselves reflect reporting biases. What can be said with confidence however, is that Task Prioritization errors do exist in actual line operations and their existence warrants thoughtful consideration. This study sheds some light on one factor, automation on the commercial flight deck, which may effect the frequency of these errors.
The secondary objective of this study was to create an effective methodology for using ASRS incident reports for inferential analysis. By carefully constructing a research question and choosing an appropriate statistical test, an inferential analysis was conducted on the data collected. In this study statistically significant results were derived, supporting the notion that ASRS incident reports can be effectively used both for descriptive analyses and for inferential analyses.
By using ASRS data, we took advantage of several of the strengths of this type of data. First, the reports were able to provide a practical alternative to collecting data from the jumpseat of a commercial aircraft. The situations described in the narratives of the reports represented situations that had occurred in line operations that gave this study ecological validity and avoided the possibility that the effect found was an artifact of a laboratory experiment. Second, the large number of incident reports available made it possible to construct a study with a large enough power to detect a medium-small effect.
While Task Prioritization errors occur in both advanced technology aircraft and traditional technology aircraft, these errors occur more frequently in the advanced technology aircraft. The increased frequency of Task Prioritization errors suggests that Task Management may be more difficult in Advanced Technology aircraft. The submission period effect suggests that there is a downward trend in Task Prioritization errors in advanced technology aircraft.
Based on these conclusions, we have two recommendations. First, we recommend that further research be conducted to differentiate the effect of automation due to the nature of its design and the effect of automation based on its novelty in air carrier fleets. One way this could be accomplished is by analyzing additional submission periods and adding these data to the results presented here. The results of such a study could also be used to determine if the overall downward trend of Task Prioritization errors that appeared in this study continues.
Second, we recommend that when designing a training program for pilots of advanced aircraft that this information be disseminated to the pilots. The information could raise the awareness of pilot's susceptibility to Task Prioritization errors in advanced technology aircraft. It is possible that a heightened awareness could counteract this susceptibility.
| This work was supported in part by the Federal Aviation Administration, Office of the Chief Scientific and Technical Advisor for Human Factors (AAR-100) under grant 93-G-039 titled Comparative Analysis of Flight Deck Automation. During the course of this project, the three technical monitors for the grant project have been John Zalenchak, Tom McCloy, and Eleana Edens. We appreciate the assistance of Dr. Ed McDowell, formerly of Oregon State University in selecting and performing the appropriate statistical analysis techniques and in reviewing our results. We thank Rolf Braune of Braune and Associates, Inc., whose suggestion led us to consider report submission period as a possible factor in reported CTM errors. Finally, we would like to thank Beth Lyall, President of Research Integrations, Inc., for her valuable contributions during this study and for her review of this document. |
Billings, C.E. (1997). Aviation automation. Mahwah, NJ: Lawrence Erlbaum Associates, 1997.
Boeing Commercial Airplane Group (1998). Statistical summary of commercial jet airplane accidents, worldwide operations, 1959-1997. Seattle, WA: Boeing Commercial Airplane Group, Airplane Safety Engineering.
Chappell, S.L. (1994). Using voluntary incident reports for human factors evaluations. In N. Johnston, N. McDonald, & R. Fuller (Eds.), Aviation Psychology in Practice. Brookfield, VT: Avebury Technical. pp. 149-169.
Chou, C., Madhaven, D., & Funk, K. (1996). "Studies of cockpit task management errors". International Journal of Aviation Psychology 6(4), pp. 307-320.
FAA Human Factors Team (1996). The interfaces between flightcrews and modern flight deck systems. Washington: Federal Aviation Administration, 1996.
Funk, K., Lyall, B., Wilson, J., Vint, R., Niemczyk, M., Suroteguh, C., and Owen, G. (1999) "Flight Deck Automation Issues," International Journal of Aviation Psychology, 9(2), pp. 109 - 123.
Johnston, W.A., Hawley, K.J., Plewe, S.H., Elliot, J.M.G., & DeWitt, M.J. (1990). Attention capture by novel stimuli . Journal of Experimental Psychology: General, 119, pp. 397-411.
Morris, W., Ed. (1969), The American Heritage Dictionary of the English Language. Boston: Houghton Mifflin.
Ministry of Civil Aviation, India (1990). Report on accident to Indian Airlines Airbus A-320 Aircraft VT-EPN at Bangalore on 14th February 1990. Government of India, Ministry of Civil Aviation.
Sarter, N.B., Woods, D.D.(1995). "'How in the world did we ever get into that mode?' Mode error awareness in supervisory control". Human Factors 31(1), pp. 5-19.
Wiener, E.L. (1989). Human factors of advanced technology ("glass cockpit") transport aircraft (NASA CR 177528). Moffet Field, CA: NASA Ames Research Center.
Wilson, J.R. (1998). The Effect of Automation on the Frequency of Task Prioritization Errors on Commercial Aircraft Flight Decks: An ASRS Incident Report Study. Unpublished thesis, Oregon State University.
Wise, J.A., Abbott, D.W., Tilden, D., Dyck, J.L., Guide, P.C., & Ryan, L. (1993). Automation in corporate aviation: Human factors issues (CAAR-15406-93-1). Daytona Beach, FL: Center for Aviation/Aerospace Research, Embry-Riddle Aeronautical University.
Listed below, most recent first, are changes made to this page since its creation.