Intervention rate as a leading indicator
The intervention rate across a flight-test campaign is one of the more useful leading indicators of program risk. Almost every program we work with under-instruments the rate by a factor of two and gets surprised when the trend turns.
The under-instrumentation pattern is consistent. Programs count overt interventions: the remote pilot took control, the safety pilot took control, the test conductor called a knock-it-off. They miss the partial interventions: the operator nudged a control input that the autonomy was about to make on its own, the safety pilot disabled a logic mode briefly, the operator silenced a contingency-mode alert that they would have responded to had the test card permitted it. These are interventions; the autonomy did not run the encounter unaided.
Our practice is to instrument every operator action that occurred during an autonomy-active window and to classify each action post-flight as: independent operator activity (operator was doing something the test card explicitly required), passive monitoring (operator action was not in response to autonomy), or intervention (operator action was a response to or a pre-emption of autonomy behavior). The classification is performed by the test team, not by an automated rule, but the rule recommends the classification.
The trend over a campaign is the artifact that matters. A flat or improving intervention rate over a campaign of varied test cards is evidence the autonomy is generalizing; a worsening rate, even if every individual test card "passed," is evidence the autonomy is overfitting to the easier encounters. We have caught two programs in the past 18 months that were trending wrong-direction even as the program-office reporting showed flat results. In both cases, the cause was a quiet drift in the test-card mix toward easier encounters; in both cases, surfacing the trend in the program review changed the remaining-test-card plan.
The instrumentation cost of doing this properly is small (one extra reduction step on existing flight-test data) and the program-management value is large. We make it part of the engagement on every flight-test program we support.