Formalized Pilot Study of SafetyCritical Software Anomalies California Institute of Technology Dr. Robyn Lutz and Carmen Mikulski Software Assurance Group Jet Propulsion Laboratory California Institute of Technology NASA Code Q Software Program Center Initiative UPN 323-08; Kenneth McGill, Research Lead OSMA Software Assurance Symposium Sept 5-7, 2001 Topics California Institute of Technology Overview Preliminary Results Quantitative analysis Evolution of requirements Visualization tools Work-in-progress Benefits SAS01
2 Overview: Goal California Institute of Technology To reduce the number of safety-critical software anomalies that occur during flight by providing a quantitative analysis of previous anomalies as a foundation for process improvement. SAS01 3 Overview: Approach California Institute of Technology Analyzed anomaly data using Orthogonal Defect Classification (ODC) method Developed at IBM; widely used by industry Quantitative approach Used here to detect patterns in anomaly data Evaluated ODC using Formalized Pilot Study
R. Glass [97] detailed rigorous process to get valid results 35 steps divided into 5 phases Used here to evaluate ODC for NASA use SAS01 4 Overview: Status California Institute of Technology Year 2 of planned 3-year study Plan Design Conduct Evaluate Use Adapted ODC categories to operational spacecraft software at JPL: Activity: what was taking place when anomaly occurred? Trigger: what was the catalyst? Target: what was fixed? Type: what kind of fix was done? SAS01 5
Preliminary Results: Quantitative Analysis California Institute of Technology Analyzed 189 Incident/Surprise/Anomaly reports (ISAs) of highest criticality 7 spacecraft: Cassini, Deep Space 1, Galileo, Mars Climate Orbiter, Mars Global Surveyor, Mars Polar Lander, Stardust Institutional defect database Access database of data of interest Excel spreadsheet with ODC categories Pivot tables with multiple views of data 1-D and 2-D frequency counts of Activity, Trigger, Target, Type, Trigger within Activity, Type within Target, etc. SAS01 6 Preliminary Results: Quantitative Analysis California Institute of Technology
User-selectable representation of analysis results: tables, pie charts, bar graphs User-selectable sets of spacecraft for comparisons Provides rapid quantification of data Assists in detecting unexpected patterns, confirming expected patterns SAS01 7 Preliminary Results: Quantitative Analysis California Institute of Technology PROJECT (All) Target Distribution Count of Target 3% 2% 2%
30% 16% Target Information Development Ground Softw are Flight Softw are None/Unknow n Hardw are Build Package Ground Resources 23% 24% Drop More Series Fields Here SAS01 8 Preliminary Results: Quantitative Analysis California Institute of Technology PROJECT (All)
Distribution of Triggers w ithin Activity Count of Trigger 80 70 60 50 Drop M ore Seri es Fiel ds Here 40 Total 30 20 Flight Operations System Test A ctivity SAS01 Unknown Start/Restart/Shutdown Software Configuration
Inspection/Review Cmd Seq Test Special Procedure Recovery Normal Activity Hardware Failure Data Access/Delivery 0 Hardware Configuration 10 Unknow n Trigger 9 Preliminary Results: Quantitative Analysis
California Institute of Technology PROJECT (A ll) Count of Trigger 30 25 Target 20 None/Unknow n 15 Inf ormation Development 10 Hardw are 5 Ground Sof tw are 0
Trigger SAS01 Flight Sof tw are Ground Resources Ground Sof tw are Hardw are Inf ormation Development None/Unknow n Flight Operations Special Procedure Flight Operations Normal Activity Flight Operations Data Access/Delivery Ground Resources Flight Sof tw are Activity 10 Preliminary Results:
Evolution of Safety-Critical Requirements Post-Launch California Institute of Technology Anomalies sometimes result in changes to software requirements Looked at 86 critical ISAs from 3 spacecraft (MGS, DS-1, Cassini) 17 of 86 had Target (what was fixed) = Flight Software 8 of 17 changed code only; 1 was incorrect patch; 1 used contingency command Focused on remaining 7 with new software requirements as a result of critical anomaly SAS01 11 Preliminary Results: Evolution of Safety-Critical Requirements Post-Launch California Institute of Technology Found that requirements changes are not
due to earlier requirement errors Instead, requirements changes are due to: Need to handle rare event or scenario (4; software adds fault tolerance) Need to compensate for hardware failure or limitations (3; software adds robustness) SAS01 12 Preliminary Results: Evolution of Safety-Critical Requirements Post-Launch California Institute of Technology Confirms value of requirements completeness for fault tolerance Confirms value of contingency planning to speed change Contradicts assumption that what breaks is what gets fixed Suggests need for better requirements engineering for maintenance Results presented IFIP WG 2.9 Workshop on Requirements Engineering, Feb, 2001;
5th IEEE International Symposium on Requirements Engineering, Aug, 2001 SAS01 13 Preliminary Results: Evolution of Safety-Critical Requirements Post-Launch California Institute of Technology Launch Requirements Engineering Maintenance Requirements Evolution SAS01 14 Preliminary Results: Web-based Visualization Tool California Institute of
Technology Results of Peter Neubauer (ASU), Caltech/JPL Summer Undergraduate Research Fellow, 2001 Developed alternate visualizations of data results to support users analyses Web-based tool assists distributed users Sophisticated tool architecture builds on existing freeware Demo at QA Section Managers meeting (FAQ: Would this work for our project?) Demo to D. Potters JPL group developing nextgeneration Failure Anomaly Management System SAS01 15 Preliminary Results: Web-based Visualization Tool California Institute of Technology in-flight Objective: Investigate and characterize the common causes of safety-critical, software anomalies on spacecraft. The work uses a defect-analysis technology called Orthogonal Defect Classification, developed at IBM. A rigorous pilot study approach using the Glass criteria is currently underway. 7 space missions: 189 defects classified; chart shows one of the 6 possible 2-way views into this information
Large number of defects seen during sending commands to / receiving data from spacecraft. Discover and present useful information Of these, many were responded to by changing operational procedures or software on the ground. --- None/? --- Operations --- Hardware --- Ground Software --- Ground Resources --- Flight Software --- Build Package Unknown Start/Restart/ Shutdown Special Procedure
S/W Config. Recovery Inspection/ Review H/W Failure H/W Config. Data Access/ Delivery Normal Activity SAS01 Cmd seq test Trigger what was happening to cause defect to be noticed Calibration For other defects, changes to flight software more prevalent
Target what was changed to respond to defect 16 Work-in-progress California Institute of Technology Several patterns noted but not yet quantified Ex: Procedures often implicated Profile by mission phase Ex: Cruise, orbit insertion, entry, landing Better way to disseminate mini-LLs? Ex: Corrective action sometimes notes need for similar action on a future mission Incorporate standardized ODC classifications in next-generation database to support automation and visualization SAS01 17 Benefits California Institute of
Technology Data mining of historical and current databases of incidents / surprises / anomalies Uses metrics information to identify and focus on problem areas Provides a quantitative foundation for process improvement Equips us with a methodology to continue to learn as projects and processes evolve SAS01 18