Adriane crash 
Author Message
 Adriane crash

Dutch videotext had a topic this evening that said that ESA found that the
Adriana-5 lauch failed because the software of its guidance systems was
accidentally replaced by the Adriane-4 version.

Anyone hear anything more about this ?

(If its true, it must be the worlds most spectacular example of a
configuration management failure :-)

--
-----------------------------------------------------------------------

--  Banking Consultant   --              Member Team-Ada             --
--  Ordina Finance BV    --    Located at Haarlem, The Netherlands   --



Sat, 09 Jan 1999 03:00:00 GMT  
 Adriane crash


: Dutch videotext had a topic this evening that said that ESA found that the
: Adriana-5 lauch failed because the software of its guidance systems was
  ^^    ^     nice typo    :-(                (for the Ariane)
--

Pfaffenwaldring 27, 70569 Stuttgart Uni Computeranwendungen
Team Ada: "C'mon people let the world begin" (Paul McCartney)



Mon, 11 Jan 1999 03:00:00 GMT  
 Adriane crash

Quote:

> Dutch videotext had a topic this evening that said that ESA found that the
> Adriana-5 lauch failed because the software of its guidance systems was
> accidentally replaced by the Adriane-4 version.

Close, but not quite. Based on my read of the report:

Ariane 4 & 5 use the same inertial measurement units and it appears that they did
not fully analyze the effect of the Ariane 5's flight characteristics against
these units.  Also, both Arianes 4 and 5 use dual redundant units which are,
unfortunately, identical in both hardware and software.  The result was that
higher (but acceptable for Ariane 5) acceleration levels caused a conversion
operation to overflow, an exception was raised, and both units completely shut
down leaving the flight control software with no navigation data!  It also
appeared from the report that the flight control software interpreted bogus data
as good and as a result commanded the engine nozzles to full deflection resulting
in the aerodynamic destruction of the vehicle.

On some really sad notes 1) the software that experienced the overflow had not
real value during that phase of flight and should have been disabled, 2) the
decision not to protect the conversion from overflow was influenced by a
requirement for a max of 80% processor utilization, and 3) the units were
_required_ to shut down as a result of any exception (rather than make the best of
it and continue in a degraded mode, if possible) on the assumption that it was
caused by a hardware failure.  Does the phrase 'penny wise, pound foolish' apply
here?

So, lots of intertwined assumptions, mistakes, etc. led to this failure but
definitely an avoidable problem.

--
Steve O'Neill                      | "No,no,no, don't tug on that!
Sanders, A Lockheed Martin Company |  You never know what it might

(603) 885-8774  fax: (603) 885-4071|    Buckaroo Banzai



Mon, 11 Jan 1999 03:00:00 GMT  
 Adriane crash


Quote:
>Dutch videotext had a topic this evening that said that ESA found that the
>Adriana-5 lauch failed because the software of its guidance systems was
>accidentally replaced by the Adriane-4 version.
>Anyone hear anything more about this ?
>(If its true, it must be the worlds most spectacular example of a
>configuration management failure :-)

That isn't what the report said. Here is a copy of the report obtained

------------------------------------------------------------------------------

Quote:
>Date:         Tue, 23 Jul 1996 16:59:23 EST


JOINT ESA/CNES PRESS RELEASE
N  33-96  -  Paris, 23 July 1996

Ariane 501 - Presentation of Inquiry Board report

Attached is a summary of the Inquiry Board report on the
failure of the first Ariane 5 flight.

The full report is available on written request from ESA and
CNES Public Relations.

     ESA     Tel.: + 33.1.53.69.72.82
                Fax: + 33.1.53.69.76.90

     CNES   Tel.: + 33.1.44.76.76.87
                Fax: + 33.1.44.76.78.16

ARIANE 501
Presentation of Inquiry Board report

On 4 June 1996 the maiden flight of the Ariane 5 launcher ended
in a failure.  Only about 40 seconds after initiation of the flight
sequence, at an altitude of about 3700 m, the launcher veered off
its flight path, broke up and exploded.

Mr Jean-Marie Luton, ESA Director General, and Mr Alain
Bensoussan, CNES Chairman, immediately set up an independent
Inquiry Board (see ESA-CNES Press Release of 10 June 1996),
which has now submitted its report.

The report begins by presenting the causes of the failure, analysis
of the flight data having indicated:

-   nominal behaviour of the launcher up to Ho  + 36 seconds;
-   simultaneous failure of the two inertial reference systems;
-   swivelling into the extreme position of the nozzles of the two
solid boosters and, slightly later, of the Vulcain engine, causing
the launcher to veer abruptly;

-   self-destruction of the launcher correctly triggered by rupture
of the electrical links between the solid boosters and the core
stage.

A chain of events, their inter-relations and causes have been
established, starting with the destruction of the launcher and
tracing back in time towards the primary cause.  These provide
the technical explanations for the failure of the 501 flight, which
lay in the flight control and guidance system.  A detailed account
is given in the report, which concludes:

"  The failure of Ariane 501 was caused by the complete loss of
guidance and attitude information 37 seconds after start of the
main engine ignition sequence (30 seconds after lift-off).  This
loss of information was due to specification and design errors in
the software of the inertial reference system.

  The extensive reviews and tests carried out during the Ariane
5 development programme did not include adequate analysis and
testing of the inertial reference system or of the complete flight
control system, which could have detected the potential failure."

Despite the series of tests and reviews carried out under the
programme, in the course of which thousands of corrections were
made, shortcomings in the system approach concerning the
software resulted in failure to detect the fault.  It is stressed that
alignement function of the inertial reference system, which served
a purpose only before lift-off (but remained operative afterwards),
was not taken into account in the simulations and that the
equipment and system tests were not sufficiently representative.

Without implicating the system architecture, the report makes a
series of recommendations for ensuring that the launcher's
software operates correctly.  The Ariane 5 programme will be
taking action in line with all these recommendations, as follows:

-   correction of the problem in the SRI (inertial reference
system) that led to the accident;
-   reexamination of all software embedded in equipment;
-   improvement of the representativeness (vis-?-vis the launcher)
of the qualification testing environment;
-   introduction of overlaps and deliberate redundancy between
successive tests:
     .   at equipment level,
     .   at stage level,
     .   at system level;
-   improvement and systematisation of the two-way flow of
information:

     .   up from equipment to system:  nominal and failure-mode
behaviour;
     .   down from system to equipment:  use of equipment items
in flight.

More specifically, the following corrective measures will be
applied:

-   to the inertial reference system:
     .   switch-off or inhibition of the alignment function after
liftoff,
     .   analysis/modification of processing, particularly on
detection of a fault (no processor shutdown),
     .   testing to check the coverage of the SRI flight domain;

-   to the system qualification environment:
     .   general improvement of representativeness through
systematic use of real equipment and components wherever
possible,
     .   simulation of real trajectories on SRI electronics.

-   In addition, the following general measures will be taken:
     .   critical reappraisal of all software (flight program and
embedded software),
     .   review of mechanisms for managing double failures,
     .   improvement of facilities for acquisition and retrieval of
telemetry data,
     .   improvement of overall coordination relating to software.

The ESA Director General and CNES Chairman will be making
a joint presentation of the plan of action put into effect and its
programmatic consequences at a press conference in September.
-------------------------------------------------------------------

Hope this is useful. So basically it _was_ a software fault - the
software didn't ignore signals it was receiving after launch from a
system whose signals are only valid prior to launch.

What I want to know is, who wrote that software, and if their was an
ESA representative responsible for it, who was he!

Not that I want to apportion blame of course, just interested!

Best Regards



Mon, 11 Jan 1999 03:00:00 GMT  
 Adriane crash

Quote:

> Adriana-5 lauch failed because the software of its guidance systems was
> accidentally replaced by the Adriane-4 version.I think you mean Ariane 5 :-)  The report is available at

http://www.esrin.esa.it/htdocs/tidc/Press/Press96/ariane5rep.html


Tue, 12 Jan 1999 03:00:00 GMT  
 Adriane crash


        >JOINT ESA/CNES PRESS RELEASE N  33-96  -  Paris, 23 July 1996

        >Ariane 501 - Presentation of Inquiry Board report

        >-------------------------------------------------------------------

        >Hope this is useful. So basically it _was_ a software fault

---Is this a euphemism for a programming error?  because that's
what it was -- a programming error.

   The error was in assuming that a value would not overflow.
The specific error was that a conversion of a double-precision
floating-point value (~58 significant bits) to 15 significant
bits caused fixed-point overflow.  The conversion was not
checked for overflow.  It should have been.  This is, after all,
a real-time system.  It's a fundamental check that a programmer
experienced in real-time systems should have carried out.

   Control was then passed to the interrupt handler, which
shut down the system.

   The question is, basically, why was Ada used for this work?
PL/I has specific facilities for real-time programming,
and especially for simulating exactly this (and other)
exceptions -- as if the exceptions had actually occurred.
The SIGNAL statement is designed for this purpose.  The
programmer would have discovered this problem the FIRST time
he used it!  And he could have included an exception handler
for this and other similar kinds of trivial errors.  These
exception handlers would have returned control to the code.

   A PL/I programmer and/or a real-time systems programmer
would have OBJECTED to the stupid requirement of shutting
down the system when a trivial error occurred.

        >What I want to know is, who wrote that software, and if their was an
        >ESA representative responsible for it, who was he!
        >Not that I want to apportion blame of course, just interested!




Tue, 12 Jan 1999 03:00:00 GMT  
 Adriane crash

: : Adriana-5 lauch failed because the software of its guidance systems was
:   ^^    ^     nice typo    :-(                (for the Ariane)

Yes, it seems GNAT is a better spelling checker then ispell :-)

--
-----------------------------------------------------------------------

--  Banking Consultant   --              Member Team-Ada             --
--  Ordina Finance BV    --    Located at Haarlem, The Netherlands   --



Wed, 13 Jan 1999 03:00:00 GMT  
 Adriane crash


Quote:

> ---Is this a euphemism for a programming error?  because that's
> what it was -- a programming error.

>    The error was in assuming that a value would not overflow.

The error was assuming that the Ariane 4 design would be adaquate
for the Ariane 5 system.

Quote:
> The specific error was that a conversion of a double-precision
> floating-point value (~58 significant bits) to 15 significant
> bits caused fixed-point overflow.  The conversion was not
> checked for overflow.  It should have been.

It was checked, hence the exception and an exception handler to
take corrective action.  Unfortunately the corrective action was
to assume that the SRI had failed and to shut it down.  The
software performed exactly as designed.

Quote:
>  This is, after all,
> a real-time system.  It's a fundamental check that a programmer
> experienced in real-time systems should have carried out.

>    Control was then passed to the interrupt handler, which
> shut down the system.

Exactly as designed.

Quote:
>    The question is, basically, why was Ada used for this work?

The failure is not a language issue, this is not the question.

-Bob



Fri, 15 Jan 1999 03:00:00 GMT  
 Adriane crash


Quote:

>    >JOINT ESA/CNES PRESS RELEASE N  33-96  -  Paris, 23 July 1996
>    >Ariane 501 - Presentation of Inquiry Board report
>    >-------------------------------------------------------------------
>    >Hope this is useful. So basically it _was_ a software fault
>---Is this a euphemism for a programming error?  because that's
>what it was -- a programming error.

Having read the report, I don't consider it to be a programming error,
it was a design and management error. It sounds like whoever designed
the system didn't pay enough attention to the requirements, and
whoever was managing it didn't pay enough attention to its conformance
to the requirements.

I think the fact that the overflow occurred was not due to a
programming oversight, after all the analyses had been done and a
decision to not check that variable had been made (*see additional
note below), but seeing as that variable should not have been in use
at that point, I don't think you can blame whoever wrote that code.

Quote:
>   The error was in assuming that a value would not overflow.
>The specific error was that a conversion of a double-precision
>floating-point value (~58 significant bits) to 15 significant
>bits caused fixed-point overflow.  The conversion was not
>checked for overflow.  It should have been.  This is, after all,
>a real-time system.  It's a fundamental check that a programmer
>experienced in real-time systems should have carried out.
>   Control was then passed to the interrupt handler, which
>shut down the system.
>   The question is, basically, why was Ada used for this work?

ESA Ada preference/mandate(?).

<..snip..>

*Note: I hope this makes ESA llok a bit closer at why they want to
limit processor loading and how the margin should be reduced through
the design and development phases. My own project has an ESA enforced
limit of 70% which is quite ridiculous given the equipment we're using
(GPS MA31750 10MHz MIL-STD-1750 processor). We cannot meet that but
have requested a waiver on that - I believe that's much better than
compromising the safety of the mission.

ESA's loading margins are really supposed to take account of a
requirement for future modifications to software once it has been
delivered. There's no way this should have been enforced for Ariane 5.

From the sound of the report,I think a pretty poor job has been done,
not by the programmers who wrote the code and performed the analysis
of what variables could safely be left unchecked, instead I think
whoever performed the requirement analysis and all levels of
management / reviewers above that havebeen completely negligent.

Best Regards



Fri, 15 Jan 1999 03:00:00 GMT  
 Adriane crash



        >>
        >> ---Is this a euphemism for a programming error?  because that's
        >> what it was -- a programming error.
        >>
        >>    The error was in assuming that a value would not overflow.

        >The error was assuming that the Ariane 4 design would be adaquate
        >for the Ariane 5 system.

        >> The specific error was that a conversion of a double-precision
        >> floating-point value (~58 significant bits) to 15 significant
        >> bits caused fixed-point overflow.  The conversion was not
        >> checked for overflow.  It should have been.

        >It was checked, hence the exception and an exception handler to
        >take corrective action.

---The SRI computer (& its backup) had an exception
handler, to be sure, but it did not have an exception
handler to take corrective action.  The exception handler
shut the computer down.

        > Unfortunately the corrective action was
        >to assume that the SRI had failed and to shut it down.  The
        >software performed exactly as designed.

---The software did not performed as designed.  It was
intended to shut down the computer only in the event of
a hardware error.  The software shut down the computer
because of a programming error.  The software performed
only as written!

        >>  This is, after all,
        >> a real-time system.  It's a fundamental check that a programmer
        >> experienced in real-time systems should have carried out.
        >>
        >>    Control was then passed to the interrupt handler, which
        >> shut down the system.

        >Exactly as designed.

---Again, not as designed.  It was designed to shut down only
in the event that the SRI computer failed.  Then the backup
would be used.



Sat, 16 Jan 1999 03:00:00 GMT  
 
 [ 19 post ]  Go to page: [1] [2]

 Relevant Pages 

1. Ariane Crash (Was: Adriane crash)

2. wish84t.exe crashes. threading build of wish crash.

3. Image Crash in VA4.02

4. Dolphin Crash, any idea ?

5. BUG: DialogView mutation to a ShellView crashes Dolphin...

6. Dolphin Crash

7. D5 crashes fatally when Tools+ is installed.

8. VM Crash Dump overwrites file

9. Memory management cause of COM server crash

10. Dolphin 3 crashes on startup

11. Dolphin Crash

12. First time for everything - Dolphin crash

 

 
Powered by phpBB® Forum Software