Hi Ian, I might have a text (or at least the basis) for your last book.
It is an email I send to the Formal Community after FM2009, picking up a
leaflet on a survey of formal methods. I also did some investigation in car safety, and the issues are
much wider than adding the right IF statements. The root cause is incomplete
requirements analysis to start with, but mainly that the automotive people have
not yet understood the difference between the continuous and the discrete
domain. Mechanical parts are in the first domain and have a natural “graceful degradation”
property. Electronics and software are in the second domain and can stop
working in one clock cycle (say 10 nanoseconds). Even with a very low probability
of failure at the hardware level, after some time in the human domain (e.g.
driving a car for 5 years), it is a certainty that it will happen. Even using a
dual-core controller (like TI has one with ARMs) will not cover the common mode
failures. Hence our controller will be heterogeneous. My conclusions is that subsystems like traction control,
braking, etc need to be SIL4 => fault tolerance through triplication (pedal,
sensors, CPU, SW, …) and voting. If this fails your are at SIL3 (fail safe mode),
which is now the “standard” practice => often it means they put the engine at
high throttle. Still very risky at high speed on a highway. Humans need a bit
more than 20 nanosecs to take corrective action. Best regards, Eric See below the mail I sent (to John Fitzgerald). Dear FM authors, I picked up the leaflet on the FM report at FM2009, read
your excellent report, and while I contributed to the survey, I believe
that the kind of experience we have with using FM is a bit missing from the
report. In 2005 we started a project (with support from the Flemish
IWT Gov organization) to develop a distributed RTOS (OpenComRTOS) from scratch
but using FM from the beginning. (this was under the banner of Open License
Society). The purpose twofold: 1.
To learn to know the practical use of Formal Methods.
(we had no prior experience, just awareness, mostly of CSP). 2.
To develop a “correct” (read error-free and clean)
networked RTOS. There was prior art. Virtuoso was a similar RTOS, developed by
Eonic Systems in the 90’s and sold to Wind River in 2001. The team was composed of three people: -
Myself, mainly as project manager and RTOS architect. -
Gjalt de Jong, PhD in electronics but well versed in
telecom and software implementor. -
Jose Faria, an engineer, electro-mechanical engineer:
he was our formal methods engineer (no prior experience). The FM work was supervised by Prof. Raymond Boute. The first 6 months were mostly spend getting familiar in a
rather informal way with what FM were, looking at tools, reading some books and
papers, and finally selecting a tool. We originally had planned to use SPIN (as Promela is
C-like), but Prof. Boute suggested to use TLA+/TLC. While we were originally
less enthousiastic about the mathematical notation, in the end this might
have been beneficial because it raised the level of abstraction, expecially in
the way of thinking about the problem. The way of working was that only Jose worked full time
(although he left after one year). Gjalt and myself worked in parallel at a
company managing embedded software development but with freedom to work on our
RTOS project when we wanted. The way of working was mainly by email (Jose
was at another location) and by meeting face-to-face every 2 to 4 weeks. Mainly
discussing using a white board. So what happened? First of all, the original TLA models were useless. It
turned out that we were still thinking in terms of (C) implementation. E.g.
when a waiting list was to be modeled, Jose initially wrote TLA code mimicking
a linked list with pointers, etc. Turned out that the models then exploded very
fast and became rather complex. Until we realized this and simple used a set.
Another example was the discovery that waiting lists and buffers were actually
the same thing, etc. So we started again and then quickly as well incrementally build
up the models. Not of the full RTOS (that is beyond the capabilities) but of
the core functionality, like scheduling, semaphores, events FIFOs, priority
inheritance, etc. Initial models were just 2 pages, but after some 25
iterations they could grow to a 20- 30 pages. However, between each iteration,
often only 30 minutes was needed. We ran the model until no errors were found
and then refined it, ran it again, etc. Running the models was in minutes. I
must add that the interactions of the three people involved were very
stimulating and helped a lot because each of use looked at the problem from a
different angle. At some point, we discovered that most traditional RTOS
services are actually variations on a theme. Basically each service has a
synchronisation point, a synchronization predicate and a synchronization
action. We called this generic entity a “hub”. Another novelty was that we had
consistently designed the RTOS as a packet switching system (in a very
systematic way). This process lasted about 12 months. Two weeks later we had
running code. This code was derived by hand from the models that served as
architectural design. Since, we have found errors, but most of them were
related to the underlying “HAL”, either an OS like Windows of Linux, either a
specific embedded processor. Noteworthy is that doing this we discovered two
serious “bugs” in Windows and Linux. In Windows the “CriticalSection” system
call crashed when stressed, so did the Linux semaphore. The major surprise however was the code size. Compared with
Virtuoso we have 5 to 10 times less code for more or less the same basic
functionality. This was even more surprising as Virtuoso had a very good
reputation for size and performance, it was hand tweaked, assembler optimized,
3 generations of the RTOS were developed and about 150 person-years went into
it. It was also very hard to port. OpenComRTOS on the other was almost fully
written in ANSI-C (Misra checked), easily ported and much more scalable as well
as safer. Code size figures: typically 5 to 10 Kbytes for a full version (MP).
An optimized SP version has been compiled to less than 1 KB. The Virtuoso
RTOS typically started at 5 Kbytes (just task scheduling) and easily went up to
70 – 80 Kbytes (typically on a 32bit machine). Once the software was stable, new Formal models in TLA were
developed from the implementation source code. This was quite straightforward.
Lessons: 1.
FM methods are not that difficult, but they must be
used as a supporting tool from the beginning when developing the architecture
and even when narrowing down the specifications. 2.
One doesn’t need large teams and efforts to use them,
although dedication is a must. 3.
The biggest gain is in the architecture. 4.
Once a software has been developed with FM,
verification (using model checking) is easy. What we did not do, is to verify the C code using static
checkers or proofing tools. This is probably more appropriate for numerical
code. This brings me to a final conclusion. The current dominant
attitude in the FM community is still that FM are to be used to verify
software. The problem is that this can still be badly designed and inefficient
software. Hence the verification effort will also be much larger than needed.
FM methods really shine when used as a tool and method supporting abstract
thinking, together with informal techniques, pseudo-formal techniques (like
architectural diagrams), etc. I recently attended a presentation by MS on the
Hypervisor verification (60 000 lines of C ). I was astonished to hear that each
C function is more or less manually proofed with as only input for the formal
engineer the C code! I presume that if this Hypervisor had been developed using
FM from the start, it could have been a lot smaller and likely a lot cleaner
and more performant. Just image that all embedded software had been developed
with FM resulting in code sizes that are maybe 20% of what we have now. The
world would have been different. It would also have meant a serious reduction
in power consumption. It is also my opinion that with a well trained team and
using such “formalized” methodology, projects don’t need to take more effort
that the still dominant manual methods. Hence, it can really a cost reduction
and certainly a serious risk reducing approach (the traditional methodology is
very prone to run-away after the project was officially finished). Best regards, Eric Verhulst PS 1.
In the meantime we have started to work on the time-out
functionality. Here using TLA is problematic, so we switched to UPPAAL. We are also
starting to use Frama-C and B. 2.
A paper was published at FM 2008 on this project. From: Mailing_List_Robot
[mailto:sympa@xxxxxxxxxx] On Behalf Of Ian East Hi Tony Thanks for this. Very worrying, given I drove my
family around Florida recently in a Prius! It's interesting how simple basics can get drowned out.
I suppose it happened to NASA with Challenger. One of the things that attracts me to rail, over road,
(forthcoming book The Cost of the Car) is the engineering heritage.
Safety engineering began with rail, and signalling makes a nice student
exercise in sequential system design, and, more recently, in formal analysis. I worry whether the same art is being applied to car design.
Of course, it is entirely absent in road design, which is
inherently lethal anyway, as the annual carnage shows. And, of course, it is absent in the training of most 'software
engineers', as is engineering per se. I too began with µprocessor programming in assembly
language, and believe as much has been lost as gained in the use of 'standard'
HLLs. I have had unending difficulty adapting to universal hacking – the
lack of any real specification, design, or analysis, with software. I'm trying to assemble a series of texts called Electronics,
Systems, and Programming (http://www.openchannelpublishing.com/esp.html) that
combine a more formal approach to software, and would welcome a proposal,
particularly wrt embedded concurrent/reactive systems. (You get £2.50
back a book, instead of 80p at most, and a lower selling price, hence more
sales. And yes, that is realistic, especially with e-books on the rise.) The CPA community is still badly in need of books that
reflect its idea and ideals. The right people are always too busy, I
guess. Best Ian On 31 Mar 2010, at 03:04, Tony Gore wrote:
Hi Ian See this from an engineer who used to work at Ford in around the same era
that I was working in automotive. The key phrase he uses is Although not common over here in the UK because of our penchant
for stick shift (manual gearboxes), most cars in the US have automatic
gearboxes and cruise control. One of the standard signals to cruise control is
the brake signal to disengage it. Tony Gore email tony@xxxxxxxxxxxx Ian East Open Channel Publishing Ltd. (Reg. in England, Company Number 6818450) |