[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Race condition...



A couple of points.

(a) The quality of this discussion is exceedingly high, and a political step needs to be made: Someone please announce to the regulators and to the auto manufacturers that we (this list) are the ones who can solve this problem! It is nothing more than the honest truth, and there is some chance we will be listened to, this time. An example is Tony Gore's and Denis Nicole's exchange:

===========
On Wed, 31 Mar 2010, Denis Nicole wrote:
On Wed, 31 Mar 2010, Tony Gore wrote:
But Toyota's problems would have been considerably less for a line of code 
that said something like
IF
 brake
   engine := idle
 TRUE
   engine := throttle

But then I wouldn't be able to double declutch to change down while braking. 
That's a serious hazard; my brakes will burn out before the bottom of the 
hill.
===========

The notion that you start out with five lines of code and then refine practically will, at first, be unbelievable to those outside our community. Perhaps we need to be armed with some demos of old Transputer hardware.

(b) In addition to Eric's distinction between the continuous and the discrete domain, we need to emphasize the "discontinuous response" characteristic of software. My (quasi-humorous) example was:

Become fiery ball? [N,y]

where the N means the default answer is "no". That means you have to have a y followed by an Enter to get a fiery ball, which is about a 1 in 6000 chance on an 80-key keyboard. So the retest is unlikely to reproduce the error, but it will kill a lot of people among millions of users.

This means that all the critical branches must be in the outermost, most visible layers of the code (as we are used to doing in Transputer occam programs with priority 0 right on the outside). Once again, we are the ones who know how to do it - I mean interrupt code in occam too, not just user code.

Larry

On Mar 31, 2010, at 2:06 AM, Eric Verhulst wrote:

Hi Ian,
 
I might have a text (or at least the basis) for your last book. It is an email I send to the Formal Community after FM2009, picking up a leaflet on a survey of formal methods.
 
I also did some investigation in car safety, and the issues are much wider than adding the right IF statements. The root cause is incomplete requirements analysis to start with, but mainly that the automotive people have not yet understood the difference between the continuous and the discrete domain. Mechanical parts are in the first domain and have a natural “graceful degradation” property. Electronics and software are in the second domain and can stop working in one clock cycle (say 10 nanoseconds). Even with a very low probability of failure at the hardware level, after some time in the human domain (e.g. driving a car for 5 years), it is a certainty that it will happen. Even using a dual-core controller (like TI has one with ARMs) will not cover the common mode failures. Hence our controller will be heterogeneous.
 
My conclusions is that subsystems like traction control, braking, etc need to be SIL4 => fault tolerance through triplication (pedal, sensors, CPU, SW, …) and voting. If this fails your are at SIL3 (fail safe mode), which is now the “standard” practice => often it means they put the engine at high throttle. Still very risky at high speed on a highway. Humans need a bit more than 20 nanosecs to take corrective action.
 
 
Best regards,
 
Eric
 
See below the mail I sent (to John Fitzgerald).
 
Dear FM authors,
 
I picked up the leaflet on the FM report at FM2009, read your excellent report,  and while I contributed to the survey, I believe that the kind of experience we have with using FM is a bit missing from the report.
 
In 2005 we started a project (with support from the Flemish IWT Gov organization) to develop a distributed RTOS (OpenComRTOS) from scratch but using FM from the beginning. (this was under the banner of Open License Society). The purpose twofold:
1.       To learn to know the practical use of Formal Methods. (we had no prior experience, just awareness, mostly of CSP).
2.       To develop a “correct” (read error-free and clean) networked RTOS. There was prior art. Virtuoso was a similar RTOS, developed by Eonic Systems in the 90’s and sold to Wind River in 2001.
 
The team was composed of three people:
-          Myself, mainly as project manager and RTOS architect.
-          Gjalt de Jong, PhD in electronics but well versed in telecom and software implementor.
-          Jose Faria, an engineer, electro-mechanical engineer: he was our formal methods engineer (no prior experience).
The FM work was supervised by Prof. Raymond Boute.
 
The first 6 months were mostly spend getting familiar in a rather informal way with what FM were, looking at tools, reading some books and papers, and finally selecting a tool.
We originally had planned to use SPIN (as Promela is C-like), but Prof. Boute suggested to use TLA+/TLC. While we were originally less enthousiastic about the mathematical notation,  in the end this might have been beneficial because it raised the level of abstraction, expecially in the way of thinking about the problem.
 
The way of working was that only Jose worked full time (although he left after one year). Gjalt and myself worked in parallel at a company managing embedded software development but with freedom to work on our RTOS project when we wanted.  The way of working was mainly by email (Jose was at another location) and by meeting face-to-face every 2 to 4 weeks. Mainly discussing using a white board.
 
So what happened?
 
First of all, the original TLA models were useless. It turned out that we were still thinking in terms of (C) implementation. E.g. when a waiting list was to be modeled, Jose initially wrote TLA code mimicking a linked list with pointers, etc. Turned out that the models then exploded very fast and became rather complex. Until we realized this and simple used a set. Another example was the discovery that waiting lists and buffers were actually the same thing, etc.
 
So we started again and then quickly as well incrementally build up the models. Not of the full RTOS (that is beyond the capabilities) but of the core functionality, like scheduling, semaphores, events FIFOs, priority inheritance, etc. Initial models were just 2 pages, but after some 25 iterations they could grow to a 20- 30 pages. However, between each iteration, often only 30 minutes was needed. We ran the model until no errors were found and then refined it, ran it again, etc. Running the models was in minutes. I must add that the interactions of the three people involved were very stimulating and helped a lot because each of use looked at the problem from a different angle.
 
At some point, we discovered that most traditional RTOS services are actually variations on a theme. Basically each service has a synchronisation point, a synchronization predicate and a synchronization action. We called this generic entity a “hub”. Another novelty was that we had consistently designed the RTOS as  a packet switching system (in a very systematic way).
 
This process lasted about 12 months. Two weeks later we had running code. This code was derived by hand from the models that served as architectural design. Since, we have found errors, but most of them were related to the underlying “HAL”, either an OS like Windows of Linux, either a specific embedded processor. Noteworthy is that doing this we discovered two serious “bugs” in Windows and Linux. In Windows the “CriticalSection” system call crashed when stressed, so did the Linux semaphore.
 
The major surprise however was the code size. Compared with Virtuoso we have 5 to 10 times less code for more or less the same basic functionality. This was even more surprising as Virtuoso had a very good reputation for size and performance, it was hand tweaked, assembler optimized, 3 generations of the RTOS were developed and about 150 person-years went into it. It was also very hard to port. OpenComRTOS on the other was almost fully written in ANSI-C (Misra checked), easily ported and much more scalable as well as safer. Code size figures: typically 5 to 10 Kbytes for a full version (MP).  An optimized SP version has been compiled to less than 1 KB. The Virtuoso RTOS typically started at 5 Kbytes (just task scheduling) and easily went up to 70 – 80 Kbytes (typically on a 32bit machine).
 
Once the software was stable, new Formal models in TLA were developed from the implementation source code. This was quite straightforward.  
 
Lessons:
 
1.       FM methods are not that difficult, but they must be used as a supporting tool from the beginning when developing the architecture and even when narrowing down the specifications.
2.       One doesn’t need large teams and efforts to use them, although dedication is a must.
3.       The biggest gain is in the architecture.
4.       Once a software has been developed with FM, verification (using model checking) is easy.
 
What we did not do, is to verify the C code using static checkers or proofing tools. This is probably more appropriate for numerical code.
 
This brings me to a final conclusion. The current dominant attitude in the FM community is still that FM are to be used to verify software. The problem is that this can still be badly designed and inefficient software. Hence the verification effort will also be much larger than needed. FM methods really shine when used as a tool and method supporting abstract thinking, together with informal techniques, pseudo-formal techniques (like architectural diagrams), etc. I recently attended a presentation by MS on the Hypervisor verification (60 000 lines of C ). I was astonished to hear that each C function is more or less manually proofed with as only input for the formal engineer the C code! I presume that if this Hypervisor had been developed using FM from the start, it could have been a lot smaller and likely a lot cleaner and more performant.   
 
Just image that all embedded software had been developed with FM resulting in code sizes that are maybe 20% of what we have now. The world would have been different. It would also have meant a serious reduction in power consumption. It is also my opinion that with a well trained team and using such “formalized” methodology, projects don’t need to take more effort that the still dominant manual methods. Hence, it can really a cost reduction and certainly a serious risk reducing approach (the traditional methodology is very prone to run-away after the project was officially finished).
 
Best regards,
 
Eric Verhulst
 
 
PS
 
1.       In the meantime we have started to work on the time-out functionality. Here using TLA is problematic, so we switched to UPPAAL. We are also starting to use Frama-C and B.  
2.       A paper was published at FM 2008 on this project.
 
 
From: Mailing_List_Robot [mailto:sympa@xxxxxxxxxx] On Behalf Of Ian East
Sent: Wednesday, March 31, 2010 10:32 AM
To: Tony Gore
Cc: Jones, Chris C (UK Warton); occam-com@xxxxxxxxxx
Subject: Re: Race condition...
 
Hi Tony
 
Thanks for this.  Very worrying, given I drove my family around Florida recently in a Prius!
It's interesting how simple basics can get drowned out.  I suppose it happened to NASA with Challenger.
 
One of the things that attracts me to rail, over road, (forthcoming book The Cost of the Car) is the engineering heritage.  Safety engineering began with rail, and signalling makes a nice student exercise in sequential system design, and, more recently, in formal analysis.
 
I worry whether the same art is being applied to car design.  Of course, it is entirely absent in road design, which is inherently lethal anyway, as the annual carnage shows.
 
And, of course, it is absent in the training of most 'software engineers', as is engineering per se.
 
I too began with µprocessor programming in assembly language, and believe as much has been lost as gained in the use of 'standard' HLLs.  I have had unending difficulty adapting to universal hacking – the lack of any real specification, design, or analysis, with software.
 
I'm trying to assemble a series of texts called Electronics, Systems, and Programming (http://www.openchannelpublishing.com/esp.html) that combine a more formal approach to software, and would welcome a proposal, particularly wrt embedded concurrent/reactive systems.  (You get £2.50 back a book, instead of 80p at most, and a lower selling price, hence more sales.  And yes, that is realistic, especially with e-books on the rise.)
 
The CPA community is still badly in need of books that reflect its idea and ideals.  The right people are always too busy, I guess.
 
Best
Ian
 
On 31 Mar 2010, at 03:04, Tony Gore wrote:


Hi Ian
 
See this
 
 
from an engineer who used to work at Ford in around the same era that I was working in automotive. The key phrase he uses is
 
 
Although not common over here in the UK because of our penchant for stick shift (manual gearboxes), most cars in the US have automatic gearboxes and cruise control. One of the standard signals to cruise control is the brake signal to disengage it.
 
 
Tony Gore
email  tony@xxxxxxxxxxxx 
tel +44-1278-761000  FAX +44-1278-760006  GSM +44-7768-598570 
URL: www.aspen.uk.com 
Aspen Enterprises Limited 
Registered in England and Wales no. 3055963 Reg.Office Aspen House, Burton Row, Brent Knoll, Somerset TA9 4BW.  UK
 

 

Ian East
Open Channel Publishing Ltd.
(Reg. in England, Company Number 6818450)