[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Abstract Syntax Trees



Oyvind Teig <Oyvind.Teig@xxxxxxxxxxxx>  says,

> I have read the "Java Security" book by McGraw and Felten.
>
> On page 124 they argue that the Java to bytecode translation
> would have been safer had it been done with Abstract Syntax
> Trees (AST).
>
> 1. AST will "replace the need for global dataflow analysis",
> 2. AST also has the "same semantics as the source language
>    they represent."
>
> I gather that SPOC uses AST to translate from Occam to C,
> so this triggered my interest. Does KROC use ASTs?
>
> Does anybody have time to explain a little about ASTs,
> or point me to a somewhat top-level description - and
> perhaps describe point 1 and 2 above?

Abstract Syntax Tree(AST) is the hierarchical representation of
the semantic features of a program based on its abstract syntax.
Abstract syntax deals with only the semantic entities, unlike
the concrete syntax which also describes what kind syntactic
symbols will be used to describe the semantic features.

For example, to assign the variable 'a' with the value 5,
the concrete syntax should tell you whether you have to write
	a = 5		 /*  as in C   */
  OR
    	a := 5		 --  as in occam

The abstract syntax defines only what you do rather than how you do.
Therefore the same abstract syntax may be used for both of the above
assignments. They may be represented in an AST as follows:

		 ASSIGN

		 /    \
		a	5

A typical tree based on the concrete syntax would be:

                  ASSIGN		    ( great fun :)
		/   |	 \
              VAR  OP	EXPRESSION
              /	    |	   \
      IDENTIFIER    |	    LITERAL
            /	    |		\
           a	    =		5

As you see, the AST is the reduced tree from the parse tree where all
the intermediate nodes are eliminated.

Traversing the AST is like browsing through the program. You can
deduct a level of abstract meaning depending on the level of branching
you are standing.

Yes, AST has the "same semantics as the source language they represent."
This is an advantage over the byte-code so that you can quickly establish
the intention of the program at every level. With the byte-code
the purpose of security test would be rather limited to	 whether each
piece of instructions may do harm or not.
In both cases, I am not sure that you can 100% verify for the security.

One drawback is that, the file containing the AST of a program may be
several times larger in size when compared to the size its byte-code.

I can see a another advantage, though it may be a backfire for the
popularity of Java. Who needs Java as long as you end up with an AST
similar to the one which can be produced for a Java program?
You can use your favourite language to describe the same thing,
and the AST will be the common language.  Let everyone program
in their own language in a distributed programming environment all
around the Web and ASTs will be our interpreters to convey our
intentions.

Cheers,
Vedat
--
Vedat Demiralp
Computing Lab.,	  University of Kent,  Canterbury CT2 7NF,   UK
Tel: 44-(0)1227-827684		Fax:  44-(0)1227-762 811
---------------------------------------------------------------