by Rick Beton
Occam-Pi is quite like Occam2 but with new additions in the areas of shared channels, mobility, recursion, barriers etc. However, it doesn’t address some of the shortcomings that make Occam a difficult choice for general-purpose programs.
The following proposals aim to fill in the gaps. The change is radical: the syntax is not backward compatible. There is also a broad objective to make things simpler in a language that has tended to drift away from its eponymous philosopher’s ideals.
Namespaces via Packages
This proposal is unlike https://www.cs.kent.ac.uk/research/groups/plas/wiki/OEP/130, which is not radical enough. The proposal is:
- #INCLUDE and #USE are dropped.
- There is no distinction between headers and code.
- Package semantics are loosely similar to Java etc. except / is the absolute root and is required.
- The widely-used reverse DNS naming convention is recommended but not obligatory.
- A PACKAGE statement may optionally appear as the first declaration in a source file and affects all the declarations following it up to the end of the file.
- IMPORT may appear any number of times in a source file. It serves in effect to shorten the fully-qualified names when they refer to declarations in other packages. Each one applies to all the following declarations in its file, including subsequent IMPORT statements.
- IMPORT may specify a package
this shortens references by removing the need for leading /a in the first case, or /a/b/c i the second.
- IMPORT may also specify a sub-package
this shortens references by removing the need for leading /a and /a/b/c.
- Fully-qualified usage is always possible
x := /a/b/c/calculate(2)
- Shortened usage is also possible thanks to the IMPORT /a/b/c statement
x := calculate(2)
- Intermediate usage is also possible thanks to the IMPORT /a statement
x := b/c/calculate(2)
- IMPORT may also specify a particular declaration
x := calculate(2)
Compiling & Linking
- The compiler will create output files in the build subfolder, creating it if necessary.
- The compiler finds dependencies by considering all the binaries in the build subfolder and all the zipped binaries in the libs subfolder.
- The linker automatically considers all the binaries in the build subfolder and all the zipped binaries in the libs subfolder.
- The name of the build and libs directories can be changed using command-line switches on both compiler and linker.
Initialisation and Inline Declarations
- Let’s apply ‘kiss’; INITIAL does not read well and is verbose.
INITIAL INT x IS 0 :
is replaced by
INT x := 0 :
which has explicit assignment.
- The same syntax applies to fields in record data types (https://www.cs.kent.ac.uk/research/groups/plas/wiki/OEP/175), e.g.
DATA TYPE Point
REAL32 x := 0:
REAL32 y := 0:
- A SEQ is required before sequential groups of statements. In Occam2, all declarations covering the scope of the SEQ must be before it. This is inconvenient for multi-step algorithms. Now, declarations appearing within a SEQ body have a scope that runs until the end (i.e. outdent) of that SEQ. Consider this (contrived) example:
REAL32 x := 49:
REAL32 g := 1.0:
BOOL looping := TRUE:
g := (g + (x / g)) / 2
VAL ok := isGoodEnough(x)
looping := NOT ok
- Compared with the propasal to make SEQ optional (https://www.cs.kent.ac.uk/research/groups/plas/wiki/OEP/145), this suggestion has less impact because there is simply an implied SEQ after each declaration (or group of declarations). Unlike OEP/145, the first SEQ is still required.
- Drop the VALOF keyword and reduce the indentation one level.
- Drop the RESULT keyword and let the last _expression_ be the result.
- The result may be a comma-separated list of expressions (as per Occam2).
- Allow no-argument functions to omit the empty parentheses.
Functions as Types and Values
- It is possible to declare function types via the signature:
TYPE F0 IS T0 FUNC (…):
- No change to function implementation, which is separate.
T0 FUNC y(…) IS … :
- Now we can assign functions to variables (i.e. aliases)
F0 x := y:
T0 a := x(…) -- invokes function x, an alias of y
- Occam disallows the aliasing of variables because of lack of transparency and potential race conditions; aside from mobile variables, pass-by-copy is the norm. Functions references are different: like immutable values, pass-by-reference is a legitimate optimisation.
- We can pass functions as parameters so we can write reusable processes with parameterisable behaviour.
- We can send functions down channels so we can transport evaluation as well as values.
- There is no distinction between normal function references and mobile function references, so the latter are not needed.
- Function references don’t allow access to any free variables. This is the only way that they differ from in-place function definitions. (can we relax this? should we implement closures?)
- Functions still have no side-effects.
Methods are Functions within Data Types
- Functions are allowed inside data types - these are called methods.
- Methods have access to the values of the instance of the data type in which they belong.
- Methods are always free from side-effects (as are all functions), so they cannot change their instance’s data.
DATA TYPE Point
REAL32 x, y:
REAL32 FUNCTION rms IS
sqrt((x * x) + (y * y))
- Methods with operator names are invoked via infix notation.
- Methods with identifier names are invoked using the back-tick ` syntax (like Ada).
- No-argument methods with identifier names can also be invoked using [name] syntax, the same as the fields.
- In addition, fields can be accessed just like methods using the ` syntax.
- Methods are scoped within their enclosing data type, i.e. they do not exist independently.
- Public / private visibility of fields and methods is not considered to be a requirement; everything is public, except when it is contained within the scope of something else (for example, functions are permitted within methods and these inner functions are not visible externally).
The compiler will automatically provide some standard methods - these are called implicit methods because they exist without explicit declaration. We can use the implicit methods to reduce the number of language keywords.
- When `size is applied to an array, it returns the size of the array, exactly equivalent to the old SIZE keyword. SIZE is no longer a keyword. A warning is raised if this method is ever overridden.
- When `hash is applied to any value, it returns a hash integer constructed from the value in a consistent way. A warning is raised if this method is ever overridden.
- When `toString is applied to any value, it returns a string representing the value. All the primitive types will behave default in the obvious way, but record data types will merely have a default representation unless one is provided explicitly.
- Polymorphism is possible because data types can share common interfaces. An interface type is defined simply by a set of method signatures. A value of interface type can hold any value that implements those methods. For example
REAL32 FUNCTION x:
REAL32 FUNCTION y:
REAL32 FUNCTION rms:
- For zero-argument functions, more brevity is allowed:
REAL32 FUNCTION x, y, rms:
- A data type implements an interface simply by having all of its methods. There is no explicit remark to indicate that it does so, but the compiler will reject attempts to use an incompatible data type where an interface is expected. (Go has a similar principle.)
- Every field in a data type is directly equivalent to a ‘getter’ method of the same name with the same return type. (Note therefore that it is problematic to have fields called size or hash due to the name clash with standard implicit methods).
- Consider the Point data type and the PointLike interface above. Point is compatible with (and therefore implements) PointLike because it provides the rms method and it has two fields that look like the x and y methods.
- Point also implements Point2DLike (below), which has fewer methods, but not Point3DLike, because some Point3DLike methods are not implemented in Point:
REAL32 FUNCTION x, y:
-- no rms in this interface
REAL32 FUNCTION x, y, z: -- extra method z
REAL32 FUNCTION rms:
REAL32 FUNCTION theta, phi: -- extra methods
- A data type can, obviously, implement any number of interfaces or none at all.
Implicit Type Conversions
- Type conversions are normally explicit in Occam2; there is often no ambiguity and so this is excessively pedantic. The syntax for explicit casting is to precede the _expression_ of type T1 with required type T2.
- Implicit conversions can occur whenever the source _expression_ and the destination variable have the same underlying representation. For example
TYPE Counter IS INT:
INT x := 1:
Counter c := x: -- implicit conversion of x
- An example of where this is useful is when indexing an array: types based on INT can be used as indexes without ceremony.
- Lossless conversion of data type is possible for every pair of data types that contain the same content (regardless of methods). The following
DATA TYPE Complex32
REAL32 real, imag:
is equivalent to Complex (above) so it is possible to cast between the two types explicitly, and furthermore implicit conversion will happen when it is unambiguous.
- Methods can only be accessed when the type is explicitly known. If an equivalent type has methods that are needed, the value must be cast explicitly before the methods become available. That is, no implicit casting happens before a method is invoked.
- Data types support single inheritance via the new EXTENDS keyword.
- Extra fields can be added but none can be removed.
- Extra methods can be added. Existing methods can be overridden.
DATA TYPE Point3d EXTENDS Point
-- REAL32 x, y are inherited
REAL32 FUNCTION rms IS
… overridden implementation
REAL32 FUNCTION theta IS
REAL32 FUNCTION phi IS
- There is an implicit method `isA(type) predicate on all record data types that allows compatibility testing. Typically this is useful for testing whether a value of a certain interface or base type is also a concrete subtype of interest. It is not for testing whether a data type implements a particular interface because that’s known at compile time. (to be confirmed)
UTF Encoding and Strings
All source code is UTF8 unless specified otherwise by a BOM.
- Therefore, no ISO8859 encodings (inter alia) are supported (‘kiss’)
- A new built-in STRING type will exist containing immutable sequences of characters.
- Literal strings are represented using the new type, not BYTE.
- LIteral strings may include any Unicode character.
- Literal characters are integers large enough to hold any Unicode character (21 bits), so the compiler will actually use INT.
- STRING includes several implicit methods
- `size gives the number of characters (code-points) in the string.
- `at(INT) gives the character at a specified position, represented by INT.
- `from(INT) gives the substring starting at a particular position and running to the end of the string
- `for(INT) gives the substring starting at the beginning and consisting of the specified number of characters if the number is positive or zero. Otherwise gives the substring starting at the end and consisting of - the specified number of characters.
Note that the _expression_ s`from(i)`for(n) is likely to be commonly used.
- `toUppercase returns the same string but converted to uppercase.
- `toLowercase returns the same string but converted to lowercase.
- `toUTF8 returns a BYTE containing the UTF-8 encoded string.
- `toUTF16 returns INT16 containing the UTF-16 encoded string.
- “+”(STRING) returns a new string formed by concatenating the two strings.
- Conversion functions are provided to create strings from BYTE UTF-8 arrays and INT16 UTF-16 arrays.
Sets and Hashtables
A wrapper for arrays of primitive or record types is introduced using SET TYPE.
- This example provides a set of (distinct) INTs.
SET TYPE Distinct IS INT:
- This example provides a set of (distinct) STRINGs.
SET TYPE StringSet IS STRING:
- This example provides a hashtable of strings, indexed by strings.
DATA TYPE KeyValue IS
STRING key, value:
SET TYPE Hashtable IS KeyValue:
- If the type is a record data type, the first item in the record data type is considered to be the key. This would not normally be a BOOL for obvious reasons, so the compiler will raise an error in this case.
- Keys must be unique.
- Indexing is done using the key, optimised by the compiler via a built-in hashtable algorithm. This has the same square bracket [ ] syntax as for arrays, except the value within the brackets must be of the same type as the key instead of an integer.
- Several implicit methods come with sets (here, S is the set type, V is the underlying type and K is the key’s type):
- There is an implicit predicate TYPE IsS IS BOOL FUNC (V value):
- `keys returns a set type containing just the keys.
- `elements returns an array of the underlying type containing all the elements. This is similar to reshaping.
- `isEmpty returns true iff the size is zero.
- `size returns the number of elements in the collection.
- `contains(K key) returns true iff the set includes a given key.
- `subset(IsS predicate) returns those elements for which the predicate returns TRUE.
- `partition(IsS predicate) returns two sets, one containing those elements for which the predicate returns TRUE and the other containing all the other elements.
- ”+”(V value) returns either:
- a new set type containing all the elements in this set plus a new value if the new value’s key is not in this set, or
- a new set type containing of all the values in this set with one value updated if the new value’s key matches an existing key.
- ”/\”(S set) returns a new set of all the elements where only the intersection of keys is retained.
- ”\/”(S set) returns a new set of all the elements where the union of keys is obtained.
- ”-”(S set) returns a new set of all the elements where the difference of keys is obtained.
- ”-”(K key) returns a new set with one element removed, if it was present (otherwise it returns a clone of the original).
- If these methods are applied to a VAL set and the receiver is also VAL, the methods elements, subset, partition, “/\”, “\/” and “-” might return slices of the original without any copying, at the discretion of the compiler.
- This is also true for results that are used for iteration (see below) because the iterator element is effectively a VAL of the underlying type V.
- If the set or the receiver is mutable however, the returned sets need to be deep copies and this has a performance overhead (similar to non-mobile transmission of the data over a channel).
- This is an extension of https://www.cs.kent.ac.uk/research/groups/plas/wiki/OEP/133.
- It is sometimes useful to be able to iterate over an array -- for example, searching for an unused slot. The obvious semantics for this use abbreviation:
SEQ element IN array
out.thing (element, out!)
- The set type equivalent is easy:
SEQ element IN set`elements
out.thing (element, out!)
- No identifiers will be all-uppercase; only reserved words are all-uppercase.
- Types and interfaces will use CamelCase names starting with a capital letter.
- Variables, fields, methods and procedures will use camelCase names starting with a small letter.
- The use of dots in identifiers will be deprecated.
- The standard library will be revised to follow these conventions and to have package names introduced.
The following revisions would make Occam-Tau more like other popular languages, reducing the barriers to adoption. These are breaking changes and are for discussion as options only.
- Identifiers no longer allow a dot character, but will allow underscore instead.
- Consequently, methods could be invoked using dot as a separator instead of back-tick.
- Strings drop * as the special escape and use \ instead.
Google Drive: create, share and keep all of your stuff in one place.