Plasma Language Reference

Copyright (C) Plasma Team
Distributed under CC BY-SA 4.0

Updated: April 2023
Source: on github
Table of Contents

As the language is under development this is a working draft. Many choices may be described only as bullet points. As the language develops these will be filled out and terms will be clarified.

Lexical analysis and parsing

The "front end" passes of Plasma compilation work as follows:

  • Tokenisation converts a character stream into a token stream.

  • Parsing converts the token stream into an AST.

  • AST→Core transformation converts the AST into the core representation. This phase also performs symbol resolution, converting textual identifiers in the AST into unique references.

Lexical analysis

  • Input files are UTF-8

  • Comments begin with a // and extend to the end of the line, or are delimited by /* and */ and may cover multiple lines. Note that comments ending in **/ aren’t currently supported as they confuse our limited tokeniser.

  • Curly braces for blocks/scoping

  • Whitespace is only significant when it separates two tokens what would otherwise form a single token

  • Statements and declarations are not delimited. The end of a statement can be determined by the statement alone. Therefore: there are no statement terminators or separators (such as semicolons in C) nor significant whitespace (as in Python or Haskell).

  • String constants are surrounded by double quotes and may contain the following escapes. \n \r \t \v \f \b \\. Escaping the double quote character is not currently supported, using character codes is not currently supported. Escaping any other character prints that character as is; this allows \' to work as many programmers may expect, even though it’s not necessary.

Parsing

Plasma’s grammar is given in pieces throughout this document as concepts are introduced. However the top level and some shared definitions are given here. In the grammar definitions I use ( and ) to denote groups and ? + and * to denote optional, one or more, and zero or more.

Plasma := ModuleDecl ToplevelItem*

ToplevelItem := ImportDirective
              | TypeDefinition
              | ResourceDefinition
              | FuncDefinition
              | Pragma

ModuleQualifiers := ( ident . )*
QualifiedIdent := ModuleQualifiers ident

IdentList := ident ( , ident )*
QualifiedIdentList := QualifiedIdent ( , QualifiedIdent )*

A note on case and style.

It is desirable to use case to distinguish symbols in different namespaces that may appear in the same expression. It should never be required since there are scripts that do not have a notion of case. This is the suggested convention:

Suggestion

Notes

Variable

lower_case

Function Name

lower_case

Module Name

UpperCase

Case insensitive

Type Name

UpperCase

Type Variable

lower_case

will use the ' sigil to disambiguate from types

Data constructor

UpperCase

to distinguish construction from function application or variable use.

Field selector

lower_case

Must be the same as function names.

Interface

UpperCase

Instance

lower_case

not first class, but may appear in exp

Resources

lower_case

Note that there may be more symbol namespaces in the future.

The general rationale for these suggestions is that things that are different should look different.

Variables, functions and field selectors

The most common symbols should be in lower case and use _ to separate words are preferred, but not enforced.

Modules, types and constructors

It is useful to visually distinguish these more meta symbols. They’re part of the organisation of the program but not really part of the program.

Interfaces and instances

I’m unsure what’s best here. We may wish to make them distinct so that instanced and module qualification do not overlap.

Types and type variables

Type variables must be distinguished from types. This is because free type variables can appear in type expressions without being introduced and we’d like to distinguish free type variables from misspelt type names. So that Plasma can be used in scripts without lower and upper case we use a sigil. Type variables are always proceeded by a ' (apostrophe) sigil.

A list of values of any type t (but it must be the same type for each element):

List('t)

A list of t, that is values whose type is t, a defined type.

List(t)

Environment

The environment is a concept we will consider for Plasma’s scoping rules. The environment maps symbols to their underlying items (modules, types, functions, variables etc). Even though no environment exists at runtime, and the compile-time structure is an implementation detail of the compiler (pre.env), it is useful to think of scoping in these terms, as it explains most scoping behaviours.

Some languages allow overloading of symbols, usually based on a symbol’s type and sometimes on it’s arity. Plasma does not support any overloading.

Scopes

When a new name is defined it is added to the current environment.

print!(x)     # x does not exist.
var x = "hello"   # x (a variable) is added to the environment.
print!(x)     # We may now refer to x.

When a nested block starts, it creates a new environment based upon the old environment.

var x = "hello"
if (...) {
    print!(x)   # Ok
}

When a nested block ends, the original environment is restored.

if (...) {
    var x = "Hello"
    print!(x)   # Ok
}
print!(x)     # Error

Shadowing

Shadowing refers to a new binding with the same name as an old binding being permitted and dominant in an inner or later scope. Shadowing is not permitted for variables at all. It is permitted for other symbols.

Note
TODO: Decide on rules for a symbol of one type overriding a symbol of another type. For example it should probably be an error for a module import to shadow an interface declaration. But it’s probably okay for a variable to overload a function, unless that function is defined within another function (a closure).

Variables

A variable cannot shadow another variable.

var x = 3
var x = 4   # Error

if (...) {
  var x = 5 # Error
}
Note
We are considering a special syntax to use with variables that allows shadowing.

Other symbols

Symbols other than variables allow shadowing, for example module imports can create shadowing of their contents (types, functions etc). Including when import is used with a wildcard. Therefore we can use a different Set implementation in the inner scope:

import SortedListSet as Set
...
# some code
...
{
  import RBTreeSet as Set
  ...
  # some code using RBTreeSets
  ...
}
...
# back to SortedListSet
...

(Yes, module imports may appear within function bodies and so-on.)

However, a binding that cannot be observed such as:

import SortedListSet as Set
import RBTreeSet as Set

Doesn’t make sense, and the compiler should generate a warning.

TODO: Figure out if context always tells us enough about the role of a symbol that modules do not need to shadow types and constructors. I suspect this is true but I’ll have to define the rest of the language first.

Namespaces

The environment maps names to items. Names might be qualified and if so the qualifier is required to refer to that name. For example.

import Set
my_set1 = Set.new   # Ok
my_set2 = new       # Undefined symbol

TODO: Probably need to create a new keyword to introduce these, the equivalent of var.

Or they can be unqualified

import Set.new
my_set1 = Set.new   # Undefined symbol Set
my_set2 = new       # Ok

The name within the namespace does not need to correspond to the name as it was defined.

import Set.new as new_set
my_set = new_set    # Ok

This applies to all symbols except for variables, which can never be qualified. There is no syntax that would allow a variable to be defined with a qualifier.

Modules

Each file is a module, the file name must match the module name (case insensitive, with - and _ characters stripped). By convention CamelCase is used.

Each module begins with a module declaration.

ModuleDecl := 'module' QualifiedIdent

For example.

module MyModule

Modules may be organised into a heirachy by placing dots between identifiers to create the heirachy. Imagine a set of modules for networking such as:

Net.HTTP
Net.HTTP.Extension   // Some custom extension
Net.Common           // Common code used internally by the Net libraries
Net.FTP
Net.SSH

Each of these will have it’s fully qualified name in the module declaration at the beginning of its file.

Future work:

Exports

Resources, types and functions (all below) can all be exported from a module by placing the export keyword in front of them:

export
resource MyRes from IO

export
type MyMaybe('x) = Nothing
                 | Some(x : 'x)

export
func myFunc() uses IO {
  print!("Hello from a foreign module!\n")
}

Types may additionally be opaque-exported:

export opaque
type Tree('k, 'v) = Empty
                  | Node(
                      k : 'k,
                      v : 'v,
                      l : Tree('k, 'v),
                      r : Tree('k, 'v)
                    )

which exports the type name but not how it is constructed. The above can export a tree without exposing the detail that it is a binary tree and ensuring that all the code for keeping the tree ordered correctly or balanced is in a single module.

An exported thing may only refer to types and resources that are also exported. More precisely. a non-opaque exported thing may only refer, in its declaration but not its body, to types and resources that are either exported.

For example:

resource A from IO

export
resource B from A

export
func foo() uses A {
  ...
}

B and foo cannot be exported because A is not exported.

Likewise:

type A = ...

export opauqe
type B = ...

export
type C = C(a : A)

export
type D = D(b : B)

C cannot be exported because A is not exported. D may be exported because A ls /opaque exported/.

Future work:

Imports

ImportDirective := 'import' QualifiedIdent
                 | 'import' QualifiedIdent as ident

Modules may be imported with an import declaration.

import RBTreeMap
import RBTreeMap as Map

import imports a module. Lines one and two add a module name (RBTreeMap or Map, respectively) to the current environment.

For now all references to symbols from other modules must be module qualified. This is either with the module’s name (e.g: RBTreeMap) or with a renaming of the module (e.g: Map),

A module cannot be used without an import declaration.

Types

The Plasma type system supports:

  • Algebraic types

  • parametric polymorphism (aka generics)

  • Abstract types

  • Other features may be considered for a later version

  • Type variables begin with a ` sigil,

  • By convention type variables a lower case while type names begin with an uppercase letter.

See also Type system design which reflects more up-to-date ideas.

Type expressions refer to types.

TypeExpr := TypeName ( '(' TypeExpr ( ',' TypeExpr )* ')' )?
          | 'func' '(' ( TypeExpr ( ',' TypeExpr )* )? ')' Uses* RetTypes?
          | '\'' TypeVar

# Uses denotes which resources a function may use.
Uses := 'uses' QualifiedIdent
      | 'uses' '(' QualifiedIdentList ')'
      | 'observes' QualifiedIdent
      | 'observes' '(' QualifiedIdentList ')'

RetTypes := '->' TypeExpr
          | '->' '(' TypeExpr ( ',' TypeExpr )* ')'

TypeName := QualifiedIdent
TypeVar := ident

We can define new types using type definitions

TypeDefinition := ('export' 'opaque'?)?
    'type' ident TypeParams? = OrTypeDefn ( '|' OrTypeDefn )*

TypeParams := '(' ( '\'' Ident )* ')'

OrTypeDefn := ConstructorName
            | ConstructorName '(' TypeField ( , TypeField )+ ')'

TypeField := FieldName ':' TypeExpr
           | TypeExpr         # Not supported

ConstructorName := ident
FieldName := ident

TypeParams is a comma separated list of lowercase identifiers.

TypeField will need lookahead, so for now all fields must be named, but the anonymous name (_) is supported.

TODO: We use vertical bars to separate or types. Vertical bars mean "or" and are used in Haskell, but in C commas (for enums) and semicolons (for unions) are used. Which is best? Mercury uses semicolons as these mean "or" in Mercury.

TODO: We use parens around the arguments of constructors, like Mercury, and because fancy brackets aren’t required. However curly braces would be more familiar to C programmers.

Builtin types

How "builtin" these are varies. Ints are completely builtin and handled by the compiler where as a List has some compiler support (for special symbols & no imports required to say "List(t)") but operations may be via library calls.

  • Int

  • Uint

  • Int8, Int16, Int32, Int64

  • Uint8, UInt16, UInt32, UInt64

  • CodePoint (a unicode codepoint)

  • Float (NIY)

  • Array('t)

  • List('t)

  • String (neither a CString or a list of chars).

  • Function types

These types are implemented in the standard library.

  • CString

  • Map('t)

  • Set('t)

  • etc…

User types

User defined types support discriminated unions (here a Map is either a Node or Empty), and generics ('k and 'v are type parameters).

type Map('k, 'v) = Node(
                      m_key   : 'k,
                      m_value : 'v,
                      m_left  : Map('k, 'v),
                      m_right : Map('k, 'v)
                  )
                | Empty

TODO: Syntax will probably change, I don’t like , as a separator, I prefer a terminator, or nothing to match the rest of the language. Curly braces? | is also used as a separator here.

Types may also be defined opaquely, with their details hidden behind module abstraction.

Interfaces

Interfaces are a lot like OCaml modules. They are not like OO classes and only a little bit like Haskell typeclasses.

Interfaces are used to say that some type and/or code behaves in a particular way.

The Ord interface says that values of type Ord.t are totally ordered and provides a generic comparison function for Ord.t.

type CompareResult = LessThan | EqualTo | GreaterThan

interface Ord {
    type t

    func compare(t, t) -> CompareResult
}

t is not a type parameter but Ord itself may be a parameter to another interface, which is what enables t to represent different types in different situations; compare may also represent different functions in different situations.

We can create instances of this interface.

instance ord_int : Ord {
    type t = Int

    func compare(a : Int, b : Int) -> CompareResult {
        if (a < b) {
            LessThan
        } else if (a > b) {
            GreaterThan
        } else {
            EqualTo
        }
    }
}

Note that in this case each member has a definition. This is what makes this an interface instance (plus the different keyword), rather than an (abstract) interface. The importance of this distinction is that interfaces cannot be used by code directly, instances can.

Code can now use this instance.

r = ord_int.compare(3, 4)

Interfaces can also be used as parameter types for other interfaces. Here we define a sorting algorithm interface using an instance (o) of the Ord interface.

interface Sort {
    type t
    func sort(List(t)) -> List(t)
}

instance merge_sort(o : Ord) : Sort {
    type t = o.t
    func sort(l : List(t)) -> List(t) {
        ...
    }
}

merge_sort is an instance, each of its members has a definition, but it cannot be used without passing an argument (an instance of the Ord interface). A list of +Int+s can now be sorted using:

sorted_list = merge_sort(ord_int).sort(unsorted_list)
Note
This example is somewhat contrived, I think it’d be more convenient for sort to take a higher order parameter. But the example is easy to follow.

merge_sort(ord_int) is an instance expression, so is ord_int in the example above. Instance expressions will also allow developers to name and reuse specific interfaces, for example:

instance s = merge_sort(ord_int)
sorted_list = s.sort(unsorted_list)

More powerful expressions may also be added.

Instances can also be made implicit within a context:

implicit_instance merge_sort(ord_int)
sorted_list = sort(unsorted_list)

This is useful when an instance defines one or more operators, it makes using the interface more convenient. Suitable instances for the basic types such as Int are implicitly made available in this way.

Only one implicit instance for the given interface and types may be used at a time.

Resources

ResourceDefinition := 'export'? 'resource' Ident 'from' QualifiedIdent

This defines a new resource. The resource has the given name and is a child resource of the specified resource. SuperRes is the ultimate resource and is already defined, along with it’s child resource such as IO. See Handling effects below.

Code

Functions

FuncDefinition := 'export'? 'func' ident '(' ( Param ( ',' Param )* )? ')'
                      Uses* RetTypes? Block

Param := ident ':' TypeExpr
       | '_' : TypeExpr
       | TypeExpr                (Only in interfaces)

RetTyes := '->' TypeExpr
         | '->' '(' TypeExpr ( ',' TypeExpr )* ')'

Block := '{' BlockThing* Return? '}'

BlockThing := Statement
            | Definition

Uses is defined above in the type declarations section.

TODO: Probably add support for naming return parameters

TODO: Consider adding optional perens to enclose return parameters.

TODO: More expressions and statements

Code is organised into functions.

A function has the following form.

func Name(arg1 : type1, arg2 : type2, ...) -> ret_type1, ret_type2
        Resources?
Block

In the future if the types are omitted from a non-exported function’s argument list the compiler will attempt to infer them. For now all types are required.

TODO: Find a way that return parameters can be named. This will change the behaviour of functions WRT having the value of their last statement.

TODO: What if neither the name or type of a return value is specified?

Resources is optional and may either or both "uses" or "observes" clauses, which are either the uses or observes keywords followed by a list of one or more comma separated resource names.

The special symbol _ can be used as a parameter to ignore any arguments passed in that position, the type is still enforced.

Note that function bodies may contain definitions. Allowing functions to be nested and in the future other definitions may be scoped within function bodies.

If the definition is preceeded by export then the function is made available to other modules.

Statements

Statement := FuncDefinition
           | VarDeclaration
           | Assignment
           | Call
           | MatchStemt

Nested functions

Plasma supports nested functions, which may also be closures.

var greeting = "Hello "
func hi(name : String) -> String {
    return greeting ++ name ++ "\n"
}

print!(hi("Paul"))

Other than being able to close over other values, the only difference is that these functions do not (yet) support mutual recursion (bug #177).

In the future we also intend to support lambda expressions (bug #165) and partial application (bug #164).

Variable declaration

VarDeclaration := 'var' Ident

This syntax declares a variable without giving it a value. It may be given a value with an assignment later. This is useful if a variable is given a value within branches of an if statement but it needs to be visible outside that statement.

For example:

var variable

Declares a new uninitislised variable.

Assignment

Assignment := Pattern ( ',' Pattern )* '=' TupleExpr

Pattern := Number
         | '[' ']'
         | '[' Pattern '|' Pattern ']'
         | 'var' Ident
         | '_'
         | QualifiedIdent ( '(' Pattern ',' ( Pattern ',' )+ ')' )?

The right-hand-side (RHS) of an assignment is a series of expressions separated by commas (a TupleExpr). More than one expression is used when there is more than one pattern on the left-hand-side. Sometimes a single expression is used when that expression’s arity matches the number of patterns (eg: a call that returns multiple values).

Plasma is a single assignment language. Variables have two possible states, uninitialised and initialised (aka assigned). Each variable can only be initialised once along any execution path, and must be initialised on each execution path that falls-through (see Environment).

In an assignment a pattern must be irrefutable (always matches), this means that only the last three syntactic forms of Pattern make sense in an assignment. The first may also be used if we allow refutable patterns in some contexts in the future.

Identifiers in the pattern must be, and are checked in this order: * Data constructors if followed by (. * New variables appearing fresh in the pattern (have the var keyword). * Uninitialised variables declared ahead of the pattern. * Data constructors (constants).

This is sound because (TODO): * The compiler will warn if a programmer shadows a constructor with a new variable. * The compiler will warn (lint level) if the programmer didn’t need to separate variable declaration from initialisation.

Examples:

variable = expr

Initialise a previously declared variable.

var variable = expr

Declares and initialises a new variable (this is preferred where possible).

var var1, var var2 = expr1, expr2

Both variables are declared, var1 is initialised to the value of expr1 and var2 to expr2.

var var1, var var2 = expr

The expression returns two values, var1 takes the first value and var2 the second.

var div, _ = div_and_quot(7, 5)

The wildcard symbol _ matches everything and is used to ignore the some values. The function call returns two values but only the first is captured.

Point(var x, var y) = expr

The expression returns the data constructor Point (irrefutably). Point is deconstructed and x and y are new variables that take the values from the Point.

You may also use the wildcard _ and other constructors (provided they’re irrefutable within a pattern.

var x
Point(x, _) = expr

The first statement declares the variable x and the second statement binds it.

Point(x, _) = expr

x is not a variable in this context and therefore it must be a data constructor, and must be matched irrefutably.

_ = close!(file)

Ignores the result of a function call that affects a resource.

Function call

Call := ExprPart1 '!'? '(' Expr ( , Expr )* ')'

Function calls often return values, however functions that do not return anything can be called as a statement. Such a function only makes sense if effects a resource, and therefore will have a !. However the grammar and semantics allow functions that don’t have an affect (the compiler will almost certainly optimize these away).

function_name!(arg1, arg2)

Calls may also be expressions (see below), as an expression a call might still use or observe some resource. However only one call per statement may observe the same or a related resource, this ensures that effects happen in a clear order.

Return

Return := 'return' TupleExpr
        | 'return'

For example:

# Return one thing
return expr

# Return two things
return expr1, expr2

# Return nothing
return

A function that returns a one or more values must always end in a return statement, or a branching statement that (indirectly) ends in a return statement on each branch.

TODO: This will need to be relaxed for code that aborts.

TODO: Named returns.

Functions that return nothing may optional use a return statement, this can be used to implement early return.

Functions and blocks do not have values. This is deliberate to keep functions and expressions semantically separate. This means that the last statement of a block does not have any special significance as it does in some other languages.

Pattern matching

MatchStmt := 'match' Expr '{' Case+ '}'

Case := Pattern '->' Block

Pattern matching is a statement (as well as an expression). Cases are tried in the order they are written, the compiler should provide a warning if a case will never be executed, or a value is not covered by any cases.

var beer
match (n) {
  0 -> {
    beer = "There's no beer!"
  }
  1 -> {
    beer = "There's only one beer"
  }
  var m -> {
    beer = "There are " ++ show(m) ++ " bottles of beer"
  }
}
print!(beer)

If a variable declared outside the match is assigned by one of the cases (like beer) then it must be assigned by every case (see Environment).

Currently either all cases must have a return statement or none of them. TODO Matches where some return and others do not will be added in the future.

Note that a pattern match can bind a variable declared in the outer scope:

var x
match (...) {
  ...
  x -> { ... }
}

// x is now set.

If-then-else

ITEStmt := 'if' Expr Block 'else' ElsePart
ElsePart := ITEStmt
          | Block
if (expr) {
    statements
} else if (expr) {
    statements
} else {
    statements
}

Note: the parens around the condition are optional.

There may be zero or more else if parts.

Plasma’s single-assignment rules imply that if the "then" part of an if-then-else binds a non-local variable, then there must be an else part that also binds the variable (or does not fall-through). Else branches aren’t required if the then branch does not fall-through or does not bind anything (it may have an effect).

Loops

Note
Not implemented yet.
Note
I’m seeking feedback on this section in particular.
# Loop over both structures in a pairwise way.
for [var x <- xs, var y <- ys] {
    # foo0 and foo form an accumulator starting at 0.  The value of foo
    # becomes the value of foo0 in the next iteration.
    accumulator foo0 foo initial 0

    # The loop body.
    var z = f(x, y)
    foo = foo0 + bar(x)

    # This loop has three outputs.  "list" and "sum" are names of
    # reductions.  Reductions are instances of the reduction
    # interfaces.  They "reduce" the values produced by each iteration
    # into a single value.
    output zs = list of z
    output sum = sum of x
    # foo is not visible outside the loop, an output is required to
    # expose it.  value is a keyword, it is handled specially and
    # simply takes the last value encountered.
    output foo_final = value of foo
}
Note
the accumulator syntax will probably change after the introduction of some kind of state variable notation.

TODO: Introduce a more concise syntax for one-liners and expressions, like list comprehensions (see Generators below).

The loop will iterate over corresponding items from multiple inputs. When they’re not of equal length the loop will stop after the shortest one is exhausted. This decision allows them to be used with a mix of finite and infinite sequences.

Looping over the Cartesian combination of all items should also be supported (syntax not yet defined, maybe use &). This is equivalent to using nested loops in many other languages.

Valid input structures are: lists, arrays and sequences. Sequences are coroutines and therefore can be used to iterate over the keys and values of a dictionary, or generate a list of numbers.

TODO: Possibly allow this to work on keys and values in dictionaries. If the keys are unmodified during the loop then the output dictionary can be rebuilt more easily, its structure doesn’t need to change. Lua has the ability to require keys to be sorted, or to drop this requirement.

The output declarations include a reduction. This is how the loop should build the result.

TODO: Reduction isn’t a good word for it, since the output type can be either a scalar or a vector.

The reduction can be completely different from the type of any of the inputs. This builds an array from a list (or other ADT). This uses the array reduction.

for [var x <- xs] {
    var y = f(x)
    output ys = array of y
}

Many reductions will be possible: array, list, sequence, min, max, sum, product, concat_list. Developers will be able to create their own as these are interfaces.

Loops are implemented in terms of coroutines. Coroutines return the values for the inputs and the loop body and coroutines handle building the value of the outputs (list and sum are coroutines above). Coroutines offer the most flexibility as some of their state is kept on the stack.

Simpler implementations should be used as an optimisation when it is possible. In these cases some loops may be optimised to calls to map or foldl, or even simpler inline code.

Auto-parallelisation (a future goal) will work better with reductions that are known to be either:

  • Order independent

  • Associative / commutative, but whose input type is the same as the output

  • Mergable, with a known identity value.

Accumulators are implemented more directly (not coroutines). However they require the iterations to be processed in a specific order and may inhibit parallelisation. A dependency analysis on the body and separating out the code for each accumulator may mitigate this, especially if it can be combined with the same analyses as reductions above.

TODO: Consider allowing for loops as expressions, maybe a simplified case. This will be similar to a list comprehension.

Note that Plasma’s for loops may be similar to some language’s query syntax like LINQ. TODO: Look there for other ideas.

TODO: skip statements. A skip statement is like the opposite of the where part in some language’s list comprehensions, but perhaps more flexible like C’s continue statement. Technically if we can support this then we can also support break, but I don’t like it because it doesn’t encourage a preferred style. Furthermore if we go this far it’s a simple step to use any generator (below) with break to create something as general as a while loop. It may even look very similar to a while loop with the right sugar.

TODO: Consider also the "scan" or "search" loop pattern, where once we find what we’re looking for we break, maybe potentially removing the item from a collection?

Is filter part of the scan pattern or the fold pattern?

Expressions

Expressions are broken into two parts. This allows us to parse call expressions properly, with the correct precedence and without a left recursive grammar. Binary operators are described as a left recursive grammar, but are not implemented this way, their precedence rules are documented below.

TupleExpr := Expr ( ',' Expr )*

Expr := 'match' Expr '{' (Pattern '->' TupleExpr)+ '}'
      | 'if' Expr 'then' TupleExpr 'else' TupleExpr
      | Expr BinOp Expr
      | UOp Expr
      | ExprPart1 '!'? '(' Expr ( , Expr )* ')'         % A call or
                                                        % construction
      | ExprPart1 '[' '-'? Expr ( '..' '-'? Expr )? ']' % array access
      | ExprPart1

ExprPart1 := '(' Expr ')'
           | '[' ListExpr ']'
           | '[:' TupleExpr? ':]'       # An array
           | QualifiedIdent             # A value
           | Const                      # A constant value

BinOp := '+'
       | '-'
       | '*'
       | '/'
       | '%'
       | '++'
       | '>'
       | '<'
       | '>='
       | '<='
       | '=='
       | '!='
       | 'and'
       | 'or'

UOp := '-'      # Minus
     | 'not'    # Logical negation

UOp operators have higher precedence than BinOp, BinOp precidence is as follows, group 1: * / %, group 2: + - group 3: < > ⇐ >= == !=, group: 4-7: and or ++ , See [precedence].

Lists have the following syntax (within square brackets)

ListExpr := e
          | Expr ( ',' Expr )* ( '|' Expr )?

Examples of lists are:

# The empty list
[]

# A cons cell
[ head | tail ]

# A list 1, 2, and 3 are "consed" onto the empty list.
[ 1, 2, 3 ]

# Consing multiple items at once onto a list.
[ 1, 2, 3 | list ]

Arrays elements may be access by subscripting the array. Eg a[3] will retrieve the 3rd element (1-based). A dash before the subscript expression will count backwards from the end of the array, a[-2] is the second last element. This syntax currently clashes with unary minus and so is currently unimplemented. Array slices will use the .. token and are also unimplemented.

TODO: Arrays may also be typed and subscripted to work with a particular enum (See the Ada programming language). This should include a different range (maybe dynamic) than the full enum’s range.

Any control-flow statement is also an expression.

x = if (...) then expr else expr

Or

x = case (expr) {
  Leaf(var k, var v) = ...
  Node(var l, var k, var v, var r) = ...
}

Streams/Generators

TODO: This whole section

Plasma will support coroutines that can be used for generators. It may be useful for list/array comprehensions if we choose to add those. But also to support loops above. Some things to support with generators are:

  • Generate a sequence from the items of an enum type (see Ada).

  • Generate a sequence from integers / floats with some step, (a special case of enums)

  • Generate a shortened sequence, 7..21 or Monday..Friday

  • Use guards to select which items are included.

  • Create generators from for loops, enabling the use of an accumulator. This will make them almost like list comprehensions except as statements.

  • Consider syntax sugar when the generator is a function:

var array = [\x -> case x of
                Sat..Sun  -> 2
                _         -> 8
             for x <- enum(Days)]

// could be:
var array = [Sat..Sun  -> 2
             others    -> 8]

This hides both the lambda expression and the pattern match. Note that lambda or generator syntax is not specified yet so this may be different in reality, or not implemented at all.

Pragmas

Pragma := 'pragma' Ident '(' PragmaArgs? ')'

PragmaArgs := PragmaArg (',' PragmaArg)*

PragmaArg = String

Pragmas provide a way for the programmer to communicate something "out of bound" to the compiler/other tools. This is usually not something that’s part of the program’s meaning, but how it should be interpreted. For example what library to include or how to compile something. They take the form above with the identifier being the name of the pragma which the compiler or other tools will use as a first step to interpret the pragma.

Pragmas will generally have the form:

pragma Verb(Noun0, Noun1 ... NounN)

There may be any number of nouns including zero. A Plasma implementation will define what verbs are meaningful and which nouns (if any) are meaningful for that verb. A Plasma implementation should issue a warning but must not issue an error (except for a warnings-as-errors mode) for a pragma it doesn’t understand.

This Plasma implementation understand the following pragma:

pragma foreign_include(String)

Which says that the foreign code elsewhere in this file requires the foreign (C language) header file named by the string literal.

Ideas

These are just ideas at this stage, they are probably bad ideas.

If a multi-return expression is used as a sub-expression in another context then that expression is in-turn duplicated.

var x, y = multi_value_expr + 3

is

var x0, y0 = multi_value_expr
var x = x0 + 3
var y = y0 + 3

Therefore calls involved in these expressions must not "use resources".

Another idea to consider is that a multiple return expression in the context of function application applies as many arguments as values it returns. We probably won’t do this.

... = bar(foo(), z);

Is the same as

var x, y = foo();
... = bar(x, y, z);

Handling effects (IO, destructive update)

Plasma is a pure language, we need a way to handle effects like IO and destructive update. This is called resources. A function call that uses a resource (such as print()), may only be called from functions that declare that they use a resource. This means that a callee cannot use a resource that a caller doesn’t expect (resource usage is transitive) and anyone looking at a functions' signature can tell that it might use a resource.

A resource usage declaration looks like:

func main() -> Int uses IO

Here main() declares that it uses (technically may use) the IO resource. Resources can be either used or observed; and a function may use or observe any number of resources (decided statically). An observed resource may be read but is never updated, a used resource may be read or updated. This distinction allows two observations of a resource to commute (code may be re-arranged during optimisation), but two uses of a resource may not commute.

Developers may declare new resources, the standard library will provide some resources including the IO resource. Examples of IO 's children might be Filesystem and Time, Filesystem might have children for open files (WIP), although none of these have been decided / implemented.

A call is valid if:

Callee is Pure

Callee may Observe

Callee may Use

Caller is Pure

Y

N

N

Caller may Observe

Y

Y

N

Caller may Use

Y

Y

Y

You’ll find that this is very intuitive. It’s shown in a table for completeness.

Resource hierarchy

Resources form a hierarchy (not yet defined). For a call to be valid either the resource, or its parent must be available in the caller. For example if mkdir() uses the Filesystem resource, which is a child of IO then any caller that uses IO can call mkdir().

Temporary resources (NIY)

Some resources can be creating and destroyed, and rather than being a part of their parent always (Filesystem is always a part of IO) they are subsumed by their parent instead. For example an array uses some memory as its resource, that memory is allocated and freed when the array is initialised and then goes out of scope (it is unique). But if that the memory resource is created and destroyed within the same function, it’s caller does not need the uses declaration, memory and possibly some other resources are special cases.

Resources in statements

Every call that uses a resource must have the ! suffix. For example:

    print!("Hello world\n")

This makes it clear to anyone reading the code to beware something happens, changes or might be observed to have happened or have changed. This is also the entire reason to have it in the language, it serves no other function, but the compiler will make sure that it is present on every call that either uses or observes something.

Multiple calls with ! may be used in the same statement, provided that their resources do not overlap, or they are all observing the resource and not modifying it. (Note that we are debating) this at the moment).

Commutativity of resources

Optimisation may cause code to be executed in a different order than written. The following reorderings of two related (ancestor/descendant) resources are legal.

None

Observe

Use

None

Y

Y

Y

Observe

Y

Y

N

Use

Y

N

N

Non-related resources may be reordered freely.

Higher order code

This aspect of Plasma is under consideration and may change in the future. The concerns are:

  • Higher order functions need to handle resources, otherwise their usefulness is reduced.

  • Resource usage from such code needs to be safe (WRT order of operations).

  • We want to encourage polymorphism here, otherwise people will write higher-order abstractions that can’t be used with resources.

  • We’d prefer to make code concise that isn’t intended to be used with resources, but ought to be resource-capable anyway.

Current behaviour (WIP)

Higher order values may have uses/observes declarations (added to their type) values without such declarations are pure. All higher order calls have the usual ! sigil and the statement rules apply.

Map over list looks like:

func map(f : 'a -> 'b uses r, l : List('a)) -> List('b) uses r {
    switch (l) {
      case []       -> {
        return []
      }
      case [var x0 | var xs0] -> {
        var x = f!(x0)
        var xs = map!(f, xs0)
        return [x | xs]
      }
    }
}

Note that the calls to f and map must be in separate statements.

This has the disadvantage that it is not as concise, and that people who aren’t planning to use resources, won’t write resource-capable code, if that code is in a library it may be annoying to modify if it needs to be used with a resource later.

Note
This is almost implemented, polymorphic resources are not yet implemented.

Other proposals

There are several other ideas and their combinations that may help.

  • All higher order code implicitly uses resources, a function like map therefore also uses that resource since it contains such calls. When a higher order value doesn’t mention resources it is implied to use some polymorphic resource set. To say that no resources are involved and ordering is not important the pure keyword may be used in place of a uses or observes clause. Type inference may help make this easier.

  • Require all higher-order code to handle resources, users may feel that the compiler is being overly-pedantic.

  • Higher order calls are exempt from the one-resource-per-statement rule. Making the code more concise (it still includes a !).

    • Either expressions have a well-ordered declarative semantics or

    • resources must be declared as don’t-care ordering so they can be placed in the same statements.

Linking to and storing as data (NIY)

Linking a resource with a real piece of data, such as a file descriptor, is highly desirable. Likewise putting such data inside a structure to be used later, such as a pool of warmed-up database connections, will be necessary.

There are a couple of ideas. We could add information to the types to say that they are resources and what their parent resource type is. So that the variable can stand-in for the resource.

type Fd =
    resource from Filesystem

func write(Fd, ...) uses Fd

Builtins

These builtin operations are always available, they don’t need to be imported from a module.

type Maybe('v)

A maybe type, defined as:

type Maybe('v) = Some('v)
               | None

Integers

type Int

A signed 2’s compliment integer, its width is at least 32 bits and implementation defined.

Int + Int -> Int

Addition (also func Builtin.int_add(a : Int, b : Int) -> Int)

Int - Int -> Int

Subtraction (also func Builtin.int_sub(a : Int, b : Int) -> Int)

Int * Int -> Int

Multiplication (also func Builtin.int_mul(a : Int, b : Int) -> Int)

Int / Int -> Int

Division (also func Builtin.int_div(a : Int, b : Int) -> Int)

Int % Int -> Int

Modulo/Remainder (#378, which one?) (also func Builtin.int_mod(a : Int, b : Int) -> Int)

- Int -> Int

Unary minus (prefix operator, takes only one argument). (also func Builtin.int_minus(a : Int) -> Int)

func Builtin.int_lshift(a : Int, b : Int) -> Int

Left shift a by b bits.

func Builtin.int_rshift(a : Int, b : Int) -> Int

Right shift a by b bits.

func Builtin.int_and(a : Int, b : Int) -> Int

Bitwise and.

func Builtin.int_or(a : Int, b : Int) -> Int

Bitwise or.

func Builtin.int_xor(a : Int, b : Int) -> Int

Bitwise exclusive-or.

func Builtin.int_comp(Int) -> Int

One’s compliment (flip all the bits).

func int_to_string(Int) -> String

Return a string representation of the number. No nice formatting is attempted.

Future work:

  • TODO: Use interfaces to provide many of these operations to a group of types. Eg many integer operations will apply to all numbers. Same with the relational operators below.

  • TODO: Once there is an Int module move the builtin Int functions to it.

  • TODO: The bitwise functions should be for sized integers only.

Bools

type Bool

Is defined as:

type Bool = False
          | True
Int > Int -> Bool

Greater than (also func Builtin.int_gt(a : Int, b : Int) -> Int)

Int < Int -> Bool

Lesser than (also func Builtin.int_lt(a : Int, b : Int) -> Int)

Int >= Int -> Bool

Greater than or equal to (also func Builtin.int_gteq(a : Int, b : Int) -> Int)

Int <= Int -> Bool

Less than or equal to (also func Builtin.int_lteq(a : Int, b : Int) -> Int)

Int == Int -> Bool

Equal (also func Builtin.int_eq(a : Int, b : Int) -> Int)

Int != Int -> Bool

Not-equal (also func Builtin.int_neq(a : Int, b : Int) -> Int)

Bool and Bool -> Bool

Logical add (also func Builtin.bool_and(a : Bool, b : Bool) -> Bool)

Bool or Bool -> Bool

Logical or (also func Builtin.bool_or(a : Bool, b : Bool) -> Bool)

not Bool -> Bool

Unary not (prefix operator, takes only one argument) (also func Builtin.bool_not(a : Bool) -> Bool)

func bool_to_string(Bool) -> String

Return one of the strings "True" or "False".

Strings

type String

A character string.

type CodePoint

A Unicode Codepoint (see https://unicode.org/glossary/#code_point).

type CodepointCategory

The general category of a Unicode codepoint (see https://unicode.org/glossary/#general_category). Is defined as:

type CodepointCategory = Whitespace
                       | Other
type StringPos

A position within a string, a StringPos always points to the edge between characters, or before the first character or after the last. This makes substring operations clearer.

String ++ String -> String

String concatenation (also func Builtin.string_concat(a : String, b : String) -> String).

func codepoint_category(CodePoint) -> CodepointCategory

Return the class of a character.

func codepoint_to_string(CodePoint) -> String

Return a string containing only this codepoint.

func codepoint_to_number(CodePoint) -> Int

Return the codepoint number for this codepoint.

func Builtin.int_to_codepoint(Int) -> CodePoint

Make a codepoint from this integer.

func string_begin(String) -> StringPos

Return a StringPos of before the beginning of the string.

func string_end(String) -> StringPos

Return a StringPos of after the end of the string.

func string_substring(StringPos, StringPos) -> String

Return the string between the two StringPos parameters. The two parameters must have been created from the same string (runtime checked).

func string_equals(String, String) -> Bool

Return True if the strings are equal.

func strpos_forward(StringPos) -> StringPos

Return a StringPos moved one character forward. The StringPos must not be at the end of the string (Runtime check).

func strpos_backward(StringPos) -> StringPos

Return a StringPos moved one character backward. The StringPos must not be at the beginning of the string (Runtime check).

func strpos_next(StringPos) -> Maybe(CodePoint)

Return the next char in the string after StringPos. If StringPos is at the end of the string then None is returned.

func strpos_prev(StringPos) -> Maybe(CodePoint)

Return the previous char in the string before StringPos. If StringPos is at the beginning of the string then None is returned.

Lists

type List('t) = ['t | List('t)]
              | []

The list data type. This is recursively defined to either be a single element (the head) appended on to another list (the tail); or the empty list. The fields of the concatentation constructor (head and tail) have names internal to the compiler, TODO: maybe expose them in a list module if fields are permitted to be used as functions in the future.

[]

The empty list. (also func Builtin.list_nil() -> List('t)).

[x | xs]

x appended to the front of xs. (also func BUiltin.list_cons('t, List('t)) -> List('t)).

Misc

resource IO

The uber-resource, it covers all potential effects.

resource Time from IO

A resource to query the current time with gettimeofday.

resource Environment from IO

A resource to set the environment with setenv.

type IOResult('t)

A result type for many IO operations that may return end-of-file.

type IOResult('t) = Ok('t)
                  | EOF
func print(String) uses IO

Write the string to standard out.

func readline() uses IO -> IOResult(String)

Read a line from standard in, the newline character is not returned. Aborts the program (TODO) on error.

func Builtin.set_parameter(String, Int) uses IO -> Bool

Set a parameter for the runtime system. There are currently no settable parameters so this always returns false.

func Builtin.get_parameter(String) observes IO -> Bool, Int

Get a parameter, if the parameter exists returns True, value, otherwise returns False, 0. The parameters are:

heap_usage

The used memory in the heap.

heap_collections

The number of garbage collections that have occurred so far.

func setenv(String, String) uses Environment -> Bool

Calls setenv on POSIX.

func Builtin.gettimeofday() observes Time -> Bool, Int, Int

Calls gettimeofday and returns True, secs, usecs on success.

func Builtin.die(String)

Abort the program with the given message.

A lot of these (eg die, setenv) exist for testing and will likely change or be part of a different module in the future.

Operator precedence

Table 1. Operator precedence
Operator Level

*

1

Associates most tightly

\

1

Associates most tightly

%

1

Associates most tightly

+

2

-

2

<

3

>

3

<=

3

>=

3

==

3

!=

3

and

4

or

5

++

6

Associates least tightly

Operators within the same level bind left-to-right, For example:

1 * 2 / 3 is (1 * 2) / 3