Plasma Language Design Principles

Copyright (C) Plasma Team
Distributed under CC BY-SA 4.0

Updated: June 2021
Source: on github
Table of Contents

Plasma is designed and implemented with these principals in mind. By documenting this it not only gives us something to refer to but makes decisions more conscious, leading to a more consistent language. The first section (The big ones) is especially important and contains:

  1. Easy reasoning

  2. Familiar syntax and terminology

  3. Cutting-edge concurrency and parallelism

These three are mostly relevant when making big decisions about the language, while the remaining principals are more relevant for smaller choices and implementation details, including development of the tools & ecosystem.

Many of these will be described with anti-examples ("don’t"). I’d prefer to use positive examples of how Plasma avoids these problems, and will try to, however most can only be recognised with these "don’t" examples.

The big ones

These principals are the big ones, they define how we blend declarative and imperative programming among other things, and shoot towards our goal of better concurrent and parallel programming.

Easy reasoning

Declarative programming can make it easier to reason about a program, particularly large programs and at a large scale (the scale of modules, functions and how they interact).

For example a Plasma function’s signature tells you everything you need to know about that function, not only the data types it’ll work with but what data it can access and what resources (eg files, network sockets) it can manipulate. Plasma is a side-effect free language and borrows a lot from other declarative languages including its type system.

This also means that semantics should generally be easy to follow, the language should avoid UB or non-determinism. But we can’t solve non-termination without solving the halting problem and otherwise maintain expressitivity so we’re not going to try. (TODO: explain why exceptions are okay / when they’re okay.)

This benefits humans, who spend more time reading code than writing it, and more effort debugging. A human can read function signatures and know whether their bug may be or won’t be within that function.

It also benefits tools, specifically the compiler. By making effects clear the compiler can perform more aggressive optimisations such as reordering or parallelising code.

Familiar syntax and terminology

There are two aspects to familiarity. One is generally using syntax that’ll be more familiar to a majority of programmers in 2021. We’re assuming people coming to Plasma have at least two years experience programming and they may be "functionally curious". By being familiar where we can it makes learning Plasma easier, and people can spend more of their energy learning the parts of Plasma that are different (usually by necessity).

We make a number of choices about syntax that will be more familiar to most programmers. For example Functions and blocks use curly-brace syntax of C-like languages and the body of a function or block is a series of statements.

Likewise we use terminology and names that are going to be more familiar. What Haskell calls "Functor" we shall call "Mappable". We know this isn’t as accurate as "Functor", but we’re willing to lose some of that accuracy for more familiarity for more people. Documentation will usually explain these kinds of choices, eg: "If you’ve used Haskell you may be used to calling these Functors, which is more accurate". Which also makes it clear to people with that background exactly what they’re looking at.

Sometimes something is familiar to a smaller group of people. Like ADTs, we use the Haskell syntax for ADTs because that’s the syntax that’s familiar to the largest group of people.

Likewise some concepts have no familiar meaning (eg Monad). We carefully weigh whether to include that concept at all. For example monads are very useful so we will support but de-emphasise them. While GADTs are more specialised in their use cases and those cases can also be solved in other ways so we will not support GADTs.

Cutting-edge concurrency and parallelism

One of Plasma’s major goals is a language that does not restrict expression of concurrency or parallelism, and enables automatic parallelisation. And does all of this safely.

Many other language features are designed with this in mind. For example by making loops part of the language (rather than using recursion in a declarative language) programmers will naturally tell the compiler where loops are and this will aid automatic parallelisation. Likewise part of the reason the resource system is granular is to be able to expose more parallelism.

No paradigm is superior in all situations

Both declarative and imperative programming have a lot to offer. We choose language features from both of these groups. Neither one is purely superior.

Language syntax

Basic consistency

C structs, and C++ classes, must be followed by a semicolon. But functions don’t need to be.

Haskell uses square brackets for lists:

  • []

  • [1, 2, 3]

  • [a] (as a type expression)

But it also uses : for the cons operator, and when pattern matching with lists code looks like:

length [] = 0
length (x:xs) = (length xs) + 1

This is inconsistent. Plasma has chosen the Prolog syntax for "cons" ([x | xs]).

There may be "consistent" reasons why C/C++ and Haskell make these choices. Indeed : is an operator in Haskell while [] and [1, 2, 3] aren’t. Likewise struct declarations end in a semicolon in C otherwise the next identifier would be an instance of that struct. Nevertheless this is inconsistent from the point of view of the programmer. We will try to avoid inconsistency, and may need to do this by changing other parts of the language (if Plasma was C we’d avoid conflating structure definitions with definitions of struct instances).

Things should look like what they are / mean what they look like.

The following Mercury code

(
    X = a,
    ...
;
    X = b,
    ...
)

Could be a switch (with either 0 or 1 answers) a nondet disjunction (with any number of answers and hard to predict complexity). The exact meaning of this depends on the instantiation state of X which depends on the surrounding code. You can’t tell by looking how this code will behave.

Also in Mercury a goal such as:

A = foo(B, C)

Could be a test unification (semidet, very fast), a construction (det, with a memory allocation), a deconstruction (det or semidet), or a function call (could do anything, including not terminate).

We will try to avoid these in Plasma. Plasma has no disjunction so the first is not a problem. But the second is currently avoided because data constructors begin with capital letters (this will change, so we may need to revisit this).

We’ve been creating a syntax to concept map we’re trying to avoid overloading symbols (where possible). For example + means addition and concatenation in many languages, but in Plasma (like Haskell and Mercury) ++ means concatenation.

The same thing, should behave the same way in different contexts

What people think of as application or systems languages make this error, and scripting languages get it right, although the difference is hard to notice because it’s so great.

A language like python allows nested functions.

def foo(...):
    x = ...
    def bar(...):
        ... x ...

    return bar

But this is not legal in C and C++, or even a managed language like Java.

This is legal in Plasma (with Plasma’s syntax). We add the additional constraint that nested functions should behave like functions at the top-level, they must behave the same and for example support mutual recursion.

Where this is not true is that other statements are not allowed at the top level, doing so would create problems for module loading order. So functions will have to behave with respect to other statements within functions, and this may make them appear to behave differently. This is unfortunate but better than creating module initialisation order problems.

Make parsing simple, for machines and humans

To simplify parsing, both for machines and humans, all declarations/definitions and many statements can be recognised by their first token. All type definitions begin with the keyword type all functions with func etc. Statements can begin with if, match, return, var or similar, and those that don’t belong to a small set containing only:

  • Assignment

  • Array assignment

  • Call (with effect)

Which can be disambiguated by the first 2 tokens.

We assume that this also makes it easy for humans to recognise the type of each statement, at least provided they find the beginning of a statement which is (by convention, not syntax) at the beginning of a line or on the same line following a {.

This is also related to things being what they look like.

Choose the more restrictive alternative

There are many cases where we are unable to decide what is best for the language, particularly without experience using it in anger. In these cases given two or more choice we should choose the most restrictive. It will be more pleasant later if we change to a less restrictive option, rather than from a less restrictive option to a more restrictive one.

For example resources and higher-order code was a fairly major choice we we’ve picked one of the more restrictive options, and might find we need to relax it later.

Other

Principle of least surprise

This is written about elsewhere online. Given two alternatives, choose the one that surprises people the least (when other factors are equal). You can see that some of the above principles are specific examples of this one.

TODO

  • What are the principals for how we’re writing the Plasma tools?