Plasma Development C and C++ Style Guide

Updated: January 2019
Source: on github
Table of Contents

This document describes our C and C++ programming style. While it’s a good idea to conform to the project style, there may be exceptions where departing from the style produces more readable code.

In brief, we use C99 and C++11 (no RTTI or exceptions) on POSIX, lines are no more than 77 columns long, indentation is made with four spaces and curly brackets appear at the end of the opening line except for functions.

General Project Contributing Guide

For general information about contributing to Plasma please see our contributors' documentation.

File organization

Modules and interfaces

We follow a pattern on C to allow us to emulate (poorly) the modules of languages such as Ada and Modula-3.

  • Every .c/.cpp file has a corresponding .h file with the same base name. For example, list.c and list.h. The exceptions are:

    • The alternative interpreters are exceptions, they share the same header pz_interp.h but have different implementations.

    • Each interpreters implementation begins with the same prefix, such as pz_generic_*.c which is the generic interpreter’s files.

    • pz_main.cpp is also an exception, it only exports main() which needs no declaration.

    • Finally pz_gc_layout.h is an exception, it provides the class eclarations for the GC’s layout while other files contain the implemention organised by function. This organisation groups functions with related behaviours which makes more sense than by class.

  • Not all .h files have a corresponding .c/.cpp files.

  • We consider the .c/.cpp file to be the module’s implementation and the .h file to be the module’s interface. We’ll just use the terms ‘source file’ and ‘header’. C++ templates are an exception since their implementation must be in a header file, these headers have special names ending in template.h

  • All items exported from a source file must be declared in the header. Declarations for variables (although rare) must use the extern keyword, otherwise storage for the variable will be allocated in every source file that includes the header containing the variable definition.

  • All items not-exported from a module must be declared to be static.

  • We import a module by including its header. Never give extern or forward declarations for imported functions in source files. Always include the header of the module instead. When C++ classes form cycles, forward declare one of the class names to break the cycle immediately before its use.

  • Each header must include any other headers on which it depends. Hence it’s imperative every header be protected against multiple inclusion. Also, take care to avoid circular dependencies where possible.

  • Always include system headers using the angle brackets syntax, rather than double quotes. That is #include <stdio.h>. Plasma-specific headers should be included using the double quotes syntax. That is #include "pz_run.h" Do not put root-relative or ‘..’-relative directories in #includes.

  • Includes should be organised into 4 groups, separated by a blank line: pz_common.h, system includes, this module’s header file, other Plasma includes. Each group should be sorted alphabetically where possible.

File names

C/C language source and header files should begin with the prefix pz_. The C language does not have a namespace concept, prefixing C symbols with pz_ can make linking, and debugging linked programs easier. In C use the pz namespace.

Organization within a file

Sometimes a file (header or source file) will cover multiple concepts. In these cases the order above may be broken in order to keep things with the same concept together. For example, this may mean placing a struct followed by the functions that operate on it, followed by a global variable, and the functions that operate on it.

In some cases the environment may force a different order. For example C preprocessor macros may need to be placed in a specific order.

Generally items within a file should be organised as follows:

Source files

Items in source files should in general be in this order:

  1. Prologue comment describing the module.

  2. #includes

  3. Any local #defines.

  4. Definitions of any local (that is, file-static) global variables.

  5. Prototypes for any local (that is, file-static) functions.

  6. Definitions of functions.

Within each section, items should generally be listed in top-down order, not bottom-up. That is, if foo() calls bar(), then the definition of foo() should precede the definition of bar().

Header files

Items in headers should in general be in this order: typedefs, structs, unions, enums, extern variable declarations, function prototypes then #defines

Every header should be protected against multiple inclusion using the following idiom:

#ifndef MODULE_H
#define MODULE_H

/* body of module.h */

#endif // ! MODULE_H

Update headers to use the new style comment

File encoding

  • Files should be saved as ascii or UTF-8 and must use unix style (LF) line endings.

  • Lines must not be more than 77 columns long.

  • Indentation is to be made with spaces, usually four spaces.

  • One line of vertical whitespace should usually be used to seperate top-level items and sections within an item. Two lines may be used at the type level to create more separation when desired.

TODO editor hint for vim.

Long lines

If a statement is too long, continue it on the next line indented two levels deeper (but less or more is okay depending on the situation).

Break the line after an operator:

int var = really really long expression +
              more of this expression;

And usually at an outer element if possible, this could be the assignment operator itself.

int var = (expr1 + expr2) *
            (expr3 + expr4);

Sometimes line-breaking can be done nicely by naming a sub-expression, give it a meaningful name:

int sub_expr = some rather complex but separate expression;
int var = foo(a + b, sub_expr);

You may choose to align sub-expressions during breaking. This is recommended when an expression is broken over several lines. Even though name is short we give it its own line because the other expressions are long.

int var = fprintf("%s: %d, %s\n",
                  name,
                  some detailed and rather long expression,
                  a comment);

When things that may need wrapping occur at different depths within an expression then different levels of indentation can help convey that depth:

int var = fprintf("%s: %d, %s\n",
                  name,
                  foo(some detailed and long expression,
                      another detailed and long expression),
                  a comment);

These two sub-expressions are aligned, but they don’t have to be (see Tables below).

Sometimes breaking early can allow you to align things towards the left and give them more room. For example we prefer:

static PZ_Proc_Symbol builtin_setenv = {
    PZ_BUILTIN_C_FUNC,
    { .c_func = builtin_setenv_func },
    false
};

While clang-format prefers:

static PZ_Proc_Symbol builtin_setenv = { PZ_BUILTIN_C_FUNC,
                                         {.c_func = builtin_setenv_func},
                                         false };

Naming conventions

Functions, function-like macros, and variables

Use all lowercase with underscores to separate words. For instance, soul_machine.

Enumeration constants, #define constants, and non-function-like macros

Use all uppercase with underscores to separate words. For instance, MAX_HEADROOM.

TODO: Maybe make function-like macros belong here.

Static data

Static data should begin with s_ for both file-local and class members.

Typedefs

Use first letter uppercase for each word, other letters lowercase and underscores to separate words. For instance, Directory_Entry.

Note
this is rarely used and might become the same as classes and structs.

Structs and unions

If something is both a struct and a typedef, the name for the struct should be formed by appending ‘_S’ to the typedef name. This overrides the style for typedefs above:

typedef struct DirectoryEntry_S {
    ...
} DirectoryEntry;

For unions, append ‘_U’ to the typedef name.

Classes

Classes use CamelCaseWithInitialCap.

Member data

Fields of classes (but not structs) should begin with m_, static data members should begin with s_.

Constexpr functions and variables, and const variables.

These behave differently (better) than C macros. They don’t need to look like C macros. Use _ to seperate words with initial capital letters. Eg: ‘My_Const_Expr’

Portability and Standards

Our minimum requirements from the C and C environment is C99 (may move to C11 in the future) and C11 on a POSIX.1-2008 environment, this may change as dependencies are added in this early stage of development, however those changes should be carefully reviewed, and if possible they should be optional.

Differences between operating systems and the use of a tool like autoconf should be handled by having different configurations available via different Makefiles and header files. We will revisit this when development reaches that stage. Autoconf should be avoided, it brings only pain.

While it’s best to keep things portable, if you need a non-standard API, or an API that’s different on each operating system. You should make it available by a macro or protecting it by #ifdefs.

Data types

C99 provides many basic data types, char, short, int etc. All being defined to be at least a certain size. These should be used when the size doesn’t exactly matter. For example use bool for booleans and int or unsigned when you’re counting a normal amount of something - you should not need to use the macros such as INT_MAX. When size matters the inttypes.h types are strongly recommended, including the fast types, eg: uint_fast32_t and their macros.

float should be used in preference to double which is seldom necessary and uses more memory. Don’t rely on exact IEEE-754 semantics.

Since C99 does not specify the representation of signed values, we will assume 2’s complement arithmetic (we’re not exactly C99 pure).

Endianness and alignment must not be assumed. If laying out a structure manually align each member based on its size.

Operating system specifics

Operating system APIs differ from platform to platform. Although most support standard POSIX calls such as read, write and unlink, you cannot rely on the presence of, for instance, System V shared memory. Adhere to POSIX-supported operating system calls whenever possible since they are widely supported, even by Windows.

The CFLAGS variable in the Makefile will request that modern C compilers fail to compile Plasma if it uses non-POSIX APIs.

CFLAGS=-std=c99 -D_POSIX_C_SOURCE=200809L -Wall -Werror

When POSIX doesn’t provide the required functionality, ensure that the operating system specific calls are localised.

Compiler and C library specifics

We require a C99 compiler. However many compilers often provide non-standard extensions. Ensure that any use of compiler extensions is localised and protected by #ifdefs. Don’t rely on features whose behaviour is undefined according to the C99 standard. For that matter, don’t rely on C arcana even if they are defined. For instance, setjmp/longjmp and ANSI signals often have subtle differences in behaviour between platforms.

If you write threaded code, make sure any non-reentrant code is appropriately protected via mutual exclusion. The biggest cause of non-reentrant (non-thread-safe) code is function or module-static data. Note that some C library functions may be non-reentrant. This may or may not be documented in the man pages.

C++ portability/standards

In addition to sticking to C11 (which is the minimum required for "modern C"). We also forbid use of exceptions and RTTI, they’re unnecessary and add too much magic. You should also be frugal with templates and vtables. You may follow guidelines for "good C" from other sources, I’ve been reading the Essential C series and found it helpful.

Library standards including C/C++ standard library

If you need a feature from a newer version of one of these standards, but we don’t have the need to upgrade our minimum dependencies and the new feature is a change you can easily add as a utility function. Then add it to pz_cxx_future.h/cpp (or create a new future file for other libraries), and indicate in a comment what version of the standard they’re from.

Then when we do update our dependencies we can look in these files to easily find what workarounds we can remove.

This also applies to things that haven’t been added to a standard but might be someday.

Environment specifics

This is one of the most important sections in the coding standard. Here we mention what other tools Plasma may depend on.

Tools required for Plasma

In order to build Plasma you need: * A POSIX (1-2008) system/environment. * A shell compatible with Bourne shell (sh) * GNU make * A C99/C++11 compiler * Mercury 14.01.1 or newer.

Documenting the tools

If further tools or libraries are required, you should add them to the above list. And similarly, if you eliminate dependence on a tool, remove it from the above list.

Syntax

Basic layout (line length, indentation etc) is covered above in File encoding.

General rules

Clang-format has been configured and mostly does the right thing. But often doesn’t. You could check "what would clang-format do" but it is not to be relyed on.

Curly brackets

Curly brackets should be placed at the end of the opening line, and on a new line not-indented at the end:

if (condition) {
    ...
}

Except for functions and classes, which should have the opening curly on a new line.

int
foo(arg)
{
    ...
}

If the opening line is split between multiple lines, such as a long condition in an if-then-else, then place the opening curly on a new line to clearly separate the condition from the body:

if (this_is_a_somewhat_long_conditional_test(
        in_the_condition_of_an +
        if_then))
{
    ...
}

Space between tokens

There should be a space between the statement keywords like if, while, for and return and the next token. The return value should not be parenthesised. There should also be a space around an operator.

There should be no space between the function-like keywords like sizeof and their argument list. There also be no space between a cast and its argument.

Pointer declarations

Place the pointer or reference qualifier between the type and the variable name.

char * str1, * str2;

This avoids confusion that might occur when the pointer qualifier is attached to the type.

char* str1, not_really_a_str;

TODO: find out if the same trap exists for C++ references.

And makes the symbol easier to notice.

Statements

Use one statement per line.

Large control-flow statements

Use an +// end + comment if the if statement, switch or loop is quite large, particularly if there are multiple nested structures. It may be helpful to describe the condition of the branch in this comment.

if (blah) {
    // Use curlies, even when there's only one statement in the block.
    ...
    // Imagine dozens of lines here.
    ...
} // end if

Tiny control-flow structures

An exception to the above rule about always using curlies, is that an if statement may omit the curlies if its body is a single return or goto instruction and is placed on the same line.

file = fopen("file.txt", "r");
if (NULL != file) goto error;

or

file = fopen("file.txt", "r");
if (NULL != file) {
    goto error;
}

but not:

file = fopen("file.txt", "r");
if (NULL != file)
    goto error;

and not:

if (a_condition)
    do_action();

Additionally, if one branch uses curlies then all must use curlies. Do not mix styles such as:

if (a_condition) goto error;
else {
    do_something();
}

And if the condition covers multiple lines, then the body must always appear within curlies (with the opening curly on its own line as noted above).

if (0 == read_proc(file, imported, module, code_bytes,
                   proc->code_offset, &block_offsets[i]))
{
    goto end;
}

Conditions

TODO: Consider removing this rule.

To make clear your intentions, do not rely on the zero / no-zero boolean behaviour of C. This means explicitly comparing a value:

if (NULL != file) goto error

If using the equality operator ==, use a non-lvalue on the left-hand-side if possible. This way the comparison can not be mistaken for an assignment.

if (0 == result) {
    ...
}

Switch statements

Case labels should be indented one level, which will indent the body by two levels.

Switch statements should usually have a default case, even if it just calls abort(). If the switched-on value is an enum, the default may be omitted since the compiler will check that all the possible values are covered.

Fall through switch cases

If a switch case falls through, add a comment to say that this is deliberately intended.

switch (var) {
    case A:
        ...
        break;
    case B:
        ...
        // fall-through
    case C:
        ...
        break;
}

Curlies in cases

If a case requires local variable declarations, place the curlies like this:

    ...
case A: {
    int foo;
    ...
    break;
}
case B:
    ...

Loops

Loops that end in a non-obvious way, such as infinite while loops that use break to end the loop. Should be documented. You’ll need to use judgement about when this is needed.

// Note that the loop will exit when ...
while (true) {
    ...
    if (some condition)
        break;
    ...
}

or

while (everything_is_okay) {
    ...
    if (some condition) {
        // Exit the loop on the next iteration.
        everything_is_okay = false;
    }
    ...
}

Functions

In argument lists, put space after commas. Include parameter names in the declaration as this can aid in documentation.

Unlike other code blocks, the open-curly for a function should be placed on a new line.

int rhododendron(int a, float b, double c)
{
    ...
}

If the parameter list is very long, then you may wish, particularly for long or complex parameter lists, place each parameter on a new line aligning them. Aligning names as in variable definition lists is also suggested but not required.

int rhododendron(int                  a_long_parameter,
                 struct AComplexType* b,
                 double               c)
{
    ...
}

Variables

Variable declarations shouldn’t be flush left, however.

int x = 0,
    y = 3,
    z;
int a[] = {
    1,2,3,4,5
};

When defining multiple variables or structure fields or in some cases function parameters, then lining up their names is recommended. This also applies to structure and union fields.

There should be one line of vertical space between the definition list and the next statement.

char *        some_string;
int           x;
MyStructure * my_struct;

if (...) {

Enums or defines?

Prefer enums to lists of #defines. Note that enums constants are of type int, hence if you want an enumeration of chars or shorts, then you must use lists of #defines.

Preprocessing

Nesting

Nested #ifdefs, #ifndefs and #ifs should be indented by two spaces for each level of nesting. For example:

#ifdef GUAVA
  #ifndef PAPAYA
  #else // PAPAYA
  #endif // PAPAYA
#else // not GUAVA
#endif // not GUAVA

Multi-line macros

When continuing a macro on an new line, line the \ up o the right in the same column.

#define PZ_WRITE_INSTR_1(code, w1, tok)       \
    if (opcode == (code) && width1 == (w1)) { \
        token = (tok);                        \
        goto write_opcode;                    \
    }

Other implementation choices

C++ Class or Struct

If a thing will have methods that act on instances, it is a class and should begin with the "class" keyword, and keep its data members private. Otherwise it is a struct and shell begin with a struct keyword..

Bare Pointers

Bare pointers aren’t "modern C++". However in Plasma’s runtime system they show that the lifetime of the object is handled elsewhere. Either it is known to live a very long time and live in static data or on the C++ heap and destroyed when the program ends. Or it is a GC allocated object and we additionally guarantee that in the time while it’s live (passed around) it’s impossible for a GC to occur (there’s also a NoGCScope present.

TODO: Describe how we root GC pointers within runtime code.

C++ information hiding

C++ exposes implementation details of classes in their declarations as private members. This means that changes to these internal details can cause unnecessary recompilation. On the other hand it allows the compiler to inline functions defined in the class definition that do access private members.

When the latter need is not great it can be good to avoid creating the former problem by hiding these details. There are a few different techniques

The pImpl pattern is done where the class now contains a pointer to a class that contains the actual implementation. This pointer should be a std::shared_ptr and the outer class is expected to be passed by value rather than by reference. While this still allows callers to use object.method() style calls (which then forward), it breaks the normal expectations where "most objects should be passed by reference". Of course you can pass them by reference but doing so creates an extra pointer indirection. Passing by value isn’t great either, causing extra work in the std::shared_ptr to maintain its reference count.

There’s another pattern where an abstract base class contains a virtual public interface and a private derived class containing the actual implementation. We avoid this because we want to avoid vtables when we can.

Therefore the pattern we use in Plasma’s runtime (when we choose to hide implementation details at all) is to forward declare the class, and define it in an implementation file or implementation-only header file. The public interface is defined as non-member forwarding functions. This pattern can be seen in pz_gc.h and pz_gc.impl.h.

Comments

What should be commented

Functions

Use your judgement for whether a function should be commented. Sometimes the function name and parameter names will provide a lot of information. However for more complex functions a comment will be necessary. Comments are strongly recommended when:

  • They have side-effects

  • They require an input to be sorted, non-null or similar.

  • They have different semantics when an input has a different value (they should be separate functions if they do a different function).

  • They allocate memory that the caller is now responsible for.

  • They return statically allocated memory (try to avoid this).

  • They free memory.

  • They return certain values (non-zero, -1 etc) for errors.

  • They ain’t thread safe or reenterant.

Macros

Each non-trivial macro should be documented just as for functions (see above). It is also a good idea to document the types of macro arguments and return values, e.g. by including a function declaration in a comment.

Parameters to macros should be in parentheses.

#define STREQ(s1,s2) (strcmp((s1),(s2)) == 0)

This ensures than when a complex expression is passed as a parameter that different operator precedence does not affect the meaning of the macro.

Headers

Such function comments should be present in header files for each function exported from a source file. Ideally, a client of the module should not have to look at the implementation, only the interface. In C terminology, the header should suffice for working out how an exported function works.

Source files

Every source file should have a prologue comment which includes:

  • Copyright notice.

  • License info

  • Short description of the purpose of the module.

  • Any design information or other details required to understand and maintain the module (may be links to other documents).

Describe the exact format in use and ensure that all the C code conforms to this.

Global variables

Any global variable should be excruciatingly documented. This is especially true when globals are exported from a module. In general, there are very few circumstances that justify use of a global.

Comment style

Block comments.

Use comments of this form:

/*
 * This is a block comment,
 * it uses multiple lines.
 * It should have a blank line before it and it comments the declaration,
 * definition, block or group of statements immediately following it.
 */

For annotations to a single line of code:

i += 3; // Add 3.

Note that the // comment is standard in C99, which we are using. If the comment fits on one line, even if it describes multiple lines, a single line comment is okay:

// Add 3.
i += 3;

However if the comment is important, or the thing it documents is significant. Then use a block comment.

Guidelines for comments

Revisits

Any code that needs to be revisited because it is a temporary hack (or some other expediency) must have a comment of the form:

/*
 * XXX: <reason for revisit>
 *  - <Author name>
 */

The <reason for revisit> should explain the problem in a way that can be understood by developers other than the author of the comment. Also include the author of this comment so that a reader will know who to ask if they need further information.

"TODO" and "Note" are also common revisit labels. Only "XXX" requires the author’s name.

Comments on preprocessor statements

The #ifdef constructs should be commented like so if they extend for more than a few lines of code:

#ifdef SOME_VAR
    ...
#else // ! SOME_VAR
    ...
#endif // ! SOME_VAR

Similarly for #ifndef. Use the GNU convention of comments that indicate whether the variable is true in the #if and #else parts of an #ifdef or #ifndef. For instance:

#ifdef SOME_VAR
#endif // SOME_VAR

#ifdef SOME_VAR
    ...
#else // ! SOME_VAR
    ...
#endif // ! SOME_VAR

#ifndef SOME_VAR
    ...
#else // SOME_VAR
    ...
#endif // SOME_VAR

Using formatting tools

Typing make format will run clang-format-10 on the C/C++ code. It mis-formats quite a few things so we don’t yet use it automatically, or may do on a file-by-file basis some time.

Tables

When code or data is tabular then using a tabular layout makes the most sense. This may be something formatters cannot handle, some will allow you to describe excisions.

We don’t have a good example of this in the code base, however the data in pz_builtin.c could probably be set out in a table. If it were it might look like:

static PZ_Proc_Symbol builtins[] = {
  { PZ_BUILTIN_C_FUNC, {.c_func = builtin_setenv_func}, false },
  { PZ_BUILTIN_C_FUNC, {.c_func = builtin_free_func},   false }
};

Defensive programming

Asserts and debug builds

TODO

Statement macros must be single statements

Macros should either be expressions (they have a value) or statements (they do not), this must always be clear. If necessary make a single statement using a block. The do {} while (0) pattern is not necessary since bodies of if statments may not be macros without their own curly brackets.

#define PZ_WRITE_INSTR_1(code, w1, tok)       \
    if (opcode == (code) && width1 == (w1)) { \
        token = (tok);                        \
        goto write_opcode;                    \
    }

Macros should not evaluate parameters more than once

C expressions may have side-effects, this is okay most of the time but can lead to confusion with macros. A macro can evaluate its parameters more than once. Avoid doing this in your macros, and if you must add a comment explaining that this can happen.

C++ type conversion

It’s very easy for C++ compilers to want to perform type conversions for you. This is frequently done via conversion operators and constructions that take a single argument. The later are easy to provide by mistake, therefore 1-arg constructors should be declared as explicit, which will prevent the compiler from using them automatically.

explicit MyType(const OtherType &other);

When implicit conversion is desired, add a comment to tell anyone reading your code that you didn’t forget, that you want it to be implicit.

// Implicit constructor
Optional(T &other);

C++ copy constructors

C will create implicit copy constructors. These don’t always do the right thing so it is best to either create them explicitly or tell C you don’t want them. The same is true for copy assignment operators.

MyClass(const MyClass &) = delete;
void operator=(const MyClass &) = delete

Tips

  • Limit module exports to the absolute essentials. Make as much static (that is, local) as possible since this keeps interfaces to modules simpler.

  • Use typedefs to make code self-documenting. They are especially useful on structs, unions, and enums. Use them on the struct or union’s forward declaration or header declaration when the definition is provided elsewhere.

Tracing macros

TODO