Plasma Development C Style Guide

Paul Bone
<pbone@plasmalang.org>
version 6, 24 Apr 2018
Table of Contents

This document describes our C programming style. While it’s a good idea to conform to the project style, there may be exceptions where departing from the style produces more readable code.

In brief, we use C99 on POSIX, lines are no more than 77 columns long, indentation is made with four spaces and curly brackets appear at the end of the opening line except for functions.

General Project Contributing Guide

For general information about contributing to Plasma please see CONTRIBUTING.md in the project’s root directory.

File organization

Modules and interfaces

We impose a discipline on C to allow us to emulate (poorly) the modules of languages such as Ada and Modula-3.

  • Every .c file has a corresponding .h file with the same base name. For example, list.c and list.h. The exceptions are:

    • pz_run_*.c which all share pz_run.h as alternative implementations and

    • pz_main.c which only exports main() which needs no declaration.

  • Not all .h files have a corresponding .c file.

  • We consider the .c file to be the module’s implementation and the .h file to be the module’s interface. We’ll just use the terms ‘source file’ and ‘header’.

  • All items exported from a source file must be declared in the header. Declarations for variables (although rare) must use the extern keyword, otherwise storage for the variable will be allocated in every source file that includes the header containing the variable definition.

  • All items not-exported from a module must be declared to be static.

  • We import a module by including its header. Never give extern or forward declarations for imported functions in source files. Always include the header of the module instead.

  • Each header must include any other headers on which it depends. Hence it’s imperative every header be protected against multiple inclusion. Also, take care to avoid circular dependencies.

  • Always include system headers using the angle brackets syntax, rather than double quotes. That is #include <stdio.h>. Plasma-specific headers should be included using the double quotes syntax. That is #include "pz_run.h" Do not put root-relative or ‘..’-relative directories in #includes.

  • Includes should be organised into 3 groups, separated by a blank line: system includes, pz_common.h, other Plasma includes. Each group should be sorted alphabetically where possible.

File names

C language source and header files should begin with the prefix pz_. The C language does not have a namespace concept, keeping to our own namespace can make linking, and debugging linked programs, easier.

Organization within a file

Sometimes a file (header or source file) will cover multiple concepts. In these cases the order above may be broken in order to keep things with the same concept together. For example, this may mean placing a struct followed by the functions that operate on it, followed by a global variable, and the functions that operate on it.

In some cases the environment may force a different order. For example C preprocessor macros may need to be placed in a specific order.

Generally items within a file should be organised as follows:

Source files

Items in source files should in general be in this order:

  1. Prologue comment describing the module.

  2. #includes

  3. Any local #defines.

  4. Definitions of any local (that is, file-static) global variables.

  5. Prototypes for any local (that is, file-static) functions.

  6. Definitions of functions.

Within each section, items should generally be listed in top-down order, not bottom-up. That is, if foo() calls bar(), then the definition of foo() should precede the definition of bar().

Header files

Items in headers should in general be in this order: typedefs, structs, unions, enums, extern variable declarations, function prototypes then #defines

Every header should be protected against multiple inclusion using the following idiom:

#ifndef MODULE_H
#define MODULE_H

/* body of module.h */

#endif // ! MODULE_H

Update headers to use the new style comment

File encoding

  • Files should be saved as ascii, or UTF-8 and must use unix style (LF) line endings.

  • Lines must not be more than 77 columns long.

  • Indentation is to be made with spaces, usually four spaces.

  • One line of vertical whitespace should usually be used to seperate top-level items and sections within an item. Two lines may be used at the type level to create more separation when desired.

TODO editor hint for vim.

Long lines

If a statement is too long, continue it on the next line indented two levels deeper (but less or more is okay depending on the situation).

Break the line after an operator:

int var = really really long expression +
        more of this expression;

And usually at an outer element if possible, this could be the assignment operator itself.

int var = (expr1 + expr2) *
        (expr3 + expr4);

Sometimes line-breaking can be done nicely by naming a sub-expression, give it a meaningful name:

int sub_expr = some rather complex but separate expression;
int var = foo(a + b, sub_expr);

You may choose to align sub-expressions during breaking. This is recommended when an expression is broken over several lines. Even though name is short we give it its own line because the other expressions are long.

int var = fprintf("%s: %d, %s\n",
                  name,
                  some detailed and rather long expression,
                  a comment);

When things that may need wrapping occur at different depths within an expression then different levels of indentation can help convey that depth:

int var = fprintf("%s: %d, %s\n",
                  name,
                  foo(some detailed and long expression,
                      another detailed and long expression),
                  a comment);

These two sub-expressions are aligned, but they don’t have to be (see Tables below).

Sometimes breaking early can allow you to align things towards the left and give them more room. For example we prefer:

static PZ_Proc_Symbol builtin_setenv = {
    PZ_BUILTIN_C_FUNC,
    { .c_func = builtin_setenv_func },
    false
};

While clang-format prefers:

static PZ_Proc_Symbol builtin_setenv = { PZ_BUILTIN_C_FUNC,
                                         {.c_func = builtin_setenv_func},
                                         false };

Naming conventions

Functions, function-like macros, and variables

Use all lowercase with underscores to separate words. For instance, soul_machine.

Enumeration constants, #define constants, and non-function-like macros

Use all uppercase with underscores to separate words. For instance, MAX_HEADROOM.

Typedefs

Use first letter uppercase for each word, other letters lowercase and underscores to separate words. For instance, Directory_Entry.

Structs and unions

If something is both a struct and a typedef, the name for the struct should be formed by appending ‘_Struct’ to the typedef name:

typedef struct Directory_Entry_Struct {
    ...
} DirectoryEntry;

For unions, append ‘_Union’ to the typedef name.

Portability and Standards

Our minimum requirements from the C environment are C99 (may move to C11 in the future) on a POSIX.1-2008 environment, this may change as dependencies are added in this early stage of development, however those changes should be carefully reviewed, and if possible they should be optional.

Differences between operating systems and the use of a tool like autoconf should be handled by having different configurations available via different Makefiles and header files. We will revisit this when development reaches that stage. Autoconf should be avoided, it brings only pain.

While it’s best to keep things portable, if you need a non-standard API, or an API that’s different on each operating system. You should make it available by a macro or protecting it by #ifdefs.

Data types

C99 provides many basic data types, char, short, int etc. All being defined to be at least a certain size. These should be used when the size doesn’t exactly matter. For example use bool for booleans and int or unsigned when you’re counting a normal amount of something - you should not need to use the macros such as INT_MAX. When size matters the inttypes.h types are strongly recommended, including the fast types, eg: uint_fast32_t and their macros.

float should be used in preference to double which is seldom necessary and uses more memory. Don’t rely on exact IEEE-754 semantics.

Where C99 does not specify the representation of signed values, we will assume 2’s complement arithmetic (we’re not exactly C99 pure).

Endianness and alignment may not be assumed. If laying out a structure manually align each member based on its size.

Operating system specifics

Operating system APIs differ from platform to platform. Although most support standard POSIX calls such as read, write and unlink, you cannot rely on the presence of, for instance, System V shared memory. Adhere to POSIX-supported operating system calls whenever possible since they are widely supported, even by Windows.

The CFLAGS variable in the Makefile will request that modern C compilers fail to compile Plasma if it uses non-POSIX APIs.

CFLAGS=-std=c99 -D_POSIX_C_SOURCE=200809L -Wall -Werror

When POSIX doesn’t provide the required functionality, ensure that the operating system specific calls are localised.

Compiler and C library specifics

We require a C99 compiler. However many compilers often provide non-standard extensions. Ensure that any use of compiler extensions is localised and protected by #ifdefs. Don’t rely on features whose behaviour is undefined according to the C99 standard. For that matter, don’t rely on C arcana even if they are defined. For instance, setjmp/longjmp and ANSI signals often have subtle differences in behaviour between platforms.

If you write threaded code, make sure any non-reentrant code is appropriately protected via mutual exclusion. The biggest cause of non-reentrant (non-thread-safe) code is function or module-static data. Note that some C library functions may be non-reentrant. This may or may not be documented in the man pages.

Environment specifics

This is one of the most important sections in the coding standard. Here we mention what other tools Plasma may depend on.

Tools required for Plasma

In order to build Plasma you need: * A POSIX (1-2008) system/environment. * A shell compatible with Bourne shell (sh) * GNU make * A C99 compiler * Mercury 14.01.1 or newer.

Documenting the tools

If further tools or libraries are required, you should add them to the above list. And similarly, if you eliminate dependence on a tool, remove it from the above list.

Syntax

Basic layout (line length, indentation etc) is covered above in File encoding.

General rules

Curly brackets

Curly brackets should be placed at the end of the opening line, and on a new line not-indented at the end:

if (condition) {
    ...
}

Except for functions, which should have the opening curly on a new line.

int
foo(arg)
{
    ...
}

If the opening line is split between multiple lines, such as a long condition in an if-then-else, then place the opening curly on a new line to clearly separate the condition from the body:

if (this_is_a_somewhat_long_conditional_test(
        in_the_condition_of_an +
        if_then))
{
    ...
}

Space between tokens

There should be a space between the statement keywords like if, while, for and return and the next token. The return value should not be parenthesised. There should also be a space around an operator.

There should be no space between the function-like keywords like sizeof and their argument list. There also be no space between a cast and its argument.

Pointer declarations

Attach the pointer qualifier to the variable name.

char *str1, *str2;

This avoids confusion that might occur when the pointer qualifier is attached to the type.

char* str1, not_really_a_str;

Statements

Use one statement per line.

Large control-flow statements

Use an +// end + comment if the if statement, switch or loop is quite large, particularly if there are multiple nested structures. It may be helpful to describe the condition of the branch in this comment.

if (blah) {
    // Use curlies, even when there's only one statement in the block.
    ...
    // Imagine dozens of lines here.
    ...
} // end if

Tiny control-flow structures

An exception to the above rule about always using curlies, is that an if statement may omit the curlies if its body is a single return or goto instruction and is placed on the same line.

file = fopen("file.txt", "r");
if (NULL != file) goto error;

or

file = fopen("file.txt", "r");
if (NULL != file) {
    goto error;
}

but not:

file = fopen("file.txt", "r");
if (NULL != file)
    goto error;

and not:

if (a_condition)
    do_action();

Additionally, if one branch uses curlies then all must use curlies. Do not mix styles such as:

if (a_condition) goto error;
else {
    do_something();
}

And if the condition covers multiple lines, then the body must always appear within curlies (with the opening curly on its own line as noted above).

if (0 == read_proc(file, imported, module, code_bytes,
                   proc->code_offset, &block_offsets[i]))
{
    goto end;
}

Conditions

To make clear your intentions, do not rely on the zero / no-zero boolean behaviour of C. This means explicitly comparing a value:

if (NULL != file) goto error

If using the equality operator ==, use a non-lvalue on the left-hand-side if possible. This way the comparison can not be mistaken for an assignment.

if (0 == result) {
    ...
}

Switch statements

Case labels should be indented one level, which will indent the body by two levels.

Switch statements should usually have a default case, even if it just calls abort(). If the switched-on value is an enum, the default may be omitted since the compiler will check that all the possible values are covered.

Fall through switch cases

If a switch case falls through, add a comment to say that this is deliberately intended.

switch (var) {
    case A:
        ...
        break;
    case B:
        ...
        // fall-through
    case C:
        ...
        break;
}

Curlies in cases

If a case requires local variable declarations, place the curlies like this:

    ...
case A: {
    int foo;
    ...
    break;
}
case B:
    ...

Loops

Loops that end in a non-obvious way, such as infinite while loops that use break to end the loop. Should be documented. You’ll need to use judgement about when this is needed.

// Note that the loop will exit when ...
while (true) {
    ...
    if (some condition)
        break;
    ...
}

or

while (everything_is_okay) {
    ...
    if (some condition) {
        // Exit the loop on the next iteration.
        everything_is_okay = false;
    }
    ...
}

Functions

Function names are flush against the left margin. This makes it easier to grep for function definitions (as opposed to their invocations). In argument lists, put space after commas. Include parameter names in the declaration as this can aid in documentation.

Unlike other code blocks, the open-curly for a function should be placed on a new line.

int
rhododendron(int a, float b, double c)
{
    ...
}

If the parameter list is very long, then you may wish, particularly for long or complex parameter lists, place each parameter on a new line aligning them. Aligning names as in variable definition lists is also suggested.

int
rhododendron(int a_long_parameter,
             struct AComplexType* b,
             double c)
{
    ...
}

Variables

Variable declarations shouldn’t be flush left, however.

int x = 0,
    y = 3,
    z;
int a[] = {
    1,2,3,4,5
};

When defining multiple variables or structure fields or in some cases function parameters, then lining up their names is recommended. This also applies to structure and union fields.

There should be one line of vertical space between the definition list and the next statement.

char        *some_string;
int          x;
MyStructure *my_struct;

if (...) {

Enums or defines?

Prefer enums to lists of #defines. Note that enums constants are of type int, hence if you want an enumeration of chars or shorts, then you must use lists of #defines.

Preprocessing

Nesting

Nested #ifdefs, #ifndefs and #ifs should be indented by two spaces for each level of nesting. For example:

#ifdef GUAVA
  #ifndef PAPAYA
  #else // PAPAYA
  #endif // PAPAYA
#else // not GUAVA
#endif // not GUAVA

Multi-line macros

When continuing a macro on an new line, line the \ up o the right in the same column.

#define PZ_WRITE_INSTR_1(code, w1, tok)       \
    if (opcode == (code) && width1 == (w1)) { \
        token = (tok);                        \
        goto write_opcode;                    \
    }

Comments

What should be commented

Functions

Use your judgement for whether a function should be commented. Sometimes the function name and parameter names will provide a lot of information. However for more complex functions a comment will be necessary. Comments are strongly recommended when:

  • They have side-effects

  • They require an input to be sorted, non-null or similar.

  • They have different semantics when an input has a different value (they should be separate functions if they do a different function).

  • They allocate memory that the caller is now responsible for.

  • They return statically allocated memory (try to avoid this).

  • They free memory.

  • They return certain values (non-zero, -1 etc) for errors.

  • They ain’t thread safe or reenterant.

Macros

Each non-trivial macro should be documented just as for functions (see above). It is also a good idea to document the types of macro arguments and return values, e.g. by including a function declaration in a comment.

Parameters to macros should be in parentheses.

#define STREQ(s1,s2) (strcmp((s1),(s2)) == 0)

This ensures than when a complex expression is passed as a parameter that different operator precedence does not affect the meaning of the macro.

Headers

Such function comments should be present in header files for each function exported from a source file. Ideally, a client of the module should not have to look at the implementation, only the interface. In C terminology, the header should suffice for working out how an exported function works.

Source files

Every source file should have a prologue comment which includes:

  • Copyright notice.

  • License info

  • Short description of the purpose of the module.

  • Any design information or other details required to understand and maintain the module (may be links to other documents).

Describe the exact format in use and ensure that all the C code conforms to this.

Global variables

Any global variable should be excruciatingly documented. This is especially true when globals are exported from a module. In general, there are very few circumstances that justify use of a global.

Comment style

Block comments.

Use comments of this form:

/*
 * This is a block comment,
 * it uses multiple lines.
 * It should have a blank line before it and it comments the declaration,
 * definition, block or group of statements immediately following it.
 */

For annotations to a single line of code:

i += 3; // Add 3.

Note that the // comment is standard in C99, which we are using. If the comment fits on one line, even if it describes multiple lines, a single line comment is okay:

// Add 3.
i += 3;

However if the comment is important, or the thing it documents is significant. Then use a block comment.

Guidelines for comments

Revisits

Any code that needs to be revisited because it is a temporary hack (or some other expediency) must have a comment of the form:

/*
 * XXX: <reason for revisit>
 *  - <Author name>
 */

The <reason for revisit> should explain the problem in a way that can be understood by developers other than the author of the comment. Also include the author of this comment so that a reader will know who to ask if they need further information.

"TODO" and "Note" are also common revisit labels. Only "XXX" requires the author’s name.

Comments on preprocessor statements

The #ifdef constructs should be commented like so if they extend for more than a few lines of code:

#ifdef SOME_VAR
    ...
#else // ! SOME_VAR
    ...
#endif // ! SOME_VAR

Similarly for #ifndef. Use the GNU convention of comments that indicate whether the variable is true in the #if and #else parts of an #ifdef or #ifndef. For instance:

#ifdef SOME_VAR
#endif // SOME_VAR

#ifdef SOME_VAR
    ...
#else // ! SOME_VAR
    ...
#endif // ! SOME_VAR

#ifndef SOME_VAR
    ...
#else // SOME_VAR
    ...
#endif // SOME_VAR

Using formatting tools

TODO

Tables

When code or data is tabular then using a tabular layout makes the most sense. This may be something formatters cannot handle, some will allow you to describe excisions.

We don’t have a good example of this in the code base, however the data in pz_builtin.c could probably be set out in a table. If it were it might look like:

static PZ_Proc_Symbol builtins[] = {
  { PZ_BUILTIN_C_FUNC, {.c_func = builtin_setenv_func}, false },
  { PZ_BUILTIN_C_FUNC, {.c_func = builtin_free_func},   false }
};

Defensive programming

Asserts and debug builds

TODO

Statement macros must be single statements

Macros should either be expressions (they have a value) or statements (they do not), this must always be clear. If necessary make a single statement using a block. The do {} while (0) pattern is not necessary since bodies of if statments may not be macros without their own curly brackets.

#define PZ_WRITE_INSTR_1(code, w1, tok)       \
    if (opcode == (code) && width1 == (w1)) { \
        token = (tok);                        \
        goto write_opcode;                    \
    }

Macros should not evaluate parameters more than once

C expressions may have side-effects, this is okay most of the time but can lead to confusion with macros. A macro can evaluate it’s parameters more than once. Avoid doing this in your macros, and if you must add a comment explaining that this can happen.

Tips

  • Limit module exports to the absolute essentials. Make as much static (that is, local) as possible since this keeps interfaces to modules simpler.

  • Use typedefs to make code self-documenting. They are especially useful on structs, unions, and enums. Use them on the struct or union’s forward declaration or header declaration when the definition is provided elsewhere.

Tracing macros

TODO