Mercury Grades

Paul Bone
<paul@plasmalang.org>
version 0.1, March 2018
Draft. Copyright © 2018 Plasma Team License: CC BY-SA 4.0

Plasma is written in Mercury (at least until we get to a self hosting stage) which means if you want to compile Plasma (to contribute to it) you’ll need to build Mercury, and while there are a couple of short-cuts the long way means navigating the Mercury grade system. Mercury supports many different "grades", each one is a collection of settings for how to build and link a Mercury program or library. Each grade is made out of many grade components separated by .

Shortcuts:

  • If you just want to run Plasma, without compiling it, then try this static build. (TODO: better static builds).

  • If you want to build Plasma on x86 or x86_64 on a .deb based Linux system; then use the Debian packages, and edit Makefile to uncomment optional settings such as debugging then type "make".

  • If you want to build Plasma on a non-.deb system on x86 or x86_64 then you’ll have to build Mercury. I suggest installing the asm_fast.gc and asm_fast.gc.decldebug.stseg grades. Remember to tell ./configure which grades you need otherwise it’ll try to build all of them and could take a long time (TODO: provide detailed instructions).

  • If you have some other type of system, or are building something other than Plasma but found this document, then read on.

The Mercury project documents its grade components here (retrieved 2018-03-04), and I will be clarifying some points made there. This manual, when I retrieved it, mentioned a few grade components not worth attempting to use, these are:

hl

The hl grade component is just like the hlc grade but uses a different format data on the heap. It doesn’t provide a significant advantage over hlc so isn’t considered useful.

il

A deleted .net backend.

agc

A bit-rotten garbage collector.

threadscope

A bit-rotted profiling system, the viewer component’s latest version can no-longer open profiles generated by Mercury.

mm and probably others

alternative evaluation strategies for logic programming, you probably don’t need this and if you do, someone else will tell you. rbmm region based memory management. An advanced optimisation for memory allocation. AIUI it only works for single module programs and it’s practically useful.

There are many other ‘secret’ grade components not covered here or in the User’s guide. They are mostly experimental and include grades like rbmm. If you think they should be documented here then please let us know.

Base grade

Everything starts with a base grade. The base grade selects which compilation backend you wish to use. Some backend have more than one base grade, and there are two C backends. Exactly one base grade must be part of every valid grade string.

Low-level C

none, reg, jump, asm_jump, fast or asm_fast

High-level C

hlc

C#

csharp

Java

java

Erlang

erlang

If you need to call C#, Java or Erlang foreign code then the choice is fairly obvious. If you need to work with C foreign code, as the Plasma compiler does, then things are more complicated. For a long time the Low-level C backend generated faster code than the High-level one, at least when comparing the asm_fast and hlc grades. These days, due to changes in the C compilers, it depends on the program being run.

Choosing a low-level C grade

Assuming you might use the low-level C grade, read this section.

The low-level C grade uses a combination of three optimisations (hacks) provided by GCC. With all three disabled, the base grade is none, with all three enabled it’s asm_fast.

Table 1. Low-level C Optimisations

Grade

GCC global registers

GCC Non-local GOTOs

ASM Labels

Useful

none

N

N

N

Y

reg

Y

N

N

Y

jump

N

Y

N

N

fast

Y

Y

N

N

asm_jump

N

Y

Y

N

asm_fast

Y

Y

Y

Y

Of course you want as much optimisation as possible, so choose asm_fast but not all compilers (including GCC) fully support these GCC extensions so these grades may not work. Note that ASM labels cannot be used without GCC Non-local gotos, so there’s no grades combining those. Note also that I’ve included a "Useful" column, these are the ones worth testing, the others are only of interest to researchers, since if they work, it’s almost a certainty that asm_fast works.

So choose in order of preference: asm_fast, reg then none. On x86 and x86_64 on Linux with GCC or Clang, asm_fast works (but a future version of GCC or Clang could break this). On OS X I think only none works, but I don’t remember.

High level C

As mentioned above, hlc and asm_fast are (IIRC) comparable performance-wise. Which one you choose will depend on whether your C compiler can handle asm_fast and what other features you may need (see below). For example, if you want to use the declarative debugger, then you must use a low-level C grade, if that low-level C grade happens to be none, then that’s the best you can do.

More grade components

The complete grade is built by adding grade components to select different features, separated by periods.

Garbage collection

gc or absent.

gc is Boehm GC, the only supported GC. Not including gc means that a GC will not be built, but note that Java, C# and Erlang backends provide a GC anyway, and for them gc does not make sense.

(agc bitrotted long ago, and hgc was an experiment never completed.)

You should always include gc when using a C backend. Not including this is intended only for testing.

Thread safety

par or absent

Like the gc option, this only makes sense on C grades. Grades that include par are thread safe and support the functions in the thread module of the standard library. The Java, C# and Erlang grades support this anyway.

Low level C

The threading model is N:M with IO that can block a whole "engine" of workers. The parallel conjunction operator and the very experimental automatic parallelism work are supported. This is the only combination of base grade and par that support these features.

High level C

This uses the OS’s native threads and IO works properly.

Plasma doesn’t use thread-safety in any of its Mercury programs.

Stack segmentation

stseg or absent

Meaningful only on low-level C grades where Mercury manages its own stack. Use a segmented stack so that * The program is more tolerant of deep recursion s where TCO/LCO were not used/available. * The memory cost of a thread in par grades is much cheaper.

This is recommended when par is used and can also help with debugging and deep profiling.

Other than the similar name to "trseg" (a segmented trail) and some basic low-level concept, this is not functionally related to trailing.

Single precision float

spf or absent

Use float for floating point numbers rather than double. Much faster on 32bit platforms where floats normally require boxing, but your program may have different results

Only meaningful in C grades (I think).

Debugging

debug, decldebug, ssdebug or absent

Which type of debugging to support if any. Note that decldebug is a superset of debug, you might as well use it instead of just debug. ssdebug is a totally separate debugger suitable in the "MLDS" backends (high level C, C#, Java and Erlang).

Profiling

prof, memprof, profdeep or absent

What type of profiling to support if any. prof and memprof have a smiliar workflow. profdeep is a very advanced profiler and worth considering.

These only make sense with low-level C grades.

We’re not concerned about Plasma’s compiler’s performance until well after bootstrapping, so you won’t need this for Plasma.

Trailing

tr, trseg or absent.

Enable trailing support. Trailing is a technique for undoing destructive update on backtracking. If you don’t know what it is then you probably don’t need it. need this tr is generally discouraged in favour of trseg.

I believe this option is supported with all the C backends.

Grade compatibility

Table 2. Grade component compatibility matrix

asm_fast1

hlc

java

csharp

erlang

gc

par

stseg

tr/trseg

debug/decldebug

ssdebug

prof/memprof

profdeep

asm_fast

-

N

N

N

N

R

Y2

Y

Y

Y

y

Y

Y

hlc

N

-

N

N

N

R

Y3

N

Y

N

Y

y

N

java

N

N

-

N

N

n

n

N

?D

N

Y

N

N

csharp

N

N

N

-

N

n

n

N

?D

N

Y

N

N

erlang

N

N

N

N

-

n

n

N

?D

N

Y

N

N

gc

Y

Y

n

n

n

-

Y

Y

Y

Y

y

Y

Y

par

Y2

Y3

n

n

n

Y

-

R

?

?D

?

?D

N

stseg

Y

N

N

N

N

Y

Y

-

Y

Y

y

y

Y

tr/trseg

Y

Y

?D

?D

?D

Y

?

Y

-

Y

?

y

?

debug/decldebug

Y

N

N

N

N

Y

?D

R

Y

-

D

?D

?D

ssdebug

y

Y

Y

Y

y

y

y

y

y

D

-

?

?

prof/memprof

Y

?

N

N

N

Y

?D

y

y

?D

?

-

N

profdeep

Y

N

N

N

N

Y

N

R

?D

?D

?

N

-

Y

Compatible

y

Probably compatible

N

Not compatible

n

Not compatible, but implied support by the base grade

?

Don’t know.

?D

Don’t know, but I doubt it

R

Recommended to add the column grade component if you’re using the row grade component

1

asm_fast could mean any of the LLDS base grades, see table 1.

2

asm_fast.par supports parallel conjunction and the experimental auto-parallelism. It uses green threads however IO will block an entire worker thread, you may be able to avoid that with spawn.native.

3

hlc.par does not support parallel conjunction or auto-parallelism. It uses pthreads so works correctly with IO.

My favorite grades

I use Linux on x86_64.

Default

asm_fast.gc, or maybe hlc.gc

Thread safety

asm_fast.par.gc.stseg

Debugginu

asm_fast.gc.decldebug.stseg

Profiling

asm_fast.gc.profdeep.stseg