Skip to content
This repository has been archived by the owner on Feb 2, 2023. It is now read-only.

Multicore OCaml

Leo White edited this page Sep 10, 2013 · 16 revisions

Git branch: multicore

This project aims to add support for executing multiple threads of OCaml code in parallel within the same process. There are two sides to this project:

  1. Creating a multicore OCaml runtime.
  2. Creating programming models and libraries to support parallel programming in OCaml.

Multicore runtime

The basic idea of this runtime is to allow multiple OCaml runtimes (called contexts) to exist in the same process, and then to add a shared heap for objects that can be accessed by multiple contexts.

See Multicore runtime for a more detailed discussion.

Status

We are able to create multiple OCaml contexts within a single process. In byte-code, the read and write barriers have been updated and successfully promote objects to the shared heap. Objects on the shared heap are not currently collected.

Things to be done

Add read barrier to native code output

We need to add read barrier into the output from ocamlopt for mutable reads. This should be triggered with some kind of "-multicore" command-line option.

Add proxy objects

We need to add [proxy objects](Multicore runtime#proxy-objects).

Add a garbage collector to the shared heap

We need to add a garbage collector to the shared heap.

Parallel programming models

We would like to implement a number of parallel programming models using the multicore runtime. The following, in order of complexity, would be useful.

Threads library

OCaml's builtin Threads library needs to be extended with functions for creating and manipulating contexts. It's synchronisation primitives also need to be modified to protect from threads in other contexts.

ParMap library

The ParMap library provides parallel map functionality using multiple processes. This interface could probably reimplemented more efficiently using the multicore runtime.

Ocaml-Java's concurrent library

Xavier Clerc has created a parallel OCaml library for use in the OCaml-Java project. It provides most basic multi-threaded constructs, and would be useful to implement using the multicore runtime. This would allow us to compare OCaml's multicore runtime with the JVM.

Monadic task library

Monadic concurrency libraries like Lwt and Async provide a good model for concurrent programming in OCaml. Creating a similar library that can execute multiple tasks in parallel would be an effective model for parallel programming.

Multiple tasks would be scheduled using work-stealing (with context-local dequeues accessed via proxies).

This library could also be extended with features such as concurrent revisions.

Actors with no shared state

Ensuring no shared state

Programming using actors that share no (mutable) state is an effective parallel programming model that can safely be scaled up to distributed systems. However ensuring that actors do not share mutable state is difficult in OCaml. OCaml does not track side-effects and abstraction allows you to easily hide mutable state inside a module.

One possible solution is to enforce a lack of shared state is not to allow different nodes to share any modules. Thus two nodes would effectively be separate programs that happened to share an OCaml runtime. Note that they could still share the code for modules, just not the actual module record.

To create these nodes we could use a mechanism similar to pack. It would construct an isolated node module and only expose message passing interfaces for communicating with it. This translation is similar to that of the big funcotr patch.

Note that nodes created in this way would not require the read barrier, so we can effectively wrap a single-threaded OCaml program into a node and safely include it within a multi-threaded OCaml program.

Message passing

Message passing using copying channels is also difficult to support in OCaml. Copying is not in general a type-safe operation. This means that we require modules to expose an indication that their contents can be safely copied. Thus we have an interface:

val new_channel: 'a copyable -> 'a copy_channel

Here copyable is essentially a type-class. Creating values of 'a copyable would only be supported through a type-conv style interface:

type t = 
    Foo 
  | Bar of s 
with copyable

The copyable value also allows us to only actually copy mutable types. Only the mutable parts of a type need to be copied and this could be expressed in the copyable value. With some clever use of cross-module inlining this should provide very efficient channels for passing immutable values.

Note that it would not be possible to make types containing closures (including objects) copyable, because a closure may capture some mutable state. However, channels would be copyable, so a function could be replaced with a channel that accepted input and a return channel, called the function on its own node, and then sent the result down the return channel.