Skip to content

ReadComputeWrite (RCW)

James Edmondson edited this page Oct 3, 2024 · 5 revisions

C++ Guide Series
Architecture | Knowledge Base | Networking | Containers | RCW | Threads | Optimizations | KaRL | Encryption | Checkpointing | Knowledge Performance | Logging


Summary

RCW is a powerful paradigm for writing multithreaded code that is fast, efficient, and race-condition free when it comes to working with the thread-safe knowledge base. In this wiki, we'll explain why RCW is necessary, what it provides developers, and how to effectively use RCW tools like Staged containers in MADARA.

RCW Concept

The RCW Concept consists of separating access and updating thread-safe contexts into distinctive read, compute and write phases. These phases help to prevent writing access by other threads and is vital in multi-threaded and multi-processed applications, and is especially helpful in any situation where many readers are looking at a variable that may be updated by one or more other threads.

RCW Flow

  1. Read values from the thread-safe context into local variables (e.g., int64, double, or string)
  2. Perform any computation steps on the local variables to ensure data consistency during computation on the local variables
  3. Write updated values from the local variables to the thread-safe context

RCW Benefits

  1. Fast as possible computation
  2. Data integrity throughout computation phase
  3. Controlled execution and fewer mutex calls

RCW Notes

  1. You always execute a read phase for any variable in the computation. You only really need to write data products of the computation.

  2. Never write data to variables you do not own. You're likely overwriting an update from the thread/process that owns the data. Consider the following:

Agent 0 has two threads. The first thread updates its GPS position. The second reads its own position and a position of Agent 1 and determines if the two agents are in a potential collision course. Agent 1 does the same thing.

Now, consider the situation where you always read positions into local variables, and you also always call write on all containers when you are cleaning up a compute phase. Here's how the execution would look if you do that.

Incorrect RCW

agent_0.gps.read()
agent_1.gps.read()

agent_0.can_collide = compute_collision(agent_0, agent_1)
agent_1.can_collide = *agent_0.can_collide // note another process owns this variable

agent_0.gps.write() // note another thread owns this variable
agent_1.gps.write() // note another process owns this variable

agent_0.can_collide.write() // literally the only proper RCW write
agent_1.can_collide.write() // note another process owns this variable

The result of the above would be that you would not only overwrite your own GPS read thread's position updates, but also agent 1's position updates, and it's own collision logic.

Correct RCW

agent_0.gps.read()
agent_1.gps.read()

agent_0.can_collide = compute_collision(agent_0, agent_1)

agent_0.can_collide.write() // literally the only proper RCW write

The above is how you assure that the GPS values are local and unchangeable to your current computation by any external entity. If your thread is what is supposed to keep track of collision likelihood for agent_0, then the only data product that makes sense for writing is the can_collide boolean check above.

RCW Classes

We currently support a few base RCW classes that simplify RCW mechanics.

  • IntegerStaged
  • DoubleStaged
  • StringStaged

Each of these contains a read() and write() method and an internal local variable which is extremely fast to use.

All Containers Can Be RCW

All containers and variables in MADARA can be made into RCW phases. To do this, simply consider the following example with NativeDoubleVector:

// read phase
madara::knowledge::containers::NativeDoubleVector gps("agent.0.position", knowledge);
std::vector<double> agent_0_position = gps.to_doubles();

// compute phase
agent_0_position[0] = 55.5555;
agent_0_position[1] = 11.1111;

// write phase
gps.set(agent_0_position);

The double vector is slightly more complicated than other classes like Integer, Double, and String because it can have multiple elements. There are faster ways to do the above if you only update one of many elements in a vector (e.g., an image feature embedding). Only update the elements you need to update, whenever possible.


C++ Guide Series
Architecture | Knowledge Base | Networking | Containers | RCW | Threads | Optimizations | KaRL | Encryption | Checkpointing | Knowledge Performance | Logging