# A Calculus for Relaxed Memory

#### Karl Crary and Michael J. Sullivan

Carnegie Mellon University POPL '15, Mumbai

January 17, 2015



| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    | 1/25       |

#### Relaxed Memory Calculus

- A new approach to language memory models for concurrency
  - That is, specifying what writes are available to reads
  - In the presence of optimizing compilers and SMP machines
- Based around specifying visibility and execution orderings
- Suitable for use with  $C/C{++}$
- With a mechanized metatheory

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| •<br>000     | 0000 | 0000   | 000  |            |
|              | 0000 |        | 0    | 0.40*      |

## Concurrency?

- Concurrent programming is hard, even under the best of circumstances
- Sequential consistency: threads interleave instructions, modifying a single shared memory

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| ● <b>0</b> 0 | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      |            |

# Concurrency?

- Concurrent programming is hard, even under the best of circumstances
- Sequential consistency: threads interleave instructions, modifying a single shared memory
- Languages designed so that if locks are used to rule out data races, events are sequentially consistent

| INTRODUCTION | RMC               | Pushes | Misc | Conclusion |
|--------------|-------------------|--------|------|------------|
| ○<br>●○○     | 0000<br>00<br>000 | 0000   | 000  |            |
|              |                   |        |      |            |

# Concurrency?

- Concurrent programming is hard, even under the best of circumstances
- Sequential consistency: threads interleave instructions, modifying a single shared memory
- Languages designed so that if locks are used to rule out data races, events are sequentially consistent
- But sometimes that isn't good enough (perf-critical code, implementation of system libraries, ...)

| INTRODUCTION | RMC               | Pushes | Misc | Conclusion |
|--------------|-------------------|--------|------|------------|
| ○<br>●OO     | 0000<br>00<br>000 | 0000   | 000  |            |
|              |                   |        |      | 0 / 05     |

# Concurrency? Message passing

```
int data, flag;
void send(int msg) {
    data = msg;
    flag = 1;
}
int recv() {
    while (!flag)
    continue;
    return data;
}
```

• Two threads: one wants to send a single message to the other

| INTRODUCTION | RMC               | Pushes | Misc          | Conclusion |
|--------------|-------------------|--------|---------------|------------|
| ○<br>○●○     | 0000<br>00<br>000 | 0000   | 000<br>0<br>0 |            |

# Concurrency? Message passing

```
int data, flag;
void send(int msg) {
    data = msg;
    flag = 1;
}
int recv() {
    while (!flag)
    continue;
    return data;
}
```

- Two threads: one wants to send a single message to the other
- Correctness: recv() only returns the value passed to send()
- If the read from flag returns 1, the read from data must return the sent value

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      |            |

# Concurrency? Message passing

```
int data, flag;
void send(int msg) {
    data = msg;
    flag = 1;
}
int recv() {
    while (!flag)
    continue;
    return data;
}
```

- Two threads: one wants to send a single message to the other
- Correctness: recv() only returns the value passed to send()
- If the read from flag returns 1, the read from data must return the sent value
- Nope!

| INTRODUCTION | RMC               | Pushes | Misc | Conclusion |
|--------------|-------------------|--------|------|------------|
| 0<br>0●0     | 0000<br>00<br>000 | 0000   | 000  |            |
|              |                   |        |      | 4 / 25     |

# Concurrency? Message passing: What goes wrong

```
int data, flag;
void send(int msg) {
    data = msg;
    flag = 1;
}
int recv() {
    while (!flag)
    continue;
    return data;
}
```

- Compiler could reorder writes in send, hoist the load out of the loop, ...
- CPU has out of order and speculative execution, multilevel caches, ...

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 00●          | 000  |        | 0    |            |

```
int data, flag;
void send(int msg) {
    data = msg;
    flag = 1;
}
int recv() {
    while (!flag)
    continue;
    return data;
}
```

• What do we need for this code to work?

| INTRODUCTION | RMC                       | Pushes | Misc | Conclusion |
|--------------|---------------------------|--------|------|------------|
| 0000         | • <b>000</b><br>00<br>000 | 0000   | 000  |            |
|              |                           |        |      | 6 / 25     |

```
int data, flag;
void send(int msg) {
    data = msg;
    flag = 1;
}
vo
int recv() {
    while (!flag)
    continue;
    return data;
}
```

- What do we need for this code to work?
- If the write to flag is visible to other threads, the write to data must be also (vo = visibility order)

| INTRODUCTION | RMC                       | Pushes | Misc | Conclusion |
|--------------|---------------------------|--------|------|------------|
| 0<br>000     | • <b>000</b><br>00<br>000 | 0000   | 000  |            |
|              |                           |        |      | 6 / 25     |

```
int data, flag;
void send(int msg) {
    data = msg;
    flag = 1;
}
void send(int msg) {
    data = msg;
    void send(int msg) {
        continue;
        return data;
    }
```

- What do we need for this code to work?
- If the write to flag is visible to other threads, the write to data must be also (vo = visibility order)
- The read from flag must execute before the read from data (xo = execution order)

| INTRODUCTION | RMC        | Pushes | Misc | Conclusion |
|--------------|------------|--------|------|------------|
| 0000         | ●000<br>○○ | 0000   | 000  |            |
| 000          | 000        |        | õ    |            |
|              |            |        |      | 6 / 25     |

```
int data, flag;
void send(int msg) {
    data = msg;
    flag = 1;
}
void send(int msg) {
    data = msg;
    void send(int msg) {
        continue;
        return data;
    }
```

- What do we need for this code to work?
- If the write to flag is visible to other threads, the write to data must be also (vo = visibility order)
- The read from flag must execute before the read from data (xo = execution order)
- The combination ensures that the write to data is *visible* to the read

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      | 6 / 25     |



- The combination ensures that the write to data is *visible* to the read
- The read must read from it (or a later write)
- (rf = reads from)

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    | = / 05     |
|              |      |        |      | 7 / 25     |



- The combination ensures that the write to data is *visible* to the read
- The read must read from it (or a later write)
- (rf = reads from)

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    | - /        |
|              |      |        |      | 7 / 25     |

# RMC Key concepts

- Have the programmer explicitly specify these constraints
- Allow specification of visibility and execution ordering

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |

# RMC Concrete syntax

```
int data, flag;
void send(int msg) {
    VEDGE(wdata, wflag);
    L(wdata, data = msg);
    L(wflag, flag = 1);
}

int recv() {
    XEDGE(rflag, rdata);
    while (!L(rflag, flag))
    continue;
    return L(rdata, data);
}
```

- L(label, expr) labels an expression
- VEDGE and XEDGE establish visibility and execution edges

| INTRODUCTION | RMC               | Pushes | Misc          | Conclusion |
|--------------|-------------------|--------|---------------|------------|
| 0<br>000     | 0000<br>00<br>000 | 0000   | 000<br>0<br>0 |            |
|              |                   |        |               |            |

#### C++11Overview

- The C++11 memory model marks accesses to atomic memory locations with various "memory orders"
- Relations like "synchronizes with" and "happens before" are inferred from these
- "Happens before" isn't transitive

| INTRODUCTION | RMC               | Pushes | Misc | Conclusion |
|--------------|-------------------|--------|------|------------|
| 0000         | 0000<br>00<br>000 | 0000   | 000  |            |
|              |                   |        |      | 10 / 25    |

#### C++11Comparison

- Nicer to specify the key relations directly
- And it gives the compiler more flexibility

| INTRODUCTION | RMC                      | Pushes | Misc | Conclusion |
|--------------|--------------------------|--------|------|------------|
| 0000         | 0000<br><b>0●</b><br>000 | 0000   | 000  |            |
|              | 000                      |        | 0    | 11 / 25    |

```
typedef struct {
   unsigned char buf[BUF_SIZE];
   unsigned front, back;
} ring_buf_t;
```

#define ring\_inc(v) (((v) + 1) % BUF\_SIZE)

- Example adapted from the Linux Kernel
- · Lock-free fixed size FIFO buffer
- One producer, one consumer
- Producer modifies back, consumer modifies front.
- Empty when back == front, full when ring\_inc(back) == front.

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      | 12 / 25    |

```
void buf_enqueue(ring_buf_t *buf, unsigned char c) {
  unsigned back = buf->back;
  if (ring_inc(back) != buf->front) { // not full
    buf->buf[back] = c;
    buf->back = ring_inc(back);
  }
}
```

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      | 13 / 25    |

```
void buf_enqueue(ring_buf_t *buf, unsigned char c) {
 unsigned back = buf->back;
 if (ring_inc(back) != buf->front) { // not full
   buf->buf[back] = c;
   buf->back = ring_inc(back);
 }
}
int buf_dequeue(ring_buf_t *buf) {
 int c = -1;
 unsigned front = buf->front;
 if (front != buf->back) { // not empty
   c = buf->buf[front];
   buf->front = ring_inc(front);
 }
 return c;
}
```

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      |            |

```
void buf_enqueue(ring_buf_t *buf, unsigned char c) {
 unsigned back = buf->back;
 if (ring_inc(back) != buf->front) { // not full
    buf->buf[back] = c;
   buf->back = ring_inc(back);
 }
}
int buf_dequeue(ring_buf_t *buf) {
 int c = -1;
 unsigned front = buf->front;
 if (front != buf->back { // not empty
    c = buf->buf[front] *
    buf->front = ring_inc(front);
 }
 return c;
3
```

Message passing: values enqueued will be visible to dequeuer

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              |      |        |      | 13 / 25    |

```
void buf_enqueue(ring_buf_t *buf, unsigned char c) {
 unsigned back = buf->back;
  if (ring_inc(back) != buf->front) { // not full
    buf->buf[back] = c;
   buf->back = ring_inc(back);
 }
}
int buf_dequeue(ring_buf_t *buf) {
 int c = -1:
 unsigned front = buf->front;
 if (front != buf->back { // not empty
   c = buf->buf[front]
   buf->front = ring_inc(front);
 }
 return c;
3
```

- Message passing: values enqueued will be visible to dequeuer
- Ensure the value is read before its space is marked as free

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |

```
void buf_enqueue(ring_buf_t *buf, unsigned char c) {
 unsigned back = buf->back;
  if (ring_inc(back) != buf->front,) { // not full
    buf->buf[back] = c;
   buf->back = ring_inc(back);
 }
}
int buf_dequeue(ring_buf_t *buf) {
 int c = -1:
 unsigned front = buf->front;
 if (front != buf->back { // not empty
    c = buf->buf[front]
   buf->front = ring_inc(front);
 }
 return c;
3
```

- Message passing: values enqueued will be visible to dequeuer
- Ensure the value is read before its space is marked as free
- Don't write a value until we know its space is free

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |

```
void buf_enqueue(ring_buf_t *buf, unsigned char c) {
 XEDGE(echeck, insert);
 VEDGE(insert, eupdate);
 unsigned back = buf->back;
 if (ring_inc(back) != L(echeck, buf->front)) {
   L(insert, buf->buf[back] = c); 🧲
   L(eupdate, buf->back = ring_inc(back));
 }
}
int buf_dequeue(ring_buf_t *buf) {
 XEDGE(dcheck, read);
 XEDGE(read, dupdate);
 int c = -1;
 unsigned front = buf->front;
 if (front != L(dcheck, buf->back) {
   c = L(read, buf->buf[front])
   L(dupdate, buf->front = ring_inc(front))
  3
 return c;
}
```

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |

#### Pushes Rationale

• Consider the following (broken!) code, which could be a snippet from a mutual exclusion algorithm

| INTRODUCTION | RMC               | Pushes | Misc | Conclusion |
|--------------|-------------------|--------|------|------------|
| 0000         | 0000<br>00<br>000 | 0000   | 000  |            |
|              |                   |        |      | 15 / 25    |

## Pushes Rationale

• Consider the following (broken!) code, which could be a snippet from a mutual exclusion algorithm

- · Could let both threads into critical section
- · Can't fix this with visibility or execution edges

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | ●000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      | 15 / 25    |

#### Pushes

- Pushes are globally visible actions
- Totally ordered
- Doesn't do much on its own; combined with execution and visibility edges to constrain behavior

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0000         | 0000 | 0000   | 000  |            |
|              |      |        |      | 16 / 25    |



• Push is visibility after the write, execution before the read

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0<br>000     | 0000 | 0000   | 000  |            |
|              | 000  |        | 0    | 17/05      |



- Push is visibility after the write, execution before the read
- One of the pushes needs to be visible to the other

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    | 17/25      |



- Push is visibility after the write, execution before the read
- One of the pushes needs to be visible to the other
- Which makes the write visible to the other thread's read

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      |            |



- Push is visibility after the write, execution before the read
- One of the pushes needs to be visible to the other
- Which makes the write visible to the other thread's read

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      |            |



- Push is visibility after the write, execution before the read
- One of the pushes needs to be visible to the other
- Which makes the write visible to the other thread's read

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      |            |

### Pushes Push syntax

```
VEDGE(write1, push1);
XEDGE(push1, read1);
L(write1, lock1 = 1);
L(push1, PUSH);
if (!L(read1, lock2)) {
   // Critical section
}
```

```
VEDGE(write2, push2);
XEDGE(push2, read2);
L(write2, lock2 = 1);
L(push2, PUSH);
if (!L(read2, lock1)) {
   // Critical section
}
```

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0<br>000     | 0000 | 0000   | 000  |            |
|              | 000  |        | 0    | 18 / 25    |

#### Theory Overview

- Formalized typed core-calculus see paper for details
- Very weak, to future-proof against new hardware
- Dynamic semantics explicitly accounts for out-of-order and speculative execution

| INTRODUCTION | RMC  | Pushes | Misc     | Conclusion |
|--------------|------|--------|----------|------------|
| 0<br>000     | 0000 | 0000   | •00<br>0 |            |
|              | 000  |        | 0        | 19 / 25    |

# Theory Coherence order

- Coherence order order on writes to each location
- Key technical device
- Ensures single threaded computation works as expected

| INTRODUCTION | RMC               | Pushes | Misc | Conclusion |
|--------------|-------------------|--------|------|------------|
| 0<br>000     | 0000<br>00<br>000 | 0000   | 000  |            |
|              | 000               |        | 0    | 20 / 25    |

# Theory Theorems

- Progress and Preservation
- Interleaving actions with pushes gives sequential consistency
- Race free executions are sequentially consistent

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0<br>000     | 0000 | 0000   | 000  |            |
|              | 000  |        | 0    | 21 / 25    |

# Theory Theorems

- Progress and Preservation
- Interleaving actions with pushes gives sequential consistency
- Race free executions are sequentially consistent
- All formalized in Coq

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0<br>000     | 0000 | 0000   | 000  |            |
|              | 000  |        | 0    | 21/25      |

#### $\ Implementation$

- Compiler needs to preserve execution order
- On x86, visibility and execution order come for free
- On ARM, visibility order can be enforced with a fence (dmb); execution order allows more options

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | •    |            |
|              |      |        |      | 00/05      |

#### Related Work

- Java memory model (Manson et al. 2005)
- C++ memory model (Boehm and Adve 2008, Batty et al. 2010)
- Sarkar, et al. 2011; POWER operational model
  - Direct inspiration for our system
- Alglave et al. 2014; generic framework, "leapfrogging writes"
- Jagadeesan et al. 2010; operational model for Java
  - Our mechanism for speculation adapated from this
- Boehm and Demsky 2014; "out-of-thin-air" results worse than we realized

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | •    |            |
|              |      |        |      |            |

#### Conclusion

- RMC is a new memory model built around explicitly specifying visibility and execution orderings
- Details about the formalism and model are in the paper
- Implementation is being developed on top of  $\mathsf{Clang}/\mathsf{LLVM}$

| INTRODUCTION | RMC               | Pushes | Misc | Conclusion |
|--------------|-------------------|--------|------|------------|
| 0000         | 0000<br>00<br>000 | 0000   | 000  |            |
|              |                   |        |      | 24 / 25    |

# Thank you!

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 0000 |        | 0    | 25 / 25    |

#### C + + 11

```
int load_acquire(int *ptr) {
    XEDGE(load, post);
    return L(load, *ptr);
}
void store_release(int *ptr, int val) {
    VEDGE(pre, store);
    L(store, *ptr = val);
}
```

| INTRODUCTION | RMC  | Pushes | Misc | Conclusion |
|--------------|------|--------|------|------------|
| 0            | 0000 | 0000   | 000  |            |
| 000          | 00   |        | 0    |            |
|              | 000  |        | 0    |            |
|              |      |        |      | 25 / 25    |

# More comparision to C++11

- We give the compiler more flexibility in how to implement things
- C++11 ring buffers would do two releases, two acquires
- We can get a lot of the benefit of consume without the large complexities involved

| INTRODUCTION | RMC               | Pushes | Misc | Conclusion |
|--------------|-------------------|--------|------|------------|
| 0000         | 0000<br>00<br>000 | 0000   | 000  |            |
|              |                   |        |      | 25 / 25    |