← writing

what happens when the last receiver drops

summary: two months of using hyperbridge in theplatform. one bug class i had been ignoring is now closed.

Hyperbridge Hyperbridge: a public concurrency detour work: Nov 2021-Jan 2022 post-mortem

Core idea: Hyperbridge's close semantics forced a precise publication rule: make visibility the final store so panics cannot expose half-written channel state.

The visibility-store has to be last. That is the rule I did not have written down for the first version of hyperbridge, and that cost me three weeks.

A sender writes a slot in two parts: it puts the value into the slot, then it advances tail.index to publish that the slot is readable. If those happen in that order, a panic between them leaves the slot non-visible: the consumer never sees the index advance, the value is invisible, the channel is consistent. If the order is reversed - or if any other state-changing operation happens between the visibility store and the end of the function - a panic mid-call can leak a half-published value into the consumer's view.

The first version did the visibility store too early, because I wrote the function in the order the operations made sense to me, not in the order the channel needed them to be safe. The fix is one line of reordering and exactly the discipline that the store-release on tail.index is the last thing the function does.

The textbook bugs - last-receiver-drops leaks, last-sender-drops deadlocks - I had also been ignoring; they got their fix today too, by counting senders and receivers and transitioning to Err(Closed) when either count goes to zero. They were textbook. The visibility-ordering one was not.

Three weeks of staring at this bug class. One day of fixing it, once I had the rule.