In GHC-compiled programs, values go through three(-ish) phases of evaluation:
- Thunk. This is where they start.
- Black hole. When forced, a thunk is converted to a black hole and computation begins. Other threads that request the value of a black hole will instead add themselves to a notification list for when the black hole is updated. (Also, if the thunk itself tries to access the black hole, it will short-circuit to an exception instead of waiting forever.)
- Evaluated. When the computation finishes, its last task is to update the black hole to a plain value (well, WHNF value, anyway).
The pointer that is getting updated during these phase transitions is shared with other threads and not protected from race conditions. This means that, very rarely, it is possible for two (or more) threads to both see a pointer in phase 1 and for both to execute the 1 -> 2 transition; in that case, both will evaluate the thunk, and the transition 2 -> 3 will also happen twice. Notably, though, the 1 -> 2 transition is typically much faster than the computation it is replacing (essentially just a memory access or two), in part exactly so that the race is difficult to trigger.
Because the language is pure, the racing threads will come to the same answer. So there is no semantic difficulty here. But in some rare cases, a little bit of work may be duplicated. It is very, very rare that the overhead of a lock on every 1 -> 2 transition would be better than this slight duplication. (If you find it is in your case, consider manually protecting the evaluation of whichever expensive thing is being shared!)
Corollary: great care must be taken with the unsafe IO a -> a
family of functions; some guarantee synchronization of the evaluation of the resulting a
and some don't. If your IO a
action is not as pure as you promised it is, and a race causes it to be executed twice, all manner of strange heisenbugs can occur.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…