Introduction
I've recently stumbled upon Nathaniel Smith's article about structured concurrency. It's very nice, very readable and explains the entire problem from the basics, so if you have no idea what structured concurrency is go and read it.
I've been dealing with the problem for many years and addressed it in several blog posts. However, I've always tried to keep it down to earth, very low level, partially because I was playing with it in C (maybe with a bit of assembly mixed in) and partly because I feel uneasy about discussing the grand high-level concepts before the low-level details are sorted out.
However, Nathaniel's blog post and the subsequent discussion made me realize that maybe it's already time to look a bit higher up the stack, to go beyond C's "if you don't do by hand it won't be done" approach and think about how would structured concurrency look like if it was natively supported by a high-level language.
What follows are some raw thoughs in no particular order.
A coroutine is like a control construct
Forget parallelism for a second. We are talking concurrency here:
Very much like control constructs ("if", "for", "while", "function call" and such) a coroutine is just a nice trick to hide the ugly old goto. When you launch a coroutine you jump to its code. When coroutine gets to a point where it would block, it jumps to the code of a different coroutine. And so on.
If implemented properly, coroutines can even get close to the classic control structures in the terms of performance.
But coroutine is also like a variable
Coroutine has a state (its entire stack and whatnot). "if" statement does not. That sounds like they are fundamentally different. But look at "for" statement. It does have the control variable. In C89 it's even declared outside of the construct:
int i;
for(i = 0; i != 10; ++i) {
...
}
The control variable is stored on the stack. Maybe we can do the same with coroutines? And actually, yes, libdill does allow for that kind of thing:
void foo() {
int x;
int y;
...
}
int a;
char stk[10000];
int b;
go_mem(foo(), stk, sizeof(stk));
The stack would then look like this:
The remaining problem is to make sure that foo() is not running any more when the scope the stack was declared in is exited. If it was still running, it would use memory on the stack below the stack pointer. If the stack grew again the new stack frames and the coroutine would overwrite each other.
It looks like we'll have to solve some scoping issues before we have a functional high-level concurrency model.
A coroutine can't be orphaned
There are many good arguments for not allowing coroutines to be orphaned (not allowing a child coroutine to run after the parent coroutine exits). You can find them in Nathaniel's article.
The example of the stack-based coroutines is just a more drastic example of the same: Not only are orphaned coroutines bad design they will also cause memory corruption and undefined behavior.
And once you accept the fact that there will be no orphan coroutines you can imagine introducting concurrency into a programming language as a move from the dumb old linear call stack to a call tree.
See the animation below, but be warned. Once you see it you cannot unsee it. I've envisaged it some years ago and since then I have a hard time thinking of concurrency in any other way.
Each box in the picture is a stack frame:
A coroutine can be orphaned
Despite of what was said above there is a use case for orphaned coroutines. Specifically, sometimes you want a coroutine to be a part of an object allocated on the heap. Imagine, for example, a TCP socket object which contains a coroutine to send keepalives. The socket is created by one coroutine but it can be easily passed to a different, unrelated coroutine. If we do that we don't want the keepalives stop being sent just because the original coroutine exited.
And here we are back to "coroutine is like a variable" idea.
While variables are mostly nicely nested in the call stack, that's not always true. Every now and then you need to allocate variables on the heap. And C actually provides two mechanisms to access variables. If the variable lives on the stack you can address it as "a". When it lives on the heap you address it via a pointer "p->a".
Can we do the same thing with coroutines? Sure, why not? We can extend the language, for example, like this:
On the stack. The coroutine will be canceled when the scope is exited:
go(foo());
On the heap. The coroutine will be canceled when p is freed:
p->go(foo());
Coroutines naturally aggregate into pools
This principle comes from the observation that often there is an arbitrary number of coroutines doing the same thing. For instance, a network server will probably have one coroutine per connection. All those coroutines are exactly the same. And you, as a programmer, want to deal with all of them in a unified way rather than having to care about each one separately. In the end of the day, you want a single handle that points to that bundle of coroutines rather than many handles, each pointing to a different coroutine.
int b = bundle();
while(1) {
int s = tcp_accept(listener);
bundle_go(b, handle_tcp_connection(s));
if(shutdown_requested()) break;
}
close(b); /* All the TCP handlers are canceled here. */
libdill calls this concept a "bundle". Nathaniel Smith's Trio calls it a "nursery". Both names suck. The former becuase it's too generic (bundle of what?) and likely to name-clash with unrelated concepts, the latter because it's, for a such fundamental concept, too long (3 sylables!) and because the semantic relation between the name and the thing it represents is quite vague. Maybe we can do better? (Edited here: I've proposed "rope" but that name is already taken.)
There are different types of cancellation
Off the top of my head:
- Cancel all threads in the bundle.
- Block until all threads in the bundle exit.
- Block until at least one thread in the bundle exits. Cancel all the other threads.
The last one is not completely intuitive but we should take it seriously given that it seems to solve a bunch of pesky problems, like timeouts and Ctrl+C handling.
Consider this code:
int b = bundle(ANY);
bundle_go(b, foo());
bundle_go(b, sleep(10));
bundle_go(b, wait_for_ctrlc());
close(b);
If foo() finishes, it cancels both the timer and the Ctrl+C waiter. If timeout expires, foo() is canceled as well as the Ctrl+C waiter. If Ctrl+C is pressed, foo() is canceled as well as the timer. Exactly the semantics we want.
All that being said, the different kinds of cancellation seem to be profoundly different, in the way that may require different syntax.
For example, "cancel all threads" will be probably used to handle errors. You encounter an error and during the cleanup you just cancel the threads. That in turn means that you can't say whether you'll use "cancel" or "all" mechanism in advance, when you are creating the bundle. You don't know beforehand whether there will be an error or not.
As for "wait for all" and "wait for any", these you do want to specify in advance, because they are part of the business logic of the application. Having a single bundle that can be terminated as "any" in some cases and as "all" in other cases sounds like sloppy programming.
But even "all" and "any" don't seem to be two fully equal options. Some more thinking needed…
There are even more types of cancellation
Consider the accept loop example above. We have a bundle of worker threads with "all" semantics. All threads have to finish before the bundle can be closed.
Now imagine that we want to shut down the bundle but we want to give the worker threads 10 seconds to finish gracefully.
We would have to somehow wrap the all-style bundle of worker threads into a any-style bundle together with a new timeout coroutine.
One can even imagine three or four levels of such nesting.
Is a thread part of the bundle it creates?
As for "no" alternative, we've seen the examples above. User creates a bundle and adds threads into it. Parent continues executing as normal:
int b = bundle(ALL);
while(1) {
int s = tcp_accept(listener);
bundle_go(b, handle_tcp_connection(s));
if(shutdown_requested()) break;
}
close(b);
It's not clear how could the "yes" alternative be nicely designed. The first thread gets a priviliged position automatically, simply because it's part of the creator function. Let's say we introduce all{} construct that will treat all the coroutines created within its context (including the main one) as belonging to an implicit bundle. Note how foo() and bar() feel different from the code within the all{} construct:
void foo() {
...
}
void bar() {
...
}
all {
go(foo());
go(bar());
i = i + 1;
}
Also, consider how would similar any{} construct work. Given that if any thread in the bundle finishes, all the other threads are canceled, the code within the any{} construct can be canceled at any point when sleep() exits:
any {
go(sleep(10));
bar();
baz(); /* Canceled here after 10s, goto end of the scope. */
quux();
}
One way to solve this kinds of problems would be to simply block the main coroutine while bundle is executing:
void bar() {
...
}
void foo() {
go(bar());
...
}
r = all(foo());
/* We get here only after both foo and bar are done. */
The solution is kind of neat. However, one thing I don't like about it is that it's the main coroutine which decides whether foo() should be treated as "all" or "any" bundle. Whereas, in fact, the distinction is part of foo's business logic.
Conclusion
Implementing structured concurrency in higher-level languages is not entirely trivial and poses some hard design questions.
Any opinion on the topic is welcome.
April 28th, 2018
Intervals are a similar concept to Trio. There's several research papers by Nicholas Matsakis.
They are perhaps more constrained than trio, but allow for more sanity checking. They also have a story about shared resources. I've only used them for toys (with threads and coroutines), but the idea seems very promising.
Maybe "spool" would be a good name instead of bundle. While the analogy may not work as well as rope it's maybe closer than bundle.
Thanks! I haven't even know the word. I also like the rhyme of "thread pool" and "thread spool".
There is an active thread in the trio repo about renaming the concept away from "nursery". That could be a place to follow (and/or engage in) so that you can choose the same name and minimize confusion :)
https://github.com/python-trio/trio/issues/504
Maybe a "cohort" of coroutines?
Looking in thesaurus I've also found "wad": A small mass of cotton, wool, or other fibrous or soft material, used for stuffing, padding, packing, etc.
I would caution against "wad": it's slang for ejaculate/semen in a lot of places
I like the any/all constructs. My proposed names: join and race.
Martin, I can't recall if I've pointed out my dissertation work to you before, but in case not, you might find the idea of "facets" interesting in this connection. (See chapter 2 but also chapter 5 and the examples in chapter 8.)
Each actor has (one or more) facets, which nest to form a tree similar to your "call tree", except they're ~reactive rather than ~sequential. Facet trees have nice properties in the face of the different kinds of cancellation you mention, and compose quite nicely; some discussion of cancellation per se is in chapter 9.
Yes, I still have it on my reading list. With the work and family it's getting hard to read anything that's more than one page long :(
How is this different than Dart's async calls?
I am not familiar with Dart, but from quick research it looks like it has only the standard async/await construct. Am I missing anything?
Nice animation. Thanks.
Your concept of a bundle appears to be an interesting abstraction above a Dart isolate.
All Dart code runs in isolates, which is essentially is a thread with a separated-out heap and communication exclusively via message-passing (like Erlang Actors but seemingly without network transparency). From the main Dart isolate it is possible to create one or more new isolates to implement parallel code. Within one isolate it is possible to create tasks (single threaded concurrency) or use the async/await construct. First-class functions allow isolate-pools to be created. If you do a google search for "load_balancer.dart" you will find an interesting example of an experimental load_balancer that can run a function on multiple isolates in parallel based on some quantifiable load estimation.
There is lots more there. A parent isolate can be notified of child isolate errors. If you are targeting the DartVM and use spawn to create child isolates, you can pass objects back and forth. If targeting Javascript then only primitives (or maps or lists of primitives) can be passed.
My interest is soft real-time applications.
not sure why you'd want to fix cancellation options etc. as syntax in the language
e.g. here is wait_all and wait_any in Trio. It's very useful to be able to encapsulate various behaviors like this as functions.
Post preview:
Close preview