Since I've written the article on structured concurrency and implemented libdill the progress went on.
Most importantly, Nathaniel J. Smith has published his "Notes on structured concurrency, or: Go statement considered harmful" article. (If you prefer to watch a video, go here.) It's better, more detailed and more eloquent than anything I have ever written on the topic. If you want to read one thing about structured concurrency go read Nathaniel's article.
After C-based libdill we've also got Venice which brought structured concurrency to Swift and Trio which brought it to Python.
Then, few weeks ago, a new version of Kotlin's coroutine library was released that supports structured concurrency. Here's the accompanying blog post by Roman Elizarov.
"Concurrency made easy: coming soon to a programming language near you" recently published by John Belmonte takes another thoughtful shot at explaining the paradigm. It also summarizes the state of its adoption.
Finally, I would like to point out Aleksey Kladov's blog post "Exceptions vs Structured Concurrency" which goes beyond simple "let's tie thread lifetimes to lexical scopes!" agenda and discusses some hard questions of the actual semantics implied by structured concurrency paradigm. He does so in the context of Rust's crossbeam library. We need more stuff like this.
All of that is great work and I hope we'll see more of it in the future!
For now, let me make few comments about stuff that may not be new, but which is sort of advanced and haven't been covered well enough yet.
Thread bundles: It's not just syntactic sugar
Libdill, Trio and Kotlin all introduce a concept of thread bundles. In libdill it's called "bundle", in Trio it's "nursery", in Kotlin it's "scope" (it would be nice to have some unified terminology here). At first sight it looks just like syntactic sugar that allows you to cancel all the threads within the bundle in one go.
However, it's more than that. It allows you to cancel a bunch of threads in parallel.
To understand why it matters, consider a network server with 1000 open connections. Each connection is handled by its dedicated thread. Also, let's assume we want to give each connection one second to do clean shutdown, to perform termination handshake with the peer etc.
If we canceled the connections in a simple for loop, the shutdown of the server would take, in the worst case, 1000 seconds, i.e. more than 16 minutes. In fact, if you are facing a DoS attack that opens connections and then leaves them hanging, it would take exactly 16 minutes 40 seconds. If, on the other hand, there was a construct to cancel the connections in parallel, the server would shut down in just one second. That's a difference that would make ops people cry.
Ordered cancellation
There's a temptation to equate thread bundles with scopes: All the threads launched within the scope would be canceled in parallel when the scope is exited.
There's a problem with that. Imagine that, within the scope, we create a logging thread first, then wait for incoming connections and launch one thread per connection. Given that all the threads were launched within the same scope, they will be canceled in parallel. It may happen that the logging thread would exit first. In that case all the logs produced by the connections while they are shutting down would be lost.
What we really want is shutting down the connections first, then shutting down the logging thread.
You can do that by nesting the scopes like this:
async with trio.open_nursery() as n1:
n1.start_soon(logger)
async with trio.open_nursery() as n2:
while True:
...
n2.start(connection)
Doable, but consider that instead of two threads you had five threads. And you wanted to cancel them in a particular order. The five nested blocks would look somewhat ugly.
One way to deal with that would be to adopt semantics a bit like that of C++ destructors: The destructors are called in the reverse order of how the constructors were called.
{
nursery n1;
n1.start_soon(logger);
nursery n2;
while(1) {
...
n2.start_soon(connection);
}
} // n2 is shut down here (all connections in parallel), when done, n1 is canceled
We should also think about whether it would be worth it — at least in statically typed languages — to impose a discipline on the programmer and allow them to run only one type of thread within one bundle/nursery. It would mean that different types of threads could not be shut down in parallel. But that's probably what we want anyway. (Counterexamples are welcome!)
{
nursery<logger> n1;
n1.start_soon();
nursery<connection> n2;
while(1) {
...
n2.start_soon();
}
}
Deadlines, cancellations, grace periods
Imagine, again, the case of a network server. We may want to limit the lifetime of each connection to one hour. If it's lasts longer than that it either means that we've lost connectivity or that the peer is trying to perform a DoS attack. So let's just shut it down.
So far so good.
However, imagine we want to shut down the server. Canceling everything immediately would be easy. Just exit the main scope.
But we still want to give connections a one minute grace period to shut down cleanly. What we can do, in the main thread, is to sleep for one minute, then exit. But it feels ugly. Why wait for one minute even if all the connections terminated within few seconds?
We have a conflict of deadlines here. The connection threads are supposed to deadline in both one hour and one minute. How are we supposed to deal with that in a clean way?
Libdill provides bundle_wait() function which has a deadline parameter. It waits for all the threads in the bundle (nursery) to finish. When they do, it exits. If the deadline expires and some threads are still running it cancels them and exits.
This approach works for C where cancellation has to be done by hand anyway, but it's kind of clumsy in more sophisticated languages where it is supposed to happen automatically.
To be frank, I am not sure whether this scenario can be implemented in Trio. This section of the docs discusses the cancellation semantics but I don't see any obvious match there. (Correct me if I'm wrong!) I am not sure about Kotlin's implementation either.
Multi-core support
Finally, it should be said that all the implementations that I am aware of use coroutines and ignore OS threads. What that means is they can't use structured concurrency on more than one CPU core. At the same time there seems to be no inherent reason why OS threads or processes couldn't be made part of the structured concurrency world.
There may be technical reasons though.
I once implemented launching of OS process that was semantically equivalent to launching a coroutine in libdill, with cancellation and everything. You just had to use go_process() instead of go(). Surprisingly, it worked.
But then I deleted it. And I haven't even tried with OS threads.
The reason was that implementation of threads in the C standard library (pthreads) is complex and fragile. It contains a lot of corner cases and poorly specified behavior. Once signals are added to the mix all bets are off. Even if you make it work on one operating system, there's no guarantee that you won't experience weird behavior on a different operating system.
And I really don't want to deal with that complexity. Libdill is a hobby project and I have only limited time to spend on it.
But, eventually, if structured concurrency is to take off, it will have to deal with OS threads and processes. It would need people with enough free time and enthusiasm to deal with the complexity and maybe also some political will to change standard libraries in such a way that they play well with structured concurrency.
October 19th, 2018
Regarding grace periods: yeah, this is something we're still trying to figure out how to handle nicely in Trio. (Discussion thread.)
Your challenge with combining the one hour deadline and the one minute deadline is actually easy in Trio: cancel scopes can be nested, and you can change cancel scope deadlines on the fly. So each connection handler will have its own cancel scope with a one hour timeout, and for the grace period, you wrap your whole program in a cancel scope, and when you want to shut down, set that shared scope's deadline to 1 minute. Trio will take care of composing those deadlines together.
But in real life, that's not enough, because during the grace period you usually want to expedite things, e.g. by letting in-progress requests finish, but not accepting any new incoming requests. So you need some way to communicate to everyone to stop accepting new requests, including to tasks that are e.g. blocked in accept().
Our tentative idea is to add a "soft cancelled" state to our cancel scopes. Currently, when a cancel scope enters the cancelled state, that's delivered to any blocking operations by default, but they can explicitly opt-out (using "shielding"). A soft cancel state would *not* be delivered by default, except for operations that explicitly opt-in. So then for a conventional graceful shutdown you'd do a soft cancel + set a hard deadline, and make sure that accept loops and similar all opt-in to soft cancellation.
Regarding multi-core support: Yeah, there's definitely no reason why you couldn't combine structured concurrency with a multi-core scheduler. The one big problem is that you really want solid cancellation support. (Of course you want this anyway, but if you're trying to be structured then you're sort of forced to deal with it.) And on popular OSes the standard blocking APIs have terrible support for cancellation. So, even if you're using a multi-core scheduler, you can't use traditional blocking libraries; you still have to build a whole new set of I/O primitives with cancellation support, and then rewrite all your libraries to use them. This is a lot easier to justify if you're writing an async library, since for unrelated reasons, async libraries *also* need to build a new set of I/O primitives and rewrite everything to use them, so you can sneak the cancellation support in at the same time.
I don't think there's any way to retrofit cancellation support into classic blocking APIs, and even if you could you wouldn't want to – you can't expect existing libraries to handle it correctly. OTOH there's no reason you couldn't implement a new set of APIs that act like the classic blocking synchronous ones, but that support cancellation. (I discuss this a little in my cancellation post.)
Golang even demonstrates that it's possible to provide a conventional blocking synchronous I/O API where all the blocking operations are integrated with a user-space scheduler, using existing OS APIs under the hood. But unfortunately, they didn't bake in cancellation when they were doing that. It's too bad – a real missed opportunity :-(.
Isn't that just ordered cancellation (see above) in disguise? If you had one nursery with the thread that accepts connections and another nursery with connections themselves, then you can cancel the former first, the latter second.
In theory, the support is there. You can use pthread_kill() to send signal to a thread and if the thread is stuck inside a blocking function, the function should return with EINTR error. Except that I have no confidence in this working properly. But, on the other hand, if standard lib folks actually tried to make it work, then I can see a way forward. (Btw, you can do the same with processes, just use kill() instead of pthread_kill()).
It's not just ordered cancellation. Consider a HTML1.1 connection. You want to teach it to throw away incomplete requests but to return in-progress responses. So your HTML 1.1 handler would enable soft-cancel while reading from the socket, disable them while generating and sending the reply, repeat.
Granted that in this case you could simply call shutdown(RDR) on the socket, but that won't work on HTML2.
Why would you want to do that? Say you are shutting down a HTTP server. You give it 1 second grace period. Does it make any difference whether, within that interval, it just processes fully received requests or whether it finishes reading half-read requests and processes them? It's still 1 second. Who cares?
Think about a client attempting to perform a non-idempotent HTTP request. From the client's perspective, there are three possible outcomes: clean success, clean failure, or an ambiguous state where it doesn't know whether it failed or not. Clean failure is fine – you just retry. But if you're in the ambiguous state, you're screwed, because you don't know whether you can safely retry or not.
The point of stopping accepting incoming requests is to minimize the number of requests that end up in the ambiguous third state, and convert them into clean failures instead. In fact, if you know that all requests finish in less than N seconds, and you set an N second timeout, then that gives you a guarantee that *no* requests will end up in the ambiguous state unless there's some other independent failure (network partition, power loss, etc.). If you keep accepting new requests during the grace period, then there's no guarantee at all.
It's also nice operationally because it lets you safely use a large grace period – like a minute or whatever – just in case there's some really long-running request, while knowing that in 99% of cases it will terminate in a second or two. If you keep accepting new requests during the grace period, then a busy server will always use the whole grace period, and "graceful" shutdown becomes essentially equivalent to a hard shutdown.
I get that, but how is that not achieved by my proposal above: "If you had one nursery with the thread that accepts connections and another nursery with connections themselves, then you can cancel the former first, the latter second." ?
Hmm, I had a longer reply but it seems to have been eaten somewhere.
If you've already started reading a request, then sure, it makes sense to go ahead and finish reading it. But with HTTP/1.1 keepalive and HTTP/2, it's very common to have a connection that's sitting idle waiting for the next request to arrive. You need to inform your connection handler tasks about the graceful shutdown, so that they go ahead and close these connections instead of continuing to wait.
This is all totally doable with Trio currently. Our helper functions for servers even take a handler_nursery argument exactly because we want to make it possible to split up the accepter tasks from the connection handler tasks so they can be cancelled separately. But plumbing everything together requires some complex global coordination, so maybe it would be better to provide some first-class mechanism for doing that instead of expecting every user to wire everything together themselves. Not sure. This is why I use words like "tentative idea" here :-).
It's……. more complicated than that :-). Not all popular OSes have pthread_kill (in particular, Windows has no equivalent to EINTR). On the ones that do, it works as documented AFAICT, but does that help? You can use pthread_kill to obliterate a thread (by defining a signal handler that calls pthread_exit), but you never ever want to do that, because it becomes impossible to maintain any kind of program invariants. (E.g., what if the obliterated thread was in the middle of updating some data structure, or holding a lock?) And if you try to define some more sensible cancellation semantics – like POSIX actually did, with pthread_cancel – and then implement those using pthread_kill, then you get nasty race conditions. And then even if you fix *those*, you still have the problem that you still have to rewrite all your libraries to handle cleanup after cancellation correctly, and if you're going to do that then it's just as easy to switch them to use the newer OS APIs that have more sensible support for cancellation.
pthread_kill is potentially useful to construct a non-blocking read/write primitive on Unix OSes for cases where you can't put the fd into non-blocking mode (in particular stdin/stdout/stderr, details). But even then you have to claim the use of a signal, which is problematic for a library that might want to coexist with arbitrary other libraries in the same process, since signals are a process-global resource.
pthread_kill is super powerful but it's incredibly difficult to find a case where it's actually useful.
Kotlin coroutines have multi-core support. It is obviously complex under the hood (just like the OS kernels, as you correctly note), but the users are not exposed to that complexity. It all just works for them.
Cancellation is definitely easier for coroutines than for threads, since it is fully cooperative. It only works with async APIs, of course, so we are gradually expanding the set of non-blocking APIs available to the users by providing them with the corresponding non-blocking libraries.
When coroutine is cancelled in Kotlin it immediately and synchronously notifies all its children about cancellation, and then waits for all of them to complete, which is a good default when you what to terminate something as quickly as possible.
To perform a graceful shutdown we use the following pattern. Let us take a web server, for example. We keep its acceptor coroutine in a separate scope from all the connection coroutines. To shut it down gracefully we cancel its acceptor coroutine in a normal (non-graceful) way and then wait for the scope with connections to for a given time, giving it chance to finish serving user requests. If time's up, then the connection's scope gets cancelled, too. It gets more complex with HTTP pipelining and HTTP2.0, so there is some non-trivial code in our Ktor framework to handle that. However, it does not seem to be needed for general users of coroutines, so we are currently reluctant to add some direct support in our API for that kind of multi-stage shutdown.
I would do it the same way, but see the discussion with Nathaniel and Matthias above. It feels like I am missing something there.
Btw, it would be interesting to implement the same thing (simple web server, happy eyeballs or such) in the three languages and check how they differ, both in syntax and semantics. The goal would be to find out what the universally acceptable constructs are likely to be, similar to "if", "for" and "while" in structured programming.
Yes, we've hesitated to add anything like this to Trio too. But what I'm thinking about are complex programs composed out of multiple pieces: e.g. you might have a single program that hosts several servers speaking different protocols. (It's even reasonably common to have two HTTP servers within the same process: one used by clients, and another on an internal interface for exporting metrics.) Hopefully each of your protocol server libraries implement graceful shutdown, but maybe they all expose the functionality using different interfaces, and force you to wire it all up manually. The motivation for something like "soft cancel" is that it would provide a single standard interface for requesting a graceful shutdown that everyone would use, and solve the wiring problem.
And then as a bonus, it seems like it might simplify implementation inside these libraries too? For example if we have a standard way to request graceful shutdown, then we can implement it directly inside our standard server helpers, and higher-level code would get that part for free.
I think Kotlin's cancellation model is pretty different from Trio's, though, so not sure how the idea would even translate. (BTW, out of curiosity, have you considered implementing cancel scopes?)
Kotlin's approach to cancellation _is_ centered around cancel scopes. In fact, for Kotlin it all started two years ago as a prototype to implement reliable and user-friendly cancellation mechanism that just works. We've studied how cancellation is done in C# and Go and liked none of it. We wanted something that works without users having to write any boilerplate.
In our initial prototype the whole concept was called a Lifetime, then it was renamed to a Job. It was a structured approach to cancellation since early prototypes, in the sense that Job has children hierarchies that are automatically cancelled on parent cancellation or failure.
The only piece we were missing was a syntactic sugar to make launching children a default action. That was the last major change we did before 1.0.0 release. We've used the term "structured concurrency" to label this syntactic change of default, even though nothing had really changed inside. So, now in Kotlin you launch a coroutine in some "CoroutineScope" which is just a convention designed to ensure that your coroutines have a parent by default. In fact, CoroutineScope is just a syntactic construct that keeps are reference to a context with a Job inside.
The whole whole parent-child machinery in Kotlin that works behind the scenes links Kotlin Jobs into hierarchies. Kotlin Jobs are the actual "cancel scopes". It is because of Jobs we can say `withTimeout(…) { … }` and all the code inside the curly braces will get orderly cancelled on timeout, etc.
Huh, that's really interesting. I think of cancel scopes as being in opposition to job-scoped cancellation. In Trio, you can enter/leave cancel scopes within a single job, and soon it will even be possible to create a cancel scope in one place and then enter it at a different place.
But of course, the job-scoped cancellation that I'm thinking of, and pushing back against, is the kind you see in pthreads or asyncio/twisted or Java's Thread.interrupt. I guess if you have task hierarchies *and* make start-a-new-task-and-wait-for-it as cheap and ceremony-free as possible, then job-scoped cancellation ends up feeling similar to Trio cancel scopes?
In Trio we don't even have objects to represent tasks/jobs (at least in the ordinary user-level API), just like we don't have objects to represent function call frames. It's literally just nursery objects and cancel scope objects.
I have though of scopes, but in the end I am working with C where the idiomatic way of doing things is just doing them by hand, no magic involved, so cancel scopes feel non-idiomatic.
That being said, I have a vague feeling (don't take me too seriously though) that we can do better than cancel scopes. I mean, Golang's contexts were better than what we had before (no cancellation at all, mainly) and Trio's way of binding cancellation contexts to scopes is yet much better. But I would love to not have to deal with explicit cancel scopes at all. I am not sure it's possible, but here are some hints:
In libdill the thread bundles (nurseries) work a bit different than in Trio. When the bundle is being closed by its owner the runtime doesn't wait for all threads in the bundle to finish, rather it cancels them immediately.
So (if there were destructors in C) once the bundle runs out of scope or when the stack is being unwound all the threads are automatically canceled, no need for extra syntax.
One ugly thing is that when you explicitly want to wait for the threads in the bundle to finish you have to do that explicitly (bundle_wait function). In that case you can specify deadline for the graceful shutdown.
Another metaphor (not sure how useful): Cancellation in structured concurrency is like return statement in structured programming.
Too much to write to comment here. We should get together during some conference to discuss it.
+1
Not sure how that could be possible… in Trio, there are exactly two places where you deal with explicit cancel scopes:
Both of these kind of have to be explicit! The library can't magically guess what you want to cancel or when.
It's true that you could combine cancel scopes and nursery blocks into a single primitive that does both roles, but I think this would be a bit awkward and non-orthogonal.
I wonder if there are any conferences we all go to… maybe we need to organize a structured concurrency workshop at one!
Dunno. I have a kid so I am not super flexible. I was considering attending FOSSDEM in Feb though.
Oh, and a minor point regarding your "go_process" suggestion: We're using a slightly different API pattern in Trio. Instead of a "start in new [task/thread/process]" primitive, we have a "start in new task" primitive, that we compose with a "switch from task to thread" primitive, and hopefully a "switch from task to process" primitive in the future. This is nice because like you say, threads and processes have their own complexities, and this way we don't have to bake a particular set of choices into the nursery primitive. People can even build their own run-in-process APIs as third-party libraries.
Of course, this is in the context of a Python library, where the GIL makes it a no-brainer to do everything async within a single thread by default. In another language it would make sense to schedule tasks across multiple cores by default.
The real hard problem, rather than technical, I guess, would be to teach multi-threaded programmers to use yield() when doing heavy number-crunching so that their threads can be cleanly canceled.
And the *really hard* problem is making that not have a performance penalty, which people doing heavy number-crunching are often unwilling to accept…
(The Go folks have some intriguing work on (IIRC) using the same kind of mechanism for "zero-cost preemption" as for "zero-cost exception handling"; meanwhile other smart people allege that this is better at hiding costs than eliminating them; and I'm not qualified to judge.)
We'we seen a similar problem when switching to structured programming. People complained that they can't do the hyperoptimization using gotos, they've complained about the cost of maintaining the call stack and so on. But then the whole thing kind of petered out.
I think we are facing a similar (non-)problem here. For example, the cost of yield() in libdill is, approximately, 20 nanoseconds. Call it once a second and you'll get the perfomance hit of 0.000002%.
"Zero-cost preemption" seems like a misleading term, because it's presumably ignoring the cost of re-warming caches after a context switch, which is a significant and unavoidable part of the cost. But if all you want to do is check for cancellation, that's *very* cheap. Like, a single extremely-predictable (= ~free) branch every few milliseconds or something.
Regarding nested nurseries: Python 3.7 added an async version of ExitStack, which is a general solution for combining context managers without nesting, and provides the necessary call order.
Post preview:
Close preview