concurrency
PDF slideshow presentation by John Osterhout of Sun Microsystems: Why Threads Are Bad. It’s thirteen years old, but its information is current as ever.
People gave us a lot of hassle for not providing a simple, built-in preemptive threading class in REALbasic, but I remain convinced that we made the right decision. RB’s event-loop-driven architecture meant that very few operations actually needed to use threads – they are necessary only when some event needs to kick off a long-running process, while leaving the UI available for further interaction. The simple, non-preemptive thread class we offered was carefully constrained in ways which eliminated many of the tough problems usually encountered in threaded code. Threads would only yield to each other on loop boundaries: during the body of the loop, or in code which did not contain loops, you could ignore concurrency issues altogether.
It is hard to reason about your program’s state when you cannot predict when or in what order state changes will occur; since threads share memory, and preemptive threads can interrupt each other at any time, you have to be ready for anything to change at any time. This is complicated even with a single controller and a single worker; add in more threads and it rapidly becomes what a computer programmer might call a “hard problem”.
If we model threading as a network of independent, non-memory-sharing systems, instead of a group of mutually interrupting, state-munging threads, we can handle concurrency much more easily. The best model for multiprocessing is almost always a group of processes, not a group of threads. Each process has its own memory and manages its own state; the processes communicate with each other, and usually with some controller, over serialized, asynchronous communication channels. They may share read-only access to a common data set, but all modifications must be sent back asynchronously.
This allows each process to manage its interactions on its own terms. It updates its state when it is ready to update its state, and does not have to worry that some other process will step in and make its own changes.
There is more overhead this way, so in theory it’s not as fast as an equivalent multithreaded solution. If a host app has to break a problem down, parcel out bits of the task, and collate the results, that’s work which a threaded solution would not have to do – not to mention all the overhead involved in serializing objects to send over the pipe. But the programmer who starts with threads is in for a lot of slow, painful debugging, where the programmer who starts with processes will have time to optimize if necessary and get on with the next thing.
I had a plan for multiprocessor support in REALbasic. If I had stayed on with REAL Software, my solution would not have been to add a “PreemptiveThread” class in parallel to the existing “Thread” class, but to implement something completely different. A new project-item type, which we might call “Subprocess”, would have allowed you to effectively embed a little console app into your larger project. Creating and invoking a new instance of this class would spawn off a subprocess, which the OS kernel would then schedule on any available processor.
The limitation, of course, is that this would genuinely be a separate process. It could use the same classes as the host app, but none of the same object instances, and none of the same static/shared variables. Instead, you’d send data back and forth, as though on a socket: each subprocess would be automatically wired up with a pipe-style communications channel back to the host app. Some basic serialization primitives would let you send the contents of arrays, dictionaries, and structures back and forth, triggering an event in the receiver. This, I think, would have directed people straight toward the best solution for multiprocessing, instead of giving them primitive, low-level tools they’d have to use to build their way up from scratch.
While that would have been the right solution for REALbasic, I’m taking a very different approach with Radian, where all objects are immutable: threads can freely share object references, but once you have a reference to an object, nothing that any other thread ever does can change it. This allows threads to share data, eliminates the need for serialization and asynchronous pipes, and eliminates the overhead involved in starting up a process, while retaining most of the benefits of the process-based architecture. When it is time to merge the results back together, Radian allows sharing via transactions: an architectural style that should be familiar to anyone who has ever worked with a multiuser database. The transaction, from the point of view of the rest of the program, will happen atomically. Data will always stay in a consistent state, it is easy to see where external state may change, and it is thus possible to write code that composes nicely around those points.
A reiteration of Alan Cox’s notion that “A computer is a state machine. Threads are for people who can’t program state machines.”
Back in the day I remember doing some benchmarking and discovering that on most Un*x type operating systems, for any reasonable amount of data, interprocess pipes were roughly the same speed as memcpy()s, and since then I’ve seen situations where threads are slower than processes (The dusty recesses are telling me one of these places was on the Mac, where the Mach kernel is supposed to make threads super high performance). So, yeah, it’s processes for parallelization for me, shared memory when that’s appropriate, but pipes whenever I can get away with ’em.
Comment by Dan Lyke — December 16, 2008 @ 8:36 pm
Hmmm; I’m using something like your subprocess approach to redesign some existing apps. It’s perhaps a pity, though, that part of the redesign is to drop REALbasic for Python.
Comment by charles — December 18, 2008 @ 7:08 am
salut les américains,
vous devriez vous mettre un peu au français. C’est une belle langue. on est un peu décu du bordel chez vous à cause de cet abruti de Bush. Heureusement, Obama a cassé la baraque.
VIVE LA FRANCE, VIVE LA REPUBLIQUE
GOD BLESS AMERICA
Traduction:
In my country (FRANCE), we want really built-in preemptive threading class in REALbasic for Window seven and Snow Leopard
That’s All folks
Thanks
Comment by santoche — December 18, 2008 @ 2:50 pm
Having done a crap load of coding in Java using threads and dealing with all the synchronized accesses to make it all “safe” all I can say is separate processes are way easier to make work the way people expect them to
Comment by Norman — December 18, 2008 @ 11:03 pm
Can you give any more details on Radian? Just curious.
Comment by Bob K. — December 22, 2008 @ 7:19 pm
I’ve written an article on this, with practical sample code:
http://www.tempel.org/RBMultiProcessing
Comment by Thomas Tempelmann — January 4, 2009 @ 12:53 pm