27 July 2009 - 21:37“C++ is a superpower”
A few months ago, Bjarne Stroustrup came to Georgia Tech to give a talk sponsored by the Georgia Tech chapter of the ACM. Bjarne’s talk was about two hours, and it was mostly a broad overview of the history of C++: its significance, its evolution, Bjarne’s broad language design philosophy and the future directions he envisions for C++. Before the presentation started, I talked to Bjarne one-on-one for a bit about the most interesting aspect of the upcoming C++ 0x standard to me, concurrency-related features and the language memory model. I also asked him about his thoughts on garbage collection and shared-state concurrency versus message-passing. On the latter point, like me, he believes that the future of concurrent programming (at least in the near to foreseeable future) involves the use of a large arsenal of techniques for different situations rather than one dominant paradigm. It’s a pragmatic view and ostensibly supported by the fact that this is not the first “boom cycle” for parallel programming. Arguably shared-state concurrency doesn’t scale very easily to large systems, but shared-state concurrency within finer-grained components is manageable today; using actor-style message-passing between coarser grained components with shared-state concurrency in limited places (perhaps for high-performance primitives or highly coupled functionality) seems like an analogue to the way we use inline assembly or intrinsics for limited sections of performance-critical library routines (like memcpy or BLAS primitives) and everywhere else use more manageable high-level languages.
Garbage Collection
One topic where I didn’t fully agree with Bjarne was the issue of garbage collection. Personally, I think the addition of hooks to support garbage collection to C++0x is an essential decision for enabling effective concurrency programming. Storage management in single-threaded code is drudge work, but when concurrency is involved, it suddenly becomes a much thornier problem. People find themselves building their own slapdash garbage collection-like facilities to manage the lifetime of data objects that may be used in many different threads. These user-built facilities are tricky and often induce extra contention. Ultimately many concurrent algorithms, particularly lock-free ones, rely on the existence of a garbage collector (to prevent certain instances of ABA problems, for example*). Sure there are classes of techniques for lock-free reference counting (e.g. Detlefs et al.), but it seems like the modern mainstream languages with GC like Java and C# are gaining powerful high-level concurrent programming utilities a lot quicker than C and C++, and I suspect the higher difficulty of storage management is one influencing factor (obviously, the lack of a memory model and portability concerns are other major issues). Anyway, I mentioned to Bjarne that I thought the addition of garbage collection support was good because a lot of concurrent algorithms rely on it, and he said something to the effect of it being a crutch or somewhat lazy to rely on garbage collection just to avoid complicating concurrent algorithms with intricate reference counting. I don’t want to misrepresent his opinion, but that’s the general gist I got from his comment.
As far as I’m concerned, garbage collection is one of those largely inevitable evolutions of higher-level programming that is begrudged by some but eventually taken as a given. Some complained about overhead induced by operating systems when computers tended to run single applications; the same thing happened with the first compiled languages (versus hand-written assembly). Later it was VMs for managed languages or GC. Eventually, however, I think these will be taken for granted in mainstream programming. Java has already gone a long way to making garbage collection more of a necessary and mainstream feature. Now, certainly there are some gripes about the Java throwing out the baby with the bath water in not providing something as useful as explicitly scoped destruction to guarantee cleanup of certain related non-memory resources (like open files or sockets), and providing too weak semantics for finalize, but this isn’t a fundamental flaw of GC. Anyway, that’s tangential, but my view is that GC is nearly essential in new concurrent era, and I’m glad that C++0x is taking a first step in the right direction on that front.
Memory Model and Atomics
The rigorous memory model is, of course, also essential for great concurrent programming, and well known experts like Hans Boehm (and many others) have put a lot of work into that. I mentioned to Bjarne that I was also pleased that they are offering atomics and even support for relaxed consistency reads and writes useful in certain low-level concurrent algorithms. I told him that I thought that the inclusion of selective relaxed consistency, while definitely addressing only a limited audience of very skilled programmers, is really vindicated in light of Java’s experience — after initially resisting exposing such features, the developers of Java have come to the same conclusion that you need to expose these operations to allow the efficient implementation of certain kinds of concurrent data structures and algorithms. See Doug Lea’s work on the Java 7 Fences API and his announcement and rationale to the concurrency-interest list. He states:
The basic JSR133 JMM scheme provides three kinds of underlying ordering constaints, that are tied to variable declarations — read-volatile, write-volatile, and write-final (aka write-release, aka publish, aka lazy-set), or for stand-alone AtomicX objects.
This turns out not to mesh very well with the development of core concurrent algorithms (like the ones I tend to implement), where you often need these special flavors of reads and writes on an occasional basis, which is currently either impossible or insensible to achieve.
…
In the recent C++0x standard, this kind of usage was addressed by allowing per-usage modes on new read and write methods. (And further, supporting more than the three modes considered here, but I don’t think we need to introduce more for Java.)
…
So, after years of resisting the idea, my current conclusion is that we need to stop wishing for a miraculous solution to lack of call-by-ref, and instead allow developers to roll their own out of the raw ingredients — fences.
In his actual talk, Bjarne only briefly mentioned C++0x concurrency features and the memory model (because there was so much ground to cover), but after noting that “C++ 0x will have atomics,” he quipped that the new feature makes C++ a “superpower.” Besides the obvious pun, some people (myself included) would liken C++ to a nuclear weapon for a different reason — namely that it tends to provide many experts-only features with extremely intricate rules and many hidden pitfalls. This was illustrated perfectly to me by Bartosz Milewski’s attempt to implement a well-known synchronization algorithm (Peterson’s algorithm for shared-memory, two party mutual exclusion) using relaxed consistency atomics. Just by trying to implement Peterson’s algorithm, he ended up finding that the draft standard semantics were subtly broken, and it took Hans Boehm’s intervention and the modification of the standard just to get it to work. In his post titled The Inscrutable C++ Memory Model, he recounts just how complicated it is:
I had no idea what I was getting myself into when attempting to reason about C++ weak atomics. The theory behind them is so complex that it’s borderline unusable. It took three people (Anthony, Hans, and me) and a modification to the Standard to complete the proof of a relatively simple algorithm.
All the people involved are in the target population of expert users for relaxed consistency atomics, so the byzantine complexity illustrated here (i.e. “so complex that it’s borderline unusable”) doesn’t bode well for 99.9% of the C++ using population. Sometimes I’m afraid that some bold and foolhardy programmers won’t be able to resist the temptation of “optimizing” by using features like this — the programming equivalent of playing Russian Roulette with all chambers loaded. Of course, I still support their inclusion, since the Java experience shows that they are needed for a small subset of people like Doug Lea to write kickass foundational concurrent library functionality.
Anyway, to bring this post full circle, earlier I mentioned that shared-state concurrency can be tricky to apply to large-scale systems. One of the reasons is thematically similar to the “experts only” problem we just discussed — current abstractions for shared-state concurrency are very complex and tricky to get right. The STM people point out that locks don’t compose — in other words, to build on existing pieces and maintain invariants, you often have to “invite” other parties to participate in your own internal locking protocols, basically foiling true abstraction or encapsulation of concerns. Just how tricky are threads and locks? Well, Edward A. Lee’s “The Trouble With Threads” recounts the following anecdote:
A part of the Ptolemy Project experiment was to see whether effective software engineering practices could be developed for an academic research setting. We developed a process that included a code maturity rating system (with four levels, red, yellow, green, and blue), design reviews, code reviews, nightly builds, regression tests, and automated code coverage metrics [43]. The portion of the kernel that ensured a consistent view of the program structure was written in early 2000, design reviewed to yellow, and code reviewed to green. The reviewers included concurrency experts, not just inexperienced graduate students (Christopher Hylands (now Brooks), Bart Kienhuis, John Reekie, and myself were all reviewers). We wrote regression tests that achieved 100 percent code coverage. The nightly build and regression tests ran on a two processor SMP machine, which exhibited different thread behavior than the development machines, which all had a single processor. The Ptolemy II system itself began to be widely used, and every use of the system exercised this code. No problems were observed until the code deadlocked on April 26, 2004, four years later.
In closing, I just want to say I enjoyed Bjarne’s talk a lot, and I find him to be extremely pragmatic in technical matters. He’s very frank about C++’s purpose, the fact that it’s evolving and that it has certain warts (and also many things that are better than other languages). In other words, he’s not the Steve Jobs of C++, pretending it’s eternally perfect and consistent. There wasn’t much time for many questions, but the most amusing question to me was what can only be described as an “Oprah” question — “How do you feel about having your software running on so many systems?” Hah.
* See “ABA Prevention Using Single-Word Instructions” by Maged Michael and “DCAS is not a Silver Bullet for Nonblocking Algorithm Design” by Doherty et al. for some examples of ABA problems solved by GC and instances where GC alone is insufficient. However, regardless of whether GC solves the ABA problem entirely, GC tends to make concurrent algorithms significantly simpler in many other ways.
3 Comments | Tags: Research Content
05 Aug 2009 - 19:18
Earlier I was thinking of a small aside to add to this post, so I figured I’d make a comment on my own blog. I said that some people decry Java for “not providing something as useful as explicitly scoped destruction to guarantee cleanup of certain related non-memory resources (like open files or sockets).” In C++ people use the ubiquitous RAII idiom to handle this issue with stack-allocated objects and destructors for clean-up. In Java, you can sort of limp along with verbose nested try/finally, but it’s certainly not as convenient and leads to sprawling spaghetti. C# has provided a “using” statement for scoped resource usage and Python provides “with” and they’re a pleasure to use. Java needs to get with the program — the closures proposals address this problem, but unfortunately it doesn’t seem we’re on track for getting closures any time soon (despite a lot of great work by many talented people).
09 Oct 2009 - 13:01
Re: and he said something to the effect of it being a crutch or somewhat lazy to rely on garbage collection just to avoid complicating concurrent algorithms with intricate reference counting
The problem with safe memory reclamation for lock-free algorithms is NOT that they are intricate (they are not intended to be implemented by every application developer anyway). The problem is that it’s impossible to implement them in user space without pervasive application cooperation in efficient and scalable manner and without pitfalls.
09 Oct 2009 - 13:50
Thanks for your comment! I agree; I should have said “concurrent applications” rather than “algorithms” because, as you note, the serious problem is that the concerns you really want to localize require “pervasive application cooperation,” which spreads complexity everywhere. Spreading complexity is also a key pitfall that STM advocates highlight when they talk about locks and “composability.”