8 November 2008 - 14:57Burton Smith: Reinventing Computing
Yes, I know I haven’t posted in a while, but I’ve been busy this semester: I finished my PhD thesis proposal a few weeks ago and I’m trying to prepare for a defense within a year or so.
Reinventing Computing
Anyway, recently I attended a talk by Microsoft Technical Fellow and computing architecture and HPC guru, Burton Smith. The talk was very interesting though not specifically because of the subject matter (which is familiar), but because of the broad perspective. The premise is the same thing I’ve been hearing in every other talk for the past few years (which I’ve ranted about before) — namely, parallelism is going mainstream and we have to deal with it. Most people then follow with their specific sales pitch, but Burton followed with a broad overview of many different views of the issue and a wide range of different techniques and technologies.
One thing I really liked about his overview is that he shares my philosophy of “pragmatism over dogmatism.” He discussed potential alternatives and stated that there is a need to utilize a variety of different solutions, even those that are often seen as mutually opposing philosophies: for example, he said he believed that we need both message passing AND shared state concurrency, mutable state with transactions AND immutable functional objects, declarative programming AND imperative programming, etc. This is in contract to presentations where you hear that approach X is the “right” way forward. While it’s conceptually appealing to say that one approach is uniformly better, real world constraints often limit practical applicability — we probably need a tool chest, not just one really fancy hammer.
One thing he mentioned was the concept of viewing resource allocation in multi-core systems as a “2D bin packing” problem. He showed a view of traditional CPU scheduling as a one-dimensional problem of time-multiplexing single runnable kernel threads over each single processor (with a small number of processors total). He then showed the alternate view as a two dimensional problem of assigning chunks of processors over time to applications (i.e. time is the X-axis and processors form the Y-axis). This is reminiscent of gang scheduling or co-scheduling, except the internal scheduling of work within an application with a chunk of processors would be handled at the user level and kernel level time-slicing may not occur in a standard manner anymore. This reminded me of several pieces of current (and classical) related work.
One idea is the following: “why time slice at all on massively multi-core systems?” If you have 256 processors, just dynamically assign chunks of them like spatial multiplexing of memory.
Corey
Corey, a research OS for many-core systems, will be presented at OSDI this year (OSDI 08 program) and follows this principle — “Corey allocates physical cores to applications rather than presenting a time-shared virtual processor abstraction.” Another related concept is that Corey also allows the allocation of dedicated kernel cores for running the kernel, so kernel calls are handled “via fast shared-memory IPC rather than slow traps.” I remember Rik Farrow suggested this same idea in a 2006 Google Tech Talk titled “Security is Broken”. Corey is organized like an earlier OS called an Exokernel (and in fact M. Frans Kaashoek and his PDOS group are involved in both). Like the Exokernel, Corey delegates scheduling policy of an allocated set of cores to the “library operating systems.” This basically amounts to the same thing as user-level scheduling in a normal OS, since the library OS generally runs within the address space of an application (the authors note that the library OS doesn’t need to further isolate itself from the application because the exokernel doesn’t trust the library OS).
User-level Scheduling
One thing that the idea of user-level scheduling reminds me of is the infamous two-level scheduling of Scheduler Activations or Solaris M:N threading. The goals were similar: in theory, at the user (application) level, you can make better scheduling decisions via custom scheduling policies or just better information, and scheduling is also cheaper. In practice, the extra complexity didn’t pay off and the two levels of scheduling (at kernel and user) level often interacted in negative ways. Subtle interference between decisions made at different levels could cause significant and often unexpected performance issues, and to really take advantage of it, you needed to make sure that both levels of scheduling were not working at cross purposes — to do that really requires propagating the application level scheduling decision information to the kernel level scheduler too, which is messy and complicated.
So although, at first glance, the phrase “user-level scheduling” appearing on the slides brought the aforementioned black eye to my mind, I think the situation in the case of Corey and the kind of system Burton is proposing will be different because it’s really not the same kind of combination. In these scenarios, we have a lot more cores and the kernel level “scheduling” is at much longer time scales — instead of time slicing, it’s easier to think of it like allocating physical page frames of memory to processes’ address spaces. Of course, the allocated number of cores is less “transparent” than virtual memory, but model of holding on to a resource is more similar than time slicing over a small number of processors.
Concurrency Runtime
Another project that this brought to mind is Microsoft’s Concurrency Runtime (not to be confused with Microsoft’s similarly named Concurrency and Coordination Runtime). Not only is it related to the idea of user-mode domain-specific scheduling for multi-core applications, but it is also designed to allow different concurrency solutions to work together (thus supporting Burton’s view of utilizing many solutions with different strengths, potentially in the same program). The idea behind the Concurrency Runtime is providing a common resource management framework to allow various concurrency solutions to interoperate and “play nice” together. One problem with current solutions likes OpenMP or Intel TBB, etc. is that they all think they “own the machine.” If you want to use multiple solutions together, they interact poorly because they are all oblivious of each other. The concurrency runtime provides a user-mode common resource management framework underneath the various concurrency solutions which can arbitrate between these different requests. It also provides a bunch of richer primitives for building these solutions (i.e. higher level concepts of tasks, groups, events, thread pools, etc.), but I can’t find too much documentation on it so far (most of the references are in the form of interviews and presentations).
Snake Oil
At the beginning of this post, when I was talking about how Burton Smith’s talk and approach are different than what you usually hear, it reminded me of a recent post by Sun engineer Bryan Cantrill (an outspoken, frequently provocative* fellow — as an aside, his chapter in Beautiful Code was one of the most enjoyable to me, coming from a systems background). Anyway, as I mentioned, most talks follow the common “concurrency is here, we must deal with it” introduction with a sales pitch for a specific tool. One hot area currently is transactional memory, which can be implemented in hardware or software, and I’ve seen and read a lot of papers on this in the past few years. Anyway, Bryan recently posted a scathing post about transactional memory titled “Concurrency’s Shysters”. I think this kind of critique comes as an inevitable backlash against over-hyped/newly in vogue solutions and the dogmatic selling of some technique as a the fix for what ails you. Of course, for the sake of marketing, it’s very hard to present a nuanced view and still be convincing, so maybe it’s out of necessity, but it’s off-putting for people who feel like the presentation of a technology is unbalanced and unrealistic.
* Bryan Cantrill is infamous for the “Have you ever kissed a girl?” Usenet quip in historical Solaris v. Linux performance wars, and more recently for dumping on “Dreaming in Code” in a Google Tech Talk on DTrace. That’s random trivia, but I quite enjoy watching strongly opinionated and outspoken technical people duke it out.
BTW, if you like following all of the various concurrency solutions (which are popping up fast and furious), you might want to check out Alex Miller’s Concurrency feed.
No Comments | Tags: Research Content