4 February 2009 - 1:43Disk is the new disk
Recently, statements of the form “X is the new Y” where X and Y are members of {disk, tape, flash, RAM} have become a popular meme in computing. And when I say recently, I mean mid-2008 when I first started this post. I just haven’t been very motivated to write blog posts lately as I’ve been working on my dissertation; consequently, I don’t have the overwhelming desire to do more writing in my spare time. Anyway, here are some examples of those statements:
- “Disk Is The New Tape—Flash Is The New Disk”
- “Disk is the new tape”
- “RAM is the new disk…”
- “Solving Rubik’s Cube: Disk is the New RAM” (Described in a Google Tech Talk by Gene Cooperman entitled, “Disk-Based Parallel Computation, Rubik’s Cube, and Checkpointing”)
Despite the fact that this post has languished as a half-completed draft, the various trends that incited this meme remain: the local storage landscape/memory hierarchy of computers is changing. There are really several major trends. The most recent (and hyped) trend concerns changes in the external memory hierarchy due to the advent of cheap solid state memory, most notably flash (as well as some not-quite-viable but hot research areas like phase-change memory).
When I say “external,” I mean storage after main memory in the hierarchy, which traditionally consisted of disk followed by tape. External implies that the storage is not directly addressable by the processor in the same way as main memory. It also often implies that the storage isn’t random-access or word addressed — hard disks typically have to read and write in whole sectors, and random access is too slow to be used regularly. There are whole classes of “external memory algorithms” (“external sorts” are popular, for example) and “external memory data structures” designed specifically to perform well with the constraints of sequentially-accessed external storage. For instance, consider sorting a dataset so large that it won’t fit in main memory all at once. In memory, Quicksort is often used. But when the dataset is resident on disk, it would perform very poorly if applied directly since it is heavily dependent on random access. It is much better to use an algorithm that can do most of the work in sequential chunks, operating like a multi-pass Mergesort. You’d probably try to sort the data in smaller chunks that do fit in main memory (using any fast regular sorting algorithm), and then merge the sorted pieces sequentially.
Anyway, what are the trends that lead to these “X is the new Y” declarations? 1) the advent of flash; 2) growth in main memory sizes; and 3) the ever growing disparity between latency and bandwidth. So how do all of these fit together? Well, let’s break it down in a few ways — I’ll tackle the above bulleted items in order (mostly).
A straightforward way to interpret “Flash is the new disk, disk is the new tape” is to assume that flash will be interposed in the storage architecture before hard disks, thus “shifting down” disk to tape’s old position and tape to… well, I guess it becomes the new “stone tablet.” But “disk is the new tape” fits in more than one way: disk bandwidth often grows with disk size (improved density, etc.), but seek time remains very hard to improve due to physics. The way disks are now, random seeks are possible but so expensive as to make them worthless for many use-cases. In this way, disk is moving towards the properties of tape, which isn’t seen as a “random access” media, even though seeks are possible. To achieve good performance, it makes sense to treat disks as if they are sequential access-only media. Doug Cutting, creator of Hadoop, gave a presentation at OSCON in 2007 where he hammers home this point. MapReduce and Hadoop need to process large datasets from disk, so they must operate sequentially using linear sort and merge techniques. He does some back-of-the-envelope calculations and illustrates the differences between a random-access and streaming strategy using some fixed disk parameters (see slide 5): to update 1% of a terabyte database using an STR-limited sort and merge strategy takes ~1 day versus ~100-1000 days with a seek-limited B-tree index strategy.
This fact about disks is part of a more general trend succinctly articulated by David Patterson in an article titled “Latency lags bandwidth” appearing in the the October 2004 Communications of the ACM. Bandwidth increases way faster than latency in most areas (networks, disks and RAM), and Patterson gives six reasons and trend data to illustrate the point quite clearly. He also briefly analyzes the implications on software design.
How is “RAM [becoming] the new disk?” Well, this alludes to the fact that main memory sizes are getting so large, we can often keep relatively large working datasets entirely in memory. Coupled with high-speed networking, we can use a collection of memory-rich machines to store very large datasets. For a long time now, it has been faster to access the memory of another machine via the network than access a local hard disk. “Cooperative Caching: Using Remote Client Memory to Improve File System Performance” was published in 1994 (OSDI ’94):
Where fetching data from remote memory over an older network might be only three times faster than getting the data from remote disk, remote memory may now be accessed ten to twenty times as quickly as disk, increasing the payoff for cooperative caching.
The fact that RAM sizes have grown large enough to fit relatively large datasets in memory has fed the widespread use and development of “in-memory” front-end caches such as memcached and Microsoft’s Velocity, as well as in-memory databases such as TimesTen. The cheapness of massive amounts of RAM has led to various specialty RAM-based disk devices where RAM is literally given a storage interface, such as Gigabyte’s i-RAM or the new ACard’s ANS-9010 Serial ATA RAM disk. The latter is backed-up to flash storage by pressing a button on the front of the unit.
So going back to disks, what about “Disk is the new RAM?” Without context, it would seem to counter the implication that “Disk is the new tape,” but the sentiment is really the same. Really the thesis of the article is the same as Doug Cutting’s point about Hadoop: you can compute relatively fast if you can work at the sequential transfer rate speed of the disk in a batch processing manner. From the article:
Caveat 1. Disk latency is much more limiting than disk bandwidth. Therefore, despite the fact that RAM stands for random access memory, we would almost never use the “new RAM” (disk) in random-access mode. The old-fashioned RAM already serves as our random-access cache
In this case, they leverage many aggregated disks in parallel to get a large amount of disk bandwidth and architect the computation to work efficiently on disks. They accomplish this by using techniques from external memory algorithms and by eliminating random-access data structures in favor of sequential ones.
Finally, to bring us full-circle, what about flash being anointed the “new disk?” Well, it’s probably too early to tell what it will ultimately mean in the external storage landscape. If it does replace disk as the “first line” after main memory, we will have to rethink various system assumptions made — filesystems, demand paging code, etc. generally make certain assumptions about the cost of operations which need to be re-evaluated and tweaked to the properties of whatever media we are using as first-line backing store.
1 Comment | Tags: Research Content