Tuesday, October 19, 2010

Computing at scale, or, how Google has warped my brain

A number of people at Google have stickers on their laptops that read "my other computer is a data center." Having been at Google for almost four months, I realize now that my whole concept of computing has radically changed since I started working here. I now take it for granted that I'll be able to run jobs on thousands of machines, with reliable job control and sophisticated distributed storage readily available.

Most of the code I'm writing is in Python, but makes heavy use of Google technologies such as MapReduce, BigTable, GFS, Sawzall, and a bunch of other things that I'm not at liberty to discuss in public. Within about a week of starting at Google, I had code running on thousands of machines all over the planet, with surprisingly little overhead.

As an academic, I have spent a lot of time thinking about and designing "large scale systems", though before coming to Google I rarely had a chance to actually work on them. At Berkeley, I worked on the 200-odd node NOW and Millennium clusters, which were great projects, but pale in comparison to the scale of the systems I use at Google every day.

A few lessons and takeaways from my experience so far...

The cloud is real. The idea that you need a physical machine close by to get any work done is completely out the window at this point. My only machine at Google is a Mac laptop (with a big honking monitor and wireless keyboard and trackpad when I am at my desk). I do all of my development work on a virtual Linux machine running in a datacenter somewhere -- I am not sure exactly where, not that it matters. I ssh into the virtual machine to do pretty much everything: edit code, fire off builds, run tests, etc. The systems I build are running in various datacenters and I rarely notice or care where they are physically located. Wide-area network latencies are low enough that this works fine for interactive use, even when I'm at home on my cable modem.

In contrast, back at Harvard, there are discussions going on about building up new resources for scientific computing, and talk of converting precious office and lab space on campus (where space is extremely scarce) into machine rooms. I find this idea fairly misdirected, given that we should be able to either leverage a third-party cloud infrastructure for most of this, or at least host the machines somewhere off-campus (where it would be cheaper to get space anyway). There is rarely a need for the users of the machines to be anywhere physically close to them anymore. Unless you really don't believe in remote management tools, the idea that we're going to displace students or faculty lab space to host machines that don't need to be on campus makes no sense to me.

The tools are surprisingly good. It is amazing how easy it is to run large parallel jobs on massive datasets when you have a simple interface like MapReduce at your disposal. Forget about complex shared-memory or message passing architectures: that stuff doesn't scale, and is so incredibly brittle anyway (think about what happens to an MPI program if one core goes offline). The other Google technologies, like GFS and BigTable, make large-scale storage essentially a non-issue for the developer. Yes, there are tradeoffs: you don't get the same guarantees as a traditional database, but on the other hand you can get something up and running in a matter of hours, rather than weeks.

Log first, ask questions later. It should come as no surprise that debugging a large parallel job running on thousands of remote processors is not easy. So, printf() is your friend. Log everything your program does, and if something seems to go wrong, scour the logs to figure it out. Disk is cheap, so better to just log everything and sort it out later if something seems to be broken. There's little hope of doing real interactive debugging in this kind of environment, and most developers don't get shell access to the machines they are running on anyway. For the same reason I am now a huge believer in unit tests -- before launching that job all over the planet, it's really nice to see all of the test lights go green.

Sunday, October 10, 2010

In Defense of Mark Zuckerberg

I finally got to see The Social Network, the new movie about the founding of Facebook. The movie is set during my first year teaching at Harvard, and in fact there is a scene where I'm shown teaching the Operating Systems course (in a commanding performance by Brian Palermo -- my next choice was Brad Pitt, but I'm thrilled that Brian was available for the role). The scene even shows my actual lecture notes on virtual memory. Of course, the content of the scene is completely fictional -- Mark Zuckerberg never stormed out of my class (and I wouldn't have humiliated him for it if he had) -- although the bored, glazed-over look of the students in the scene was pretty much accurate.

It's a great movie, and very entertaining, but there are two big misconceptions that I'd like to clear up. The first is that the movie inaccurately portrays Harvard as a place full of snobby, rich kids who wear ties and carry around an inflated sense of entitlement. Of course, my view (from the perspective of a Computer Science faculty member) might be somewhat skewed, but I've never seen this in my seven years of teaching here. Harvard students come from pretty diverse backgrounds and are creative, funny, and outgoing. I've had students from all corners of the world and walks of life in my classes, and I learn more from them than they'll ever learn from me -- the best part of my job is getting to know them. I've only seen one student here wearing a tweed jacket with elbow patches, and I'm pretty sure he was being ironic.

The second big problem with the movie is its portrayal of Mark Zuckerberg. He comes across in the film as an enormous asshole, tortured by the breakup with his girlfriend and inability to get into the Harvard Final Clubs. This is an unfair characterization and not at all the Mark Zuckerberg that I know. The movie did a good job at capturing how Mark speaks (and especially how he dresses), but he's nowhere near the back-stabbing, ladder-climbing jerk he's made out to be in the film. He's actually an incredibly nice guy, super smart, and needless to say very technically capable. If anything, I think Mark was swept up by forces that were bigger and more powerful than anyone could have expected when the Facebook was first launched. No doubt he made some mistakes along the way, but it's too bad that the movie vilifies him so. (Honestly, when I first heard there was a movie coming out about Facebook with Mark Zuckerberg as the main character, I couldn't believe it -- the quiet, goofy, somewhat awkward Mark that I know hardly sounded like a winning formula for a big-budget Hollywood film.)

The take-away from the movie is clear: nerds win. Ideas are cheap and don't mean squat if you don't know how to execute on them. To have an impact you need both the vision and the technical chops, as well as the tenacity to make something real.  Mark was able to do all of those things, and I think he deserves every bit of success that comes his way. As I've blogged about before, I once tried to talk Mark out of starting Facebook -- and good thing he never listened to me. The world would be a very different (and a lot less fun, in my opinion) place if he had.

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.