i’ve got a weird history working with computers. For the most part, i’ve worked on the less sexy bits, building the stuff that other stuff talks to.
Naturally, i was interested when i read about Docker.io. It’s an interesting concept, building small, standard “containers” that are minimal, portable systems (where portable == any core that happens to meet most of the same requirements as the container). These systems have a predictable API and allow for easy deployment and use with minimal fuss.
But, i wasn’t sure if i was as taken by what Docker was, or what Docker was advocating.
In essence, what you’re building with Docker is a pre-configured, self-contained app for a specific build. You can’t take a Docker instance and run it on Windows. Likewise, even though it’s linux based, you’re advised not to run it against anything too foreign. All of which is absolutely fine, by the way, but highlights something that folks tend not to think about.
When building things for the backend, you really need to think LEGO. Lots of semi-specialized bits that fit together well.
Docker encourages you to build components. While possible, it’s kind of a pain in the butt to load a ton of things into a Docker Container, but fairly easy to make a suite of them that talk to each other.
Amazon kinda gets this right, too. In essence, they drive you hard to not build monolithic systems. Generally, you don’t want pack as many things onto one box as you can. (e.g. you don’t want to build a blog on a box.) If you’ve got all your digital eggs in one basket, you’re going to have a really bad time. Instead, you want to divvy things up and put data on one set of machines that only deal with data, and display on another set of machines that only deals with querying and displaying the data. Likewise, you can have other machines that only do image processing, or indexing, or whatever else. You give each a fast, standard API to receive, process, and deliver content. This allows you to grow things that are slow, shrink things that are fast or unused, and deal with things that go boom.
Granted, Amazon tends to nickle & dime about things that can drive costs up surprisingly fast, but that’s their business and they’re pretty up front about that.
So, what elements does a successful system tend to have? A few things, really.
- Shared storage
Data’s gotta live somewhere, and it needs to be accessible by multiple machines. This doesn’t always mean using a database, though. It may make just as much sense to use NFS or some other distribution system to ensure that data is reasonably consistent between machines. This could be complex (mysql) or simple key/value (memcache or redis), or even just a shared data directory, although that requires locking and there be dragons.
This part is exceptionally difficult to both explain as well as provide. “Logging” has two bits to it, “Short Term” that doesn’t really pay attention to what’s being logged, but tracks how much things have changed, (e.g. CPU & Memory spikes, Memory accesses, error rates, etc.) and “Long Term” that provides data insights and deep info (How many visitors from China have the latest browser? etc.) You get woken up at 3AM because of short term stuff. My current preference is for this is Heka, mostly because it doesn’t do much. You layer other things on top to get the pretty pictures.
Things have got to talk to each other. This could be simple (REST), complex (0mq Pub/Sub), or even proprietary. There may be multiple ways to communicate (users go on one channel, events go on a different one), or everything uses the same interface (flickr was built based on that idea, and has one of the best third party APIs to show for it). Sadly, there’s few clear solutions for this, mostly because of the fractured nature of messaging.
- Display (optional)
This can literally be anything. It might get pushed all the way down to the client and you’ve got a little browser based app that pulls data. It might be something that only lives on a dedicated “web head” running PHP or some other templating system that pulls and displays the data. It might be a mix as devices become more powerful.
The rest is kind of up to your app. What you REALLY want to avoid doing is piling more than one of those into the same thing. (E.g. You don’t want to have your Data Storage also be your Display, or your Messaging also be your Logging system.) i’d also guard against your app being any one of these as well. These are all fairly well “solved” spaces. At most, you’d need to create a small abstraction layer that will allow you to swap whatever backend you’ve picked now for the better one you’ll pick later.
Building big things is a bad idea. Building networks of little things tends to be a much better one. If stuff like AWS and Docker get folks thinking that way, we’ll live in a much, much better world.