You’re Overthinking It

You comb Hacker News daily, marveling at the neatly packaged startup tales, uber-effective best practices, super clever engineering solutions, and lots and lots of links to websites filled with Helvetica, minimalism, and pastel colors. You’ve attended Lean Startup workshops, read Four Steps to the Epiphany, and subscribe to the Silicon Valley Product Group blog.

Honestly, it’s all very intimidating.

My product advice, from one overthinker to another overthinker – throw it all away. I mean, read the articles, enjoy the stories, and try to form your own opinions, but I wouldn’t take it too seriously.

A year into my first startup, my first major product epiphany was to never, never, ever try to build a product you couldn’t be a user for. That may be obvious, but I still read people discussing strategies for building products that they don’t use. There is no better user study, no more accurate persona than asking yourself what is good. There are probably product people out there that can do it, but, no offense, it’s probably not you, and it’s certainly not me.

There are many pros to building a product that you would use. Actually using a product (and I mean really using it), allows you to access the powers of intuition, an infinitely more valuable product tool than reason. Your intuition explains to you in a moment what it takes your reason an hour to break down. Your reason will lead you down a dozen wrong roads.

Another way of saying that is: if you think it’s cool, it’s probably cool. If you think it sucks, it probably sucks.

However, building a product for yourself doesn’t give you a free pass from user research, personas, and all the other things that product gurus tell you to do. Spend a year having the same product discussions with the same group of people, and the discussions will lose all meaning. Talking to five people outside of the company will bring you back to earth real fast.

One last thing…whatever you build, make sure it looks good and is the highest quality possible.

But wait, you say, look at eBay, Amazon, and Craiglist – they look like crap. Implement an MVP with product/market fit, and it doesn’t matter what it looks like. That’s true sometimes, but it also depends on where you are on Maslow’s hierarchy of needs. The lower on the pyramid your product is, the crappier it can look. If your product is core to helping people make money, pirate movies, or sell your useless couch, you don’t need a designer. But if you’re high on the pyramid, ugly/clunky UI makes it impossible to for people to see your vision.

If you read Steve Jobs biography, it talks about one of the three original Apple philosophies: an odd word called impute. It’s basically a philosophy around impressions. On product, they said, “If we present them in a slipshod manner, they will be perceived as slipshod”.

Speaking of the biography, I’ll wrap up with my impression of that book. It’s a story about a product genius, but it’s a story with as many missteps as triumphs. I take the moral of the story to be: forget the experts, the know-it-alls, and the doubters. Trust yourself and your vision, and go build something.

Turn ruby code into simple daemons with Dante

Context

At Miso, the engineering team has recently been working on a major shift to our architecture. Namely, moving to a much purer service oriented approach. Our new strategy as a whole is going to be the subject of many blog posts in the future. This transition has required us to setup a lot of new services. These services range from small API services written in Sinatra and Renee to front-end heavy services in Padrino, even to internal services communicating with protobuf messages.

Every one of these services has a lot of shared behaviors in common; each needs to log output, be monitored, be daemonized and easily start / stoppable, have automated deployment, etc. These shared commonalities between services and the quantity of these services led us to begin our journey to create a ‘unified’ services architecture that will manage all of this for us. While that project is still a work in progress, today I would like to introduce one tiny library that we built along the way.

Introduction

One of the common behaviors each ruby service needs is a simple executable that can reliably start / stop a process. More-over, each executable needs to be able to start the process in non-daemon and daemonized mode, allow a port to be specified, allow a pid file to be specified, and the executable should be able to stop daemonized processes that had been started. Ideally, this same interface could be used with God or Monit to monitor the process.

We started by looking at existing daemonizing gems such as daemon-kit, and while these are excellent libraries, they didn’t really achieve quite our purposes. In the end, we decided to whip up a gem that would provide this in the simplest way that could possibly work. The result is an extremely tiny but well-tested daemonize gem called Dante.

Usage

First, let’s start with how to integrate Dante into a gem or ruby code base.

Since dante will become a dependency to your gem’s executable, add to your gemspec:

# mygem.gemspec

Gem::Specification.new do |s|
  # ...
  s.add_dependency "dante"
end

or alternatively to your Gemfile:

# Gemfile

gem "dante"

Now you can start using dante for your executables. Dante’s interface is extremely simple. You wrap your code in a Dante block and then the gem handles the rest and passes you the relevant options.

Let’s say we want to daemonize arbitrary ruby code, the simplest way would be:

# Starts the code as a daemonized process
Dante::Runner.new('foo').execute(
  :daemonize => true, :pid_path => "/tmp/path.pid") { do_something! }

You can then also stop based on the pid:

# Stops the daemonized process
Dante::Runner.new('foo').execute(
  :kill => true, :pid_path => "/tmp/path.pid")

Now, let’s say we wanted to boot up a daemonized thin api service using an executable. First we would create the executable in bin/blog and then wrap the thin bootup in a Dante#run block:

#!/usr/bin/env ruby

require File.expand_path("../blog.rb", __FILE__)

Dante.run('blog') do |opts|
  # opts available: host, pid_path, port, daemonize, user, group
  Thin::Server.start('0.0.0.0', opts[:port]) do
    use Rack::CommonLogger
    use Rack::ShowExceptions
    run Blog
  end
end

Notice how simple the Dante part is. Dante just accepts a daemon name and yield options that were passed on the command line to your program. In this example, the thin process uses those options to bootup thin on the correct port. So what does wrapping your executable code in Dante give you?

Functionality

With your executable daemonized with Dante, you get a familiar and standardized interface for controlling your process. The simplest way to start the thin server now is just to run the executable with no options:

./bin/blog

That will boot up a non-daemonized process in the command line. Now, let’s say we want to get help for this process:

./bin/blog --help

will print a nice auto-generated help message showing the daemons commands as a reference. Now, let’s start the process as a daemon, setting the port and pidfile:

./bin/blog -d -p 8080 -P /var/run/blog.pid

This will start the blog service on port 8080 storing the pid file at /var/run/blog.pid, and with that your process will be running. Now let’s stop the process:

./bin/blog -k -P /var/run/blog.pid

The -k flag will find the process from the pid and then kill the daemonized process for you.

Customization

In many cases, you will need to custom flags/options or a custom description for your executable. You can do this by using Dante::Runner explicitly:

#!/usr/bin/env ruby

require File.expand_path("../../myapp.rb", __FILE__)

# Set executable name to 'myapp' and default port to 8080
runner = Dante::Runner.new('myapp', :port => 8080)

# Sets the description in 'help'
runner.description = "This is awesome myapp!"

# Setup custom 'test' option flag
runner.with_options do |opts|
  opts.on("-t", "--test TEST", String, "Test this thing") do |test|
    options[:test] = test
  end
  # ... more opts ...
end

# Parse command-line options and execute the process
runner.execute do |opts|
  # opts: host, pid_path, port, daemonize, user, group
  Thin::Server.start('0.0.0.0', opts[:port]) do
    puts opts[:test] # Referencing my custom option
    use Rack::CommonLogger
    use Rack::ShowExceptions
    run MyApp
  end
end

Now you would be able to do:

./bin/myapp -t custom

and the opts would contain the :test option for use in your script. In addition, help will now contain your customized description in the banner.

That’s all you need to know to use Dante although the process also supports starting on alternate hosts and a handful of other options. This is all Dante does, the simplest daemonizer that could possibly work. But it has been extremely handy for us for controlling our services.

Process Monitoring

One easy way to monitor a daemonized process is to use God to watch the process. Thankfully, God works perfectly with Dante and they cooperate quite nicely. God is supposed to be able to daemonize processes automatically but in my experience this can be unreliable and having a consistent interface for the executable can be convenient for testing.

Setting up a god file for a dante process is simple:

# /etc/god/blog.rb

God.watch do |w|
  pid_file = "/var/run/blog.pid"
  w.name            = "blog"
  w.interval        = 30.seconds
  w.start           = "ruby /path/to/blog/bin/blog -d -P #{pid_file}"
  w.stop            = "ruby /path/to/blog/bin/blog -k -P #{pid_file}"
  w.start_grace     = 15.seconds
  w.restart_grace   = 15.seconds
  w.pid_file        = pid_file
  w.behavior(:clean_pid_file)
  w.start_if do |start|
    start.condition(:process_running) do |c|
      c.interval = 5.seconds
      c.running = false
    end
  end
end

Conclusion

Our team will have a lot more to talk about soon as we begin open-sourcing various parts of our new services architecture. However, I hope Dante can be useful as a standalone utility. I know that it has been for us. Be sure to checkout Dante for more information and to try this out for yourself.

For an example of our usage of Dante in a ruby gem, check out Gitdocs, our open-source dropbox clone using Ruby and Git. In particular, check out the cli.rb class for advanced usage.

Collaborate and track tasks with ease using GitDocs

At Miso, we have tried a lot of task tracking tools of all shapes and sizes. However, since we are a small team, we all end up using those in conjunction with just a plain text file where we ‘really’ track our tasks. Nothing beats a simple text file for detailed personal task tracking and project tracking. I have been using text-file based task management since as long as I can remember and most of the team shares this sentiment. A separate issue that commonly comes up is just a place to dump code snippets, setup guides, team references, etc. We have used a wiki for this, and are currently using the Github wiki for this very purpose. A third common need is a place to dump software, images, mocks, etc and we currently use dropbox for this along with a shared network drive.

Recently, a discussion arose around the possibility of just using a simple git repository of our own to serve all these disparate purposes. Namely, we needed a place to do personal and project task tracking, store useful shared code snippets, and dump random files, notes, etc that the development team needs access to. Fundamentally, what prevents a simple git repository shared between all developers serve this task? Well, there are a few obstacles to this purely git based approach.

The first is that with shared files and docs, you don’t want to have to worry about constantly pushing and pulling. Changes are happening all the time, text is being changed or files are being added and having to manually commit every line change to a text file doesn’t really make sense for this use case. We really wanted something like Dropbox…namely when a file is changed, the updates are pushed and pulled automatically. Second, we really need a way to view files in the browser. Imagine writing a great markdown guide and never being able to see the guide rendered in proper markdown. We needed a way for this shared repository to be browsable in the browser.

So after discussing this, we decided to build our ‘ideal’ solution and the result is gitdocs, a way to collaborate on documents and files effortlessly using just git and the file system. Think of gitdocs as a combination of dropbox and google docs but free and open-source.

Installation

Install the gem:

gem install gitdocs

If you have Growl installed, you’ll probably want to run brew install growlnotify to enable Growl support.

Usage

Gitdocs is a rather simple library with only a few basic commands. You need to be able to add paths for gitdocs to monitor, start/stop gitdocs and be able to check the status. First, you need to add the doc folders to watch:

gitdocs add my/path/to/watch

You can remove and clear paths as well:

gitdocs rm my/path/to/watch
gitdocs clear

Then, you need to startup gitdocs:

gitdocs start

You can also stop and restart gitdocs as needed. Run

gitdocs status

for a helpful listing of the current state. Once gitdocs is started, simply start editing or adding files to your designated git repository. Changes will be automatically pushed and pulled to your local repo.

You can also have gitdocs fetch a remote repository with:

gitdocs create my/path/for/doc git@github.com:user/some_docs.git

This will clone the repo and add the path to your watched docs. Be sure to restart gitdocs to have path changes update:

gitdocs restart

To view the docs in your browser with file formatting:

gitdocs serve

and then visit http://localhost:8888 for access to all your docs in the browser.

Conclusion

Our development team has started using Gitdocs extensively for all types of files and documents. Primarily, task tracking, file sharing, note taking, code snippets, etc. All of this in git repos, automatically managed by gitdocs and viewable in your favorite text editor or browser. Be sure to check out the gitdocs repository for more information.

What are subnets?

At Miso, we recently launched an integration with AT&T that allows the Miso iPhone app to connect with your TV. Have you ever thought that browsing hundreds of cable channels on your TV was a painful process? Our vision with Miso is that you can turn on your favorite show, see what your friends are watching, or browse additional information about a show, all from your phone.

To make this happen, Miso connects with the AT&T receiver over wifi, and we’re learning from our early adopters that home networks are more complicated than they used to be. Wifi routers are definitely the norm, and it’s not uncommon to have multiple wifi routers cover the living room, the master bedroom, and the guest bedroom upstairs. If you’re plugging in a wifi router that you bought from Best Buy, you’re probably creating a new subnet, which is something you almost never need to care about.

As it turns out, Miso’s integration with AT&T and DIRECTV both assume that a home only has one subnet, which is not always true. This has caused some support issues and prompted some people within Miso to ask, what is a subnet? Good question! As a programmer, I have some basic knowledge of network engineering, but answering this question prompted me to dig a bit deeper.

What is a subnet?

Subnets and IP addresses go hand in hand. Together, they are the building blocks for how computers find each other on a network. For instance, every time you visit a website, you first have to find the computer on the internet that hosts that website.

Addressing on the internet works a lot like mailing addresses for your apartment. The zip code for my apartment is 94105; if you were to Google that, you could see that I live within a certain region of San Francisco. Subnets are like zipcodes – they divide up the internet world into small regions.

The subnet is actually part of the IP address. You’ve probably seen an IP address for a computer. My IP address at this coffee shop is 192.168.5.199. The first three numbers (192.168.5) is the subnet (or network address) and the last number (199) is my laptop’s address (host address). The guy sitting at the next table probably has an address like 192.168.5.198. Because we’re on the same subnet, we could do things like share playlists with iTunes or exchange files on a shared folder. The three numbers that form the subnet are somewhat random, in the same way that zipcodes are somewhat random.

For nerds only: the tricky part is that which part of the IP address is the subnet actually varies, and it determines how large the subnet is. The mechanisms for dividing up address space has evolved over the years, from classful addressing to classless addressing. We’re at the point where we’re out of IPv4 addresses, thus the creation of IPv6 which has many more available addresses.

So, great, subnets are like zipcodes. They group computers together in some logical way. Laptops in the same cafe are probably on the same subnet and can do stuff like share iTunes playlists, and laptops that are in different cafes probably can’t. Yet, there must be some way to communicate between subnets. The websites that I visit every day are hosted on computers that are all on different subnets.

How does information travel between subnets?

Sending information across subnets is a lot like sending mail across the country. After writing the address on the envelope, I deliver the letter to my local post office. My local post office looks at the zipcode and begins the process of routing the letter through hubs to bring the letter in the general vicinity of the destination address. As the letter approaches the destination address, it will be given to another local post office, which ultimately puts it on a mail truck that goes to the actual address.

In the same way, let’s say that I want to visit google.com. From my laptop, the address for google.com is 74.125.224.114. My laptop has no idea where that address is, so it does the equivalent of delivering my request to the local post office, which is the wifi router of this coffee shop (you may have also heard this referred to as a gateway). The coffee shop’s wifi router also doesn’t know where that address is, so it forwards the request to the internet provider (ISP) of the coffee shop. The internet provider has a better sense for zip codes, so it starts the process of sending the packet in the right direction.

For nerds only: This process is called IP routing and there are several algorithms for doing it and they vary with IPv4 and IPv6. The algorithms differ in how chatty they are and what size network they work well with, among other things. Many algorithms are based on Dijkstra’s algorithm.

So how come I can’t share my iTunes playlist with any computer?

I can access google.com from anywhere, but I can’t access that laptop from the other coffee shop. The reason for that is that there are many more computers (and IP addresses) than there are mailing addresses, so they can’t all be public. If you work in a large business, then your department probably has an internal mail stop that’s handled by your company’s mailroom.

In the same way, internet addresses are divided up into public and private addresses. Public addresses can be reached from anywhere, but private addresses can only be accessed in local networks. Public addresses are handed out by the internet equivalent of the post office called the Internet Assigned Numbers Authority (IANA). If I want a public mailing address, I have to contact the post office. If I want a public IP address, then I have to contact my internet provider. My internet provider is a member of a Regional Internet Registry (RIR), which is a member of the IANA. It’s fairly common that one home will have one public IP address, although you can pay your ISP for more.

So, why doesn’t AT&T and DIRECTV work across multiple subnets?

Hopefully, you have a high level sense for internet addressing, subnets, and routing, although we still haven’t explained why our AT&T and DIRECTV integrations don’t work across multiple subnets yet. We need to understand one more concept – multicast.

Most network communication is two computers talking to each other directly. In some cases, you want to communicate with multiple computers at the same time. Broadcast and multicast are the two ways of doing that. In order for Miso to find the cable receiver, it makes an announcement over multicast asking if there are any cable receivers in the network. That’s the same way your computer discovers printers on your local network.

As you can imagine, multicast announcements can be pretty noisy. Imagine if my laptop made a multicast announcement and every computer in the world had to pay attention; things would get pretty chaotic. Therefore, multicast messages only go out to current subnet. If my cable receiver is on a different subnet, it won’t receive my multicast announcement.

There are a number of ways of overcoming this problem which include adjusting multicast TTL levels, router port forwarding, or configuring routers as bridges. TTL (Time-to-live) tells the multicast message how many hops to take before it should give up. By default, this value is 1, which means that it will deliver the multicast messages to all computers on the local subnet only. Router port forwarding and configuring routers as bridges are other mechanisms for allowing the multicast traffic to hop across to an adjacent subnet; however, for our AT&T integration, they are less ideal because they involve users understanding how to administer their home networks.

Hopefully, you found this high level guide useful. Feedback welcome!

Forget Chef or Puppet – Automate with Sprinkle

Miso has a relatively standard server architecture for a medium-level traffic Rails application. We use Linode to host our application and we provision a number of VPS instances that make up our infrastructure. We have a load balancer equipped with Nginx and Varnish, we have an application server that runs our Rails application to serve dynamic requests, we have a master-slave database setup, and a cache server that has memcached and redis.

Early on, we manually setup these servers by hand by installing packages, compiling libraries, tweaking configurations and installing gems. We then manually documented these steps to a company wiki which would allow us to mechanically follow the steps and setup a new instance of any of our servers.

Automating our Setup

We decided recently that we wanted a more robust way of provisioning and configuring our different types of servers. Mechanically following steps from a wiki is inefficient, error-prone and likely to become inaccurate. What we needed was a system that would allow us to provision a new application or database server with just a single command. Here were additional requirements for our desired setup:

  • Simple to setup and configure
  • Lightweight system using familiar syntax
  • Preferably utilizing ssh and commands in the vein of Capistrano
  • Modular components that can be assembled to setup each server
  • Locally executable such that I can provision a server from my local machine
  • Doesn’t require the target machine to have anything installed prior

The set-up and provisioning tools which are most popular for open-source seem to be Puppet and Chef. Both are great tools that have a lot of features and flexibility, and are well-suited when you have a large infrastructure with tens or hundreds of servers for a project. However, Chef and Puppet recipes are not trivial to create or manage and by default they are intended to be managed from a remote server. We felt that there was unnecessary complexity and overhead for our simple provisioning purposes. Thanks for the corrections in the comments regarding the remote server requirement for Puppet/Chef.

For smaller infrastructures like ours, we felt a better tool would be easier to manage and setup. Ideally, something that is akin to deploying our code with Capistrano. A tool that can be managed and run locally and that can be maintained easily. After some exploration, we stumbled upon a ruby tool called Sprinkle, probably best known by one example automated script called Passenger Stack.

There are several aspects of Sprinkle that made this our tool of choice. For one, it is locally managed and the setup is done leveraging the rock-solid Capistrano deployment system. Also, even though Sprinkle is written in Ruby, the tool does not require Ruby or anything else to be installed on the target servers since the automated setup is executed on your development machine and communicates with the remote server using only SSH. The best part is that there are only a few concepts required to understand and use this system.

Understanding Sprinkle

The rest of this article is intended to be a long and comprehensive overview of Sprinkle. While I have read many blog posts about Sprinkle written across several different years, I hadn’t seen a post that covered each aspect of Sprinkle in full detail. The goal is that by the end of this post, you should be able to understand and write Sprinkle scripts as well as execute them. Let’s start off by exploring the four major concepts that make up Sprinkle and that will be used to build out your server recipes: Packages, Policies, Installers, and Verifiers.

Packages

A package defines one or more things to provision onto the server. There is a lot of flexibility in a way a package is defined but fundamentally this represents a “component” can be installed. Packages are sets of installations, options, and verifications, grouped under a meaningful name. The basic structure of a package is like this:

# packages/some_name.rb
package :some_name do
  description 'Some text'
  version '1.2.3'
  requires :another_package

  # ...installers...

  verify { ...verifiers... }
end

Note that defining a package does not install the package by default. A package is only installed when explicitly mentioned in a policy. You can also specify recommendations and/or optional package dependencies in a package as well:

# packages/foo_name.rb
package :foo_name do
  requires :another_package
  recommends :some_package
  optional :other_package

  # ...installers...
  verify { ...verifiers... }
end

You can also create virtual aliased packages that are various alternatives for the same component type. For instance, if I wanted to give people a choice over the database to use when provisioning:

# packages/database.rb
package :sqlite3, :provides => :database do
  # ...installers and verifiers...
end

package :postgresql, :provides => :database do
  # ...installers and verifiers...
end

package :mysql, :provides => :database do
  # ...installers and verifiers...
end

You can now reference that you want to install a :database and the script will ask you which provision you want to install. For more information on packages, the best place is looking at the source file itself.

Policies

A policy defines a set of particular “packages” that are required for a certain server (app, database, etc). All policies defined will be run and all packages required by the policy will be installed. So whereas defining a “package” merely defines it, defining a “policy” actually causes those packages to be installed. A policy is very simple to define:

# Define target machine used for the "app" role
role :app, "208.28.38.44"
# Installs the packages specified with a 'requires' when the script executes
policy :myapp, :roles => :app do
  requires :some_package
  requires :nginx
  requires :postgresql
  requires :rails
end

A role merely defines what server the commands are run on. This way, a single Sprinkle script can provision an entire group of servers specifying different roles similar to using Capistrano directly. You may specify as many policies as you’d like. If the packages you’re requiring are properly defined with verification blocks, then no software will be installed twice, so you may require a webserver on multiple packages within the same role without having to wait for that package to install repeatedly.

For more information on policies, the best place is looking at the source file itself.

Installers

Installers are mechanisms for getting software onto the target machine. There are different scripts that allow you to download/compile software from a given source, use packaging systems (such as apt-get), copy files, install a gem, or even just inject content into existing files. Common examples of installers are detailed below.

To run an arbitrary command on the server:

package :foo do
  # Can be any line executed in the shell
  runner "run_some_command --now"
end

To install using the aptitude package manager in Ubuntu:

package :foo do
  apt 'foo-package'
  # Supports only installing dependencies
  apt('bar-package') { dependencies_only true }
end

That would install the ‘foo-package’ when you run the ‘foo’ package. To install a ruby gem:

package :foo do
  gem 'foo-gem'
  # Supports specifying version, source, repository, build_docs and build_flags
  gem 'bar-gem' do
    version "1.2.3"
    source 'http://gems.github.com'
    build_docs false
  end
end

To upload arbitrary text into a file on the target:

package :foo do
  push_text 'some random text', '/etc/foo/bar.conf'
  # Supports sudo access for a file
  push_text 'some random text', '/etc/foo/bar.conf', :sudo => true
end

You can also replace text in a target file:

# packages/foo.rb
package :foo do
  replace_text 'original foo', 'replacement bar', '/etc/foo/bar.conf'
  # Supports sudo access for a file
  replace_text 'original foo', 'replacement bar', '/etc/foo/bar.conf', :sudo => true
end

To transfer a file to a remote target file:

package :foo do
  # Transfers are recursive by default so whole directories can be moved
  transfer 'file/some_folder', '/etc/some_folder'
  # Supports sudo access for a file
  transfer 'file/foo.file', '/etc/foo.file', :sudo => true
  # Also a file can have "render" passed which runs the template through erb 
  # You can access variables to output the file dynamically, or pass explicit locals
  foo_port = 8080
  transfer 'file/foo.file', '/etc/foo.file', :render => true, 
            :locals => { :bar_port => 80 }
end

To run a rake task as part of a package on target:

package :foo, :rakefile => "/path/to/Rakefile" do
  rake 'foo-task'
end

To install a library from a given source path:

package :foo do
  source 'http://foo.com/latest-1.2.3.tar.gz'
  # Supports prefix, builds, archives, enable, with, and more
  source 'http://magicbeansland.com/latest-1.1.1.tar.gz' do
    prefix    '/usr/local'
    archives  '/tmp'
    builds    '/tmp/builds'
    with      'pgsql'
  end
end

Installers also support installation hooks at various points during the install process which vary depending on the installer used. An example of how to use hooks is as follows:

package :foo do
  apt 'foo-package' do
    pre :install, 'echo "Beginning install..."'
    post :install, 'echo "Completing install!"'
  end
end

Multiple hooks can be specified for any given step and each installer has multiple steps for which hooks can be configured. For more information on installers, the best place is looking at the source folder itself.

Verifiers

Verifiers are convenient helpers which you can use to check if something was installed correctly. You can use this helper within a verify block for a package. Sprinkle runs the verify block to find out whether or not something is already installed on the target machine. This way things never get done twice and if a package is already installed, then the task will be skipped.

Adding a verify block for every package is extremely important, be diligent to have an appropriate verifier for every installer used in a package. This will make the automated scripts much more robust and reusable on any number of servers. This also ensures that an installer works as expected and tests the server after installation as well.

There are many different types of verifications, for each one there are installers for which they are particularly useful. For instance, if I wanted to see if an aptitude package was installed correctly:

package :foo do
  apt 'foo-package'
  apt 'bar-package'
  verify do
    has_apt 'foo-package'
    has_apt 'bar-package'
  end
end

This will only install the package on a target if not already installed and verifies the installation after the package runs. If we wanted to check that a gem exists:

package :foo do
  gem 'foo-gem', :version => "1.2.3"
  gem 'bar-gem'
  verify do
    has_gem 'foo-gem', '1.2.3'
    # or verify that ruby can require a gem
    ruby_can_load 'bar-gem'
  end
end

If you want to check if a directory, file or executable exists:

package :foo do
  mkdir '/var/some/dir'
  touch 'var/some/file'
  runner 'touch /usr/bin/abinary' do
    post :install, "chmod +x /usr/bin/abinary"
  end

  verify do
    has_directory '/var/some/dir'
    has_file      '/etc/apache2/apache2.conf'
    has_executable 'abinary'
  end
end

You can also check if a process is running:

package :foo do
  apt 'memcached'

  verify do
    has_process 'memcached'
  end
end

For more information on verifiers, the best place is looking at the source folder itself.

Putting Everything Together

Once you understand the aforementioned concepts, building automated recipes for provisioning becomes quite straightforward. Simply define packages (with installers and verifiers) and then group them into ‘policies’ that run on target machines. Generally, you can have a deploy.rb file and an install.rb file that are defined as follows:

# deploy.rb
# SSH in as 'root'. Probably not the best idea.
set :user, 'root'
set :password, 'secret'

# Just run the commands since we are 'root'.
set :run_method, :run

# Be sure to fill in your server host name or IP.
role :app, '83.434.34.234'

default_run_options[:pty] = true

The install file tends to define the various policies for this sprinkle script:

# install.rb
require 'packages/essential'
require 'packages/git'
require 'packages/nginx'
require 'packages/rails'
require 'packages/mongodb'

policy :myapp, :roles => :app do
  requires :essential
  requires :git
  requires :nginx
  requires :rails
  requires :mongodb
end

deployment do
  delivery :capistrano

  source do
    prefix   '/usr/local'
    archives '/usr/local/sources'
    builds   '/usr/local/build'
  end
end

Then you should store your packages in a subfolder aptly named ‘packages’:

# packages/git.rb
package :git, :provides => :scm do
  description 'Git version control client'
  apt 'git-core'

  verify do
    has_executable 'git'
  end
end

You can store your assets (configuration files, etc) in “assets” folder and access them from your packages to upload. Once all the packages and policies have been defined appropriately you can execute the sprinkle script on the command line with:

sprinkle -c -s install.rb

And sprinkle is off to the races, setting up all the policies on the target machines.

Further Reading

There are several other good posts about Sprinkle:

In addition, there are a lot of good examples of sprinkle recipes:

Let me know what you think of all this in the comments and if you have any related comments or questions. What do you use to automate and manage your servers?

Vendor – Bringing Bundler to iOS

Using and sharing iOS libraries is tragically difficult given the maturity of the framework, the number of active developers, and the size of the open source community.  Compare with Rails which has RubyGems and Bundler, it’s no surprise that Rails/Ruby has a resource like The Ruby Toolbox, while iOS development has…nothing?

Are iOS developers fundamentally less collaborative than Rails developers?  When developing on Rails, if I ever find myself developing anything remotely reusable, I can almost always be certain that there is a gem for it (probably with an obnoxiously clever name).

I don’t think the spirit of collaboration is lacking in the iOS developer community; rather, there are a few fundamental challenges with iOS library development:

  • Libraries are hard to integrate into a project.  Granted, it’s not *that* hard to follow a brief set of instructions, but why can’t this process be more streamlined?
  • No standardized versioning standard.  Once a library is integrated into a project, there is no standard way of capturing which version of the library was used.
  • No dependency specification standard (this is a big problem).  Why does facebook-ios-sdk embed it’s own JSON library when there are better ones available?  So many libraries come embedded with common libraries that we all use – and worse than that, who knows what version they’re using!  Not only can this lead to duplicate symbols, but library developers essentially have to start from scratch instead of leveraging other existing libraries.

Of course, coming up with a naming, versioning, and dependency standard for iOS libraries and convincing everyone to adopt it is a daunting task.  One possible approach is follow the example of Homebrew, a popular package manager for OS X.  Homebrew turned installing and updating packages on OS X into a simple process.  Instead of convincing everyone to comply to some standard, Homebrew maintains a set of formulas that helps describe commonly used packages.  These formulas allow Homebrew to automate the installation process as well as enforce dependencies.  This works well for Homebrew, although it puts the burden of maintaining package specifications in one place, rather then distributed as it is with Ruby gems.

There seems to be a need for some type of solution here.  When we write Rails libraries, we write the readme first to help us understand what problem we’re trying to solve (Readme Driven Development).  Below is the readme for an iOS packaging system called Vendor.

Vendor – an iOS library management system

Vendor makes the process of using and managing libraries in iOS easy.  Vendor leverages the XCode Workspaces feature introduced with XCode 4 and is modeled after Bundler. Vendor streamlines the installation and update process for dependent libraries.  It also tracks versions and manages dependencies between libraries.

Step 1) Specify dependencies

Specify your dependencies in a Vendors file in your project’s root.

source "https://github.com/bazaarlabs/vendor"
lib "facebook-ios-sdk"  # Formula specified at source above
lib "three20"
lib "asi-http-request", :git => "https://github.com/pokeb/asi-http-request.git"
lib "JSONKit", :git => "https://github.com/johnezang/JSONKit.git"

Step 2) Install dependencies

vendor install
git add Vendors.lock

Installing a vendor library gets the latest version of the code, and adds the XCode project to the workspace.  As part of the installation process, the library is set up as a dependency of the main project, header search paths are modified, and required frameworks are added.  The installed version of the library is captured in the Vendors.lock file.

After a fresh check out of a project from source control, the XCode workspace may contain links to projects that don’t exist in the file system because vendor projects are not checked into source control. Run `vendor install` to restore the vendor projects.

Other commands

# Updating all dependencies will update all libraries to their latest versions.
vendor update
# Specifying the dependency will cause only the single library to be updated.
vendor update facebook-ios-sdk

Adding a library formula

If a library has no framework dependencies, has no required additional compiler/linker flags, and has an XCode project, it doesn’t require a Vendor formula. An example is JSONKit, which may be specified as below. However, if another Vendor library requires JSONKit, JSONKit must have a Vendor formula.

lib "JSONKit", :git => "https://github.com/johnezang/JSONKit.git"

However, if the library requires frameworks or has dependencies on other Vendor libraries, it must have a Vendor formula.  As with Brew, a Vendor formula is some declarative Ruby code that is open source and centrally managed.

An example Vendor formula might look like:

require 'formula'

class Three20 < Formula
  url "https://github.com/facebook/three20"
  libraries libThree20.a
  frameworks "CoreAnimation"
  header_path "three20/Build/Products/three20"
  linker_flags "ObjC", "all_load"
  vendors "JSONKit"
end

Conclusion

Using iOS libraries is way harder than it should be, which has negatively impacted the growth of the open source community for iOS.  Even if I was only developing libraries for myself, I would still want some kind of packaging system to help me manage the code.  Vendor essentially streamlines the flow described by Jonas Williams here.  Unfortunately, programmatically managing XCode projects isn’t supported natively, but people have implemented various solutions, such as Three20Victor Costan, and XCS.

Open Questions

  • Is there an existing solution for this?
  • Would this be a useful gem for your iOS development?
  • Why hasn’t anyone built something like this already? Impossible to build?

Adventures in Scaling, Part 2: PostgreSQL

Several months ago, I wrote a post about REE Garbage Collection Tuning with the intent of kicking off a series dedicated to different approaches and methods applied at Miso in order to scale our service. This time around I wanted to focus on how to setup PostgreSQL on a dedicated server instance. In addition, I will cover how to tweak the configuration settings as a first-pass towards optimizing database performance and explain which parameters are the most important.

Why PostgreSQL?

Before I begin covering these topics, I want to briefly touch on why our application (and this tutorial) is centered on PostgreSQL and not one of the many other RDBMS or NoSQL alternatives available for persistence in current web development. From the multitude of “general purpose” database persistence options available, the most common choices for a startup in my experience tend to be MySQL, Drizzle, PostgreSQL and MongoDB.

Each of these options above have pros and cons and there is no one size fits all solution as is usually the case in technology. In the past, I have traditionally used MySQL to do the bulk of database persistence for Rails apps. This choice was largely because of familiarity as well as the clear MySQL favoritism in the early years of the Rails community. Though the full explanation of why is outside the scope of this post, suffice to say I won’t be choosing to use MySQL again in the future when starting a new project. My claim, unsubstantiated in this post, is that there is nothing significant MySQL provides over PostgreSQL and yet there are many pitfalls and downsides.

If you are interested in the subject, I recommend you read a few posts and draw your own conclusions. To be fair, Drizzle looks like an interesting alternative to MySQL and/or Postgres. Having never used that database, I would be curious to hear how it compares to PostgreSQL. We are big fans of MongoDB at Miso and we store several types of data for our services within collections. However, for historical and practical reasons, we did not want to dedicate the time to convert our primary dataset as the benefit at our current level was not significant enough to warrant the time involved. In a future post, I would love to delve deeper into our Polyglot Persistence strategy and why we opted to use particular technologies over alternatives.

Setting up PostgreSQL

With that explanation out of the way, let’s turn our attention to setting up PostgreSQL on a dedicated database server. In this tutorial, we will be installing Postgres 9.X on an Ubuntu machine. You may need to tweak this for your specific needs and platform depending on your specific setup.

One of the first questions you run across when setting up a dedicated PostgreSQL server is “How much RAM should the instance have?”. I would take a look at the total size of your database (or the expected size of your database in the near future). If at all possible, the ideal RAM for your instance would allow for the entire database to be placed in memory. At small scales, this should be possible and will mean major performance increases for obvious reasons. The size I recommend for a starting server is typically between 2-8GB of RAM. Furthermore, I would recommend against using a dedicated database with less than 1GB of RAM if you can help it. If you have a large dataset and need replication or sharding from the start, then I would recommend putting down this guide and buying the PostgreSQL High Performance book right now. For the purposes of the rest of this guide, I am going to assume an 8GB instance was selected.

Now that we picked the size of our instance, let’s actually start the install. Unfortunately for us, Ubuntu 10.04 apt repositories only have 8.4. The postgresql-9.0 package won’t be added until Ubuntu Natty. For this reason, unless you are using that version, you must use alternate repositories for this installation. There is an excellent utility called “python-software-properties” to make installing repositories easier. If you haven’t yet, install that first:

sudo apt-get install python-software-properties

Next, let’s add the repository containing PostgreSQL 9.0:

sudo add-apt-repository ppa:pitti/postgresql
sudo apt-get update

Now, we need to install the database, the contrib tools, and several supporting libraries:

sudo apt-get install postgresql-9.0 postgresql-contrib-9.0
sudo apt-get install postgresql-server-dev-9.0 libpq-dev libpq5

Installing all of these packages now will save you the pain later of trying to track down why certain things won’t work. Another good step is to symlink the archival cleanup tool to /usr/bin for use when you need to enable replication with WAL:

sudo ln -s /usr/lib/postgresql/9.0/bin/pg_archivecleanup /usr/bin/

At this time, you should also recreate the cluster to ensure proper UTF-8 encoding for all databases:

su postgres
pg_dropcluster --stop 9.0 main
pg_createcluster --start -e UTF-8 9.0 main
sudo /etc/init.d/postgresql restart

You can now check the status of the database using:

sudo /etc/init.d/postgresql status

Once this has all been done, you can find the configuration directory in /etc/postgresql/9.0/main and the data directory in /var/lib/postgresql/9.0/main.

Another useful tool is pg_top, which allows you to view the status of your PostgreSQL processes along with the currently executing queries:

sudo wget http://pgfoundry.org/frs/download.php/1781/pg_top-3.6.2.tar.gz
tar -zxpvf pg_top-3.6.2.tar.gz
cd pg_top-3.6.2
./configure
make

PostgreSQL and all related utilities should now be properly installed. From here, you can begin creating databases and users with the psql command:

psql -d postgres
> CREATE USER deploy WITH PASSWORD 'jw8s0F4';
> CREATE ROLE admin SUPERUSER
> CREATE DATABASE name;

Of course, there are a lot of commands you can run with this console. You are encouraged to read other sources for more information.

PostgreSQL Tuning

Now that we have successfully installed PostgreSQL, the next thing to do is to tune the configuration parameters for solid performance. Of course, the best way to do this is to measure and profile your application and set these based on your own needs. All these settings will ultimately vary based on the needs of your particular application. Nonetheless, here is a guide intended to get you started with settings that work “well enough”. From here, these parameters can be tweaked to your hearts content to find the sweet spot for your individual use case.

Fortunately for us, PostgreSQL guru Gregory Smith has created a tool to make our lives a great deal simpler. This tool is called pgtune and I encourage you to run this on your server as quickly as possible after setup. The settings this recommends should be your baseline configuration values unless you know otherwise. First, let’s download the pg_tune tool:

cd ~
wget http://pgfoundry.org/frs/download.php/2449/pgtune-0.9.3.tar.gz
tar -zxvf pgtune-0.9.3.tar.gz

Once the utility has been extracted, simply execute the binary with the proper options and recommended configuration values will be output:

cd pgtune-0.9.3
./pgtune -i /etc/postgresql/9.0/main/postgresql.conf -o ~/postgresql.conf.pgtune --type Web

This will generate all the recommended values tailored custom to your server in a file ~/postgresql.conf.pgtune. Simply view this file and note the settings at the bottom:

cat ~/postgresql.conf.pgtune
# Look at the bottom for the relevant parameter values

Take the settings and append them to your actual configuration file located at /etc/postgresql/9.0/main/postgresql.conf. To be on the safe side, you should also update the kernel shmmax property which is the maximum size of shared memory segment in bytes. This is particularly necessary for large values of effective_cache_size and other parameters:

  sysctl -w kernel.shmmax=26843545600
  sudo nano /etc/sysctl.conf
    # Append the following line:
    kernel.shmmax=26843545600
  sudo sysctl -p /etc/sysctl.conf

Once this has been updated and you have saved the modified settings, restart your cluster for the settings to take affect:

pg_ctlcluster 9.0 main reload

Now that we have setup these tuned parameters, let’s take a look deeper and delve into the most important parameters you can tweak to improve your database performance. The list below is not a comprehensive guide, but in most cases these parameters will serve as the first values to experiment with after using pg_tune.

  • work_mem – Specifies the amount of memory to be used by internal sort operations and hash tables before switching to temporary disk files. The total memory used could be many times the value of work_mem; it is necessary to keep this fact in mind when choosing the value. Sort operations are used for ORDER BY, DISTINCT, and merge joins. The recommended range for this value is the total available memory / (2 * max_connections). On an 8GB system, this could be set to 40MB.
  • effective_cache_size – Sets the planner’s assumption about the effective size of the disk cache that is available to a single query. This is factored into estimates of the cost of using an index; a higher value makes it more likely index scans will be used, a lower value makes it more likely sequential scans will be used. The recommended range for this value is between 60%-80% of your total available RAM. On an 8GB system, this could be set to 5120MB.
  • shared_buffers – Sets the amount of memory the database server uses for shared memory buffers. The default is typically 32 megabytes and must be at least 128 kilobytes. The recommended range for this value is between 20%-30% of the total available RAM. On an 8GB system, this could be set to 2048MB.
  • wal_buffers – The amount of memory used in shared memory for WAL data. The default is 64 kilobytes (64kB). The setting need only be large enough to hold the amount of WAL data generated by one typical transaction. The recommended range for this value is between 2-16MB. On an 8GB system, this could be set to 12MB.
  • maintenance_work_mem – Specifies the maximum amount of memory to be used in maintenance operations, such as VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY. Larger settings may improve performance for vacuuming and for restoring database dumps. The recommended range for this value is 50MB per GB of the total available RAM. On an 8GB system, this could be set to 400MB.
  • max_connections – Determines the maximum number of concurrent connections to the database server. The default is typically 100 connections and this parameter can only be set at server start. The recommended range for this value is between 100-200.

Beyond this for further reading, I would recommend checking out other articles, and of course Gregory Smith’s book PostgreSQL High Performance.

Wrapping Up

Hopefully you have found the guide above helpful. The intent was to provide an easy way for people to get started using PostgreSQL in their Rails applications today. If you are starting a new Rails application, I strongly encourage you to make your own informed decision on which persistence engine to use. Research the issue for yourself but I would urge you to at least consider giving PostgreSQL an honest try. Coming from using MySQL prior, I can assure you we did not miss a single thing about MySQL when we migrated.

This post is the first of many that will deal with PostgreSQL and provide details on how to get this excellent RDBMS to work for your application. In a future post, I would like to cover how to setup “hot standby” replication with archival logs (WAL). That guide will take you step by step through creating a master database and several read-only slaves as well as then utilizing those slaves from within your Rails application. We also ported from MySQL to PostgreSQL, so another future post will detail how to best migrate your database as well as pitfalls to avoid.

Building a Platform API on Rails

Introduction

In my last post, I discussed the failings of to_json in Rails and how we approached writing our public APIs using RABL and JSON Templates in the View. Since that post, we have continued expanding our Developer API and adding new applications to our Application Gallery.

The approach of using views to intelligently craft our JSON APIs has continued to prove useful again and again as we strive to create clean and simple APIs for developers to consume. We have used the as_json approach on other projects for years in the past and I can say that I have not missed that strategy even once since we moved to RABL. To be clear, if your APIs and data models are dead simple and only for internal or private use and a one-to-one mapping suits your needs, then RABL may be overkill. RABL really excels once you find yourself needing to craft flexible and more complex APIs rich with data and associated information.

The last post was all about explaining the reasons for abandoning to_json and the driving forces behind our creation of RABL. This post is all about how to actually use Rails 3 and RABL in a psuedo real-world way to create a public API for your application. In other words, how to empower an “application” to become a developer’s platform.

Authentication

The first thing to determine when creating a platform is the authentication mechanism that developers will use when interacting with your API. If your API is read-only and/or easily cached on a global basis, then perhaps no authentication is necessary. In most cases though, your API will require authentication of some kind. These days OAuth has become a standard and using anything else is probably a bad idea.

The good news is that we are well past the point where you must implement an OAuth Provider strategy from scratch in your application. Nevertheless, I would strongly recommend you become familiar with the OAuth Authentication Protocol before trying to create a platform. Starting with this in mind, your next task is to pick an OAuth gem for Ruby. Searching Github, you can see there are many libraries to choose from.

The best one to start with is Rails OAuth Plugin which will provide a drop-in solution both to consuming and providing APIs in your Rails 3 applications. The documentation is fairly decent and coupled with the introduction blog post, I think reiterating the steps to set this up would be unnecessary. Follow the steps and generate an OAuth Provider system into your Rails 3 application. This is what will enable the “OAuth Dance” where developers can generate a request token and sign with that token for a user to retrieve and access token. Once the developer gets an access token, they can use that to interact with your APIs. The easiest way for developers to consume your API is to use a “consumer” OAuth library such as Ruby OAuth or equivalent in the client language.

Building your API

Now that we have authentication taken care of in our application, the next step is to create APIs for developers to interact with on our platform. For this, we can re-use our existing controllers or create new ones. You already likely have html representations of your application for users to interact with. These views are declared in a “index.html.erb” file or similar and describe the display for a user in the browser. When building APIs, the simplest way to think about them is to think of the JSON API as “just another view representation” that lives alongside your HTML templates.

To demonstrate how easy building an API is, let’s consider a sample application. This application is a simple blogging engine that has users and posts. Users can create posts and then their friends can read their streams. For now, we just want this to be a blogging platform and we will leave the clients up to developers to build using our APIs. First, let’s use RABL by declaring the gem in our Gemfile:

# Gemfile
gem 'rabl'

Next, we need to install the new gem into our gemset:

$ bundle install

Next, let’s generate the User and Post tables that are going to be used in this example application:

$ rails g model User first_name:string last_name:string age:integer
$ rails g model Post title:string body:text user_id:integer
$ rake db:migrate db:test:prepare

Nothing too crazy, just defining some models for user and posts with the minimum attributes needed. Let’s also setup model associations:

# app/models/user.rb
class User < ActiveRecord::Base
has_many :posts
end

# app/models/post.rb
class Post < ActiveRecord::Base
belongs_to :user
end

Great! So we now have a user model and a post model. The user can create many posts and they can be retrieved easily using the excellent rails has_many association. Now, let’s expose the user information for a particular user in our API. First, let’s generate the controller for Users:

$ rails g controller Users show

and of course setup the standard routes for the controller in routes.rb:

# config/routes.rb
SampleApiDemo::Application.routes.draw do
resources :users
end

Next, let’s protect our controller so that the data can only be accessed when a user has authenticated, and setup our simple “show” method for accessing a user’s data:

# app/controllers/users_controller.rb
class UsersController < ApplicationController
before_filter :oauth_required

respond_to :json, :xml
def show
@user = User.find_by_id(params[:id])
end
end

As you can see the controller is very lean here. We setup a before_filter from the oauth-plugin to require oauth authentication and then we define a “show” action which retrieves the user record given the associated user id. We also declare that the action can respond to both json and xml formats. At this point, you might wonder a few things.

First, “Why include XML up there if we are building a JSON API?” The short answer is because RABL gives this to us for free. The same RABL declarations actually power both our XML and JSON apis by default with no extra effort! The next question might be, “How does the action know what to render?”. This is where the power of the view template really shines. There is no need to declare any response handling logic in the action because Rails will automatically detect and render the associated view once the template is defined. But what does the template that powers our XML and JSON APIs look like? Let’s see below:

# app/views/users/show.rabl
object @user

# Declare the properties to include
attributes :first_name, :last_name

# Alias 'age' to 'years_old'
attributes :age => :years_old

# Include a custom node with full_name for user
node :full_name do |user|
[user.first_name, user.last_name].join(" ")
end

# Include a custom node related to if the user can drink
node :can_drink do |user|
user.age >= 21
end

The RABL template is all annotated above hopefully explaining fairly clearly each type of declaration. RABL is an all-ruby DSL for defining your APIs. Once the template is setup, testing out our new API endpoint is fairly simple. Probably the easiest way is to temporarily comment out the “oauth_required” line or disable in development. Then you can simply visit the endpoint in a browser or using curl:

// rails c
// User.create(...)
// rails s -p 3001
// http://localhost:3001/users/1.json
{
user: {
first_name: "Bob",
last_name: "Hope",
years_old: "92",
full_name: "Bob Hope",
can_drink: true
}
}

With that we have a fully functional “show” action for a user. RABL also let’s us easily reuse templates as well. Let’s say we want to create an “index” action that lists all our users:

# app/views/users/index.rabl
object @users

# Reuse the show template definition
extends "users/show"

# Let's add an "id" resource for the index action
attributes :id

That would allow us to have an index action by re-using the show template and applying it to the entire collection:

// rails s -p 3001
// http://localhost:3001/users.json
[
{
user: {
id : 1,
first_name: "Bob",
last_name: "Hope",
years_old: "92",
full_name: "Bob Hope",
can_drink: true
}
},
{
user: {
id: 2,
first_name: "Alex",
last_name: "Trebec",
years_old: "102",
full_name: "Alex Trebec",
can_drink: true
}
}
]

Now that we have an “index” and “show” for users, let’s move onto the Post endpoints. First, let’s generate a PostsController:

$ rails g controller Posts index

and append the posts resource routes:

# config/routes.rb
SampleApiDemo::Application.routes.draw do
resources :users, :only => [:show, :index]
resources :posts, :only => [:index]
end

Now, we can fill in the index action by retrieving all posts into a posts collection:

# app/controllers/posts_controller.rb
class PostsController < ApplicationController
before_filter :login_or_oauth_required
respond_to :html, :json, :xml

def index
@posts = Post.all
end
end

and again we define the RABL template that serves as the definition for the XML and JSON output:

# app/views/posts/index.rabl

# Declare the data source
collection @posts

# Declare attributes to display
attributes :title, :body

# Add custom node to declare if the post is recent
node :is_recent do |post|
post.created_at > 1.week.ago
end

# Include user as child node, reusing the User 'show' template
child :user do
extends "users/show"
end

Here we introduce a few new concepts. In RABL, there is support for “custom” nodes as well as for “child” association nodes which can also be defined in RABL or reuse existing templates as shown above. Let’s check out the result:

[
{
post: {
title: "Being really old",
body: "Let me tell you a story",
is_recent: false,
user: {
id : 52,
first_name: "Bob",
last_name: "Hope",
years_old: 92,
full_name: "Bob Hope",
can_drink: true
}
}
},
{ ...more posts... }
]

Here you can see we have a full posts index with nested user association information and a custom node defined as well. This is where the power of JSON Templates lies. The incredible flexibility and simplicity that this approach affords you. Renaming an object or attribute is as simple as:

# app/views/posts/index.rabl

# Every post is now contained in an "article" root
collection @posts => :articles

# The attribute "first_name" is now "firstName" instead
attribute :first_name => :firstName

Another benefit is the ability to to use template inheritance which affords seamless reusability. In addition, the partial system allows you to embed a RABL template within another to reduce duplication.

Wrapping Up

For a full rundown of the functionality and DSL of RABL, I encourage you to check out the README and the Wiki. In addition, I would like to point out excellent resources by other users of RABL and encourage you to read teohm’s post as well as Nick Rowe’s overview of RABL as well. Please post any other resources in the comments!

My intention with this post was to provide a solid introductory guide for creating an API and a data platform easily in Rails. If there are topics that you would be interested in reading about more in-depth related to API design or creation or building out a platform, please let us know and we will consider it for a future post.

Low hanging fruit for Ruby performance optimization in Rails

Our goal: we currently spend about 150-180 ms in Ruby for an average web request, and we think it’s reasonable to improve that to be around 100 ms.

Two of the most expensive operations in Ruby are: object allocation and garbage collection. Unfortunately, in Ruby (unlike Java), garbage collection is synchronous and can have a major impact on the performance of requests. If you’ve ever noticed that a partial template rendering occasionally randomly takes a couple of seconds, you’re probably observing a request triggering garbage collection.

The good news: it’s easy to quickly (< 10 min) see how much your app is impacted from garbage collection. You’re likely to improve your performance by 20-30% just by tuning your garbage collection parameters.

If you have a production Rails app, and you’re even remotely interested in performance, I’m going to assume you’re using the excellent New Relic service.  If you’re using REE or a version of Ruby with the Railsbench GC patches, it’s easy to turn on garbage collection stats that will be visualized by New Relic.

You’ll get pretty charts like this:

by adding:

GC.enable_stats

somewhere in your initialization.  After enabling garbage collection stats in our application, we can see that approximately 20% of our Ruby time is spent in garbage collection, which implies that there’s also a not-so-insignificant portion of time spent in object allocation.

What’s next

We last tuned our Ruby garbage collection parameters about 5 months ago and, after an initial performance boost, we’ve seen the application time spent in Ruby creep back up.  To try to bring the response time back down, our next steps are to:

  • Consider taking another pass at garbage collection parameter tuning.  Since we’ve already taken one pass at this, I’m not sure if we’ll be as impactful the second time around, but we’ll see.
  • Identify the slowest controller actions via New Relic and profile them using ruby-prof or perftools.

Performance tuning using ruby-prof is likely going to vary a lot depending on the code, but if we find techniques that might apply more broadly, we’ll be sure to blog about it here.

Links

If you’re using to_json, you’re doing it wrong

At Miso, we have been very busy in the last few months building out a large number of public APIs for our Developer Platform. In a short time, we have already seen early versions of applications built on our platform for Chrome, Windows Mobile 7, Blackberry, Playbook, XBMC among others. This has been very exciting to see the community embrace our platform and leverage our data to power additional services or bring our service to a new group of users. In this post, we will discuss how we started out building our APIs using Rails and ‘to_json’, why we became frustrated with that approach and how we ended up building our own library for API generation.

Our public APIs are designed to be unsurprising and intuitive for a developer. We chose OAuth 1.0a (soon to support OAuth 2) because this is already familiar to developers and there is rich library support across languages for this authentication strategy. The endpoints are for the most part RESTful with GET retrieving and POST / DELETE used to modify the data associated with a user. We also tried to design our API responses as simply as possible, giving every attribute a readable name, keeping node hierarchies relatively flat and not including unnecessary information from our database schema. While this may seem like good design goals when building a Public API, you might find yourself surprised at how difficult this can be using Rails and the baked in ‘to_json’ serialization that the framework provides.

Rails ‘to_json’ API generation

Let’s start by discussing the canonical approach Rails provides for generating APIs. The idea is to have the JSON and XML responses be deeply tied to the model schemas with one-to-one mappings in most cases between the database columns and the api output. This is great if you are working on internal APIs and you wish to dump the data directly into a response but less great for well-designed public APIs. Let’s look at how rendering with this to_json approach works in practice:

# app/controllers/posts_controller.rb
#...
  respond_to :json, :xml
  def index
    @posts = Post.all
    respond_with(@posts)
  end
#...

Now, when this controller action is invoked the @post object automatically has the ‘to_json’ method called which takes the ActiveRecord model and converts seamlessly to JSON output. The model can also optionally be given parameters in the model by overriding the as_json method with options:

# app/models/post.rb
class Post
  def as_json(options={})
    super(options.merge(:methods => [...], :only => [...], :include => [...])
  end
end

This would then render the associated JSON response based on the options specified. In the simplest cases this would be all you need and the Rails way works without a problem. As long as your database schema is deeply coupled with the API output then all is well. As mentioned there are also a few choices to customize the to_json output and alter the output:

  • only – Only show column names in the output as specified in this list
  • except – Show all column names except the ones specified in this list
  • methods – Include these methods nodes (without any arguments) as nodes in the output
  • include – Add child nodes (potentially nested) based on associations within the object

Alright, to recap: There is a model which contains a specific database schema and some methods. There are limited options to transforming that into JSON as described above by passing a hash of options. These options can be passed in as defaults with the ‘as_json’ method in the model itself or through the controller action.

The unravelling of ‘to_json’ begins

Well wait just a minute, what if in different API responses, you want to include different api output options? What if you want to override the ‘as_json’ defaults? Not too bad you can just do:

# app/controllers/posts_controller.rb
#...
  respond_to :json, :xml
  def index
    @posts = Post.all
    respond_with(@posts) do |format|
      format.json { render :json => @posts.to_json(:include => [...], :methods => [...]) }
    end
  end
#...

Using those settings you can change the JSON output on a per action basis. This system as described is a good high level overview of the Rails JSON generation approach. This can work in very simple and naive applications, but it should not be hard to imagine how this approach can be restrictive as well as verbose. Suppose I want to render a JSON output with only a few columns, also adding several methods and including nested options for multiple associations? That might look something like this:

# app/controllers/posts_controller.rb
#...
  respond_to :json, :xml
  def index
    @posts = Post.all
    respond_with(@posts) do |format|
      format.json { render :json => @posts.to_json(
         :only => [:title, :body, :created_at, :tags, :category],
         :include => [
            :likes => { :only => [:created_at], :include => [:author] },
            :comments => { only => [:created_at, :body], :include => [:author]  },
            :user => { :only => [:first_name, :last_name}, :methods => [:full_name] },
         :methods => [:likes_count, :comments_count])
      }
    end
  end
#...

This action code is already starting to smell a bit funny even here. This is a lot of bulk in the controller and quite redundant. This doesn’t even really seem like this belongs in the controller. This is more of a view or template concern discussing the details of a particular JSON representation. You could move that all to the model inside a method, but that actually makes things harder to follow. Already this method of generating JSON doesn’t feel quite right and begins to break down.

You may think this example is a contrived case or poor API design but consider that there’s actually not that much going on here. This type of response is commonplace in almost any public API you will see on the web. In fact, it is actually much simpler then many in the wild. Compare the above to the Instagram API.

More Frustrations with API Generation

The issues above were just the beginning of the issues we ran up against using the ‘to_json’ method because that approach is interested in ‘serializing’ a database object while we are interested in creating a relevant representation for our public API platform. The ‘serialization’ of the object so directly just didn’t quite fit what we were trying to do.

The easiest way to demonstrate the limitations that frustrated us is to show relevant examples. Let’s start with a simple idea. In our system we have ‘posts’ and we have the idea of ‘liking a post’. In our API we want to return if the authenticated user ‘liked’ a particular post in the feed. Something like:

[ { post : { title : "...", liked_by_user : true }, ...]

Notice the node ‘liked_by_user’ which contains whether or not a user has liked the given post. Assuming we have this method in the model:

class Post
  # Returns true if given user has liked the post.
  # @user.liked_by_user?(@user) => true
  def liked_by_user?(user)
    self.likes.exists?(:user_id => user.id)
  end
end

We simply want to get this boolean value into the API response with the node name ‘liked_by_user’. How would we do this in ‘to_json’? How do we pass an argument to a method? After doing some research, it was apparent that this was not particularly easy or intuitive. It would be nice to have a simple way to pass multiple arguments to a method in the model without jumping through hoops.

Let’s move onto another example. Suppose we want to change the ‘user’ association to be aliased as an ‘author’ node in the output. Let’s say we have:

class Post
  belongs_to :user
end

and we want to have the output be:

[ { post : { title : "...", author : { first_name : "...", last_name : "..." }  }, ...]

What if I just need a minor change to the value of an attribute before inserting it into the JSON? What if I need a custom node in the JSON that is not needed in the model directly? What if I want to include the value of a method only if a condition on the record is met? What if I want to reduce duplication and render a JSON hash as a child of the parent response? What if I want to glue a couple of attributes from the user to the post? Change the model and fill it with this display logic every time? Fill our controllers with complicated JSON display options? Workaround the problems by fighting with ‘to_json’ and/or monkeypatching it?

Perhaps a better approach

As we came against these issues and many more while we designed and implemented our public APIs, we butted our heads against ‘to_json’ again and again. Often we wanted the attributes defined in the schema to be renamed or modified for the representation, or we wanted to omit attributes, or we wanted to include attributes if a condition was met, we wanted to handle polymorphic associations in a clean and easy way, we wanted to keep a flat hierarchy by ‘gluing’ attributes from the child to the parent.

Furthermore, the model and/or controller was getting filled up with tons of json specific details that had nothing to do with model or business logic. In fact, these JSON responses and verbose declarations didn’t seem to belong in the model or the controller at all and were cluttering up our code. In fact, true to MVC these details of the response seemed much more appropriate in a view of some kind. This idea of storing the JSON in a view sparked an experiment. Why not just generate the JSON in a template and move all of the display details out of the model and the controller. What if the API could be crafted easily in the view where a JSON representation belongs?

We agreed that implementing APIs in a view made the most sense both conceptually and practically. The next question becomes what templating language to use to generate these APIs? Forming the XML or JSON manually in ‘erb’ seemed verbose and error-prone. Using builder seemed silly since we wanted to build APIs that work primarily in JSON. Indeed, none of the default templating languages we grab for seemed to fit. We didn’t want to painstakingly handcraft nodes manually, we just wanted a simple way to declare how our APIs should look that afforded us the flexibility we needed.

We investigated a wealth of different libraries that seemed to fit the bill from tequila, to json_builder, to argonaut and many more attempts to solve this problem. Clearly we weren’t the only ones that had experienced the pain of ‘to_json’. Perusing the READMEs of any of these libraries quickly revealed people fed up with the limitations same as we had become. Problem was every option we could find didn’t work for one reason or another. Either the syntax was awkward, the libraries weren’t maintained, there were too many bugs, or the templates became verbose and difficult to manage. After reviewing the available options, we decided to try and design our own library for creating APIs. One that would solve all the problems we had encountered thus far.

The Ruby API Builder Language

We embarked on a thought experiment before building the library. What were our frustrations with existing libraries and tools? Where do we want the JSON options to live? How did we want to specify them? What options did we want to have? What language or syntax should we use to define the output? How do we keep the options DRY and intuitive?

Early on we decided we wanted the JSON output to be defined in the views. Logic that belonged in the models would stay there where it belonged, but this was rarely the case. Most of the options were simply crafting the JSON response and clearly belongs in a template. So that meant a file living in the views folder within Rails. We also decided we didn’t want to learn a new language and that Ruby was as good an API builder as any. Why not just leverage a simple Ruby DSL to build our APIs? Why not support inheritance and partials for our APIs? Why not allow the same template to describe both the JSON and XML responses for our API?

From these design questions and several days of work, the RABL gem was born. We started using this approach and fell in love with it immediately. All of a sudden, generating APIs was easy and intuitive. Even the most complex or custom API output was very simple and maintainable through the use of inheritance, partials and custom nodes. All of this was kept neatly tucked away in a view template where it belonged without requiring any extra code in the models or worse the controller actions.

Stay Tuned

Since we built RABL, we have gotten excellent feedback from the community. We have deployed in production all of our Public APIs using RABL and we couldn’t be happier. Please checkout the README and let us know what you think! We would love to hear your experiences with building APIs on Rails or Sinatra. This post is a setup for a thorough step-by-step tutorial we plan to publish soon on generating clean JSON and XML APIs in Rails 3 using RABL.