Bringing real-world coding education to colleges

CodePath goes back to school

To bridge the gap between industry best practices and what’s taught in academia we plan to have CodePath classes running in hundreds of universities within the next three years. CodePath has spent years training senior engineers and designers inside of companies like Facebook, Dropbox, and Google on the latest technologies in mobile and web development. We believe that bringing the same classes to universities and introducing modern technologies like iOS, Android, and Node.js in the context of building practical apps will complement the traditional computer science curriculum.

As students near graduation, this can also help expose and prepare them for the thousands of opportunities in tech companies in Silicon Valley as well as other tech hubs. However, this program is not just for existing computer science students. We believe that introducing technology through building web and mobile apps will attract a diverse population who had not previously considered computer science.

The gap between school and industry

By the time I entered college as a Computer Engineering major, a nerd father, nerd summer camps, and years of nerd training prepared me well for my freshman year. With several programming languages under my belt, I felt at ease in the early logic and algorithm classes. And yet, in 1999, I had a vague sense that I was on the sidelines as my more adventurous friends were tackling PHP and throwing around words like “shopping cart” and “session cookies”. By the time I decided to explore more, it seemed like the crowd had already moved on to Cold Fusion and ASP. I still remember the general sense of frustration and confusion I felt after graduation as I was trying to understand the actual technical landscape that was so different from my textbooks.

This is not a condemnation of computer science programs. Critics of traditional degree programs will sometimes cite the fact that, in practice, most programmers don’t write compilers or operating systems or rebalance binary trees or route through graphs. While it’s true that web and mobile apps require a completely different set of tasks, I don’t regret learning the fundamentals. You can be a successful car mechanic without ever learning about the principles of combustion or aerodynamics, but you’ll probably never build your own race car.

However, there is a distinct gap between what’s taught in school and what’s practiced in industry. Our courses are designed to run in conjunction with university programs and bridge that gap. As students prepare for the transition into industry, the practical skills we give them will open thousands of doors. On the flip side, companies spend millions on university recruiting, focusing on only the top 50 schools and running interview drills that they know to be imperfect assessments. We hope that our program will pave the way for students from thousands of colleges and universities to discover and prepare for the opportunities they’re looking for.

“Started from the bottom, now I am an intermediate iOS developer. Bootcamp fulfilled all the promises Tim made. ”

– Stanford Student in Spring 2015 iOS Class

CodePath pilots iOS Class at Stanford

Earlier this year, we ran our first university class in iOS development for a group of 20 Stanford students sponsored by Andreessen Horowitz. Our instructor co-taught with a student instructor and a team of three TAs, and students participated either for independent study credits or no credits. We used the same policies that we have for our professional classes: unexcused absences or incomplete weekly projects will result in being withdrawn from the class. Despite having overburdened class schedules and other commitments, 15 students were able to successfully submit all projects and present a final group project which is an app they designed and built. Students in the program connected with senior iOS engineers from Wealthfront, Amazon, Linkedin, and Nest. Andreessen Horowitz also agreed to provide access to their portfolio of companies to all students in the program. To top it off, CodePath provided a $5,000 cash prize to the top student projects.

shop discount bridesmaid dresses,occasion dresses
CodePath Founder, Tim Lee, speaking to a student-led iOS class at Stanford University

This program is not just for senior computer science students. Over the past few years, we’ve run an extremely successful program teaching designers how to build native mobile apps that was initially created for Facebook. Even though the designers have limited or no programming experience, they’re driven to create the apps they can see in their imagination. The class isn’t just about building apps like the one shown below. Designers gain confidence in their ability to pursue their own ideas, can work more seamlessly with engineers, and earn greater respect from their more technical peers.

WorkDay

Workday Image
A productivity app built by designers in a CodePath class

Teaching designers successfully has helped us confirm that programming can be learned relatively quickly even by those with no prior experience. Unfortunately, in many universities there is a belief that only those who have been programming since childhood or with exceptional abilities in math or logic can learn new languages in a reasonable timeline. Furthermore, programming is rarely thought of as a field for those seeking to express creativity in their work.

We believe that a big part of the achievement gap in programming is in fact a motivation gap.

Most non-technical people are puzzled when programmers refer to themselves as the creative type or go on about the joys of seeing their ideas come to life. It doesn’t help that most introductory programming courses emphasize that the field is only for the ‘logical’ and the ‘gifted’. It’s no wonder that most creative, intelligent young people steer clear of dreary computer science classes. We believe that a big part of the achievement gap in programming is in fact a motivation gap. By motivating students with a clear vision of the things they can create, they’ll be inspired to learn more and to get through the difficult parts of the learning process. We’re in the early days of our university program, but we’re excited by the impact that we’ll have in this community and beyond.

Thinking bigger: a free engineering school

“What do you do?” or “What does your startup do?” are tired questions in San Francisco, a city where you are measured by your works. The superficial startup ideas, the digital smugness, and the unnatural culture of tech celebrity are sometimes enough to make you want to throw your smartphone into the ocean and dive in after it. Go away from San Francisco for a while though, and you’ll start to remember the thing that drew you in the first place. You remember that, beneath the fads and clumsy attempts of first-time entrepreneurs, lives a city of dreamers and doers defying cynics and conventional wisdom.

Our dream is to create a modern school for engineers to attend throughout their career. Even more audaciously, we want to offer this school for free. Various experiences have led us down this path: our experience as founders exposed us to the pains of hiring, and our experience as engineers gave us the aspiration to be lifelong learners. We believe we can connect the dots and create a better solution for both.

A founder’s perspective

A founder really only has two critical roles: company direction and hiring. A common mantra in the startup world is, “execution is everything”, and, ultimately, it is the team that must execute the vision. Founders often share the trait of being control freaks, so letting go of that control is one of first challenges that they will face. That doesn’t mean letting the company run amuck because setting company direction and expectations is the second role of a founder. I don’t have any sympathy for founders that gripe about their team. All I see is a founder that has failed in their only job.

Unfortunately, hiring is brutally hard. Referral is still the best way to hire, and when that well runs dry, the options are not good. Sometimes, I felt the only thing we could do was watch the months slip by as we tried job boards, contingency recruiters, in-house recruiters, engineering speed dating, engineering auctioning, and any number of ridiculous things.

I find it incredible that large companies use talent acquisition as a major strategy in hiring. I understand how it happens though, when you have executives staring at quarterly hiring targets. Acquisition is probably the only way to expedite the hiring process, so you’re left to solve the challenge of hammering together engineers from different company cultures into a cohesive team. To me, though, that’s like panning for gold when you can build a gold factory instead.

In fact, there is a vast supply of engineers. Engineers who are talented, humble, and eager to learn and contribute. Why, then, do companies drool over Ivy league new graduates whose only experience is school projects when there is this other pool of talent available? At the end of the day, it’s about risk. A big part of hiring is mitigating risk and most experienced engineers don’t have “clean” resumes. Does too many years at the same company mean a lack of initiative? Does too few years mean they are flighty? Also, their technical experience almost certainly doesn’t completely intersect with the tech stack your company is using, so how quickly can they ramp up?

At the end of the day, I can understand the justification for passing on many of these people. Companies avoid risk in hiring so much that one veto amongst six interviews is enough to disqualify a candidate. This is troubling, however, as we have all been in that candidate’s shoes. At some point in our career, a hiring manager or investor took a risk on us, perhaps despite our previous experience, and maybe that was just the opportunity that we needed. In the happiest stories, we went on to be very successful for the company, justifying the risk. Unfortunately, the stories often don’t have happy endings, which causes companies to write off a large number of candidates.

We believe we can change that and broaden the candidate pool beyond those that fit a narrow description.  Our school is the perfect environment for individuals to demonstrate initiative and the ability to master new skills quickly. Imagine companies bringing interesting projects to the class or challenges like Google’s Summer of Code. Opportunities to collaborate are opportunities to build trust, and trust is the thing that makes referral based hiring so much better than traditional recruiting. I would have gladly exchanged my many hours spent interviewing with time spent mentoring, probably with more productive results. Properly executed, the school could recycle an endless supply of qualified engineers back to grateful companies.

An engineer’s perspective

The only constant is change, and there’s no truer statement in engineering. An engineer’s career is composed of “learning” years and “plateau” years. It’s deadly to stay too long in the “plateau” years. As a result, professional engineers are accustomed to, and take pride in, frequent self-teaching. It’s a good thing that engineers enjoy self-teaching because it’s the only option available currently. It’s certainly not very practical or efficient to go back to college or even get a graduate degree, and community college and other continuing education programs are no match for modern technology. Instead, you’re left sifting through a thousand different eager voices on the internet, most of whom got something working, but also don’t really know what they’re talking about. You sit there, like some kind of technology archeologist, painstakingly piecing together which APIs were thoughtfully designed and which were some historical remnant. Imagine learning how to be a blacksmith by walking into an empty smithy with nothing but a handful of StackOverflow posts. Maybe possible, but very slow.

On the flip side, think back to some of your more fruitful “learning” years. They are probably characterized by two things: a substantial challenge and a significant mentor. In engineering, there are trivial challenges and there are interesting challenges, but both are time consuming for a newcomer. For example, debugging a well-known library issue can take as much time as designing a clever caching strategy. Mentors streamline the trivial challenges by filling in the communal tribal knowledge. Of course, reference texts are also great resources, but by being comprehensive, important nuggets are lost in a sea of trivia. Lack of mentorship breeds great inefficiencies, and this is made obvious when one observes technologists today praising “recent” advances that are simply rediscovered concepts pioneered decades ago.

We believe that our school brings the same advantages as structured mentorship, and allows engineers to challenge themselves throughout their career. Interestingly, it also brings about another important advantage. Collaboration with like-minded peers is so rewarding that it can keep a team together long after the product or engineering challenge loses its luster. However, if you’ve ever tried to chase that feeling by meeting random engineers at a beer-fueled engineering meetup, you’ve probably been disappointed. A learning environment is infinitely more effective at building meaningful collaborative relationships. The relationships may be its own end, may lead to professional collaboration, or may lead to your next cofounder.

CodePath

When we founded CodePath, we wanted to build it on several principles. First and foremost is the standard for excellence. We want to attract the best and that requires the highest quality classes. Second, classes are project-based because engineering is best learned by doing. Third, projects are built in groups because engineering is not a solo activity in industry. To us, product, design, and collaboration are as much a part of engineering as libraries and frameworks, especially in a startup.

Making the classes free is a risky decision. There’s no such thing as a free lunch, and that will make engineers wary of low quality or some kind of catch, so that’s an impression we’ll have to overcome. When we set out to build this school, we were determined to build a school that we would attend ourselves. As a student, I don’t have thousands to spend on classes every year. As long as we can prove that we can train engineers to a certain standard, there’s money on the table from startups and big companies alike.

We’ve already been using our curriculum in corporate training for the past several months, and we’re excited to bring the classes to other engineers. Are you interested? Apply for our iOS evening bootcamp or our Android evening bootcamp. These initial bootcamps are targeted towards experienced developers interested in iOS and Android. If you would like to mentor a group, email us at help@thecodepath.com.

Follow us @thecodepath or join our mailing list for updates.

You’re Overthinking It

You comb Hacker News daily, marveling at the neatly packaged startup tales, uber-effective best practices, super clever engineering solutions, and lots and lots of links to websites filled with Helvetica, minimalism, and pastel colors. You’ve attended Lean Startup workshops, read Four Steps to the Epiphany, and subscribe to the Silicon Valley Product Group blog.

Honestly, it’s all very intimidating.

My product advice, from one overthinker to another overthinker – throw it all away. I mean, read the articles, enjoy the stories, and try to form your own opinions, but I wouldn’t take it too seriously.

A year into my first startup, my first major product epiphany was to never, never, ever try to build a product you couldn’t be a user for. That may be obvious, but I still read people discussing strategies for building products that they don’t use. There is no better user study, no more accurate persona than asking yourself what is good. There are probably product people out there that can do it, but, no offense, it’s probably not you, and it’s certainly not me.

There are many pros to building a product that you would use. Actually using a product (and I mean really using it), allows you to access the powers of intuition, an infinitely more valuable product tool than reason. Your intuition explains to you in a moment what it takes your reason an hour to break down. Your reason will lead you down a dozen wrong roads.

Another way of saying that is: if you think it’s cool, it’s probably cool. If you think it sucks, it probably sucks.

However, building a product for yourself doesn’t give you a free pass from user research, personas, and all the other things that product gurus tell you to do. Spend a year having the same product discussions with the same group of people, and the discussions will lose all meaning. Talking to five people outside of the company will bring you back to earth real fast.

One last thing…whatever you build, make sure it looks good and is the highest quality possible.

But wait, you say, look at eBay, Amazon, and Craiglist – they look like crap. Implement an MVP with product/market fit, and it doesn’t matter what it looks like. That’s true sometimes, but it also depends on where you are on Maslow’s hierarchy of needs. The lower on the pyramid your product is, the crappier it can look. If your product is core to helping people make money, pirate movies, or sell your useless couch, you don’t need a designer. But if you’re high on the pyramid, ugly/clunky UI makes it impossible to for people to see your vision.

If you read Steve Jobs biography, it talks about one of the three original Apple philosophies: an odd word called impute. It’s basically a philosophy around impressions. On product, they said, “If we present them in a slipshod manner, they will be perceived as slipshod”.

Speaking of the biography, I’ll wrap up with my impression of that book. It’s a story about a product genius, but it’s a story with as many missteps as triumphs. I take the moral of the story to be: forget the experts, the know-it-alls, and the doubters. Trust yourself and your vision, and go build something.

What are subnets?

At Miso, we recently launched an integration with AT&T that allows the Miso iPhone app to connect with your TV. Have you ever thought that browsing hundreds of cable channels on your TV was a painful process? Our vision with Miso is that you can turn on your favorite show, see what your friends are watching, or browse additional information about a show, all from your phone.

To make this happen, Miso connects with the AT&T receiver over wifi, and we’re learning from our early adopters that home networks are more complicated than they used to be. Wifi routers are definitely the norm, and it’s not uncommon to have multiple wifi routers cover the living room, the master bedroom, and the guest bedroom upstairs. If you’re plugging in a wifi router that you bought from Best Buy, you’re probably creating a new subnet, which is something you almost never need to care about.

As it turns out, Miso’s integration with AT&T and DIRECTV both assume that a home only has one subnet, which is not always true. This has caused some support issues and prompted some people within Miso to ask, what is a subnet? Good question! As a programmer, I have some basic knowledge of network engineering, but answering this question prompted me to dig a bit deeper.

What is a subnet?

Subnets and IP addresses go hand in hand. Together, they are the building blocks for how computers find each other on a network. For instance, every time you visit a website, you first have to find the computer on the internet that hosts that website.

Addressing on the internet works a lot like mailing addresses for your apartment. The zip code for my apartment is 94105; if you were to Google that, you could see that I live within a certain region of San Francisco. Subnets are like zipcodes – they divide up the internet world into small regions.

The subnet is actually part of the IP address. You’ve probably seen an IP address for a computer. My IP address at this coffee shop is 192.168.5.199. The first three numbers (192.168.5) is the subnet (or network address) and the last number (199) is my laptop’s address (host address). The guy sitting at the next table probably has an address like 192.168.5.198. Because we’re on the same subnet, we could do things like share playlists with iTunes or exchange files on a shared folder. The three numbers that form the subnet are somewhat random, in the same way that zipcodes are somewhat random.

For nerds only: the tricky part is that which part of the IP address is the subnet actually varies, and it determines how large the subnet is. The mechanisms for dividing up address space has evolved over the years, from classful addressing to classless addressing. We’re at the point where we’re out of IPv4 addresses, thus the creation of IPv6 which has many more available addresses.

So, great, subnets are like zipcodes. They group computers together in some logical way. Laptops in the same cafe are probably on the same subnet and can do stuff like share iTunes playlists, and laptops that are in different cafes probably can’t. Yet, there must be some way to communicate between subnets. The websites that I visit every day are hosted on computers that are all on different subnets.

How does information travel between subnets?

Sending information across subnets is a lot like sending mail across the country. After writing the address on the envelope, I deliver the letter to my local post office. My local post office looks at the zipcode and begins the process of routing the letter through hubs to bring the letter in the general vicinity of the destination address. As the letter approaches the destination address, it will be given to another local post office, which ultimately puts it on a mail truck that goes to the actual address.

In the same way, let’s say that I want to visit google.com. From my laptop, the address for google.com is 74.125.224.114. My laptop has no idea where that address is, so it does the equivalent of delivering my request to the local post office, which is the wifi router of this coffee shop (you may have also heard this referred to as a gateway). The coffee shop’s wifi router also doesn’t know where that address is, so it forwards the request to the internet provider (ISP) of the coffee shop. The internet provider has a better sense for zip codes, so it starts the process of sending the packet in the right direction.

For nerds only: This process is called IP routing and there are several algorithms for doing it and they vary with IPv4 and IPv6. The algorithms differ in how chatty they are and what size network they work well with, among other things. Many algorithms are based on Dijkstra’s algorithm.

So how come I can’t share my iTunes playlist with any computer?

I can access google.com from anywhere, but I can’t access that laptop from the other coffee shop. The reason for that is that there are many more computers (and IP addresses) than there are mailing addresses, so they can’t all be public. If you work in a large business, then your department probably has an internal mail stop that’s handled by your company’s mailroom.

In the same way, internet addresses are divided up into public and private addresses. Public addresses can be reached from anywhere, but private addresses can only be accessed in local networks. Public addresses are handed out by the internet equivalent of the post office called the Internet Assigned Numbers Authority (IANA). If I want a public mailing address, I have to contact the post office. If I want a public IP address, then I have to contact my internet provider. My internet provider is a member of a Regional Internet Registry (RIR), which is a member of the IANA. It’s fairly common that one home will have one public IP address, although you can pay your ISP for more.

So, why doesn’t AT&T and DIRECTV work across multiple subnets?

Hopefully, you have a high level sense for internet addressing, subnets, and routing, although we still haven’t explained why our AT&T and DIRECTV integrations don’t work across multiple subnets yet. We need to understand one more concept – multicast.

Most network communication is two computers talking to each other directly. In some cases, you want to communicate with multiple computers at the same time. Broadcast and multicast are the two ways of doing that. In order for Miso to find the cable receiver, it makes an announcement over multicast asking if there are any cable receivers in the network. That’s the same way your computer discovers printers on your local network.

As you can imagine, multicast announcements can be pretty noisy. Imagine if my laptop made a multicast announcement and every computer in the world had to pay attention; things would get pretty chaotic. Therefore, multicast messages only go out to current subnet. If my cable receiver is on a different subnet, it won’t receive my multicast announcement.

There are a number of ways of overcoming this problem which include adjusting multicast TTL levels, router port forwarding, or configuring routers as bridges. TTL (Time-to-live) tells the multicast message how many hops to take before it should give up. By default, this value is 1, which means that it will deliver the multicast messages to all computers on the local subnet only. Router port forwarding and configuring routers as bridges are other mechanisms for allowing the multicast traffic to hop across to an adjacent subnet; however, for our AT&T integration, they are less ideal because they involve users understanding how to administer their home networks.

Hopefully, you found this high level guide useful. Feedback welcome!

Vendor – Bringing Bundler to iOS

Using and sharing iOS libraries is tragically difficult given the maturity of the framework, the number of active developers, and the size of the open source community.  Compare with Rails which has RubyGems and Bundler, it’s no surprise that Rails/Ruby has a resource like The Ruby Toolbox, while iOS development has…nothing?

Are iOS developers fundamentally less collaborative than Rails developers?  When developing on Rails, if I ever find myself developing anything remotely reusable, I can almost always be certain that there is a gem for it (probably with an obnoxiously clever name).

I don’t think the spirit of collaboration is lacking in the iOS developer community; rather, there are a few fundamental challenges with iOS library development:

  • Libraries are hard to integrate into a project.  Granted, it’s not *that* hard to follow a brief set of instructions, but why can’t this process be more streamlined?
  • No standardized versioning standard.  Once a library is integrated into a project, there is no standard way of capturing which version of the library was used.
  • No dependency specification standard (this is a big problem).  Why does facebook-ios-sdk embed it’s own JSON library when there are better ones available?  So many libraries come embedded with common libraries that we all use – and worse than that, who knows what version they’re using!  Not only can this lead to duplicate symbols, but library developers essentially have to start from scratch instead of leveraging other existing libraries.

Of course, coming up with a naming, versioning, and dependency standard for iOS libraries and convincing everyone to adopt it is a daunting task.  One possible approach is follow the example of Homebrew, a popular package manager for OS X.  Homebrew turned installing and updating packages on OS X into a simple process.  Instead of convincing everyone to comply to some standard, Homebrew maintains a set of formulas that helps describe commonly used packages.  These formulas allow Homebrew to automate the installation process as well as enforce dependencies.  This works well for Homebrew, although it puts the burden of maintaining package specifications in one place, rather then distributed as it is with Ruby gems.

There seems to be a need for some type of solution here.  When we write Rails libraries, we write the readme first to help us understand what problem we’re trying to solve (Readme Driven Development).  Below is the readme for an iOS packaging system called Vendor.

Vendor – an iOS library management system

Vendor makes the process of using and managing libraries in iOS easy.  Vendor leverages the XCode Workspaces feature introduced with XCode 4 and is modeled after Bundler. Vendor streamlines the installation and update process for dependent libraries.  It also tracks versions and manages dependencies between libraries.

Step 1) Specify dependencies

Specify your dependencies in a Vendors file in your project’s root.

source "https://github.com/bazaarlabs/vendor"
lib "facebook-ios-sdk"  # Formula specified at source above
lib "three20"
lib "asi-http-request", :git => "https://github.com/pokeb/asi-http-request.git"
lib "JSONKit", :git => "https://github.com/johnezang/JSONKit.git"

Step 2) Install dependencies

vendor install
git add Vendors.lock

Installing a vendor library gets the latest version of the code, and adds the XCode project to the workspace.  As part of the installation process, the library is set up as a dependency of the main project, header search paths are modified, and required frameworks are added.  The installed version of the library is captured in the Vendors.lock file.

After a fresh check out of a project from source control, the XCode workspace may contain links to projects that don’t exist in the file system because vendor projects are not checked into source control. Run `vendor install` to restore the vendor projects.

Other commands

# Updating all dependencies will update all libraries to their latest versions.
vendor update
# Specifying the dependency will cause only the single library to be updated.
vendor update facebook-ios-sdk

Adding a library formula

If a library has no framework dependencies, has no required additional compiler/linker flags, and has an XCode project, it doesn’t require a Vendor formula. An example is JSONKit, which may be specified as below. However, if another Vendor library requires JSONKit, JSONKit must have a Vendor formula.

lib "JSONKit", :git => "https://github.com/johnezang/JSONKit.git"

However, if the library requires frameworks or has dependencies on other Vendor libraries, it must have a Vendor formula.  As with Brew, a Vendor formula is some declarative Ruby code that is open source and centrally managed.

An example Vendor formula might look like:

require 'formula'

class Three20 < Formula
  url "https://github.com/facebook/three20"
  libraries libThree20.a
  frameworks "CoreAnimation"
  header_path "three20/Build/Products/three20"
  linker_flags "ObjC", "all_load"
  vendors "JSONKit"
end

Conclusion

Using iOS libraries is way harder than it should be, which has negatively impacted the growth of the open source community for iOS.  Even if I was only developing libraries for myself, I would still want some kind of packaging system to help me manage the code.  Vendor essentially streamlines the flow described by Jonas Williams here.  Unfortunately, programmatically managing XCode projects isn’t supported natively, but people have implemented various solutions, such as Three20Victor Costan, and XCS.

Open Questions

  • Is there an existing solution for this?
  • Would this be a useful gem for your iOS development?
  • Why hasn’t anyone built something like this already? Impossible to build?

Low hanging fruit for Ruby performance optimization in Rails

Our goal: we currently spend about 150-180 ms in Ruby for an average web request, and we think it’s reasonable to improve that to be around 100 ms.

Two of the most expensive operations in Ruby are: object allocation and garbage collection. Unfortunately, in Ruby (unlike Java), garbage collection is synchronous and can have a major impact on the performance of requests. If you’ve ever noticed that a partial template rendering occasionally randomly takes a couple of seconds, you’re probably observing a request triggering garbage collection.

The good news: it’s easy to quickly (< 10 min) see how much your app is impacted from garbage collection. You’re likely to improve your performance by 20-30% just by tuning your garbage collection parameters.

If you have a production Rails app, and you’re even remotely interested in performance, I’m going to assume you’re using the excellent New Relic service.  If you’re using REE or a version of Ruby with the Railsbench GC patches, it’s easy to turn on garbage collection stats that will be visualized by New Relic.

You’ll get pretty charts like this:

by adding:

GC.enable_stats

somewhere in your initialization.  After enabling garbage collection stats in our application, we can see that approximately 20% of our Ruby time is spent in garbage collection, which implies that there’s also a not-so-insignificant portion of time spent in object allocation.

What’s next

We last tuned our Ruby garbage collection parameters about 5 months ago and, after an initial performance boost, we’ve seen the application time spent in Ruby creep back up.  To try to bring the response time back down, our next steps are to:

  • Consider taking another pass at garbage collection parameter tuning.  Since we’ve already taken one pass at this, I’m not sure if we’ll be as impactful the second time around, but we’ll see.
  • Identify the slowest controller actions via New Relic and profile them using ruby-prof or perftools.

Performance tuning using ruby-prof is likely going to vary a lot depending on the code, but if we find techniques that might apply more broadly, we’ll be sure to blog about it here.

Links

Easy Monitoring of Varnish with Munin

If you’re looking for a reverse proxy server, Varnish is an excellent choice. It’s fast, and it’s used by Facebook and Twitter, as well as plenty of others. For most sites, it can be used effectively pretty much out of the box with minimal tuning.

Like many decently-sized Rails apps, we leverage a lot of open source code. Dozens of gems and plugins, a variety of cloud services, Varnish and Nginx for caching and load balancing, and various persistence solutions. The point is, as our app usage has grown over the last year, we’ve had our share of stressful, on-the-fly debugging while our app was down. That’s not the best time to learn about all the fun nuances and interactions of your technology stack.

It’s a good idea to know what your services are doing and the key metrics to watch, so you’re better prepared when you hit those inevitable scaling pain points. New Relic has been tremendously useful for monitoring and debugging our database and Rails app. The rest of this post goes over some key metrics for Varnish and setting up Munin to monitor them.

Optimizing and Inspecting Varnish

Unless your application has an extremely high volume of traffic, you likely won’t have to optimize Varnish itself (e.g., cache sizes, thread pool settings, etc). Most of the work will be in verifying that your resources have appropriate HTTP caching parameters (Expires/max-age and ETag/Last-Modified). You’re most of the way there if you do the following:

  • Run Varnish on a 64-bit machine. It’ll run on a 32-bit machine, but it likes the virtual address space of a 64-bit machine. Also, Varnish’s test suites are only run on 64-bit distributions.
  • Normalize the hostname. e.g., www.website.com => website.com, to avoid caching the same resource multiple times. Details here.
  • Unset cookies for any resource that should be cacheable. Details here.

Varnish includes a variety of command line tools to inspect what Varnish is doing. SSH into the server running Varnish, and let’s take a look.

Inspecting an individual resource

First, let’s look at how Varnish handles an individual resource. On a client machine, point a web browser to an resource cached by Varnish. On the server, type:

$ varnishlog -c -o ReqStart <IP address of client machine>

The output of this command will be communication between the client machine and Varnish. In another SSH terminal, type:

$ varnishlog -b -o TxHeader <IP address of client machine>

The output of this command will be communication between Varnish and a backend server (i.e., an origin server, the actual application). Try reloading the resource in the browser. If it is cached correctly, you shouldn’t see any communication between Varnish and any backend servers. If you do see something printed there, inspect the HTTP caching headers and verify they are correct.

Varnish statistics

Now that we’ve seen that Varnish is working for an individual resource, let’s see how it’s doing overall. In your SSH session, type:

$ varnishstat

The most important metrics to note here are the hitrate and the uptime. Varnish has a parent process whose only function is to monitor and restart a child process. If Varnish is restarting itself frequently, that’s something to be investigated by looking at its output in /var/log/syslog.

Other than that, check out Varnishstat For Dummies for a good overview.

It’s great that we can check on Varnish fairly easily, but the key is to automate this process; otherwise, it can be very difficult to detect warning patterns early. Also, it’s not realistic to have a huge, manual, pre-flight checklist to check on the health of all your services. Enter Munin…

Get Started with Munin in 15 minutes

Munin is a monitoring tool with a plug-in framework. Munin nodes periodically report back to a Munin server. The Munin server collects the data and generates an HTML page with graphs. The default install of Munin contains a plug-in for reporting Varnish statistics. The Varnish plug-in includes a variety of graphs, including the one below.

Installing Munin

If you’re installing Munin on an Ubuntu machine (or any distribution that uses apt), use the commands below. For other platforms, see the installation instructions here.

For every server you want to monitor, type:

$ sudo apt-get install munin-node

Designate a server to collect the data. The server can also be a Munin node. On the server, type:

$ sudo apt-get install munin

Configuring Munin

For each node, open the configuration file at /etc/munin/munin-node.conf. Add the IP address of the Munin server.

allow ^xxx.xxx.xxx.xxx$

After you modify the configuration file, restart the Munin node by typing:

$ sudo service munin-node restart

For the server, open the configuration file at /etc/munin/munin.conf. Add each node that you want to monitor.

[Domain;serverA]
  address xxx.xxx.xxx.xxx
  use_node_name yes

Choose any value you like for Domain and serverA above; the names are purely for organization. When the Munin server was installed, it also installed a cron job that runs every 5 minutes and collects data from each node. After editing the configuration file, wait 5 minutes for the charts to be generated. If you’re impatient, type:

$ sudo -u munin /usr/bin/munin-cron

View Munin Graphs

If you have lighttpd or Apache, point it at /var/cache/munin/www. If the charts have been generated properly, there should be an index.html file in that directory.

Troubleshooting Munin

If the Munin charts aren’t being generated, make sure that the directories listed in /etc/munin/munin.conf exist and have appropriate permissions for the user, munin.

Try manually executing munin-cron and see if there is any error output.

Look at /var/log/syslog for any Munin-related errors.

Conclusion

That’s it! Varnish is optimized and working correctly, and Munin is reporting the important stats so you can sleep easy at night. Enjoy!

Additional Resources

Web caching references

Caching Tutorial – Excellent overview of web caching by Mark Nottingham.
Things Caches Do – Overview of reverse proxy caches like Varnish and Rack-Cache.
HTTP 1.1 Caching Specification – Official HTTP 1.1 Caching Specification.

Varnish references

A Varnish Crash Course For Aspiring Sysadmins
Varnishstat for Dummies
Varnish Best Practices
Achieving a High Hitrate

Munin references

Munin Tutorial