Prospects + Backers = Teams Questions


Pave is transforming how young careers grow. Read on for stories of success, inspiration and insight from our entire community.

11 September 2013

POSTED BY justinatpave Pave’s technical foundation

Pave was built from the start with three goals from a technical perspective:

- Build something that enables us to work and iterate quickly.

- Build something that scales.

- Build something we’re proud of.

Over the last year, this is the architecture we’ve developed. It works well, but we’re always interested in how we can improve. Suggest some in the comments.

1/ Amazon Web Services (AWS) Virtual Private Cloud (VPC) + OpenVPN

Before getting into other details, we should discuss a framework for securing and controlling access to the network as a whole. The only external points of access to our internal network are through our VPN server (we use OpenVPN, which works for now) or through the Elastic Load Balancer (ELB) which fronts our webservers.

Within that network, we further segregate into several tiers of machines, with different permissions. Broadly speaking, there are three main concepts: 

Application tier. This is where the production and canary web servers live. This is segregated so that, if something were compromised, attackers wouldn’t have access to almost any resources on other machines.
- Services tier. This hosts machines like our databases, development servers, servers for asynchronous processes, etc… It allows pretty free communication within the tier, but external connections are only allowed from VPN.
- VPN / auth / Network File System (NFS) / etc… - this isn’t really one tier, but several each with very specific network configurations to only allow traffic to/from necessary hosts and on pre-defined ports  For example, our dev machines can contact our NFS server specifically for NFS, but it’d be impossible to Secure Shell (SSH) from NFS to the development machines.

2/ Web application.

Pave is written in Python and based on the Flask microframework. The web servers themselves are EC2 instances running nginx + uwsgi to handle requests, although requests are first routed through an ELB.  This framework should allow us to scale horizontally for a long time before running into problems.  Adding new machines is as simple as provisioning an instance in EC2, running a command to set it up, and adding it to the ELB.  This could be entirely automated if that were a priority at the moment.

We also rely on tons of Python libraries.  Some of the major ones are MongoAlchemy, boto (AWS integration), pycrypto, Werkzeug, WTForms, and Fabric.  We’re deeply indebted to the open source community, and hope to contribute back soon.

3/ Data storage.

We use MongoDB running on EC2 instances for our primary data storage.  We run three m1.small instances for the mongo config servers, several m1.large instances for our mongod (and their replicas), and then each application or dev server runs its own mongos. This has a few advantages:

Easy sharding. This isn’t terribly difficult with PostgreSQL or MySQL, but can involve some application-layer logic once the data gets large enough.
- Easy replication. Spin up server, add to the mongo config, and it handles the rest.
- Forces us to think about queries without joins. At large scale, joins across relational tables aren’t really feasible. This forces us to confront that reality from day one. That said, it is annoying to require 2-3 queries in Mongo for what could be done in a single SQL query. My bias here probably related to coming from Facebook, where (despite using MySQL), joins were unavailable. This led to the development of “objects” and “associations”, essentially nodes and edges in the graph. Mongo requires that type of thinking from the start.
- Schemaless. Adding data to a collection (the SQL equivalent would be adding a column to a table) is trivial. Depending on your implementation, this can be an annoying migration process with other data stores.
- No need for memcache/elasticache. Best practice with Mongo is to use any additional RAM for the DB machines themselves, rather than attempting to front with another caching layer. It’s nice to entirely eliminate an extra layer of architecture. 
- Our data is also safe. AWS’ Elastic Block Store (EBS) volumes themselves are replicated, and then we take a snapshot every 30 minutes or so. This is on top of Mongo itself already doing replication.

4/ Development.

All the engineers work on a system that closely mimics the production environment.  Right now everyone follows one of two approaches:

- SSH’s into a development instance hosted in EC2. This is my workflow, owing to a few advantages like not needing to worry about losing/breaking laptops affecting productivity, not needing to schlep around my work laptop to be productive, no worry about a lost laptop losing important data, and no hardware constraints (e.g. I can create a new dev machine, of any instance size I need, within a few minutes).
- Run Ubuntu locally, and develop there.  There’s a little less latency when working (not usually a problem, but can be annoying sometimes), plus you’re not reliant on an internet connection for work.

We run an internal version of the site, this gets updated to the latest code every 5 minutes. This adds two advantages - first, a safe place inside our VPN (added security) to run all admin tools. Second, a place for everyone in the company to always be using the latest version of the code. Many bugs are caught here.

Finally, there’s also a canary tier which fully mimics production.  This lets us test things like load balancers, CDN configs, and anything else that’s hard to test without full production replication. We almost always deploy new code here before pushing live.

5/ Hosting photos and other uploaded files

We use AWS S3 for this.  I used to manage the Photos product eng team back at Facebook, so I had a pretty good idea of how complicated it is to set up good, efficient, large scale hosting for photos (hint: we even wrote our own file system). S3 abstracts away all that complexity, which makes our lives much much easier.

6/ Static content.

Static content is data that doesn’t change per request or viewer. For example, your Quora feed is entirely personalized for you, however the Javascript and CSS to render that feed is (very likely, unless they’re testing something) the same for everyone. We use a few tools to help us:

- LESS for styling. LESS is a great way to abstract out a bit of CSS (e.g. it gives you mixins, more obvious scoping, variables). It makes development easier, and when we deploy it just gets compiled into per-page CSS. Each page view only needs 1-2 requests for CSS, which increases performance.
- Require for JS. Require allowed us to think about our JS in broadly two domains, page-specific JS and modules that get imported many places. It also allows us to package JS so that, on any individual page request, we typically just send across one JS file which helps with performance. We haven’t adopted some of the new (and seemingly cool) JS frameworks, although we’re definitely considering it.
- AWS Cloudfront as a CDN. We front all our CSS/JS with Cloudfront. This reduces network time spent requesting the resources, making our site faster for the end user.
7/ Tools.
We use many services and tools, I’ll just throw a few out there:

- Jenkins for Continuous Integration (CI)We run our unit and integration tests against each commit. Any time something breaks, the entire engineering team gets and email.
- Phabricator for code review. Every line of code in Pave’s codebase has been reviewed by another engineer. I think this had led to a better overall quality of engineering, plus ensures everyone has an idea of what’s changing in the system.
- Github for repository hosting. Buy don’t build.
- Asana for tasks, bugs, and sprint tracking. Asana strikes me as a second best solution to almost every problem. It’s not purpose built for development. It’s also not purpose built for sales, interview feedback, or tracking what we want to stock in our kitchen. But it works reasonably well for all of these jobs, and lets us unify a solution across the company. That said, I’m well known for being its biggest internal advocate :)
- HipchatPretty cool for internal communication.
- Optimizely. Great way to give autonomy to non-engineers for setting up experiments across the site without taking engineering resources. Isn’t great for non-trivial tests though.
- MixpanelTracking all our funnels, conversion rates, and experiments. We’re watching data from Mixpanel continuously on “information radiators” via…
- Geckoboard. A simple utility that lets us tie together data from disparate sources and display it. The data gets displayed on a TV mounted on the wall that the entire company can see.
- NewRelicI can’t say enough good things about NewRelic for visibility into the application. It handles server monitoring, watches for errors, lets us know about performance concerns, and allows us to track all of that over time.
Pingdom. First line of defense against major problems. If the site were to go down or a major page start fataling, half the company would have a text message within a minute or two.
- Loggly. All of our logs are sent to loggly where they’re indexed and searchable. It’s not the best tool in the world (NewRelic is better for log aggregation), but it’s easily searchable and has a decent interface.
- Iterable. A service that we use to help send triggered email. This has enabled non-technical team members to edit email copy and formatting.

That’s the core of it.  What setups do other startups use?