Narayan Newton

Syndicate content
Updated: 1 hour 8 min ago

Go Comcast Go

Wed, 04/16/2008 - 11:39pm

Lets all give it up for Comcast, come on now:

Go Comcast Go!
Ra Ra Ree!
Kick Em In The Knee!
Ra Ra Rass
Kick Em In...The Other Knee!

A couple weeks ago I got several disturbing reports from the midwest, mainly of the "drupal.org has been down for 6 hours" variety. My response to which was frantically going to drupal.org, watching it load perfectly and then enjoying a large steaming cup of wtf.

This lovely period was capped by helping an employee of a Drupal company in Boston try to track down this issue when it happened to them. He was very patient indeed and sent me the numerous tcpdumps, traceroutes, pings requests and netstat printouts that I wanted. It was quite the interesting issue. Any computer in their office could ping the individual drupal.org webnodes, but not the master virtual IP. SYN packets were getting to the load balancer managing the master VIP, but the SYN,ACK was never getting back to them.

The most frustrating part of this was that every once in awhile a connection would go through and get to ESTABLISH..then die and we would go back to connections waiting for an ACK.

We beat at this for awhile and then someone called Comcast and we discovered a lovely feature called "Smart Packet Detection." This "protects" the intertubes from clogging by noting when many packets are going to a single IP address and then blocking that IP for awhile.....except apparently for SYN packets? That part makes no sense.

Anyway, they requested that feature be turned off and connectivity immediately returned.

Categories: Planet OSL

Easter Sunday With Heartbeat And MySQL

Tue, 03/25/2008 - 2:31pm

I admit that I'm feeling somewhat relieved today. I've had a large migration on my todo list for quite awhile now. Last Sunday I finally sat down for the 16 hours it required and got it done (with the help of several local coffee shops). First, a little back story.

Drupal.org has had a somewhat standard DB setup, a simple master-master active/passive replication system, with DNS ptrs to control the failover. I've been unhappy with this for several reasons, mainly the lack of automated and instantaneous failover. However, I've also had issues with having a single MySQL instance on each server. These are very powerful servers, with multiple cores/cpus, and they could be very highly concurrent. MySQL/InnoDB has some issues with that (has known issues with lock contention at high levels of concurrency) and these issues prevent full utilization.

There is yet another issue though, drupal.org and its subsites are a somewhat odd situation. Not only is this a "production infrastructure", but it is used as a test-case and in many ways a development environment. This means we have many copies of the drupal.org database, which are used for testing. MySQL doesn't know these arn't production and treats their pages the same as any other database in memory. This quickly became a problem and even with a very large InnoDB buffer pool, it fills after about a day, starving the production DB and the write buffer (among other things).

My solution to all of this was to split the various databases across three MySQL instances, one for drupal.org, one for all the other production sites and one for testing sites. I could then tune them individually and I would get less lock contention across the instances as concurrency scaled up.

So, I decided to put this plan into action Sunday and at the same time deploy Heartbeat (Linux-HA) to enable automated failover. This plan required more IP addresses than were available on the current VLAN, so roll switching VLANs into all of this. Looking back, I probably should have split these up into multiple days....but I do like a challenge (sadly).

I spent quite a bit of time planning out all of this before hand. The first problem was Heartbeat. This software package comes in two versions (named version 1 and version 2 amazingly enough). The first being very simple and limited (it can only handle two nodes, can't have its config updated "online"..etc) and the second being very feature rich and very very complicated (and configured through daemon-managed XML files....hoora). So, I read the various papers checked into cvs for heartbeat and experimented with it on scratchvm.drupal.org and my workstation. Like many things, once you understand how the developers are thinking it becomes quite simple.

I sketched out a plan and on Sunday started things going:

Step 1: Fail DB1 to DB2
This was easy enough, I've done it many times.

Step 2: Bring DB1 up on the new VLAN and new IP
This was also simple. The only odd thing was that I changed the hostname to db1-static. Why? Because this "main" IP is one of the few on this box that won't eventually be moved around and managed by heartbeat.

Step 3: Bring up heartbeat
I had already written the cluster configuration for hearbeat, so all I had to do was check all of the configuration into our central management system, create authentication keys for the cluster...check them into our protected repository...etc..etc..etc. Eventually heartbeat came up and brought up six new IP addresses with it: db1-main-vip, db2-main-vip, db1-other-vip, db2-other-vip, db1-test-vip and db2-test-vip. The breakout is fairly obvious here, one IP for each MySQL instance on the master and the slave. Heartbeat manages all of these.

Step 4: Bring up the MySQL instances
Again...fairly simple. I wrote a new config for each instance and had the Gentoo initscript start each one in turn. The only odd part was splitting everything up into distinct subdirs (as the default MySQL configuration definitely assumes it is the only instance running).

Step 5: Load the DBs and start replicating off the slave
This was a bit tricky. Keep in mind that the slave is running a single MySQL instance and the master is now running 3...each of which will be replicating a subset of the databases on the slave. It wasn't that bad, but I was careful to get it working correctly.

Step 6: Setup the backend network and IPTables
I brought up 3 more private IPs on the backend network between the two database boxes. These are to allow replication between distinct IP's on the backend network, each IP mapped to a MySQL instance. This brings up an interesting point. Lets look at the main MySQL instance. This will be listening on db1-main-vip. But wait...what if db2-main-vip gets failed over to the master server? This will bring up a new IP on the master which that instance has to listen on. Also, we now have a backend IP which that instance has to listen on. Simple enough right? Ya..no. MySQL can only listen to one IP address (or all IP addresses, bug #14979). So, I had to put some iptables rules in place to rewrite these other IP addresses to one on which the MySQL instance was listening.

Step 7: Start replication and fail back to the MySQL instances on db1
Not much to say here, but I was very happy when I saw it working.

Step 8: Rinse and repeat for db2

Step 9: Profit?

This was a rather long day, but the entire setup is working quite well now. I still need to get cluster monitoring going and there are quite a few scripts I use that need to be ported to the idea of running multiple instances, but I'm finally mildly proud of our DB setup. (and it looks awesome when you diagram it)

Categories: Planet OSL

DrupalCon: Performance Tuning

Sat, 03/01/2008 - 9:38pm

Today, I got sent a draft of the slides I and several others will be working off of at DrupalCon in Boston on Tuesday. This session will cover Performance Tuning for Drupal deployments. It will cover pretty much every aspect, from Linux tuning to opcode caches to MySQL tuning. I am particularly looking forward to David Strauss's section on database design and Scott Mattoon's on DTrace.

Speaking of which, one of the interesting parts of this session is that its one of the panel sessions where multiple people are cooperating on bringing together a quality presentation. Seeing the slides and knowing the people involved, I'm quite hopeful. They may even make up for myself being involved. The presenters are:

  • Khalid Baheyeldin
  • Jeremy Andrews
  • David Strauss
  • Narayan Newton
  • Scott Mattoon
  • Robert Douglass

This is quite a lineup (again with one rather disappointing exception). I'm looking forward to listening from the stage.

Categories: Planet OSL