SHOCKER: Vagrant base box using RPMs

I’ve been using some of the base boxes available from http://www.vagrantbox.es/ as a starting point for lots of Vagrant VMs recently, but came unstuck when the version of puppet in use on the base box was substantially different from our production environment (2.7 v 2.6.8 in our production environment). (I was working on alt_gem, an alternate package provider for maintaining gems outside the RVM in use by puppet)

At first I thought it would be simple enough to downgrade puppet on one of my Vagrant VMs, but then I discovered that nearly all of the CentOS/Red Hat vagrant boxes install ruby & puppet from tarballs, which is balls frankly, shouldn’t we be using packages for everything?!  (Kris Buytaert says so, so it must be true)

So instead of ranting, I tweaked an existing veewee CentOS template to install puppet & chef from RPMs, for puppet, it uses the official puppetlabs yum repo, for chef it uses the frameos packages. (I’m a puppet user, so I’ve only tested the puppet stuff, chef is at least passing the “veewee validate” tests).

You can grab the box here: https://dl.dropbox.com/u/7196/vagrant/CentOS-56-x64-packages-puppet-2.6.10-chef-0.10.6.box

 

To use it in your Vagrant config, make sure this is in your Vagrantfile:

  # Every Vagrant virtual environment requires a box to build off of.
  config.vm.box = "CentOS-56-64-packages"

  # The url from where the 'config.vm.box' box will be fetched if it
  # doesn't already exist on the user's system.
  config.vm.box_url = "https://dl.dropbox.com/u/7196/vagrant/CentOS-56-x64-packages-puppet-2.6.10-chef-0.10.6.box"

 

I’ve sent a pull request to Patrick to get the new template included in veewee, and a pull request to Gareth to get the box listed on www.vagrantbox.es.

Now, time to go back to what I was doing originally before I got side tracked 🙂

 

 

Faking Production – database access

One of our services has been around for a while, a realy long time.  It used to get developed in production, there is an awful lot of work involved in making the app self-contained, to where it could be brought up in a VM and run without access to production or some kinds of fake supporting environment.  There’s lots of stuff hard coded in the app (like database server names/ip etc), and indeed, and there’s a lot of code designed to handle inaccessible database servers in some kind of graceful manor.

We’ve been taking bite sized chunks of all of this over the last few years, we’re on the home straight.

One of the handy tricks we used to get this application to be better self-contained was avoid changing all of the database access layer (hint, there isn’t one) and just use iptables to redirect requests to production database servers to either local empty database schema on the VM, or shared database servers with realistic amounts of data.

We manage our database pools (master-dbs.example.com, slave-dbs.example.com, other-dataset.example.com etc) using DNS (PowerDNS with MySQL back end), in production, if you make a DNS request for master-dbs.example.com, you will get 3+ IPs back, one of which will be in your datacentre, the others will be other datacentres, the app has logic for selecting the local DB first, and using an offsite DB if there is some kind of connection issue.  We also mark databases as offline by prepending the relevant record in MySQL with OUTOF, so that a request for master-dbs.example.com will return only 2 IPs, and a DNS request for OUTOFmaster-dbs.example.com will return any DB servers marked out of service.

Why am I telling you all of this?  Well, it’s just not very straight forward for us to update a single config file and have the entire app start using a different database server. Fear not, our production databases aren’t actually accessible from the dev environments.

But what we can do is easily identify the IP:PORT combinations that an application server will try and connect to.  And once we know that it’s pretty trivial to generate a set of iptables statements that will quietly divert that traffic elsewhere.

Here’s a little ruby that generates some iptables statements to divert access to remote, production, databases to local ports, where you can either use ssh port-forwarding to forward on to a shared set of development databases, or to several local empty-schema MySQL instances:

require “rubygems”
require ‘socket’

# map FQDNs to local ports
fqdn_port = Hash.new
fqdn_port[“master-dbs.example.com”] = 3311
fqdn_port[“slave-dbs.example.com”] = 3312
fqdn_port[“other-dataset.example.com”] = 3314

fqdn_port.each do |fqdn, port|
puts “#”
puts “# #{fqdn}”
# addressess for this FQDN
fqdn_addr = Array.new

# get the addresses for the FQDN
addr = TCPSocket.gethostbyname(fqdn)
addr[3, addr.length].each { |ip| fqdn_addr << ip }

addr = TCPSocket.gethostbyname(‘OUTOF’ + fqdn)
addr[3, addr.length].each { |ip| fqdn_addr << ip }

fqdn_addr.each do |ip|
puts “iptables -t nat -A OUTPUT -p tcp -d #{ip} –dport 3306 -j DNAT –to 127.0.0.1:#{fqdn_port[fqdn]}”
end
end

And yes, this only generates the statements, just pipe the output into bash if you want the commands actually run.  Want to see what it’s going to do?  Just run it.  Simples.

The New Toolbox

In days gone by, any computer guy worth his salt had a collection of boot floppies, 5.25″ & 3.5″, containing a mix of MS-DOS, DR-DOS, Toms Root Boot & Norton tools. These days passed and the next set of essentials was boot cd-r, containing BartPE, RIPLinux, Knoppix etc. People quickly switched to carrying these tools USB sticks, smaller, easier to change, great when the dodgy PC you were trying to breathe life into supported USB booting.

I think there’s a better way, based on the last 3 days of hell spent setting up what should have been identical touchscreen machines (no cd, slow USB interfaces)

Your new toolkit is a cheap laptop, with a big hard disk, running the following:

  1. Your favourite Linux distro (I’ve used Ubuntu for this laptop)
  2. tftpd, dhcpd & dnsmasq setup for PXE booting other machines from this laptop (FOG uses dhcpd for all it’s automatic DHCP magic, use dnsmasq for simple local DNS, required for Unattended)
  3. FOG Cloning System
  4. Unattended Windows 2000/XP/2003 Network Install System
  5. CloneZilla PXE Image (for good measure)
  6. RIPLinux PXE Image

Why?  USB booting stills seems troublesome, installing Windows from flash seems very slow.  Nearly everything supports PXE these days, if it has a built in ethernet port, it’s pretty much guaranteed to support PXE booting.  There is nothing like the feeling of being able to image a machine into FOG over a 1Gb crossover cable in a matter of minutes.  Got everything working? image it and walk away, safe in the knowledge that if somebody comes along and breaks things, you can image it back in minutes, instead of having to do another clean install and build all your updates & software back on top.

There’s a little bit of plain in getting all of separate packages to run from the one /tftpboot/pxelinux.cfg/default, but it’s just a matter of careful copy & paste from the canned configs.

WRR DNS with PowerDNS

I had an interesting challenge in work recently, we have 3 data centres running our applications, currently the RR DNS system does what it’s supposed to, spreads the data round each of the 3 DCs evenly.  This works fine when all of your data centres have a similar capacity.  But ours don’t.  This causes problem when your load/traffic gets to the point where one of the DCs can’t cope.  Now, there are many expensive and complicated solutions to this, this how ever isn’t one of them, it’s quite simple, has it’s weaknesses, but as you’ll see it’s also quite elegant.

Background

Our infrastructure already relies heavily on MySQL replication & PowerDNS, both of those are installed on all our public machines, indeed, we have a large MySQL replication loop with many spokes off the loop, ensuring that all of the MySQL data is available everywhere.  PowerDNS is used for both internal & external DNS services, all backed off the MySQL backend on the aforementioned MySQL replication loop.  This is important to us, as this solution required no new software, just some configuration file tweaks & same database table alterations.

Overview

Each record is assigned a weight. This weight will influence the likelihood of that record being returned in a DNS request with multiple A records. A weight of 0 will mean that the record will always be in the set of A records returned. A weight of 100 will mean that the record will never be returned (well, almost never).

Method

  1. Add an extra column to the PowerDNS records table, called weight, this is an integer.
  2. Create a view on the records table that adds random values to each record every time it is retrieved.
  3. Alter the query used to retrieve data from the records table to use the view and filter on the weight and random data to decide if the record should be returned.

This is achieved by using the view to create a random number between 0 and 100 (via rand()*100).

create view recordsr AS select content,ttl,prio,type,domain_id,name, rand()*100 as rv, weight from records;

We use this SQL to add the column:

alter table records add column `weight` int(11) default 0 after change_date;

The random data is then compared against the record weight to decide if the record should be returned in the request. This is done using the following line in the pdns.conf file:

gmysql-any-query=select content,ttl,prio,type,domain_id,name from recordsr where name=’%s’ and weight < rv order by rv

For small sample sets (100), the results are quite poor & the method proves to be inaccurate, but for larger sets, 10,000 and above, the accuracy improved greatly.  I’ve written some scripts to perform some analysis against the database server & against the DNS server itself.  To test the DNS server, I set cache-ttl=1 and no-shuffle=on in pdns.conf.  With the cache-ttl=1, I waited 1.1 seconds between DNS queries.

Here’s some results, sample-pdns.pl was used to gather this data:

Sample Size = 1,000

#### WRR DNS Results
dc1: 462, 46.2% (sample size), 23.38% (total RR)
dc2: 514, 51.4% (sample size), 26.01% (total RR)
dc3: 1000, 100% (sample size), 50.60% (total RR)
total_hits: 1976, 197.6% (sample size), 100% (total RR)

Desired priorities were:
dc1 2/100, 80%
dc2 5/100, 50%
dc3 0/100, 100%

Sample Size = 10,000

#### WRR DNS Results
dc1: 10000, 100% (sample size), 50.57% (total RR)
dc2: 5821, 58.21% (sample size), 29.43% (total RR)
dc3: 3952, 39.52% (sample size), 19.98% (total RR)

pos-1-dc1: 5869, 58.69% (sample size), 29.68% (total RR)
pos-1-dc2: 2509, 25.09% (sample size), 12.68% (total RR)
pos-1-dc3: 1622, 16.22% (sample size), 8.20% (total RR)
pos-2-dc1: 3332, 33.32% (sample size), 16.85% (total RR)
pos-2-dc2: 2548, 25.48% (sample size), 12.88% (total RR)
pos-2-dc3: 1540, 15.4% (sample size), 7.78% (total RR)
pos-3-dc1: 799, 7.99% (sample size), 4.04% (total RR)
pos-3-dc3: 790, 7.9% (sample size), 3.99% (total RR)
pos-3-dc2: 764, 7.64% (sample size), 3.86% (total RR)

total_hits: 19773, 197.73% (sample size), 100% (total RR)

#### Desired priorities were:
dc3 60/100, 40%
dc2 40/100, 60%
dc1 0/100, 100%

As you can see, with the larger sample size, the weighting becomes much more transparent.

dc1 appeared in the returned records 100% of the time, as expected, dc2 appeared 58.21% (desired percentage was 60%) and dc3 appeared 39.52% (desired percentage was 40%).

What is possibly more interesting & relevant is the number of times a particular dc appears in the top slot (pos-1) of the returned results, this is the A record most likely to be used by the client.  dc1 appears in the top slot 58.69% of the time, with dc2 appearing 25.09% and dc3 16.22%.  These results diverge from the desired prioroties quite a bit, but are still in order with the desired distribution.

Advantages

  1. No new code/binaries to distribute
  2. Reuse existing infrastructure
  3. Easy to roll-back from.

Disadvantages

  1. Fairly coarse grained controls of load balancing (no feedback loop)
  2. At least 1 site should have a weight of 0
  3. No gurantee on number of records that will be returned in a query (other then records with a weight of 0)
  4. Increased load on the database generating 1 or more random numbers on each query against the view

jBPM Community Day

Friday 6th June 2008 was the first jBPM Community Day, held in the Guinness Store House in Dublin, this is practically on my doorstep, and as we’ve been looking at jBPM for some pilots recently, I couldn’t not go.

The speakers on the day were Tom Baeyens, Joram Barrez, Paul Browne and Koen Aers. It was great to hear that jBPM is being used in all sort of environments, in some very large projects and most of all the direction of the project from the project leaders. It was also good to hear about local take up in & around Ireland (there were guests from all over Europe, including some Americans based in Budapest)

Tom & the rest of the team are taking their collective experience in the BPM and building the Process Virtual Machine, and state engine that can be used to execute processes described in many different languages, starting with jPDL, but already on the horizon are BPEL and Seam PageFlow. The PVM looks set to be the definitive state machine for process management, with plugin interfaces for persistence, task management etc.

It was a great day, many thanks to all of those who contributed to the smooth running & interesting content, and selection of a great venue!

[It’s only just struck me what a great venue, making a product that’s as consistently good as Guinness requires clearly documented processes, which soon becomes clear when you take the tour of the Store House that describes the process involved in taking the raw ingredients and producing something as fine as a smooth pint of Guinness]

Questions for the jBPM Community/Things I’m going to try and answer over the coming weeks

  • Where’s the absolute beginners guide? [or, as this is in a community, where can I start one and what needs to be in it? :-)]
  • What are the requirements/guidelines on replacing the jbpm-console or integrating functionality into your own app?
  • What are the interface points/techniques in PVM for other languages?
  • Drools/jBPM – what are the integration scenarios?
    • populate Drools with data/beans in a node of a process?
    • do both things operate independently?
  • Integration with authentication systems? (AD/LDAP instead of SQL based accounts)

ssh-vulnkey

There’s a flaw in ssh-vulnkey, it doesn’t always show you the name of the file with an offending blacklisted key in it. Here’s a couple of ways round this:

For a small machine, inspect the files by hand:

strace ssh-vulnkey -a 2>&1 | grep ^stat64| grep -v NOENT| cut -d” -f 2| sort | uniq | xargs vi

Or, a little longer, using ssh-vulnkey to find all relevant keys & reprocess them displaying the filename & then the result of the ssh-vulnkey for the individual file:

strace ssh-vulnkey -a 2>&1 | grep ^stat64| grep -v NOENT| cut -d” -f 2| sort | uniq | xargs -i bash -c “echo ; echo {} ; ssh-vulnkey {};”

This really is a dirty hack, using strace to extract the files ssh-vulnkey and then reprocess them individually, there are a million ways this could be done better, but not on a single bash line 🙂

Exchange to ICS

I found this post Ryan Hadley a few days ago, which I got working with a little bit of time, I noticed that Thunderbird was displaying all-days events oddly, so I checked the VEVENT info being generated & tweaked to work correctly with Thunderbird/Lightening. I also dropped in the URL of the event in OWA & fixed it for situations where there are public & private names for the OWA/Exchange instance, handy when you want to go and amend an entry etc.

Hope you find it useful.

Exchange2ICS.tar.gz

The definitive guide to Apache, Subversion & AD LDAP on Debian

After struggling for ages with different guides on ldap, Apache & Subversion, I found the following guide, and everything just worked after following it. Kudos to Sander.

You can read the article in full here.

Here’s the: http://www.jejik.com/articles/2007/06/apache_and_subversion_authentication_with_microsoft_active_directory/

Making Thunderbird more Mutt like

I used to be a big mutt fan, but with the growing amount of HTML mail I recieve, it became too much of a chore, combined with the fact that IMAP offline support is a bit kludgy (I’ve used both isync & offlineimap) I abandoned mutt some time ago and move to Thunderbird.

Thunderbird has better offline IMAP support, but it’s very mouse driven, but there are some handy extensions that can make it easier to use from the keyboard. I’m using GMailUI and keyconfig. GMailUI gives you j/k support, and single key archive (these are UI features lifted from GMail, but we all know that they came from vi/mutt originally)

I don’t use Thunderbird’s spam filters, I have a sendmail/cyrus/spamassassin mail setup, and I keep it trained using folders on the IMAP server, this also means that the rest of my family should get the benefit of my spam training.

A shorter version of this is that I want to press a single key to move a mail to a certain folder, without invoking Thunderbird Junk Mail stuff.

keyconfig to the rescue!

  • Install keyconfig
  • Open the keyconfig menu (Tools, Keyconfig)
  • Click “Add a new key”
  • Call it what you like (mine is MoveToJunk)
  • Enter the following code:
    MsgMoveMessage('imap://simon@mccartney.ie/INBOX/Junk');

    Where simon@mccartney.ie is the name of the IMAP account I’m using and Junk is a subfolder of the Inbox (as far as Cyrus is concerned, everything is a subfolder of the Inbox, unlike other IMAP systems).

  • Click OK
  • Assign a key against it (I use Shift-J, which meant I also had to un-map the existing Junk Control keys using keyconfig, look As Junk & As Not Junk and click reset.
  • Ahhh, keyboard heaven!