I’m Ryan Bowlby

a devops practitioner, mtn biker, hiker, & coffee connoisseur

about me

I’m a devops engineer working to automate away the normal operations tedium. Hacking on something all day with coffee and a multi-day backpacking trip sound equaly appealing.

Here I ramble about infrastructure as code, systems architecture, configuration management, scaling, and security.

Resolve Test Kitchen SSH Failures

The kitchen-ec2 driver for test kitchen works well. I’m using it to run chef cookbook integration tests within our jenkins build jobs. Unfortunately, ec2 instances have the nasty habit of starting SSH and having it listen ahead of the appropriate SSH user account being fully provisioned/configured.

Alas, when the ec2 driver’s loop sees that SSH is up it immediately attempts to log into the instance. This often fails as the user is not appropriately configured yet.

1
2
3
E, [2014-12-04T23:01:56.379896 #5034] ERROR — default-centos-65: —-Nested Exception—–
E, [2014-12-04T23:01:56.379946 #5034] ERROR — default-centos-65: Class: Net::SSH::AuthenticationFailed
E, [2014-12-04T23:01:56.379995 #5034] ERROR — default-centos-65: Message: Authentication failed for user root@SOMEIP

The actual code being used to authenticate into the host is within the test-kitchen codebase. Several exceptions are caught and cause a retry of the ssh authentication. Unfortunately the one being thrown in this case is not caught: Net::SSH::AuthenticationFailed.

Why would you ever retry SSH when the failure is auth related? It’s not often a state you would imagine to change. However, that is exactly the case when provisioning ec2 instances.

There exists a pull request that I hope they merge sooner than later. In the interim I’ve added a nasty monkey patch of the establish_connection method. Add this cringeworthy patch the size of a jumbo jet to your kitchen ci rake file as such:

cringeworthy horror linenos:true
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
# monkey patch the shit out of kitchen, because amazon ec2 instances suck.
# Adds retries when Auth fails by rescuing Net::SSH::AuthenticationFailed.
require 'kitchen'
require 'kitchen/ssh'
require 'net/ssh'

module Kitchen
  class SSH
    def establish_connection
      logger.info("[SSH] inside monkey patched establish_connection method")
      rescue_exceptions = [
        Errno::EACCES, Errno::EADDRINUSE, Errno::ECONNREFUSED,
        Errno::ECONNRESET, Errno::ENETUNREACH, Errno::EHOSTUNREACH,
        Net::SSH::Disconnect, Net::SSH::AuthenticationFailed ]
      retries = 20

      begin
        logger.debug("[SSH] opening connection to #{self}")
        Net::SSH.start(hostname, username, options)
      rescue *rescue_exceptions => e
        if (retries –= 1) > 0
          logger.info("[SSH] connection failed, retrying (#{e.inspect})")
          sleep(20)
          retry
        else
          logger.warn("[SSH] connection failed, terminating (#{e.inspect})")
          raise
        end
      end
    end
  end
end

Just don’t tell your Mom you got it from me.

| Comments

written in chef, kitchen-ec2,

Terraform Review

Let me preface this review with how estatic I am that a tool like Terraform is finally here! I spent the last few hours putting together a set of Terraform configuration files. The result of which is the creation – on AWS – of the entire development stack for a backend service that exists at the company I work for. I’m going to expound on what I believe to be the benefits and drawbacks of Terraform.

| Comments

written Read on →

VMWare “Management Network” Failover

Just a quick post demonstrating how to failover the VMware management network to a second virtual switch. You probably want your “Management Network” traffic for your ESXi hosts to be separate from your VM traffic. Unfortunately, you don’t always have enough network capacity for two uplinks for each virtual switch. Below is a script that will effectively move the “Management Network” to a second vSwitch in instances where the uplink of the primary vSwitch becomes unavailable.

| Comments

written in InfoTech Read on →

Hyperic - Scripting Removal of Server Resources

hyperic_logo

By default the Hyperic agent will autodiscover sendmail and NTP server resources. So quite a few of Hyperic’s platforms will likely be monitoring these resources in your installation. In quite a few cases you don’t care about monitoring these resources and rather increase Hyperic performance by removing them. Also, they will likely fill up the auto-discovery screen and become a nuisance.

| Comments

written in General, InfoTech Read on →

Nagios - Mitigating False Positives

icinga logo

A common issue when monitoring thousands of services is dealing with intermittent issues and “false positives” clogging up the status page. Often when checks fail then clear on their own the issue is deemed a “false positive” by the operations staff. What’s more likely is that an actual issue was briefly observed but merely intermittent in nature (true positive).  In a perfect world when a service fails, even for a moment, you would perform root cause analysis and resolve the issue. In the real world when a service check fails the operations staff waits to see if the alert clears without intervention. How long they wait is determined by how often things show up in monitoring and clear on their own (aka flapping). The more often things alert and clear without need for intervention the longer the NOC is going to postpone a possible issue before investigating.

| Comments

written in General, InfoTech Read on →

VIM as a Python IDE

I recently began scripting in Python using the VIM editor; my editor of choice. In what became a failing effort to keep my sanity I forewent customization of the VIM settings on my personal machines. You see, I’m often tasked with editing files on servers whose VIM settings I can’t customize. I feared that if I were to become overly accustomed to any custom settings then I’d likely blurt obscenities when forced to use a vanilla VIM.

Without some tweaking of my vimrc I end up having to manually indent code in Python. Talk about a loss of productivity, having to use the space bar to indent Python code is the surest path to insanity. Mimicking the mindless repetition that’s better suited to steam powered machinery is a less than efficient use of my time. I’ve since submitted defeat and tailored my VIM settings to Python. I may occasionally blurt an obscenity when using VIM on somebody else’s machine but it’s a calculated loss. Below is a breakdown of my VIM settings. I hope others will find it useful.

| Comments

written in InfoTech Read on →