I’m Ryan Bowlby

a devops practitioner, mtn biker, hiker, & coffee connoisseur

about me

I’m a devops engineer working to automate away the normal operations tedium. Hacking on something all day with coffee and a multi-day backpacking trip sound equaly appealing.

Here I ramble about infrastructure as code, systems architecture, configuration management, scaling, and security.

Terraform Review

Let me preface this review with how estatic I am that a tool like Terraform is finally here! I spent the last few hours putting together a set of Terraform configuration files. The result of which is the creation – on AWS – of the entire development stack for a backend service that exists at the company I work for. I’m going to expound on what I believe to be the benefits and drawbacks of Terraform.

| Comments

written Read on →

VMWare “Management Network” Failover

Just a quick post demonstrating how to failover the VMware management network to a second virtual switch. You probably want your “Management Network” traffic for your ESXi hosts to be separate from your VM traffic. Unfortunately, you don’t always have enough network capacity for two uplinks for each virtual switch. Below is a script that will effectively move the “Management Network” to a second vSwitch in instances where the uplink of the primary vSwitch becomes unavailable.

| Comments

written in InfoTech Read on →

Hyperic - Scripting Removal of Server Resources

hyperic_logo

By default the Hyperic agent will autodiscover sendmail and NTP server resources. So quite a few of Hyperic’s platforms will likely be monitoring these resources in your installation. In quite a few cases you don’t care about monitoring these resources and rather increase Hyperic performance by removing them. Also, they will likely fill up the auto-discovery screen and become a nuisance.

| Comments

written in General, InfoTech Read on →

Nagios - Mitigating False Positives

icinga logo

A common issue when monitoring thousands of services is dealing with intermittent issues and “false positives” clogging up the status page. Often when checks fail then clear on their own the issue is deemed a “false positive” by the operations staff. What’s more likely is that an actual issue was briefly observed but merely intermittent in nature (true positive).  In a perfect world when a service fails, even for a moment, you would perform root cause analysis and resolve the issue. In the real world when a service check fails the operations staff waits to see if the alert clears without intervention. How long they wait is determined by how often things show up in monitoring and clear on their own (aka flapping). The more often things alert and clear without need for intervention the longer the NOC is going to postpone a possible issue before investigating.

| Comments

written in General, InfoTech Read on →

VIM as a Python IDE

I recently began scripting in Python using the VIM editor; my editor of choice. In what became a failing effort to keep my sanity I forewent customization of the VIM settings on my personal machines. You see, I’m often tasked with editing files on servers whose VIM settings I can’t customize. I feared that if I were to become overly accustomed to any custom settings then I’d likely blurt obscenities when forced to use a vanilla VIM.

Without some tweaking of my vimrc I end up having to manually indent code in Python. Talk about a loss of productivity, having to use the space bar to indent Python code is the surest path to insanity. Mimicking the mindless repetition that’s better suited to steam powered machinery is a less than efficient use of my time. I’ve since submitted defeat and tailored my VIM settings to Python. I may occasionally blurt an obscenity when using VIM on somebody else’s machine but it’s a calculated loss. Below is a breakdown of my VIM settings. I hope others will find it useful.

| Comments

written in InfoTech Read on →

Nagios Plugin: Check_dell

Just finished a Python script to check Dell hardware components via the omreport utility. It’s designed to be used client-side via NRPE (or check_by_ssh). Additional usage information can be found within the scripts docstrings as well as the --help option. Some gotchas:

  • In some instances NRPE will not execute scripts that start with #!/usr/bin/env. In these instances you will need to specify the full path to python .

  • The plugin expects a symlink of omreport in /usr/sbin, you may need to add one if the OMSA install script didn’t. I hard-coded the path because relying on the shell environments PATH variable is a security concern; especially in cases where the plugin is setuid root or called via sudo

  • When starting OMSA use srv-admin.sh start on Redhat-based systems or /etc/init.d/dataeng starton Debian-based. The order that the services start is crucial. The necessary device drivers must be loaded prior to the loading of the IPMI module.

| Comments

written in General, InfoTech Read on →

Bonjour Isn’t Evil, But..

Just finished watching a Google Tech Talk on Bonjour presented by Dr. Stuart Cheshire. It’s a very simple introduction to Apple’s implementation of zeroconf. Bonjour aka Zeroconf aka Avahi isn’t the evil I thought it was; and I don’t know why I assumed it was evil. I guess it’s a mixture of hating that Avahi is on by default in most RH-based distros coupled with my misconception that bonjour was appletalk rebranded. Appletalk had a reputation for being chatty so I just assumed bonjour inherited that gene.

Truth is bonjour doesn’t introduce any non-standard whiz-bang protocols or “chatty” communications into the LAN. It’s simple multicast mixed with creative use of DNS PTR and SRV records. It uses some of the same tactics ARP uses to update all devices based off the requests and replies broadcast by other devices (nothing too surprising there).

Don’t go getting the impression I posted this just to evangelize the obvious utility or practicality of zeroconf. It’s one of those technologies that’s implicitly trusting of the local network. With today’s ubiquitous use of wifi, often public wifi, that’s a major fault. All OSes have these local network technologies that operate on the assumption that “all devices are inherently good”. Then ambivalently choose something like Kerberos with Active Directory. Where kerberos believes that all networks are inherently evil.. to the point where it doesn’t even trust the network enough to send an encrypted hash of a users password. Apple then blends these contradictions into one tremendously retarded practice of allowing the local DHCP server to specify the primary domain controller for use when authenticating on your local system. Read that last sentence again for effect. Lets remember that next time “it just works”.

System Pref –> Accounts –> Login Options –> Join –> Search Policy –> change from “Automatic” to “Local Directory” –> commence acting like you knew.

| Comments

written in InfoTech