How to update your server's BIOS across the network

Posted by Tom Moertel Thu, 03 Mar 2011 22:45:00 GMT

Today at work, I had to upgrade the BIOS on some new Dell servers that had just arrived. (Dell, conveniently, shipped them with firmware six months’ out of date.)

Dell’s Linux-based BIOS updater, not entirely shockingly, didn’t work. After I had installed its prerequisites and let it run (repeatedly), it churned for a while and then gave up, offering only this:

The update failed to complete.

How helpful.

So, I decided to fall back to the tried-and-true option: the simple, DOS-based BIOS updater that Dell provides. The trick is that the updater does not contain DOS: You have to figure out how to get your servers to boot from a DOS floppy and then run the BIOS updater.

Did I mention that the servers lack floppy and CD-ROM drives?

Having been down this road before, I knew to get a FreeDOS disk image and then add the BIOS updater to it:

  1. Download the DOS-based firmware updater.
  2. Download a 2.88-MB FreeDOS floppy image with enough free space for the updater. (I used the dosdisk288.img image from the biosdisk project; see below.)
  3. Mount the image, copy the updater to it, and then unmount it. (See these instructions for more.)

Now I had a bootable floppy-disk image dosdisk288.img that contained the BIOS updater. Next, I needed some way to boot the system using that image. Since the servers had no floppy or CD-ROM drives, the next trick was making the image network-bootable.

For this, I used PXE and cobbler, an install server. We already run cobbler on provisioning servers sprinkled throughout our server farms, so it was easy to put the image on the network:

mv /tmp/dosdisk288.img /srv/dell-PER415-firmware/PER415-010203.img

cobbler distro add \
  --name=Dell-PER415-fw-010203 \
  --kernel=/usr/lib/syslinux/memdisk \
  --initrd=/srv/dell-PER415-firmware/PER415-010203.img

cobbler profile add \
   --name=Dell-PER415-fw-010203-installer \
   --distro=Dell-PER415-fw-010203

cobbler sync

The “distro add” command was where the magic happened. It told cobbler to create a fake Linux distribution whose kernel is memdisk and whose initrd is my floppy image. Memdisk is a special boot kernel designed to boot from a floppy image supplied as the initial ramdisk (initrd).

After the cobbler sync completed, I was able to boot the servers from the network by selecting the “Dell-PER415-fw-010203-installer” item from the PXE boot menu. A few moments later, I was at a DOS prompt. From there, I just ran the BIOS updater, and I was done!

For reference, here are the tools and documentation I used to accomplish my mission:

  • Ubuntu Dell BIOS – a handy page on the Ubuntu wiki discussing schemes for updating the BIOS on Dell systems
  • Dell biosdisk – an unofficial project at Dell for building bootable floppies and images for installing BIOS updates
  • MEMDISK module, part of SYSLINUX – it allows you to boot floppy and disk images from Linux bootloaders; it pretends to be a Linux kernel and boots the image given as its initrd kernel argument

Posted in
Tags , , , , , ,
3 comments
no trackbacks
Reddit Delicious

Two handy alternatives to the top command: htop and atop

Posted by Tom Moertel Thu, 10 Jun 2010 15:20:00 GMT

I recently started using two handy variants of top, the standard Unix tool for monitoring what’s going on with processes on a system.

The first, htop, uses ncurses to provide a more interactive process-viewing experience. You can surf through running processes, scrolling horizontally and vertically to reveal information that would otherwise have been clipped, information such as full command lines. Further, you can drive a cursor to select processes for commands like kill and lsof to act upon. Yes, you can see what files a process has open; you can even trace processes using strace. There’s also a handy tree view for understanding process ancestry.

The second top alternative, atop, offers more accurate accounting of processes and resource usage. It tracks all processes, even those that have lived out their brief lives between atop’s screen updates. This comprehensive accounting is helpful for understanding problems caused by herds of individually short-lived processes. With the old top, you might catch only a few of the processes in the act, but with atop, you can see the herd for what it is.

Two handy tools – check them out.

Posted in
Tags , , ,
1 comment
no trackbacks
Reddit Delicious

A couple of tips for writing Puppet manifests

Posted by Tom Moertel Thu, 15 Nov 2007 07:30:00 GMT

I recently started using Puppet to automate my server-build processes. The basic idea behind Puppet is that you create “manifests” that declare a directed graph of “resources” that represents the desired state of your machines. Puppet-managed machines on your network then query a master server to obtain the latest copy of the graph, which they then reconcile with their current states to make whatever changes are necessary to bring themselves up to date.

For the most part, everything works well. I have encountered a couple of snags when writing manifests, however, so I’m going to explain them here as reminder until I get the time to fix them in the Puppet code and send patches upstream.

First, don’t use hyphens in class names. While hyphens are legal in class names, they are not allowed in qualified variables, thus variables defined within hyphen-named classes are inaccessible from the outside world.

Second, and this one is both tricky and important, Puppet handles prerequisites for definitions by silently passing those prerequisites on to all of the resources within the definitions. Definitions, in effect, don’t really have their own prerequisites, they just pass them on to their children. But – and here’s the problem – if those child resources declare their own prerequisites, those prerequisites will overwrite the passed-on prerequisites, effectively causing them to be ignored.

This problem bit me hard when trying to create a definition for installing Ruby Gems from a local cache of gems:

define local_gem($gem) {
    $path = "/var/local/local-gems/$gem" 
    file { $path:
        ensure  => present,
        source  => "puppet://puppet/files/gems/$gem",
        require => File["local-gems-dir"],
        owner   => root,
        group   => root,
        mode    => 0664,
    }
    package { $title:
        ensure   => installed,
        provider => "gem",
        require  => [ Package["rubygems"], File[$path] ],
        source   => $path,
    }
}

The intent was to be able to declare a local gem like so:

local_gem { "sqlite3-ruby":
    gem     => "sqlite3-ruby-1.2.1.gem",
    require => Package["sqlite-devel"]
}

Thus the “sqlite3-ruby” local gem has the single prerequisite of the “sqlite-devel” package – or at least that’s what I expected. What happened on deployment was that the prerequisite was ignored because when it was passed on to the inner file and package resources, those resources had their own require parameters, and those parameters overwrote the passed-on prerequisite.

The work-around is somewhat hacky. I augmented the definition with a do-nothing resource that has no require parameter of its own. This resource does nothing but capture the passed-on prerequisites. Then I made all of the other resources in the definition include the do-nothing resource as one of their prerequisites. Thus they are made to inherit the passed-on prerequisites.

My final definition looks like this:

define local_gem($gem) {

    # dummy exec to propagate requires from local_gem
    exec { $name: command => "/bin/true" }

    $path = "/var/local/local-gems/$gem" 
    file { $path:
        ensure  => present,
        source  => "puppet://puppet/files/gems/$gem",
        require => [ Exec[$name], File["local-gems-dir"] ],
        owner   => root,
        group   => root,
        mode    => 0664,
    }
    package { $title:
        ensure   => installed,
        provider => "gem",
        require  => [ Exec[$name], Package["rubygems"], File[$path] ],
        source   => $path,
    }
}

Notice how the file and package resource both require the dummy exec resource. That’s the trick that allows them to require the prerequisites passed on from the local_gem definition.

It’s not pretty, but it works. See this email on the puppet-users mailing list for more on the problem.

Posted in
Tags , , ,
2 comments
no trackbacks
Reddit Delicious

Typo-4.0.3 instability and a minor patch for sqlite3-ruby

Posted by Tom Moertel Thu, 24 Aug 2006 04:41:00 GMT

Since I upgraded my blog from Typo 4.0.0 to 4.0.3, it has been somewhat unstable. About once a day it starts responding with “500 Internal Server Error” and stays that way until I restart it.

The root of the problem seems to be the database connection, as evidenced by this exception showing up in the production log:

SQLite3::CantOpenException (could not open database)

Unfortunately, the exception doesn’t provide anything specific to go on.

A quick look at the sqlite3-ruby code suggested that I was not going to get the specifics, either. The Ruby-based wrapper never calls sqlite3_errmsg after a call to sqlite3_open fails on behalf of SQLite3::Database.new.

A quick patch, however, fixed the problem:

--- sqlite3-ruby-1.1.0.orig/lib/sqlite3/database.rb
+++ sqlite3-ruby-1.1.0/lib/sqlite3/database.rb
@@ -109,7 +109,7 @@
       @statement_factory = options[:statement_factory] || Statement

       result, @handle = @driver.open( file_name, utf16 )
-      Error.check( result, nil, "could not open database" )
+      Error.check( result, self, "could not open database" )

       @closed = false
       @results_as_hash = options.fetch(:results_as_hash,false)

(Submitted as Ticket 5504 on RubyForge.)

Before applying the patch, opening a database at a nonexistent path results in a generic error message:

$ ruby -r rubygems -e 'require_gem "sqlite3-ruby";
    SQLite3::Database.new("/no/such/path/db")'

... could not open database (SQLite3::CantOpenException) ...

After applying the patch, we get additional error information:

... could not open database: unable to open database file
    (SQLite3::CantOpenException) ...

With the patch in place, all I have to do is wait for Typo to start acting up again. Then I’ll have some interesting information in the log.

Until then, I’m relying on cron and a short monitoring script to restart Typo when it tips into foolishness:

#!/bin/bash

url=http://blog.moertel.com/admin
addrs=tom@moertel.com

response=$(GET -sd $url 2>&1)

if [ "$response" != "200 OK" ]; then
    { echo "Response was: $response"; echo; service typo restart; } |
    mail -s "Blog site not responding! (Restarting)" $addrs
fi

We’ll see how it goes.

Update: That was fast. The error popped up again and this time the log told me something useful: “unable to open database file.” Now, why couldn’t Typo open the database file, especially since the file is perfectly fine and had been opened successfully (many times) by the very same Typo process earlier? Here’s a hint:
$ ls /proc/28788/fd | wc -l
1023

Seems like there’s a resource leak in Typo 4.0.3 (or Rails 1.1.6). Under some conditions, instead of reusing existing database connections, Typo keeps trying to open new ones. Eventually, it uses up its allotment of file descriptors and the operating system is forced to say, “That’s enough, pal,” (EMFILE).

I’ll look in to it more in the morning.

Update 2: Problem solved.

Posted in , , ,
Tags , ,
1 comment
no trackbacks
Reddit Delicious

How to make sure your servers come back up after an extended power outage

Posted by Tom Moertel Wed, 09 Aug 2006 04:35:00 GMT

If an extended power outage drains your UPS, and your servers are forced to shut down, will they automatically start up again when the power is eventually restored? It’s a good question, especially if your servers are in some distant, unattended server room. Unless you’ve tested your servers, don’t assume that the answer is Yes.

Many servers offer a BIOS configuration option that forces them to automatically power on when they receive line voltage. If your servers have this option, just set it and you’re done.

Unfortunately, some servers, including a Dell PowerEdge 1600SC that I’m using, lack this configuration option. When these servers turn themselves off as the final step of a UPS-controlled shutdown, they don’t start up again when the power is restored. Because they were shut down before the power was cut off, they think they are supposed to remain off when the power is restored. That is, they remember their on/off status across power outages.

Fortunately, there is a way to make sure these servers automatically power on: shut them down without powering them off; halt them instead. That way, when the UPS finally cuts off the supply voltage, the servers will still be in their “on” state, and they will remember this state across the outage. Later, when the power is restored, the servers will automatically restore their pre-outage state and power up.

With Fedora Core Linux and Network UPS Tools, it’s not difficult to make sure the servers are halted instead of powered off, but the implementation isn’t obvious. To spare you the digging, here are the important bits.

  1. When the power fails and the UPS-monitoring software decides that the batteries are almost depleted, it will initiate a server shutdown using the command defined in the /etc/ups/upsmon.conf file. The default command is this:
    SHUTDOWNCMD "/sbin/shutdown -h +0" 
    
  2. The shutdown command will tell the init process to enter runlevel 0, which is the prepare-to-halt-the-system runlevel.
  3. The init process will stop all of the running services in an orderly fashion, and then, as the last step, invoke the final script in the shutdown process: /etc/rc.d/rc0.d/S01halt.
  4. The final lines of the S01halt script will power off the server. Unless, that is, the file /halt is present, in which case the script will halt the server instead.

Thus the trick is to make sure that the /halt file does exist. The trick turns out to be easy to pull off; just redefine the shutdown command in /etc/ups/upsmon.conf:

SHUTDOWNCMD "/bin/touch /halt; /sbin/shutdown -h +0" 

And that’s all there is to it!

Posted in , ,
Tags , , , , , , ,
2 comments
no trackbacks
Reddit Delicious