Posted by Tom Moertel
Thu, 03 Mar 2011 22:45:00 GMT
Today at work, I had to upgrade the BIOS on some new Dell servers that
had just arrived. (Dell, conveniently, shipped them with firmware six
months’ out of date.)
Dell’s Linux-based BIOS updater, not entirely shockingly, didn’t
work. After I had installed its prerequisites and let it run
(repeatedly), it churned for a while and then gave up, offering only
this:
The update failed to complete.
How helpful.
So, I decided to fall back to the tried-and-true option: the simple,
DOS-based BIOS updater that Dell provides. The trick is that the
updater does not contain DOS: You have to figure out how to get your
servers to boot from a DOS floppy and then run the BIOS updater.
Did I mention that the servers lack floppy and CD-ROM drives?
Having been down this road
before,
I knew to get a FreeDOS disk image and then add the BIOS updater to it:
- Download the DOS-based firmware updater.
- Download a 2.88-MB FreeDOS floppy image with enough free space for the updater. (I used the dosdisk288.img image from the biosdisk project; see below.)
- Mount the image, copy the updater to it, and then unmount it. (See these instructions for more.)
Now I had a bootable floppy-disk image dosdisk288.img that contained
the BIOS updater. Next, I needed some way to boot the system using
that image. Since the servers had no floppy or CD-ROM drives, the
next trick was making the image network-bootable.
For this, I used PXE and cobbler, an install server. We already run cobbler on provisioning servers sprinkled throughout our server farms, so it was easy to put the image on the network:
mv /tmp/dosdisk288.img /srv/dell-PER415-firmware/PER415-010203.img
cobbler distro add \
--name=Dell-PER415-fw-010203 \
--kernel=/usr/lib/syslinux/memdisk \
--initrd=/srv/dell-PER415-firmware/PER415-010203.img
cobbler profile add \
--name=Dell-PER415-fw-010203-installer \
--distro=Dell-PER415-fw-010203
cobbler sync
The “distro add” command was where the magic happened. It told cobbler to create a fake Linux distribution whose kernel is memdisk and whose initrd is my floppy image. Memdisk is a special boot kernel designed to boot from a floppy image supplied as the initial ramdisk (initrd).
After the cobbler sync completed, I was able to boot the servers from the
network by selecting the “Dell-PER415-fw-010203-installer” item from the
PXE boot menu. A few moments later, I was at a DOS prompt. From there,
I just ran the BIOS updater, and I was done!
For reference, here are the tools and documentation I used to accomplish my mission:
- Ubuntu Dell BIOS – a handy page on the Ubuntu wiki discussing schemes for updating the BIOS on Dell systems
- Dell biosdisk – an unofficial project at Dell for building bootable floppies and images for installing BIOS updates
- MEMDISK module, part of SYSLINUX – it allows you to boot floppy and disk images from Linux bootloaders; it pretends to be a Linux kernel and boots the image given as its initrd kernel argument
Posted in sysadmin
Tags bios, cobbler, dell, firmware, memdisk, pxe, sysadmin
3 comments
no trackbacks

Posted by Tom Moertel
Thu, 10 Jun 2010 15:20:00 GMT
I recently started using two handy variants of top, the standard Unix tool for monitoring what’s going on with processes on a system.
The first, htop, uses ncurses to provide a more interactive process-viewing experience. You can surf through running processes, scrolling horizontally and vertically to reveal information that would otherwise have been clipped, information such as full command lines. Further, you can drive a cursor to select processes for commands like kill and lsof to act upon. Yes, you can see what files a process has open; you can even trace processes using strace. There’s also a handy tree view for understanding process ancestry.
The second top alternative, atop, offers more accurate accounting of processes and resource usage. It tracks all processes, even those that have lived out their brief lives between atop’s screen updates. This comprehensive accounting is helpful for understanding problems caused by herds of individually short-lived processes. With the old top, you might catch only a few of the processes in the act, but with atop, you can see the herd for what it is.
Two handy tools – check them out.
Posted in sysadmin
Tags atop, htop, sysadmin, top
1 comment
no trackbacks

Posted by Tom Moertel
Thu, 15 Nov 2007 07:30:00 GMT
I recently started using Puppet
to automate my server-build processes. The basic idea behind Puppet
is that you create “manifests” that declare
a directed graph of “resources” that represents the desired state of
your machines. Puppet-managed machines on your network then query a
master server to obtain the latest copy of the graph, which they then
reconcile with their current states to make whatever changes are
necessary to bring themselves up to date.
For the most part, everything works well. I have encountered a couple
of snags when writing manifests, however, so I’m going to explain them
here as reminder until I get the time to fix them in the Puppet code and send
patches upstream.
First, don’t use hyphens in class names. While hyphens are legal
in class names, they are not allowed in qualified variables, thus
variables defined within hyphen-named classes are inaccessible
from the outside world.
Second, and this one is both tricky and important, Puppet handles
prerequisites for definitions by silently passing those prerequisites on
to all of the resources within the definitions. Definitions, in
effect, don’t really have their own prerequisites, they just pass them on to
their children. But – and here’s the problem – if those child
resources declare their own prerequisites, those prerequisites will
overwrite the passed-on prerequisites, effectively causing them to
be ignored.
This problem bit me hard when trying to create a definition for
installing Ruby Gems from a local cache of gems:
define local_gem($gem) {
$path = "/var/local/local-gems/$gem"
file { $path:
ensure => present,
source => "puppet://puppet/files/gems/$gem",
require => File["local-gems-dir"],
owner => root,
group => root,
mode => 0664,
}
package { $title:
ensure => installed,
provider => "gem",
require => [ Package["rubygems"], File[$path] ],
source => $path,
}
}
The intent was to be able to declare a local gem like so:
local_gem { "sqlite3-ruby":
gem => "sqlite3-ruby-1.2.1.gem",
require => Package["sqlite-devel"]
}
Thus the “sqlite3-ruby” local gem has the single prerequisite of the
“sqlite-devel” package – or at least that’s what I expected. What
happened on deployment was that the prerequisite was ignored because
when it was passed on to the inner file and package resources, those
resources had their own require parameters, and those parameters
overwrote the passed-on prerequisite.
The work-around is somewhat hacky. I augmented the definition with a do-nothing resource
that has no require parameter of its own. This
resource does nothing but capture the passed-on prerequisites. Then I made
all of the other resources in the definition include the do-nothing
resource as one of their prerequisites. Thus they are made to inherit the
passed-on prerequisites.
My final definition looks like this:
define local_gem($gem) {
# dummy exec to propagate requires from local_gem
exec { $name: command => "/bin/true" }
$path = "/var/local/local-gems/$gem"
file { $path:
ensure => present,
source => "puppet://puppet/files/gems/$gem",
require => [ Exec[$name], File["local-gems-dir"] ],
owner => root,
group => root,
mode => 0664,
}
package { $title:
ensure => installed,
provider => "gem",
require => [ Exec[$name], Package["rubygems"], File[$path] ],
source => $path,
}
}
Notice how the file and package resource both require the dummy exec resource.
That’s the trick that allows them to require the prerequisites passed on from
the local_gem definition.
It’s not pretty, but it works. See this email on the puppet-users mailing list for more on the problem.
Posted in sysadmin
Tags gems, manifests, puppet, rails
2 comments
no trackbacks

Posted by Tom Moertel
Thu, 24 Aug 2006 04:41:00 GMT
Since I upgraded my blog from Typo 4.0.0 to
4.0.3, it has been somewhat unstable. About once a day it starts
responding with “500 Internal Server Error” and stays that way until I
restart it.
The root of the problem seems to be the database
connection, as evidenced by this exception showing up in the
production log:
SQLite3::CantOpenException (could not open database)
Unfortunately, the exception doesn’t provide anything specific
to go on.
A quick look at the
sqlite3-ruby code
suggested that I was not going to get the specifics, either. The Ruby-based wrapper
never calls sqlite3_errmsg after a call to sqlite3_open fails on behalf of SQLite3::Database.new.
A quick patch, however, fixed the problem:
--- sqlite3-ruby-1.1.0.orig/lib/sqlite3/database.rb
+++ sqlite3-ruby-1.1.0/lib/sqlite3/database.rb
@@ -109,7 +109,7 @@
@statement_factory = options[:statement_factory] || Statement
result, @handle = @driver.open( file_name, utf16 )
- Error.check( result, nil, "could not open database" )
+ Error.check( result, self, "could not open database" )
@closed = false
@results_as_hash = options.fetch(:results_as_hash,false)
(Submitted as Ticket 5504 on RubyForge.)
Before applying the patch, opening a database at a nonexistent path results in
a generic error message:
$ ruby -r rubygems -e 'require_gem "sqlite3-ruby";
SQLite3::Database.new("/no/such/path/db")'
... could not open database (SQLite3::CantOpenException) ...
After applying the patch, we get additional error information:
... could not open database: unable to open database file
(SQLite3::CantOpenException) ...
With the patch in place, all I have to do is wait for Typo to start
acting up again. Then I’ll have some interesting information in the
log.
Until then, I’m relying on cron
and a short monitoring script to restart Typo when it tips into
foolishness:
#!/bin/bash
url=http://blog.moertel.com/admin
addrs=tom@moertel.com
response=$(GET -sd $url 2>&1)
if [ "$response" != "200 OK" ]; then
{ echo "Response was: $response"; echo; service typo restart; } |
mail -s "Blog site not responding! (Restarting)" $addrs
fi
We’ll see how it goes.
Update: That was fast. The error popped up
again and this time the log told me something useful: “unable to open
database file.” Now, why couldn’t Typo open the database file,
especially since the file is perfectly fine and had been opened
successfully (many times) by the very same Typo process earlier? Here’s
a hint:
$ ls /proc/28788/fd | wc -l
1023
Seems like there’s a resource leak in Typo 4.0.3 (or Rails 1.1.6).
Under some conditions, instead of reusing existing database
connections, Typo keeps trying to open new ones. Eventually, it uses
up its allotment of file descriptors and the operating system is forced
to say, “That’s enough, pal,” (EMFILE).
I’ll look in to it more in the morning.
Update 2: Problem solved.
Posted in ruby, typo, rails, sysadmin
Tags rails, sqlite3, typo
1 comment
no trackbacks

Posted by Tom Moertel
Wed, 09 Aug 2006 04:35:00 GMT
If an extended power outage drains your UPS, and your servers are
forced to shut down, will they automatically start up again when the
power is eventually restored? It’s a good question, especially
if your servers are in some distant, unattended server room.
Unless you’ve tested your servers, don’t assume that the answer
is Yes.
Many servers offer a BIOS configuration option that forces them to
automatically power on when they receive line voltage. If your
servers have this option, just set it and you’re done.
Unfortunately, some servers, including a Dell PowerEdge 1600SC
that I’m using, lack this configuration option. When these servers
turn themselves off as the final step of a UPS-controlled
shutdown, they don’t start up again when the power is restored.
Because they were shut down before the power was cut off, they think
they are supposed to remain off when the power is restored. That is,
they remember their on/off status across power outages.
Fortunately, there is a way to make sure these servers automatically
power on: shut them down without powering them off; halt them
instead. That way, when the UPS finally cuts off the supply voltage,
the servers will still be in their “on” state, and they will remember
this state across the outage. Later, when the power is restored, the servers
will automatically restore their pre-outage state and power up.
With Fedora Core Linux and Network UPS
Tools, it’s not difficult to make
sure the servers are halted instead of powered off, but the implementation
isn’t obvious. To spare you the digging, here are the
important bits.
- When the power fails and the UPS-monitoring software decides that
the batteries are almost depleted, it will initiate a server shutdown
using the command defined in the
/etc/ups/upsmon.conf
file. The default command is this:
SHUTDOWNCMD "/sbin/shutdown -h +0"
- The shutdown command will tell the
init process
to enter runlevel 0, which is the prepare-to-halt-the-system runlevel.
- The
init process will stop all of the running
services in an orderly fashion, and then, as the last step, invoke the
final script in the shutdown process:
/etc/rc.d/rc0.d/S01halt.
- The final lines of the
S01halt script will
power off the server. Unless, that is, the file /halt is
present, in which case the script will halt the server instead.
Thus the trick is to make sure that the /halt
file does exist. The trick turns out to be easy to pull off;
just redefine the shutdown command in /etc/ups/upsmon.conf:
SHUTDOWNCMD "/bin/touch /halt; /sbin/shutdown -h +0"
And that’s all there is to it!
Posted in linux, hardware, sysadmin
Tags fedora, halt, hardware, linux, nut, power, shutdown, ups
2 comments
no trackbacks
