Running Ubuntu cloud-images, with cloud-init, on all infrastructure, from cloud to bare metal

So we’re happy provisioning our AWS, GCP, DigitalOcean, Azure, and other virtual machines with cloud-init.

In the cloud, the providers offer “metadata” support (sometimes called user-data) and mostly clean, working, updated Ubuntu images which start cloud-init, find the metadata (usually through the “magic” address 169.254.169.254) and away we go.

Off the public clouds, though, it’s a different beast. The good news is, we can make it work, and unify all provisioning, from developer boxes to production physical machines, including private clouds.

Ubuntu provides wonderful little images, updated almost daily, released in-line with the AWS AMI, at https://cloud-images.ubuntu.com/ – for a 16.04LTS image, fully updated, go to https://cloud-images.ubuntu.com/xenial/current/ – there’s many formats available: QCOW2, VHD, OVA, etc. Thanks Ubuntu!

Using qemu-img utility, you can convert say the QCOW2 image (xenial-server-cloudimg-amd64-disk1.img) into any other format, eg VMDK, or even “raw”.

We’ve also exploited cloud-init’s “nocloud” datasource, to boot regular VMWare Workstation/Fusion/etc, using an ISO image containing the meta-data. The first virtual harddisk is a converted cloud-image from Ubuntu, and a virtual CDROM ISO is attached. The tiny ISO is prepared using “cloud-localds” from the cloud-image-utils package. This has been documented since 2013 by Scott Moser, the father of cloud-init.

In a KVM/libvirt environment, it’s even easier. KVM/libvirt allow you to “direct kernel boot” using the kernel and initrd, passing kernel command-line parameters directly, and with that you can coerce cloud-init by adding a kernel parameter “ds=nocloud-net;s=http://your/data/source”. Works beautifully. (nocloud-net adds “meta-data” and “user-data” to the end of the URL automatically).

For vSphere/Free ESXi deployments, even though the ISO trick would work, what we do is prepare a VMDK image already setup: before converting the image to VMDK, we mount it temporarily (using qemu-nbd), change a few configuration options, and add the kernel cmdline parameters directly to /boot/grub/grub.cfg. Also, depending on the VMWare environment, it’s also helpful to pre-set the network address configuration, etc. After that we dismount, convert to VMDK and deploy with VMWare’s ovftool directly to ESXi.

The most recent challenge was how to provision physical, bare-metal servers using the same method, so that we can re-use all our cloud-init scripts and infrastructure.

Around the internet most people are using network boot (with PXE). Unfortunately the env we’re working with does not support PXE for unrelated reasons, so we gotta do something less standard.

Current process is almost the same as the ESXi case above: we prepare a cloud-image (from QCOW2 ubuntu source), mount it, and preconfigure network/grub/etc. This image is converted to “raw” format, and made available via an HTTP endpoint.

At the physical machine, things get a lot more manual: we use an Ubuntu rescue image (regular mini.iso from Ubuntu distro), and boot it on the physical hardware. Rescue sets up networking, disk access, and keyboard. We drop to a shell, without mounting any filesystems, and issue something along the lines of
wget -O - http://somehost/image_we_prepared.img | dd bs=2M of=/dev/sda

This unceremoniously destroys the contents of the physical machine, replacing it with the cloud-image we prepared, partition table and all. After that, reboot, and voilá, there goes cloud-init, exactly as before.

There’s a gotcha in this process: the cloud-image does not contain most kernel drivers needed for a physical machine, in my case using a Dell 1950, the PERC’s drivers (megaraid_sas) were not included, resulting in a failed boot. To fix this, ‘rescue’ into the image, and install linux-image-generic, and run update-initramfs -u. This is something we’re working on to move to the image-preparation automated step.

This is all done using DRAC virtual media and virtual console, but would work equally well using a burned CD with mini.iso and remote hands at the datacenter.

For the future (depending on demand for bare metal provisioning, we’ve only had to provision a few hosts until now), I’d like to create my own mini.iso which automates the process of dd-ing the image and the fixes; but that sounds like a lot of work, and an equivalent PXE setup seems much more sane, CoreOS seems to have nailed this already.

cloudinit all the things!

Fully managed VMWare ESXi 5.1 with Dell Servers

Update 12/August: There’s new versions of both ESXi 5.1 (Update 1) and Dell VIB. The instructions below should continue to work, just remember to use the updated filenames from the downloads. Download VMware ESXi 5.1 Update 1 Recovery Image (released on April 29, 2013) and Dell OpenManage Server Administrator vSphere Installation Bundle (VIB) for ESXi 5.1 (released on August 12, 2013)

Update 12/April: There’s a problem with OMSA and ESXi 5.1 and (atleast) R710 servers. Check out http://communities.vmware.com/thread/439083 and call your Dell/VMWare rep.

You’ve got a newish Dell server (11th generation or newer) and you want to run a fully managed (OMSA, monitoring, SNMP) free ESXi 5.1 system. Continue reading “Fully managed VMWare ESXi 5.1 with Dell Servers”

Old DELL CERC SATA monitoring on new Debian Squeeze (PowerEdge PE 830)

lspci says it’s an “Dell CERC SATA RAID 2 PCI SATA 6ch (DellCorsair)”, under “Adaptec AAC-RAID (rev 01)”. Dell OMSA 6.5.x installs perfectly and after reboot detects everything but the CERC controller; so add the repository as per the instructions at http://hwraid.le-vert.net/wiki/DebianPackages and then “apt-get install aacraid-status”. Then run “aacraid-status”. It should output something like Continue reading “Old DELL CERC SATA monitoring on new Debian Squeeze (PowerEdge PE 830)”

Dell DRAC5 Virtual Media CDROM breaks OS installation, hangs on Windows boot, and makes megaraid_sas insane

We bought a dozen shiny new Dell 2950 and 1950 machines, with DRAC5 cards which seem quite handy.

So off we went to install Debian and Windows 2003 using only DRAC5’s virtual Console (kind of an IP KVM) and virtual media. No physical access to the servers is required. It all works very well, you point DRAC at an .ISO image, configure virtual media to attached in the Ctrl-E screen, hit F11 and choose to boot from “VIRTUAL CDROM”. This is where OS Installation, and problems, begin.

When installing Debian, when virtual media is attached, the PERC controller shifts stuff around so that the virtual media device can fit. That makes my RAID-5 array on the PERC be at /dev/sdc instead of the expected /dev/sda. This seems alright at first, but after the installation is done, GRUB goes insane. You can jackhammer GRUB to work, but then the megaraid_sas driver goes insane. Literally. Lockups, wierd and misleading messages. Insanity reigns.

When installing Windows without the help of the Dell Installation CD, it locks up right at the first “Windows Setup” blue screen, even before the “F6” driver prompt. The machine literally freezes (you can still restart it using DRAC, sweet). If you try using the Dell Install CD, it goes okay until the first reboot where the same thing happens. Complete lockup.

After spending a lot of time with Dell Support with clueless people (they were about to send us a tech to swap the PERC 6i controller which was “obviously” the culprit), we finally realized the problem was the Virtual Media in DRAC5. When we disabled Virtual Media and used a real, physical, spinning DVD drive, everything was fine. It turns out Windows locked up because it couldn’t figure out the “VIRTUAL CDROM” thing. Nothing was wrong with the PERC. Debian installed like a dream too, on /dev/sda.

If you really need to use the Virtual Media, you have to go through a very boring process. First, attach virtual media in Ctrl-E screen and mount the ISO in your browser. Start installing your operating system. When it’s time for the first reboot, go into Ctrl-E again and dettach virtual media. Now the install should proceed as normal…

All in all, DRAC5 is a very useful thing, it’s KVM capabilities have already paid for themselves if only in gasoline saved in trips to the datacenter. Now a Firefox 3 plugin, and fixes to the Virtual Media problems, would be very nice. Hello Dell?