Category Archives: Uncategorized

Remembering Fail-Safe

The description “fail safe” is commonly used to mean something foolproof, or a system with backup systems to prevent failure.  In other words, “safe from failure”.

That’s a shame, since we have plenty of words that already mean that.   My dictionary defines fail-safe as … a system … that insures safety if the system fails to operate  properly.  The original meaning meant “safe in case of failure”.  Things break.  How do we head off catastrophe?

Real World examples

The TCP network protocol “guarantees” delivery, but it’s fail-safe.  If a packet can’t be delivered, as happens, the connection is dropped rather than either accepting partial or corrupted data.

In the movie Die Hard, the engineers of Nakatomi Plaza decided that safety meant that in the event of a power failure all the security systems of the building would be dis-abled.  In the movie that meant the bad guys could get into the vault.  In the real world, that decision would prevent people from being locked into the building.

After thousands of deaths resulting from train accidents, train car brakes are now engaged by default.  A pressure line powered by the locomotive pulls the brake pads away from the wheels.  In the event that any of the braking system (the non-braking system?) fails, the brakes are pressed against the wheels.

Airplanes use positive indicators for the status of important functions such as the landing gear being down.  Instead of an error light if the gear has failed, there’s a no-error light if the gear is locked.  Should the sensor, wiring, or bulb fail, the indication is that gear is not down.  Better to have gear down and think it’s not than think it is when it isn’t.

Value in software

This idea that we should expect failure isn’t novel, it’s called testing.  But arguably the primary purpose of testing is to identify defects in the software to avoid failure in production.  Is there value in assuming that we won’t be successful at preventing every possible anomalous condition, including that our code does what we expect?  Consider the questions that fail safe raises?

What can fail?

Your software has bugs in it.  Networks go down.  You may get broken input.  You may get correct input that breaks your system because you didn’t know the correct format.  You may get data in the wrong order.  Software you didn’t write but you’re counting on may fail.

What is “safe”?

What’s the best result when failure happens?  Roll back a transaction?  Immediately kill a system?  Display an error?  Throw an exception?

How we get back from “safe” to operational again?

Once having decided what failure means and how to entire a safe mode, we may not have asked ourselves before how to get things going again.  If we reject entry of a file that contains erroneous data, how do we notify someone to deal with that?  How do we get it out of a queue to be processed again?

The Advantages of Convention over Configuration

Several popular frameworks have as a core design principle that “convention” is preferred over “configuration”.

I’ve come to think that they’ve actually understated the case.  In fact, our development team has made significant strides in identifying not only causes of defects, but also the factors that slow development in exchange for reducing defects in the first place.

One of the top issues on the latter list is what we’ve been calling the “arbitrary decision”.  If a given technical challenge is hard, that can actually simplify development.  Either the first solution we find is a good one, since we save time by avoiding looking for additional solutions, which might not even exist.  Or there may be a clear best solution, or even only one that will actually work.

This “arbitrary decision” is exactly what’s referred to as “convention”.  Every convention that’s already been decided both reduces defects and saves developer time.

Defects are reduced because:

  • Developers have fewer decisions to make, saving focus for important problems
  • Mismatches between modules or components are reduced
  • The code visually reflects the standard, making deviation apparent

Speed of development is increased simply because there are fewer decisions to make.

Here are some examples of conventions that have benefitted us:

  • Code formatting standard.  We really didn’t fight over this, as the development team is aware of the benefits of not fighting about it.  We agreed our standard is “good enough”, so we can move on.
  • Variable naming conventions, including capitalization of common names in our domain space.
  • Which of the several libraries that provide the same functionality, such as Base64 conversion.
  • Coding language versions: do we rely on features of newer implementations or make code backward-compatible

Incremental Database Migrations

One of the causes of headaches for active development is database migrations.  Code that requires new tables or new columns in existing tables simply won’t work if the database migration hasn’t been applied.

This is different from code, which could (safely) use introspection to see if a field in a class exists, or simply just recompile the entire project monolithically.  New, unused members can safely be added, and can even be safely discarded when using a compiling language like C# or Java.

It is not even safe to add additional columns, as we discovered in production recently:

When adding a new column to a table which has an identical name to an existing column in another table, and those tables are joined, SQL queries can fail because the selector field or the WHERE clauses may become ambiguous.

So, another point for using an ORM.

Parallels Desktop v. VMWare Fusion for Linux

sudoParallels (parallels website) and VMWare (VMWare website) have been releasing new updates and fighting head-to-head for the business of people roughly like me for years.  For a moment there, the competition was so fierce that the cheapest way for me to get a new Parallels license was to buy a copy of VMWare via their super-cheap “competitive upgrade” pricing, and then use my new VMWare license to get a competitive upgrade to the newest Parallels product.

The feature set and performance have always been, according to reviews and my own experience, pretty comparable.  And both companies have been aggressively pushing for performance improvements and marketing wins for those virtualizing Microsoft Windows.

My primary purpose for virtualization is in running Linux, which is a use case that neither company is particularly advertising.  That means that the real feature sets and performance comparisons aren’t crystal clear.  Ergo, my comparison of the anomalies, wins, and losses between the two (nearly) most current products, from the experience of a Linux virtualizer:

FeatureVMWare Fusion 8.5.3Parallels Desktop 11.2.2
Multiple ScreensFusion expands to all 3 of my screens"All" screens is actually only 2 screens, which are determined based on the active screen when activating full screen.
Virtualization Tools InstallationParallels is the winner here. Fusion A) Installs partial tools by default, making it confusing whether the tools have been installed and 2) Requires mounting a disk and building the tools within the virtual machineAutomatic installation of Parallels Tools
File sharing with Mac Hostmounted folders appear in /mnt/hgfs
Mouse Support?A scroll gesture on a Mac Magic Mouse is often interpreted as "keep scrolling until you reach the bottom or top of this page". The VM is useless for several seconds at a time.
Virtual Hard Drive ExpansionVMWare requires that a drive be created initially with a fixed size. The virtual drive only actually takes up the non-free space of that drive, but if you hit the maximum size, creating a new drive is an utter pain.Parallels has an actual expanding drive. You can increase the volume size at any time the VM is shut down.

Some problems are cropping up with VMWare in the process of verifying this information under Sierra.

“Shared folders will not be available in the virtual machine until VMware Tools is installed and running.” appears on the Sharing tab under settings.  This would be a reasonable error except:

  1.  It appears even if the Tools are installed
  2. There’s not a good indication as to whether the Tools are installed or not: there is some VMWare functionality automatically installed (somehow) into the virtual machine before the VMWare Tools installation process.

VMWare with Centos 7 is behaving inconsistently with multiple screens.  With 3 Mac screens active, “Full Screen” along with not “Use Single Screen in Full Screen” results in a single virtual screen being mirrored across all 3 monitors.  Fine, that’s what the directions for Fusion say,  along with “you’ll have to make changes inside the virtual machine“.  Well, fine, but:

  1. Directions, anywhere, anyone?  Hello?
  2. I’m sure they mean, for CentOS, Settings, Displays

Great, I need to control the display settings here, making sure I’m not mirroring the only display Fusion offers.  So, Fusion can’t provide multiple virtual displays through the VMWare Tools into Linux.  I could buy that, though that would be a big disappointment.  But NO, multiple displays actually are provided.  If I tell Sierra to mirror one of my displays to another one, then I get TWO distinct virtual displays, which are assigned to my physical displays.  Huh?  On the third display I see my two virtual displays together.  WTF?

The rules for Parallels on to which its TWO displays gets mapped seems to depend on which is the active Mac screen and from which screen you go full screen:

Screen 1 Active:

1 : 1 + 2

2: 2 + 1

3: 3 + 1

Screen 2 Active:

1: 1 + 2

2 : 2 + 1

3 : 3 + 2

Screen 3 Active:

2: 2 + 3

3: 3 + 1

Another Parallels issue is the persistent message on boot that “[Parallels Tools] You should log out of the graphical session to apply new Shared Profile settings. When you log in again, all host computer user folders will be unmapped from those in the virtual machine.”  You’d think that rebooting would be sufficient to “log out of the graphical session”.

Another is that random mouse jumping occurs, usually moving from one screen to another.

WordPress Site Performance

UPDATE: database latency is by far the dominant factor in my site’s performance.  Although using a dedicated Amazon RDS MySql instance, this site is on Dreamhost shared hosting.  The latency to the AWS server means that even the 37 queries necessary for the basic front page view turned less than 100ms into a 6 second load.

Harder than you think it might be is having a grasshopper-fast web site.  Here’s my research into what, exactly, “fast” means, and how to achieve it.

How Fast is Fast?

Here are some facts:

  1. IBM produced a paper distinguishing between >400ms and <400ms response time: http://daverupert.com/2015/06/doherty-threshold/  Computer response less than 400ms was “addicting”.
  2. This site recorded the median response time to stimulus is 266ms.
  3. Google’s PageRank algorithm says that page load speed is a factor in page ranking.

In short, faster is better, with no upper limit where benefits stop.  My subjective experience is that there’s no instance I can generate for which improvement isn’t a benefit.

How Fast Is Your Site?

We’ve tested our sites with https://www.webpagetest.org/

What Are the Results?

All tested scenarios are with default WordPress install.  The tested scenarios are:

  1. Dreamhost shared hosting with Dreamhost-provided MySQL
  2. AWS EC2 micro instance with local MySQL
  3. Dreamhost DreamCompute instance
 First ByteDocument CompleteFully Loaded
DreamHost shared Hosting0.838s2.893s3.088s
EC2 T2.micro with local database0.304s4.538s4.664s

There are a couple of big surprises here.  The first is that a shared hosting site isn’t so bad, all in all.  First byte time is a couple hundred milliseconds slower than on a dedicated machine.  Another is that the dedicated hosting time to complete file delivery is much slower than for shared hosting.

The biggest surprise is that the ratings for hosting from WebPageTest weight first byte response time so highly that the dedicated host is given a “B” grade, but shared hosting an “F”.  Can’t tell you from these numbers how I would subjectively rate the experience.

Considering AWS RDS MySQL

Amazon AWS RDS costs:

Payment Option Upfront Hourly Monthly* Effective Hourly** Effective Monthly On-Demand
No Upfront $0 $.017 $10.22 $0.014 $10.08 $0.017 per Hour, $12.24/month
Partial Upfront $51 $.006 $4.38 $0.012 $8.57
All Upfront $102 $.000 $0.00 $0.012 $8.50

Some common things you might want to be able to do:

Change the size of a the storage after configuring the instance

Yes: http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ModifyInstance.MySQL.html

A Shortage of Books on Tomcat

If I’m not wrong to assume that a book that’s primarily on Apache’s web servlet Tomcat project should have “Tomcat” in the title, this list of books and their publication years suggests that nobody’s writing new books or updating their old ones.  By the way, I’m using Tomcat 8, for where there seem to be no books.  Even Amazon.com has no English-language books with “Tomcat 8” as part of its title.

Beginning JSP, JSF and Tomcat: Java Web Development
2012

Apache Tomcat 7 Essentials
2012

Apache Tomcat 7
2011

Tomcat 6 Developer’s Guide
2009

Tomcat: The Definitive Guide, 2nd Edition
2007

Professional Apache Tomcat 6
2007

How Tomcat Works: A Guide to Developing Your Own Java Servlet Container
2004

Booting Linux on Mac Hardware

There are two ways to boot flavors of Linux on Mac hardware:

  1. Boot from an external device, such as a USB stick
  2. Dual-boot

Per answers like those provided at How-To Geek,  the fundamental problem of booting non-Mac operating systems on Macs is the unique Apple EFI code.

My situation is complicated by having a 32-bit EFI on the original Mac Pro, which requires a hack even to boot a modern Mac OS X version.

Unfortunately, I’ve had little luck even on my recent Macbook Pro trying to boot from a USB drive.  While Linux succeeds in the initial boot phase, it gets lost trying to start XWindows.

Supposedly,  rEFInd is the relevant tool.

Technology Stacks Used by Other Companies

I really like Stackshare’s listing of technology stacks used by various companies from big to small, many of which we’ve heard of.

It’s not so much useful to take a large company’s usage of a technology as an endorsement.  Amazon, for example, wrote their original web technology in PHP, and still has significant chunks left.  According to an insider, PHP is “banned” for new functionality on the grounds that PHP can’t be adequately secured.

It is useful to see where new technologies have been adopted, and get some context on how other companies are evolving their technology.  After all, there is something of a network effect for any technology or product: more users means more developers and support which means more users, etc.

Experimenting with Google Compute Engine

I must say I’ve been happy with Amazon Web Services.  I utilize accounts both for business and personal, and I’ve been very pleased with the progress of their development of additional services, including SQS, SES, and RDS.  I’ve been aware of some of the holes in the stretched pizza dough, but like many consumers, there’s no reason to evaluate other options until things actually get painful.

To be clear, there have been points where the pain has come close to inspiring me at least to see what else is out there.  Some examples come to mind:

  1. If you stop or reboot a running instance— which obviously stops your production instance— you’re required to confirm your intention.  If you create a new machine image from a running instance— which in not-at-all-an-obvious way stops your instance— there’s no warning.
  2. If you use the Amazon Web Services console to manage your various tools, you’re shown only the obscure initials for the services— EC2, SES, S3.  If you try to manage the administrative logs, you’re shown only the fully spelled out service names.
  3. Meeting all the recommended security points on their checklist requires that you turn off the default login.  But if you already have a retail account connected with your AWS account— which is encouraged and can’t be separated— then you must use the default login.

The pain arrived today.  According to AWS billing records, my otherwise innocent micro instance had been spending several days last month spewing obscene amounts of data for an unknown reason to an unknown destination, racking up a huge bill.  While chances are this is something I might have been able to do something about, there’s little evidence immediately available to even corroborate that this data actually transferred.  I haven’t submitted a ticket yet to Amazon to see if there’s anything they can do to, at a minimum, explain what happened.

In any case, this has inspired me to evaluate deploying my software on other platforms.  It’s certainly advantageous to at least be very clear on the extent to which you’re committed to a vendor.

I’ve begun separating the actual requirements for the services I use from the niceties that AWS has been providing.  To wit:

  • SSH keys to access from any terminal and SFTP service

Niceties from AWS I’ll probably miss:

  • EC2 (instance) roles
  • AWS command line tools to talk to S3

Niceties from Google I might learn to appreciate:

  • Save money on instances that stay up without having to pay for reserved instances
  • Customizable instance sizes
  • Automatic detailed monitoring stats

Here are some existing comparison articles that have been useful:

http://cloudacademy.com/blog/ec2-vs-google-compute-engine/