Scalability Rule 39: Feature Flags

In the book Scalability Rules: Principles for Scaling Websites (Abbott & Fisher), Chapter 9 (Design for Fault Tolerance and Graceful Failure) includes rule 39, Ensure You Can Wire On and Off Functions. My appreciation for this capability has morphed over time from thinly disguised disdain to begrudging acceptance to admiration. My ability to understand this rule should bolster the spirits of those readers less technically inclined than others.

“Wire it off and ship it”

I first encountered the concept of wire on/wire off in 2002 while working in QA. Features that were not complete or that could not meet release criteria to be placed in full production would at times be released anyway after being wired off. At that time, the development organization was incented on how much code they wrote with minimal attention paid to function or availability. Wire it off and ship it was not a popular instruction for the QA team. To free this rule from its negative connotations, we shall henceforth call it feature flags.

What is a feature flag?

At its simplest, a feature flag is a framework by which portions of functionality (features) can be turned on or off. The flag can be toggled prior to code release or after, and for several reasons.

When should a feature flag be used?

The primary use of feature flags is to remove a feature from customer use following the architectural principles of fault isolation and designing for graceful failure. This is a reaction to poor customer experiences – a significant production bug. Let’s call this reactive response to badness.

A secondary use for feature flags is to control the exposure of new functionality to customers – a planned gradual release to targeted customer groups. Let’s call this the guinea pig technique.

A tertiary use of feature flags is to control the access to features as they are purchased. If your product is modular and customers can choose how much functionality they buy and when, feature flags can be a straightforward way to enable access to newly purchased modules. In a SaaS (Software as a Service) paradigm, the code was always there, the feature flag controls which customers can access which features. Let’s call this show me the money.

A quaternary use for feature flags is to release partially completed features whilst maintaining the integrity of the CI/CD pipeline. If a development team has completed their portion of new functionality but an adjacent team has not, use a feature flag to conceal full functionality until all work is ready for use. The team that has done their part moves on to their next task, confident that the automated testing the their CI/CD pipeline has done its job and their code is ready for use.

Reactive Response to Badness

Everything breaks eventually. Having the option of using feature flags to respond to incidents is a far better place to be than forcing your customers to endure a functionality outage while you toil away to “fix forward”. Protecting the customer experience improves customer retention and reduces revenue loss associated with functionality incidents.

Feature flags can also be used to isolate problematic calls to external 3d party services or APIs. You might not have written the code, but you are still accountable for the customer experience.

The Guinea Pig Technique

Using feature flags to control access to new features is a smart way to gain market feedback while minimizing risk to customer experience. There are several approaches:

Some customers may have an early adopter mindset and volunteer to help test new features. This can become tricky if these willing coconspirators think they now own your product roadmap.

Companies that offer both free and paid services have tested new features on their free customers.

Show Me the Money

Controlling changes in access to modules as customers add or drop from their plate can also be done via feature flags. A successful marketing campaign can include “free previews” to give customers a taste of new functionality in hopes they like it enough to buy it. Ever see a rotund child turn down free cake? How about a DevOps engineer and shiny new K8 features?

When are feature flags the wrong choice?

Feature flags may be both a floor wax and a dessert topping, but they are not the cure for everything.

Critical path functionality (AuthN, AuthZ, Add to cart, Checkout) would be better served by HA, fault isolated, heavily monitored architecture. Yes, guest checkout could be an option for some, but certainly not all.

It is critical that your team has thought through what happens when a feature is flagged off. What is the customer experience? A 404-response code is not what right looks like. A cached response from earlier data or a maintenance in progress message would be better.

Operational impact of toggling feature flags

There is no such thing as a free lunch. Toggling feature flags can have customer impact. In many cases, toggling from on to off entails restarting services and clearing caches - actions customers will notice. If the choice to toggle a flag is not to isolate from failure, it might make sense to schedule the action at a low usage period of the business cycle. Always be mindful of the customer experience and the impact actions can generate.

Interested in learning more about feature flags or other scalability rules? Contact us, we’ve been in your shoes and can help lessen the pain.