Microservice Bulkhead Pattern - Dos and Don'ts
Bulkhead Pattern Overview
Bulkheads in ships separate components or sections of a ship such that if one portion of a ship is breached, flooding can be contained to that section. Once contained, the ship can continue operations without risk of sinking. In this fashion, ship bulkheads perform a similar function to physical building firewalls, where the firewall is meant to contain a fire to a specific section of the building.
The microservice bulkhead pattern is analogous to the bulkhead on a ship. By separating both functionality and data, failures in some component of a solution do not propagate to other components. This is most commonly employed to help scale what might be otherwise monolithic datastores. The bulkhead is then a pattern for implementing the AKF principle of “swimlanes” or fault isolation.
Problems the Bulkhead Pattern Fixes
The bulkhead pattern helps to fix a number of different quality of service related issues.
- Propagation of Failure: Because solutions are contained and do not share resources (storage, synchronous service-to-service calls, etc), their associated failures are contained and do not propagate. When a service suffers a programmatic (software) or infrastructure failure, no other service is disrupted.
- Noisy Neighbors: If implemented properly, network, storage and compute segmentation ensure that abnormally large resource utilization by a service does not affect other services outside of the bulkhead (fault isolation zone).
- Unusual Demand: The bulkhead protects other resources from services experiencing unpredicted or unusual demand. Other resources do not suffer from TCP port saturation, resulting database deterioration, etc.
Principles to Apply
- Share Nearly Nothing: As much as possible, services that are fault isolated or placed within a bulkhead should not share databases, firewalls, storage, load balancers, etc. Budgetary constraints may limit the application of unique infrastructure to these services. The following diagram helps explain what should never be shared, and what may be shared for cost purposes. The same principles apply, to the extent that they can be managed, within IaaS or PaaS implementations.
- Avoid synchronous calls to other services: Service to service calls extend the failure domain of a bulkhead. Failures and slowness transit blocking synchronous calls and therefore violate the protection offered by a bulkhead.
Put another way, the dimensions of a bulkhead or failure domain is the largest boundary across which no critical infrastructure is shared, and no synchronous inter-service calls exist.
Anti-Patterns to Avoid
The following anti-patterns each rely on either synchronous service to service communication or sharing of data solutions. As such, they represent solutions that should not be present within a bulkhead.
When to use the Bulkhead Pattern
- Apply the bulkhead pattern whenever you want to scale a service independent of other services.
- Apply the bulkhead pattern to fault isolate components of varying risk or availability requirements.
- Apply the bulkhead pattern to isolate geographies for the purposes of increased speed/reduced latency such that distant solutions do not share or communicate and thereby slow response times.
AKF Partners has helped hundreds of companies implement new microservice architectures and migrate existing monolithic products to microservice architectures. Give us a call – we can help!