GROWTH BLOG: What are Microservices?
AKF Partners Logo Technology ConsultingScalability - We wrote the book on it ℠

Growth Blog

Scalability and Technology Consulting Advice for SaaS and Technology Companies

The Circuit Breaker Pattern - Dos and Don'ts

July 8, 2019  |  Posted By: Marty Abbott

Circuit Breaker Pattern Overview

The microservice Circuit Breaker pattern is an automated switch capable of detecting extremely long response times or failures when calling remote services or resources.  The circuit breaker pattern proxies or encapsulates service A making a call to remote service or resource B.  When error rates or response times exceed a desired threshold, the breaker “pops” and returns an appropriate error or message regarding the interface status.  Doing so allows calls to complete more quickly, without tying up TCP ports or waiting for traditional timeouts.  Ideally the breaker is “healing” and senses the recovery of B thereby resetting itself.

Disambiguation

The circuit breaker analogy works well in that it protects a given circuit for calls in series.  Unfortunately, it misses the true analogy of tripping to protect the propagation of a failure to other components on other circuits.  We often use the term circuit breaker in our practice to refer to either the technique of fault isolation or the microservice pattern of handling service to service faults.  In this article, we use the circuit breaker consistent with the microservice meaning.
Microservice Circuit Breaker Overview

Problems the Circuit Breaker Fixes

Generally speaking, we consider service to service calls to be an anti-pattern to be avoided whenever possible due to the multiplicative effect of failure and the resulting lowering of availability.  There are, however, sometimes that you just can't get around making distant calls.  Examples are:

  1. Resource (e.g. database) Calls: Necessary to interact with ACID or NoSQL Solutions.
  2. Third Party Integrations: Necessary to interact with any third party.  While we prefer these to be asynchronous, sometimes they must be synchronous.

In these cases, it makes sense to add a component, such as the circuit breaker, to help make the service more resilient.  While the breaker won't necessarily increase the availability of the service in question, it may help reduce other secondary and tertiary problems such as the inability to access a service for troubleshooting or restoration upon failure.

Principles to Apply

  1. Avoid the need for circuit breakers whenever possible by treating calls in series as an anti-pattern.
  2. When calls must be made in series, attempt to use an asynchronous and non-blocking approach.
  3. Use the circuit breaker to help speed recovery and identification of failure, and free up communication sockets more quickly.

When to use the Circuit Breaker Pattern

  • Useful for calls to resources such as databases (ACID or BASE).
  • Useful for third party synchronous calls over any distance.
  • When internal synchronous calls can't otherwise be avoided architecturally, useful for service to service calls under your control.

Key Takeaways

The circuit breaker won't fix availability problems resulting from a failed service or resource.  It will make the effects of that failure more rapid which will hopefully:

  • Free up communication resources (like TCP sockets) and keep them from backing up.
  • Help keep shared upstream components (e.g. load balancers and firewalls) from similarly backing up and failing.
  • Help keep the failed component or service accessible for more rapid troubleshooting and alerting.
  • Always ensure to have alerts fired on breaker open situations to help aid in faster time to detect (TTD).

AKF Partners has helped hundreds of companies implement new microservice architectures and migrate existing monolithic products to microservice architectures.  Give us a call – we can help!

 

 

Subscribe to the AKF Newsletter

Contact Us

Microservice Bulkhead Pattern - Dos and Don'ts

June 27, 2019  |  Posted By: Marty Abbott

Bulkhead Pattern Overview

Bulkheads in ships separate components or sections of a ship such that if one portion of a ship is breached, flooding can be contained to that section.  Once contained, the ship can continue operations without risk of sinking.  In this fashion, ship bulkheads perform a similar function to physical building firewalls, where the firewall is meant to contain a fire to a specific section of the building.

The microservice bulkhead pattern is analogous to the bulkhead on a ship.  By separating both functionality and data, failures in some component of a solution do not propagate to other components.  This is most commonly employed to help scale what might be otherwise monolithic datastores.  The bulkhead is then a pattern for implementing the AKF principle of “swimlanes” or fault isolation.

Bulkhead pattern usage


Problems the Bulkhead Pattern Fixes

The bulkhead pattern helps to fix a number of different quality of service related issues.

  • Propagation of Failure:  Because solutions are contained and do not share resources (storage, synchronous service-to-service calls, etc), their associated failures are contained and do not propagate.  When a service suffers a programmatic (software) or infrastructure failure, no other service is disrupted.
  • Noisy Neighbors:  If implemented properly, network, storage and compute segmentation ensure that abnormally large resource utilization by a service does not affect other services outside of the bulkhead (fault isolation zone).
  • Unusual Demand:  The bulkhead protects other resources from services experiencing unpredicted or unusual demand.  Other resources do not suffer from TCP port saturation, resulting database deterioration, etc.

Principles to Apply

  1. Share Nearly Nothing:  As much as possible, services that are fault isolated or placed within a bulkhead should not share databases, firewalls, storage, load balancers, etc.  Budgetary constraints may limit the application of unique infrastructure to these services.  The following diagram helps explain what should never be shared, and what may be shared for cost purposes.  The same principles apply, to the extent that they can be managed, within IaaS or PaaS implementations.
  2. Bulkhead pattern usage
  3. Avoid synchronous calls to other services:  Service to service calls extend the failure domain of a bulkhead.  Failures and slowness transit blocking synchronous calls and therefore violate the protection offered by a bulkhead.

Put another way, the dimensions of a bulkhead or failure domain is the largest boundary across which no critical infrastructure is shared, and no synchronous inter-service calls exist. 

Anti-Patterns to Avoid

The following anti-patterns each rely on either synchronous service to service communication or sharing of data solutions.  As such, they represent solutions that should not be present within a bulkhead.

When to use the Bulkhead Pattern

  • Apply the bulkhead pattern whenever you want to scale a service independent of other services.
  • Apply the bulkhead pattern to fault isolate components of varying risk or availability requirements.
  • Apply the bulkhead pattern to isolate geographies for the purposes of increased speed/reduced latency such that distant solutions do not share or communicate and thereby slow response times.

AKF Partners has helped hundreds of companies implement new microservice architectures and migrate existing monolithic products to microservice architectures.  Give us a call – we can help!

Subscribe to the AKF Newsletter

Contact Us

Command and Query Responsibility Segregation (CQRS): Dos and Don’ts

June 24, 2019  |  Posted By: Marty Abbott

CQRS Overview

The microservice CQRS pattern is most commonly employed to help scale what might be otherwise monolithic datastores.  Per the X-axis of the AKF Scale Cube, a write instance of a datastore receives all creates, updates and deletes (collectively the “commands” within CQRS) while one or more read instances receive reads (the “queries” within CQRS).

Simple Implementations

Most relational (ACID compliant) databases have asynchronous and eventually consistent transfer mechanisms available to create “replicants” or “replica sets” of a “master” database.  These are often called master databases (the write database) and slave databases (the read databases).  The easiest implementation for CQRS then is to rely on native replication technology to create one or more slave databases with the same schema.  The service is then separated into a write service and a read service, each with its own endpoint connected to the appropriate database.

Command and Query Responsibility Segregation Pattern

Many NoSQL solutions also offer similar capabilities.  Attribute names vary by implementation, but the user may identify the number of copies or replica sets for any piece of information.  Further, the user can also often identify how these sets are to be used.  MongoDB, for instance, allows users to identify from which elements of a replica set a read should be performed as well as the level of “synchronicity” between a write and the remaining read elements.  Ideally, this level of synchronicity should be as loose as possible to maximize the value of BASE and avoid the limitations of Brewer’s Cap Theorem.

Advanced Implementations

Highly tuned solutions, or solutions that need to support significant transaction volumes, may benefit from differentiation between the write and read schema instance.  There may be several reasons for this, including:

  • A subset of data that needs to be written compared to that which needs to be read.  For instance, read data may include “static” elements that add no value to the (C)reat, (U)pdate or (D)elete process.
  • An elimination of certain relationships in the read or write schema where those relationships add no value.  Read databases, for instance, may need more relationships than the write in order to perform complex reporting.
  • A reduction or significant difference in indices between those needed for writes (ideally small given that each index needs to be updated for each write, thereby increasing write response time) and those needed for reads.

In these cases, native replication capabilities in either ACID (relational) or BASE (NoSQL) datastores may not fulfill the desired outcomes.  Engineers may need to rely on eventually consistent transfer technologies including queues or buses along with transformation compute capacity. 

Another somewhat “advanced” concept within relational solutions is to use “Master-Master” replication technology as “Master-Slave”.  At AKF, we prefer not to rely on multi-master solutions for distributing writes across multiple database instances.  Very often, “in doubt” transactions can cause a pinging effect against multiple master databases and sometimes result in either starvation or deadlock (the Dining Philosopher’s Problem). 

But if you implement a multi-master replication solution and use it as master-slave we avoid the concerns over transactional coherence and consistency with a single write database.  Should that database fail, however, we can easily “swing” future transactions to the other master previously operating as a slave with a very low probability of conflict.

Benefits of CQRS

  • Scalability of transactions at low development cost and comparatively low conceptual complexity (conceptual cost).
  • Availability/redundancy of solutions.
  • Distribution of X-axis (read copies) geographically for lower latency on read request.

Drawbacks to CQRS

  • Multiple copies of data for scale, leading to potentially higher costs of goods sold
  • Higher number of failures in a production environment (but typically with increasing levels of customer perceived availability).
  • Reads are very often “eventually consistent” as opposed to immediately consistent.  This approach will work for greater than 99.9% of use cases in our experience.

How to Use CQRS

  • Split writes (commands) from reads (queries) both in actions/methods/services and the datastores they use.
  • Implement an eventual consistency solution between the datastores.
  • If implementing more than one read instance, implement a proxy or load balancer with a single service endpoint to “spray” requests across all instances.

What to NEVER do with CQRS

Never employ multiple “write masters” or “command masters” with CQRS where each master shares ownership of the same data elements.

AKF Partners has helped hundreds of companies implement new microservice architectures and migrate existing monolithic products to microservice architectures.  Give us a call – we can help!

Subscribe to the AKF Newsletter

Contact Us

Strangler Pattern: Dos and Don’ts

June 12, 2019  |  Posted By: Marty Abbott

Strangler Pattern Overview

The microservice Strangler pattern is employed when teams migrate functionality from an old solution to one or more new implementations.  While the pattern can be used for any old to new service migration, the diagram below depicts the common use case of migrating from a monolithic solution to a microservice implementation:

Depiction of the strangler pattern and progression over time during migration

The solution above starts with a single monolithic application.  Step one is to implement a proxy (or context switch) that can separate requests by type (Y-Axis or Service Segmentation), or if attempting to evaluate the solution for efficacy against an existing implementation, the X-Axis by transaction volume. 
The pattern progresses with the monolith growing increasingly smaller as newly implemented services begin to take requests previously bound for the monolith. 
Finally, the proxy is removed once the service disaggregation is completed.

Benefits of Strangler

  • The Strangler pattern allows for graceful migration from a service to one or more replacement services.
  • If implemented properly, with the ability to “roll back”, the pattern allows relatively low risk in migrating to new service(s).
  • The pattern can be used for versioning of APIs.
  • Similar to versioning above, the pattern can be used for legacy interactions (the old service remains for solutions that aren’t or won’t be upgraded).

Drawbacks to Strangler

  • If implemented with a “new” or additional service – something in addition to a prior proxy or load balancer, the solution decreases availability through the multiplicative effect of failure.
  • Per above, additional (new) services will increase latency.

How to Use Strangler

  • Use the Strangler for versioning or migration to new services.
  • Employ within an existing proxy – either a software-based load balancer (such as Nginx) or a layer 7 capable load balancer/context switch (F5 or Netscaler).
  • If using an existing proxy is not possible, ensure the client is responsible for choosing the resource.
  • Keep rules updated as services are migrated.
  • Prune rules as no longer needed.

What to NEVER do with Strangler

Never employ a new service and increase the call depth with Strangler.  Such a service is often called a “façade”.  Your existing proxy or load balancer solutions likely give you this flexibility today.

Facade anti-pattern used with Strangler Microservice Pattern

Never allow the solution to become a bottleneck or a single point of failure.  Always ensure that the solution is scalable along the X-axis for both availability and scalability.

X-Axis Approach

The X-Axis approach takes a percentage of calls for any unique service endpoint and splits them between the old service and the new service.  This approach is useful to A/B test the solution for end-user efficacy, response time, availability, and cost of operations (cost per transaction).  It also allows for graceful “dialing up” or “dialing down” of the transaction volume (say from 1% to 51%).

Y-Axis Approach

This is the traditional usage for Strangler.  It takes a monolith servicing N unique capabilities and partitions them along either verb/service boundaries (e.g. checkout separated from everything else) or noun/resource boundaries (e.g. catalog related actions from customer data actions).

Best Approach

If time allows, implement both the X and Y approach.  X gives you rollback as well as A/B testing and significantly reduces your risk while Y allows service disaggregation.
AKF Partners has helped hundreds of companies implement and improve microservice based architectures.  Give us a call, we can help you with your transition.

Subscribe to the AKF Newsletter

Contact Us

Sidecar Pattern: The Dos and Don’ts

June 5, 2019  |  Posted By: Marty Abbott

Man on motorcycle with dog in sidecar Shutterstock Image 1084641965 purchased by AKF Partners June 5 2019

Sidecar Pattern Overview

The Sidecar Pattern is meant to allow the deployment of components of an application into separate, isolated, and encapsulated processes or containers.  This pattern is especially useful when there is a benefit to sharing common components between microservices, as in the case of logging utilities, monitoring utilities, configuration routines, etc.

The Sidecar Pattern name is an analogy indicating the one-seat cars sometimes bolted alongside a motorcycle. 

Benefits of Sidecar

Sidecar comes with many benefits:

  • Use of multiple languages (polyglot) or technologies for each component.  This is especially useful if a language is especially strong in a necessary area (e.g. Python for Machine Learning, or R for statistical work) or if an opensource solution can be leveraged to eliminate in-house specialization (e.g. the use of NGINX for certain network-related functions).
  • Separation of what would otherwise be a monolith, and if used properly, fault isolation of associated services.
  • Conceptually easy interactions between components similar to those provided by libraries, or service calls between microservices.
  • Lower latency than traditional service calls to “other” services as the Sidecar lives in the same processing environment (VM or physical server) – albeit typically in a separate container.
  • Similar to the use of libraries, allows for ownership by individual teams and organizational scalability of a larger team.
  • Similar to the use of dynamically-loadable libraries, allows for independent release by teams of various shared usage components.

Drawbacks to Sidecar

Regardless of implementation (poly- or mono- glot), Sidecar has some drawbacks compared to the use of libraries:

  • Higher inter-process communication latency – Because most implementations are service calls, the loopback interface on a system (127.0.0.1) will increase latency compared to the transition of call flow through memory.
  • Size – especially in polyglot implementations but even in monoglot implementations – Containerization leads to multiple copies of similar libraries and increased memory utilization for comparable operations relative to the use of libraries.
  • Environments – it is difficult to create any notion of fault isolation with Sidecar without containerization technologies.  VM technologies (Sidecar in a VM separate from the host or calling solution) is not an option as it is then a Fan Out or Mesh anti-pattern rather than a local call.

When to Use Sidecar

Sidecar is a compelling alternative to libraries for cases where the increase in latency associated with local service messaging does not impact end-user response times.  Examples of these are asynchronous logging, out of band monitoring, and asynchronous messaging capabilities.  Circuit breakers (time-based request/response timeouts) are also a good example of a Sidecar implementation.

When to Avoid Sidecar

Never use a Sidecar Pattern for synchronous activities that must complete prior to generating a user response.  Doing so will add some delay to end-user response times.

AKF also advises staying away from Sidecar for synchronous communications between services where doing so requires Sidecar to know all endpoints for each service.  A specific example we advise against is having every endpoint (instance) of Service A (e.g. add-to-cart) know of every endpoint (instance) of Service B (e.g. decrement-SKU).  A graphic of this example is given below:

Sidecar is useful for several components but do not use it for allowing every endpoint to communicate to every other endpoint

The above graphic indicates the coordination between just two services and the instances that comprise that service.  Imagine a case where all services may communicate to each other (as in the broader Mesh anti-pattern).  Attempting to isolate faults becomes nearly impossible.

If Service A sometimes fails while calling Service B, how do you know which component is failing?  Is it a failure in Service A, Service A’s Sidecar Proxy or Service B?  Easier is to have a fewer number of proxies (albeit at a higher cost of latency given non-local communication) handle the transactions allowing for easier fault identification.

AKF Partners has helped hundreds of companies move from monolithic solutions to services and microservice architectures.  Give us a call, we can help you with your transition.

Subscribe to the AKF Newsletter

Contact Us

Microservice Anti-Pattern: The Service Mesh

May 8, 2019  |  Posted By: Marty Abbott

This article is the sixth in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits.  Articles two and three cover anti-patterns for service and data fan out respectively.  The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor.  The fifth article expands the fuse metaphor from service fuses to data fuses.

Howard Anton, the author of my college Calculus textbook, was fond of the following phrase:  “It should be intuitively obvious to the casual observer….”.  The clause immediately following that phrase was almost inevitably something that was not obvious to anyone – probably not even the author.  Nevertheless, the phrase stuck with me, and I think I finally found a place where it can live up to its promise. The Service Mesh, the topic of this microservice anti-pattern, is the amalgamation of all the anti-patterns to date.  It contains elements of calls in series, fuses and fan out.  As such, it follows the rules and availability problems of each of those patterns and should be avoided at all costs. 

This is where I need to be very clear, as I’m aware that the Service Mesh has a very large following.  This article refers to a mesh as a grouping of services with request/reply relationships.  Or, put another way, a “Mesh” is any solution that violates repeatedly the anti-patterns of “tree lights”, “fuses” or “fan out”.  If you use “mesh” to mean a grouping of services that never call each other, you are not violating this anti-pattern.

What constitutes a service mesh?

What is NOT a service mesh?

The reason mesh patterns are a bad idea are many-fold:

1)  Availability:  At the extreme, the mesh is subject to the equation: [N∗(N−1)]/2.  This equation represents the number of edges in a fully connected graph with N vertices or nodes.  Asymptotically, this reduces to N2.  To make availability calculations simple, the availability of a complete mesh can be calculated as the service with the lowest availability (A)^(N*N).  If the lowest availability of a service with appropriate X-axis cloning (multiple instances) is 99.9, and the service mesh has 10 different services, the availability of your service mesh will approximate 99.910.  That’s roughly a 99% availability – perhaps good enough for some solutions but horrible by most modern standards.

2) Troubleshooting:  When every node can communicate with every other node, or when the “connectiveness” of a solution isn’t completely understood, how does one go about finding the ailing service causing a disruption?  Because failures and slowness transit synchronous links, a failure or slowness in one or more services will manifest itself as failures and slowness in all services.  Troubleshooting becomes very difficult.  Good luck in isolating the bad actor.

3) Hygiene:  I recall sitting through computer science classes 30 years ago and hearing the term “spaghetti code”.  These days we’d probably just call it “crap”, but it refers to the meandering paths of poorly constructed code.  Generally, it leads to difficulty in understanding, higher rates of defects, etc.  Somewhere along the line, some idiot has brought this same approach to deployments.  Again, borrowing from our friend Anton, it should be intuitively obvious to the casual observer that if it’s a bad practice in code it’s also a bad practice in deployment architectures.

4) Cost to Fix: If points 1 through 3 above aren’t enough to keep you away from connected service meshes, point 4 will hopefully help tip the scales.  If you implement a connected mesh in an environment in which you require high availability, you will spend a significant amount of time and money refactoring it to relieve the symptoms it will cause.  This amount may approximate your initial development effort as you remove each dependent anti-pattern (series, fuse, fan-out) with an appropriate pattern.


Microservice Anti-Pattern:  The Service Mesh

Fixing a mesh is not an easy task.  One solution is to ensure that no service blocks waiting for a request to complete of any other service.  Unfortunately, this pattern is not always easy or appropriate to implement.

Microservice Anti-Pattern Service Mesh Fix - Async Interactions

Another solution is to deploy each service as service when it is responding to an end user request, and as a library for another service wherever needed.

Microservice Anti-Pattern Service Mesh Fix - Libraries

Finally, you can traverse each service node and determine where services can be collapsed or any of the other patterns identified within the tree light, fuse, or fanout anti-patterns.


AKF Partners helps companies create scalable, fault tolerant, highly available and cost effective architectures to meet their product needs.  Give us a call, we can help

Subscribe to the AKF Newsletter

Contact Us

Microservice Anti-Pattern: Data Fuse

May 8, 2019  |  Posted By: Marty Abbott

This article is the fifth in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits.  Articles two and three cover anti-patterns for service and data fan out respectively.  The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor.

The Data Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed data store.  This data store can be any persistence solution from physical file services, to a common storage area network, to relational (ACID) or NoSQL (BASE) databases.  When the shared data solution “C” fails, service A and B fail as well.  Similarly, when data solution “C” becomes slow, slowness under high demand propagates to services A and B. 

As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of data service C.  Service B’s theoretical availability is calculated similarly.  Problems with service A can propagate to service B through the “fused” data element.  For instance, if service A experiences a runaway scenario that completely consumes the capacity of data store C, service B will suffer either severe slowness or will become unavailable. 

Microservices Anti-Pattern - The Data Fuse

The easiest pattern solution for the data fuse is simply to merge the separate services.  This makes the most sense if the services can be owned by the same team.  While availability doesn’t significantly increase (service A can still affect service B, and the data store C still affects both), we don’t have the confusion of two services interacting through a fuse.  But if the rate of change for each service indicates that it needs separate teams, we need to evaluate other options (see ”when to split services”  for a discussion on drivers of services splits.

Data Fuse Microservices Anti-Pattern Fix:  Merge Services

Another way to fix the anti-pattern is to use the X axis of the Scale Cube as it relates to databases. An easy example of this is the sharing of account data between a sign-up service and a sign-in (AUTHN and AUTHZ) service.  In this example, given that sign-up is a write-based service and sign-in is a read based service we can use the X axis of the Scale Cube and split the services on a read and write basis.  To the extent that B must also log activity, it can have separate tables or a separate schema that allows that logging.  Note that the services supporting this split need not be unique - they can in fact be the exact same service - but the traffic they serve is properly segmented such that the read deployment receives only read traffic and the write deployment receives only write traffic.

Data Fuse Microservices Anti-Pattern Fix:  X Axis Read-Write Splits

 

If reads and writes aren’t an easily created X axis split, or if we need the organizational scale engendered by a Y-axis split, we need to be a bit more creative.  An example pattern comes from the differences between add-to-cart and checkout in a commerce solution.  Some functionality is shared between the components, including the notion of showing calculated sales tax and estimated shipping.  Other functionality may be unique, such as heavy computation in add-to-cart for related and recommended items, and up-sale opportunities such as gift wrapping or expedited shipping in checkout.  We also want to keep carts (session data) around in order to reach out to customers who have abandoned carts, but we don’t want this ephemeral clutter clogging the data of checkout.  This argues for separation of data for temporal (response time) reasons.  It also allows us to limit PCI compliance boundaries, removing services (add to cart) from the PCI evaluation landscape.

Data Fuse Microservices Anti-Pattern Fix:  Y Axis Data Split


Transition from add-to-cart to checkout may be accomplished through the client browser, or done as an asynchronous back end transfer of state with the browser polling for completion so as to allow for good fault isolation.  We refactor the datastore to separate data to services along the Y axis of the scale cube

Data Fuse Microservices Anti-Pattern Fix:  Moving Data when necessary for Y Axis Data Split

AKF Partners helps companies create scalable, fault tolerant, highly available and cost-effective architectures to meet their product needs.  Give us a call, we can help.

 

Subscribe to the AKF Newsletter

Contact Us

Microservice Anti-Pattern: Service Fuse

April 27, 2019  |  Posted By: Marty Abbott

This article is the fourth in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture).  Many of the mistakes or failure points teams create in services splits.  Articles two and three cover anti-patterns for service and data fan out respectively. 

The Service Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed service pool.  When the shared service “C” fails, service A and B fail as well.  Similarly, when service “C” becomes slow, slowness under high demand propagates to services A and B. 

As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of service C.  Service B’s theoretical availability is calculated similarly.  Under unusual conditions, the availability of A could also impact B similar to the way in which service fan out works.  Such would be the case if A somehow holds threads for C, thereby starving it of threads to serve B.

Because overall availability is negatively impacted, we consider the Service Fuse to be a microservice anti-pattern.

Microservice Anti-Pattern Sharing a common service deployment


The easiest and most common method to fault isolate the failure and response time propagation of Service C is to deploy it separately (in separate pools) for both Service A and B.  In doing so, we ensure that C does not fail for one service as a result of unusual demand from the other.  We also isolate failures due to unique requests that might be made by either A or B.  In doing so, we do incur some additional operational costs and additional coordination and overhead in releases.  But assuming proper automation, the availability and response time improvements are often worth the minor effort.


Solution to Service Fuse Anti-Pattern - deploy same service separately

As with many of our other anti-patterns we can also employ dynamically loadable libraries rather than separate service deployments.  While this approach has some of the slight overhead (again assuming proper automation) of the above separate service deployments, it often also benefits from significant server-side response time decreases associated with network transit. 

Solution to Service fuse Anti-Pattern - deploy service separately as libraries

We often see teams over emphasizing the cost of additional deployments.  But the separate service deployment or dynamically loadable library deployment seldom results in significantly greater effort.  Splitting the capacity of a shared pool relative to the demand split between services A and B (e.g. 50/50, 90/10, etc) and adding a small number of additional services for capacity is the real implication of such a split.  Is 5 to 10% additional operational cost and seconds of additional deployment time worth the significant increase in availability?  Our experience is that most of the time it is.

Subscribe to the AKF Newsletter

Contact Us

Microservice Anti-Pattern: Data Fan Out

April 21, 2019  |  Posted By: Marty Abbott

This article is the third in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services, many of the mistakes or failure points teams create in services splits and the first anti pattern.  The second article, Service Fan Out discusses the anti-pattern of a single service acting as a proxy or aggregator of mulitple services.

Data Fan Out, the topic of this microservice anti-pattern, exists when a service relies on two or more persistence engines with categorically unique data, or categorically similar data that is not meant to be processed in parallel.  “Categorically Unique” means that the data is in no way related.  Examples of categorical uniqueness would be a database that stores customer data and a separate database that stores catalog data.  Instances of the same data, such as two separate databases each storing half of product catalog, are not categorically unique.  Splitting of similar data is often known as sharding.  Such “sharded” instances only violate the Data Fan Out pattern if:

1) They are accessed in series (database 1 is accessed and subsequently database 2 is accessed) –or-

2) A failure or slowness in either database, even if accessed in parallel, will result in a very slow or unavailable service.

Persistence engine means anything that stores data as in the case of a relational database, a NoSQL database, a persistent off-system cache, etc. 

Anytime a service relies on more than one persistence engine to perform a task, it is subject to lower availability and a response time equivalent to the slower of the N data stores to which it is connected.  Like the Service Fan Out anti-pattern, the availability of the resulting service (“Service A”) is the product of the availability of the service and its constituent infrastructure multiplied by the availability of each N data store to which it is connected. 

Further, the response of the services may be tied to the slowest of the runtime of Service A added to the slowest of the connected solutions.  If any of the N databases become slow enough, Service A may not respond at all. 

Because overall availability is negatively impacted, we consider Data Fan Out to be a microservice anti-pattern.

Microservice Anti-Pattern - Data Fan Out

One clear exception to the Data Fan Out anti-pattern is the highly parallelized querying done of multiple shards for the purpose of getting near linear response times out of large data sets (similar to one component of the MapReduce algorithm).  In a highly parallelized case such as this, we propose that each of the connections have a time-out set to disregard results from slowly responding data sets.  For this to work, the result set must be impervious to missing data.  As an example of an impervious result set, having most shards return for any internet search query is “good enough”.  A search for “plumber near me” returns 19/20ths of the “complete data”, where one shard out of 20 is either unavailable or very slow.  But having some transactions not present in an account query of transactions for a checking account may be a problem and therefore is not an example of a resilient data set.

Our preferred approach to resolve the Data Fan Out anti-pattern is to dedicate services to each unique data set.  This is possible whenever the two data sets do not need to be merged and when the service is performing two separate and otherwise isolatable functions (e.g. “Customer_Lookup” and “Catalog_Lookup”). 

Microservice Anti-Pattern Data Fan Out Solution - Split Service

When data sets are split for scale reasons, as is the case with data sets that have both an incredibly high volume of requests and a large amount of data, one can attempt to merge the queried data sets in the client.  The browser or mobile client can request each dataset in parallel and merge if successful.  This works when computational complexity of the merge is relatively low.

Microservice Anti-Pattern Data Fan Out Solution Client Side Aggregation

When client-side merging is not possible, we turn to the X Axis  of the Scale Cube for resolution.  Merge the data sets within the data store/persistence engine and rely on a split of reads and writes.  All writes occur to a single merged data store, and read replicas are employed for all reads.  The write and read services should be split accordingly and our infrastructure needs to correctly route writes to the write service and reads to the read service.  This is a valuable approach when we have high read to right ratios – fortunately the case in many solutions.  Note that we prefer to use asynchronous replication and allow the “slave” solutions to be “eventually consistent” - but ideally still within a tolerable time frame of milliseconds or a handful of seconds.

Microservice Anti-Pattern Data Fan Out Solution - Scale Cube X Axis Read Write Split


What about the case where a solution may have a high write to read ratio (exceptionally high writes), and data needs to be aggregated?  This rather unique case may be best solved by the Z axis of the AKF Scale Cube, splitting transactions along customer boundaries but ensuring the unification of the database for each customer (or region, or whatever “shard key” makes sense).  As with all Z axis shards, this not only allows faster response times (smaller data segments) but engenders high scalability and availability while also allowing us to put data “closer to the customer” using the service. 

Microservice Anti-Pattern Data Fan Out Solution - Scale Cube Y Axis Customer Split

AKF Partners helps companies create highly available, highly scalable, easily maintained and easily developed microservice architectures.  Give us a call - we can help!

Subscribe to the AKF Newsletter

Contact Us

Microservice Anti-Pattern: Service Fan Out

April 8, 2019  |  Posted By: Marty Abbott

This article is the second in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services, many of the mistakes or failure points teams create in services splits and the first anti pattern.

Fan Out, the topic of this microservice anti-pattern, exists when one service either serves as a proxy to two or more downstream services, or serves as an integration of two subsequent service calls. Any of the services (the proxy/integration service “A”, or constituent services “B” and “C”) can cause a failure of all services.  When service A fails, service B and C clearly can’t be called.  If either service B or C fails or becomes slow, they can affect service A by tying up communication ports.  Ultimately, under high call volume, service A may become unavailable due to problems with either B or C.

Further, the response of the services may be tied to the slowest responding service.  If A needs both B and C to respond to a request (as in the case of integration), then the speed at which A responds is tied to the slowest response times of B and C.  If service A merely proxies B or C, then extreme slowness in either may cause slowness in A and therefore slowness in all calls.

Because overall availability is negatively impacted, we consider Service Fan Out to be a microservice anti-pattern.

Microservice Anti-Pattern Service Fan Out


One approach to resolve the above anti-pattern is to employ true asynchronous messaging between services.  For this to be successful, the requesting service A must be capable of responding to a request without receiving any constituent service responses.  Unfortunately, this solution only works in some cases such as the case where service B is returning data that adds value to service A.  One such example is a recommendation engine that returns other items a user might like to purchase.  The absence of service B responding to A’s request for recommendations is unfortunate but doesn’t eliminate the value of A’s response completely.

Fix to Service Fan Out Anti-Pattern - Async Calls

As was the case with the Calls In Series Anti-Pattern, we may also be able to solve this anti-pattern with ”Libraries for Depth” pattern.

Fix to Service Fan Out Anti-Pattern - Libraries

Of course, each of the libraries also represents a constituent part that may fail for any call – but the number of moving parts for each constituent part decreases significantly relative to a separately deployed service call.  For instance, no network interface is required, no additional host and virtual VM is employed during the call, etc.  Additionally, call latency goes down without network interfaces.

The most common complaint about this pattern is that development teams cannot release independently.  But, as we all know, this problem has been fixed for quite some time with Unix, Linux and Windows dynamically loadable libraries (dlls, dls) and the like.

Finally, we can remove the proxy/integration service into the browser and make multiple browser requests.  Data returned from service B or service C can either be displayed in separate browser frames/divisions or can be evaluated and integrated using browser scripting (e.g. javascript).  We prefer this method whenever possible.  If A is simply serving as a proxy, the solution is relatively simple.  If A was serving as an integration/aggregation service then Service A’s logic must be moved into the browser/client.  Doing so creates complete fault isolation and allows the services to fail independently without an impact on each other.

Fix to Service Fan Out Anti-Pattern - Browser Fan Out

AKF Partners has helped to architect some of the most scalable, highly available, fault-tolerant and fastest response time solutions on the internet.  Give us a call - we can help.

Subscribe to the AKF Newsletter

Contact Us

 1 2 > 

Categories:

Most Popular: