June 24, 2019 | Posted By: Marty Abbott
The microservice CQRS pattern is most commonly employed to help scale what might be otherwise monolithic datastores. Per the X-axis of the AKF Scale Cube, a write instance of a datastore receives all creates, updates and deletes (collectively the “commands” within CQRS) while one or more read instances receive reads (the “queries” within CQRS).
Most relational (ACID compliant) databases have asynchronous and eventually consistent transfer mechanisms available to create “replicants” or “replica sets” of a “master” database. These are often called master databases (the write database) and slave databases (the read databases). The easiest implementation for CQRS then is to rely on native replication technology to create one or more slave databases with the same schema. The service is then separated into a write service and a read service, each with its own endpoint connected to the appropriate database.
Many NoSQL solutions also offer similar capabilities. Attribute names vary by implementation, but the user may identify the number of copies or replica sets for any piece of information. Further, the user can also often identify how these sets are to be used. MongoDB, for instance, allows users to identify from which elements of a replica set a read should be performed as well as the level of “synchronicity” between a write and the remaining read elements. Ideally, this level of synchronicity should be as loose as possible to maximize the value of BASE and avoid the limitations of Brewer’s Cap Theorem.
Highly tuned solutions, or solutions that need to support significant transaction volumes, may benefit from differentiation between the write and read schema instance. There may be several reasons for this, including:
- A subset of data that needs to be written compared to that which needs to be read. For instance, read data may include “static” elements that add no value to the (C)reat, (U)pdate or (D)elete process.
- An elimination of certain relationships in the read or write schema where those relationships add no value. Read databases, for instance, may need more relationships than the write in order to perform complex reporting.
- A reduction or significant difference in indices between those needed for writes (ideally small given that each index needs to be updated for each write, thereby increasing write response time) and those needed for reads.
In these cases, native replication capabilities in either ACID (relational) or BASE (NoSQL) datastores may not fulfill the desired outcomes. Engineers may need to rely on eventually consistent transfer technologies including queues or buses along with transformation compute capacity.
Another somewhat “advanced” concept within relational solutions is to use “Master-Master” replication technology as “Master-Slave”. At AKF, we prefer not to rely on multi-master solutions for distributing writes across multiple database instances. Very often, “in doubt” transactions can cause a pinging effect against multiple master databases and sometimes result in either starvation or deadlock (the Dining Philosopher’s Problem).
But if you implement a multi-master replication solution and use it as master-slave we avoid the concerns over transactional coherence and consistency with a single write database. Should that database fail, however, we can easily “swing” future transactions to the other master previously operating as a slave with a very low probability of conflict.
Benefits of CQRS
- Scalability of transactions at low development cost and comparatively low conceptual complexity (conceptual cost).
- Availability/redundancy of solutions.
- Distribution of X-axis (read copies) geographically for lower latency on read request.
Drawbacks to CQRS
- Multiple copies of data for scale, leading to potentially higher costs of goods sold
- Higher number of failures in a production environment (but typically with increasing levels of customer perceived availability).
- Reads are very often “eventually consistent” as opposed to immediately consistent. This approach will work for greater than 99.9% of use cases in our experience.
How to Use CQRS
- Split writes (commands) from reads (queries) both in actions/methods/services and the datastores they use.
- Implement an eventual consistency solution between the datastores.
- If implementing more than one read instance, implement a proxy or load balancer with a single service endpoint to “spray” requests across all instances.
What to NEVER do with CQRS
Never employ multiple “write masters” or “command masters” with CQRS where each master shares ownership of the same data elements.
AKF Partners has helped hundreds of companies implement new microservice architectures and migrate existing monolithic products to microservice architectures. Give us a call – we can help!
Subscribe to the AKF Newsletter
June 12, 2019 | Posted By: Marty Abbott
Strangler Pattern Overview
The microservice Strangler pattern is employed when teams migrate functionality from an old solution to one or more new implementations. While the pattern can be used for any old to new service migration, the diagram below depicts the common use case of migrating from a monolithic solution to a microservice implementation:
The solution above starts with a single monolithic application. Step one is to implement a proxy (or context switch) that can separate requests by type (Y-Axis or Service Segmentation), or if attempting to evaluate the solution for efficacy against an existing implementation, the X-Axis by transaction volume.
The pattern progresses with the monolith growing increasingly smaller as newly implemented services begin to take requests previously bound for the monolith.
Finally, the proxy is removed once the service disaggregation is completed.
Benefits of Strangler
- The Strangler pattern allows for graceful migration from a service to one or more replacement services.
- If implemented properly, with the ability to “roll back”, the pattern allows relatively low risk in migrating to new service(s).
- The pattern can be used for versioning of APIs.
- Similar to versioning above, the pattern can be used for legacy interactions (the old service remains for solutions that aren’t or won’t be upgraded).
Drawbacks to Strangler
- If implemented with a “new” or additional service – something in addition to a prior proxy or load balancer, the solution decreases availability through the multiplicative effect of failure.
- Per above, additional (new) services will increase latency.
How to Use Strangler
- Use the Strangler for versioning or migration to new services.
- Employ within an existing proxy – either a software-based load balancer (such as Nginx) or a layer 7 capable load balancer/context switch (F5 or Netscaler).
- If using an existing proxy is not possible, ensure the client is responsible for choosing the resource.
- Keep rules updated as services are migrated.
- Prune rules as no longer needed.
What to NEVER do with Strangler
Never employ a new service and increase the call depth with Strangler. Such a service is often called a “façade”. Your existing proxy or load balancer solutions likely give you this flexibility today.
Never allow the solution to become a bottleneck or a single point of failure. Always ensure that the solution is scalable along the X-axis for both availability and scalability.
The X-Axis approach takes a percentage of calls for any unique service endpoint and splits them between the old service and the new service. This approach is useful to A/B test the solution for end-user efficacy, response time, availability, and cost of operations (cost per transaction). It also allows for graceful “dialing up” or “dialing down” of the transaction volume (say from 1% to 51%).
This is the traditional usage for Strangler. It takes a monolith servicing N unique capabilities and partitions them along either verb/service boundaries (e.g. checkout separated from everything else) or noun/resource boundaries (e.g. catalog related actions from customer data actions).
If time allows, implement both the X and Y approach. X gives you rollback as well as A/B testing and significantly reduces your risk while Y allows service disaggregation.
AKF Partners has helped hundreds of companies implement and improve microservice based architectures. Give us a call, we can help you with your transition.
Subscribe to the AKF Newsletter
June 5, 2019 | Posted By: Marty Abbott
Sidecar Pattern Overview
The Sidecar Pattern is meant to allow the deployment of components of an application into separate, isolated, and encapsulated processes or containers. This pattern is especially useful when there is a benefit to sharing common components between microservices, as in the case of logging utilities, monitoring utilities, configuration routines, etc.
The Sidecar Pattern name is an analogy indicating the one-seat cars sometimes bolted alongside a motorcycle.
Benefits of Sidecar
Sidecar comes with many benefits:
- Use of multiple languages (polyglot) or technologies for each component. This is especially useful if a language is especially strong in a necessary area (e.g. Python for Machine Learning, or R for statistical work) or if an opensource solution can be leveraged to eliminate in-house specialization (e.g. the use of NGINX for certain network-related functions).
- Separation of what would otherwise be a monolith, and if used properly, fault isolation of associated services.
- Conceptually easy interactions between components similar to those provided by libraries, or service calls between microservices.
- Lower latency than traditional service calls to “other” services as the Sidecar lives in the same processing environment (VM or physical server) – albeit typically in a separate container.
- Similar to the use of libraries, allows for ownership by individual teams and organizational scalability of a larger team.
- Similar to the use of dynamically-loadable libraries, allows for independent release by teams of various shared usage components.
Drawbacks to Sidecar
Regardless of implementation (poly- or mono- glot), Sidecar has some drawbacks compared to the use of libraries:
- Higher inter-process communication latency – Because most implementations are service calls, the loopback interface on a system (127.0.0.1) will increase latency compared to the transition of call flow through memory.
- Size – especially in polyglot implementations but even in monoglot implementations – Containerization leads to multiple copies of similar libraries and increased memory utilization for comparable operations relative to the use of libraries.
- Environments – it is difficult to create any notion of fault isolation with Sidecar without containerization technologies. VM technologies (Sidecar in a VM separate from the host or calling solution) is not an option as it is then a Fan Out or Mesh anti-pattern rather than a local call.
When to Use Sidecar
Sidecar is a compelling alternative to libraries for cases where the increase in latency associated with local service messaging does not impact end-user response times. Examples of these are asynchronous logging, out of band monitoring, and asynchronous messaging capabilities. Circuit breakers (time-based request/response timeouts) are also a good example of a Sidecar implementation.
When to Avoid Sidecar
Never use a Sidecar Pattern for synchronous activities that must complete prior to generating a user response. Doing so will add some delay to end-user response times.
AKF also advises staying away from Sidecar for synchronous communications between services where doing so requires Sidecar to know all endpoints for each service. A specific example we advise against is having every endpoint (instance) of Service A (e.g. add-to-cart) know of every endpoint (instance) of Service B (e.g. decrement-SKU). A graphic of this example is given below:
The above graphic indicates the coordination between just two services and the instances that comprise that service. Imagine a case where all services may communicate to each other (as in the broader Mesh anti-pattern). Attempting to isolate faults becomes nearly impossible.
If Service A sometimes fails while calling Service B, how do you know which component is failing? Is it a failure in Service A, Service A’s Sidecar Proxy or Service B? Easier is to have a fewer number of proxies (albeit at a higher cost of latency given non-local communication) handle the transactions allowing for easier fault identification.
AKF Partners has helped hundreds of companies move from monolithic solutions to services and microservice architectures. Give us a call, we can help you with your transition.
Subscribe to the AKF Newsletter
May 29, 2019 | Posted By: Robin McGlothin
VMs vs Containers
Inefficiency and down time have traditionally kept CTO’s and IT decision makers up at night. Now, new challenges are emerging driven by infrastructure inflexibility and vendor lock-in, limiting Technology more than ever and making strategic decisions more complex than ever. Both VMs and containers can help get the most out of available hardware and software resources while easing the risk of vendor lock-in limitation.
Containers are the new kids on the block, but VMs have been, and continue to be, tremendously popular in data centers of all sizes. Having said that, the first lesson to learn, is containers are not virtual machines. When I was first introduced to containers, I thought of them as light weight or trimmed down virtual instances. This comparison made sense since most advertising material leaned on the concepts that containers use less memory and start much faster than virtual machines – basically marketing themselves as VMs. Everywhere I looked, Docker was comparing themselves to VMs. No wonder I was a bit confused when I started to dig into the benefits and differences between the two.
As containers evolved, they are bringing forth abstraction capabilities that are now being broadly applied to make enterprise IT more flexible. Thanks to the rise of Docker containers it’s now possible to more easily move workloads between different versions of Linux as well as orchestrate containers to create microservices. Much like containers, a microservice is not a new idea either. The concept harkens back to service-oriented architectures (SOA). What is different is that microservices based on containers are more granular and simpler to manage. More on this topic in a blog post for another day!
If you’re looking for the best solution for running your own services in the cloud, you need to understand these virtualization technologies, how they compare to each other, and what are the best uses for each. Here’s our quick read.
VM’s vs. Containers – What’s the real scoop?
One way to think of containers vs. VMs is that while VMs run several different operating systems on one server, container technology offers the opportunity to virtualize the operating system itself.
Figure 1 – Virtual Machine Figure 2 - Container
VMs help reduce expenses. Instead of running an application on a single server, a virtual machine enables utilizing one physical resource to do the job of many. Therefore, you do not have to buy, maintain and service several servers. Because there is one host machine, it allows you to efficiently manage all the virtual environments with a centralized tool – the hypervisor. The decision to use VMs is typically made by DevOps/Infrastructure Team. Containers help reduce expenses as well and they are remarkably lightweight and fast to launch. Because of their small size, you can quickly scale in and out of containers and add identical containers as needed.
Containers are excellent for Continuous Integration and Continuous Deployment (CI/CD) implementation. They foster collaborative development by distributing and merging images among developers. Therefore, developers tend to favor Containers over VMs. Most importantly, if the two teams work together (DevOps & Development) the decision on which technology to apply (VMs or Containers) can be made collaboratively with the best overall benefit to the product, client and company.
What are VMs?
The operating systems and their applications share hardware resources from a single host server, or from a pool of host servers. Each VM requires its own underlying OS, and the hardware is virtualized. A hypervisor, or a virtual machine monitor, is software, firmware, or hardware that creates and runs VMs. It sits between the hardware and the virtual machine and is necessary to virtualize the server.
IT departments, both large and small, have embraced virtual machines to lower costs and increase efficiencies. However, VMs can take up a lot of system resources because each VM needs a full copy of an operating system AND a virtual copy of all the hardware that the OS needs to run. This quickly adds up to a lot of RAM and CPU cycles. And while this is still more economical than bare metal for some applications this is still overkill and thus, containers enter the scene.
Benefits of VMs
• Reduced hardware costs from server virtualization
• Multiple OS environments can exist simultaneously on the same machine, isolated from each other.
• Easy maintenance, application provisioning, availability and convenient recovery.
• Perhaps the greatest benefit of server virtualization is the capability to move a virtual machine from one server to another quickly and safely. Backing up critical data is done quickly and effectively because you can effortlessly create a replication site.
Popular VM Providers
• VMware vSphere ESXi, VMware has been active in the virtual space since 1998 and is an industry leader setting standards for reliability, performance, and support.
• Oracle VM VirtualBox - Not sure what operating systems you are likely to use? Then VirtualBox is a good choice because it supports an amazingly wide selection of host and client combinations. VirtualBox is powerful, comes with terrific features and, best of all, it’s free.
• Xen - Xen is the open source hypervisor included in the Linux kernel and, as such, it is available in all Linux distributions. The Xen Project is one of the many open source projects managed by the Linux Foundation.
• Hyper-V - is Microsoft’s virtualization platform, or ‘hypervisor’, which enables administrators to make better use of their hardware by virtualizing multiple operating systems to run off the same physical server simultaneously.
• KVM - Kernel-based Virtual Machine (KVM) is an open source virtualization technology built into Linux. Specifically, KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).
What are Containers?
Containers are a way to wrap up an application into its own isolated ”box”. For the application in its container, it has no knowledge of any other applications or processes that exist outside of its box. Everything the application depends on to run successfully also lives inside this container. Wherever the box may move, the application will always be satisfied because it is bundled up with everything it needs to run.
Containers virtualize the OS instead of virtualizing the underlying computer like a virtual machine. They sit on top of a physical server and its host OS — typically Linux or Windows. Each container shares the host OS kernel and, usually, the binaries and libraries, too. Shared components are read-only. Sharing OS resources such as libraries significantly reduces the need to reproduce the operating system code and means that a server can run multiple workloads with a single operating system installation. Containers are thus exceptionally light — they are only megabytes in size and take just seconds to start. Compared to containers, VMs take minutes to run and are an order of magnitude larger than an equivalent container.
In contrast to VMs, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program. This means you can put two to three times as many applications on a single server with containers than you can with VMs. In addition, with containers, you can create a portable, consistent operating environment for development, testing, and deployment. This is a huge benefit to keep the environments consistent.
Containers help isolate processes through differentiation in the operating system namespace and storage. Leveraging operating system native capabilities, the container isolates process space, may create temporary file systems and relocate process “root” file system, etc.
Benefits of Containers
One of the biggest advantages to a container is the fact you can set aside less resources per container than you might per virtual machine. Keep in mind, containers are essentially for a single application while virtual machines need resources to run an entire operating system. For example, if you need to run multiple instances of MySQL, NGINX, or other services, using containers makes a lot of sense. If, however you need a full web server (LAMP) stack running on its own server, there is a lot to be said for running a virtual machine. A virtual machine gives you greater flexibility to choose your operating system and upgrading it as you see fit. A container by contrast, means that the container running the configured application is isolated in terms of OS upgrades from the host.
Popular Container Providers
1. Docker - Nearly synonymous with containerization, Docker is the name of both the world’s leading containerization platform and the company that is the primary sponsor of the Docker open source project.
2. Kubernetes - Google’s most significant contribution to the containerization trend is the open source containerization orchestration platform it created.
3. Although much of early work on containers was done on the Linux platform, Microsoft has fully embraced both Docker and Kubernetes containerization in general. Azure offers two container orchestrators Azure Kubernetes Service (AKS) and Azure Service Fabric. Service Fabric represents the next-generation platform for building and managing these enterprise-class, tier-1, applications running in containers.
4. Of course, Microsoft and Google aren’t the only vendors offering a cloud-based container service. Amazon Web Services (AWS) has its own EC2 Container Service (ECS).
5. Like the other major public cloud vendors, IBM Bluemix also offers a Docker-based container service.
6. One of the early proponents of container technology, Red Hat claims to be “the second largest contributor to the Docker and Kubernetes codebases,” and it is also part of the Open Container Initiative and the Cloud Native Computing Foundation. Its flagship container product is its OpenShift platform as a service (PaaS), which is based on Docker and Kubernetes.
Uses for VMs vs Uses for Containers
Both containers and VMs have benefits and drawbacks, and the ultimate decision will depend on your specific needs, but there are some general rules of thumb.
• VMs are a better choice for running applications that require all of the operating system’s resources and functionality when you need to run multiple applications on servers or have a wide variety of operating systems to manage.
• Containers are a better choice when your biggest priority is maximizing the number of applications running on a minimal number of servers.
Because of their small size and application orientation, containers are well suited for agile delivery environments and microservice-based architectures. When you use containers and microservices, however, you can easily have hundreds or thousands of components in your environment. You may be able to manually manage a few dozen virtual machines or physical servers, but there is no way you can manage a production-scale container environment without automation. The task of automating and managing a large number of containers and how they interact is known as orchestration.
Scalability of containerized workloads is a completely different process from VM workloads. Modern containers include only the basic services their functions require, but one of them can be a web server, such as NGINX, which also acts as a load balancer. An orchestration system, such as Google Kubernetes is capable of determining, based upon traffic patterns, when the quantity of containers needs to scale out; can replicate container images automatically; and can then remove them from the system.
For most, the ideal setup is likely to include both. With the current state of virtualization technology, the flexibility of VMs and the minimal resource requirements of containers work together to provide environments with maximum functionality.
If your organization is running many instances of the same operating system, then you should look into whether containers are a good fit. They just might save you significant time and money over VMs.
Subscribe to the AKF Newsletter
May 14, 2019 | Posted By: AKF
If AKF Partners had to be known for one thing and one thing only it would be the Scale Cube. An ingenious little model designed for companies to identify how scalable they are and set goals along any of the three axes to make their product more scalable. Based upon the amount of times I have said scalable, or a derivative of the word scale, it should lead you to the conclusion that the AKF Scale Cube is about scale. And you would be right. However, the beauty of the cube is that is also applicable to Security.
The X-Axis is usually the first axis that companies look at for scalability purposes. The concept of horizontal duplication is usually the easiest reach from a technological standpoint, however it tends to be fairly costly. This replication across various tiers (web, application or database) also insulates companies when the inevitable breach does occur. Planning for only security without also bracing for a data breach is a naive approach. With replication across the tiers, and even delayed replication to protect against data corruption, not only are you able to accommodate more customers, you now potentially have a clean copy replicated elsewhere if one of your systems gets compromised, assuming you are able to identify the breach early enough.
One of the costliest issues with a breach is recovery to a secure copy. Your company may take a hit publicity wise, but if you are able to bring your system back up to a clean state, identify the compromise and fix it, then you are can be back on your way to fully operational. The reluctant acceptance that breaches occur is making its way into the minds of people. If you are just open and forthright with them, the publicity issue around a breach tends to be lessened. Showing them that your system is back up, running and now more secure will help drive business in the right direction.
Splitting across services (the Y-Axis) has many benefits beyond just scalability. It provides ownership, accountability and segregation. Although difficult to implement, especially if coming from a monolithic base, the benefits of these micro-services help with security as well. Code bases that communicate via asynchronous calls not only allow a service to fail without a major impact to other services, it creates another layer for a potential intruder to traverse.
Steps that can be implemented to provide a defense in depth of your environment help slow/mitigate attackers. If asynchronous calls are used between micro-services each lateral or vertical movement is another opportunity to be stopped or detected. If services are small enough, then once access is gained threats have less access to data than may be ideal for what they are trying to accomplish.
Segmenting customers based upon similar characteristics (be it geography, spending habits, or even just a random selection) helps to achieve Z-Axis scalability. These pods of customers provide protection from a full data breach as well. Ideally no customer data would ever be exposed, but if you have 4 pods, 25% of customer data is better than 100%. And just like the Y-Axis, these splits aid with isolating attackers into only a subset of your environment. Various governing boards also have different procedures that need to be followed depending upon the nationality of the customer data exposed. If segmented based upon that (eg. EU vs USA) then how you respond to breaches can be managed differently.
Now I Know My X, Y, Z’s
Sometimes security can take a back seat to product development and other functions within a company. It tends to be an afterthought until that fateful day when something truly bad happens and someone gains unauthorized access to your network. Implementing a scalable environment via the AKF Scale Cube achieves a better overall product as well as a more secure one.
If you need assistance in reaching a more scalable and secure environment AKF is capable of helping.
Subscribe to the AKF Newsletter
May 8, 2019 | Posted By: Marty Abbott
This article is the sixth in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits. Articles two and three cover anti-patterns for service and data fan out respectively. The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor. The fifth article expands the fuse metaphor from service fuses to data fuses.
Howard Anton, the author of my college Calculus textbook, was fond of the following phrase: “It should be intuitively obvious to the casual observer….”. The clause immediately following that phrase was almost inevitably something that was not obvious to anyone – probably not even the author. Nevertheless, the phrase stuck with me, and I think I finally found a place where it can live up to its promise. The Service Mesh, the topic of this microservice anti-pattern, is the amalgamation of all the anti-patterns to date. It contains elements of calls in series, fuses and fan out. As such, it follows the rules and availability problems of each of those patterns and should be avoided at all costs.
This is where I need to be very clear, as I’m aware that the Service Mesh has a very large following. This article refers to a mesh as a grouping of services with request/reply relationships. Or, put another way, a “Mesh” is any solution that violates repeatedly the anti-patterns of “tree lights”, “fuses” or “fan out”. If you use “mesh” to mean a grouping of services that never call each other, you are not violating this anti-pattern.
The reason mesh patterns are a bad idea are many-fold:
1) Availability: At the extreme, the mesh is subject to the equation: [N∗(N−1)]/2. This equation represents the number of edges in a fully connected graph with N vertices or nodes. Asymptotically, this reduces to N2. To make availability calculations simple, the availability of a complete mesh can be calculated as the service with the lowest availability (A)^(N*N). If the lowest availability of a service with appropriate X-axis cloning (multiple instances) is 99.9, and the service mesh has 10 different services, the availability of your service mesh will approximate 99.910. That’s roughly a 99% availability – perhaps good enough for some solutions but horrible by most modern standards.
2) Troubleshooting: When every node can communicate with every other node, or when the “connectiveness” of a solution isn’t completely understood, how does one go about finding the ailing service causing a disruption? Because failures and slowness transit synchronous links, a failure or slowness in one or more services will manifest itself as failures and slowness in all services. Troubleshooting becomes very difficult. Good luck in isolating the bad actor.
3) Hygiene: I recall sitting through computer science classes 30 years ago and hearing the term “spaghetti code”. These days we’d probably just call it “crap”, but it refers to the meandering paths of poorly constructed code. Generally, it leads to difficulty in understanding, higher rates of defects, etc. Somewhere along the line, some idiot has brought this same approach to deployments. Again, borrowing from our friend Anton, it should be intuitively obvious to the casual observer that if it’s a bad practice in code it’s also a bad practice in deployment architectures.
4) Cost to Fix: If points 1 through 3 above aren’t enough to keep you away from connected service meshes, point 4 will hopefully help tip the scales. If you implement a connected mesh in an environment in which you require high availability, you will spend a significant amount of time and money refactoring it to relieve the symptoms it will cause. This amount may approximate your initial development effort as you remove each dependent anti-pattern (series, fuse, fan-out) with an appropriate pattern.
Fixing a mesh is not an easy task. One solution is to ensure that no service blocks waiting for a request to complete of any other service. Unfortunately, this pattern is not always easy or appropriate to implement.
Another solution is to deploy each service as service when it is responding to an end user request, and as a library for another service wherever needed.
Finally, you can traverse each service node and determine where services can be collapsed or any of the other patterns identified within the tree light, fuse, or fanout anti-patterns.
AKF Partners helps companies create scalable, fault tolerant, highly available and cost effective architectures to meet their product needs. Give us a call, we can help
Subscribe to the AKF Newsletter
May 8, 2019 | Posted By: Marty Abbott
This article is the fifth in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits. Articles two and three cover anti-patterns for service and data fan out respectively. The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor.
The Data Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed data store. This data store can be any persistence solution from physical file services, to a common storage area network, to relational (ACID) or NoSQL (BASE) databases. When the shared data solution “C” fails, service A and B fail as well. Similarly, when data solution “C” becomes slow, slowness under high demand propagates to services A and B.
As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of data service C. Service B’s theoretical availability is calculated similarly. Problems with service A can propagate to service B through the “fused” data element. For instance, if service A experiences a runaway scenario that completely consumes the capacity of data store C, service B will suffer either severe slowness or will become unavailable.
The easiest pattern solution for the data fuse is simply to merge the separate services. This makes the most sense if the services can be owned by the same team. While availability doesn’t significantly increase (service A can still affect service B, and the data store C still affects both), we don’t have the confusion of two services interacting through a fuse. But if the rate of change for each service indicates that it needs separate teams, we need to evaluate other options (see ”when to split services” for a discussion on drivers of services splits.
Another way to fix the anti-pattern is to use the X axis of the Scale Cube as it relates to databases. An easy example of this is the sharing of account data between a sign-up service and a sign-in (AUTHN and AUTHZ) service. In this example, given that sign-up is a write-based service and sign-in is a read based service we can use the X axis of the Scale Cube and split the services on a read and write basis. To the extent that B must also log activity, it can have separate tables or a separate schema that allows that logging. Note that the services supporting this split need not be unique - they can in fact be the exact same service - but the traffic they serve is properly segmented such that the read deployment receives only read traffic and the write deployment receives only write traffic.
If reads and writes aren’t an easily created X axis split, or if we need the organizational scale engendered by a Y-axis split, we need to be a bit more creative. An example pattern comes from the differences between add-to-cart and checkout in a commerce solution. Some functionality is shared between the components, including the notion of showing calculated sales tax and estimated shipping. Other functionality may be unique, such as heavy computation in add-to-cart for related and recommended items, and up-sale opportunities such as gift wrapping or expedited shipping in checkout. We also want to keep carts (session data) around in order to reach out to customers who have abandoned carts, but we don’t want this ephemeral clutter clogging the data of checkout. This argues for separation of data for temporal (response time) reasons. It also allows us to limit PCI compliance boundaries, removing services (add to cart) from the PCI evaluation landscape.
Transition from add-to-cart to checkout may be accomplished through the client browser, or done as an asynchronous back end transfer of state with the browser polling for completion so as to allow for good fault isolation. We refactor the datastore to separate data to services along the Y axis of the scale cube.
AKF Partners helps companies create scalable, fault tolerant, highly available and cost-effective architectures to meet their product needs. Give us a call, we can help.
Subscribe to the AKF Newsletter
April 27, 2019 | Posted By: Marty Abbott
This article is the fourth in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture). Many of the mistakes or failure points teams create in services splits. Articles two and three cover anti-patterns for service and data fan out respectively.
The Service Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed service pool. When the shared service “C” fails, service A and B fail as well. Similarly, when service “C” becomes slow, slowness under high demand propagates to services A and B.
As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of service C. Service B’s theoretical availability is calculated similarly. Under unusual conditions, the availability of A could also impact B similar to the way in which service fan out works. Such would be the case if A somehow holds threads for C, thereby starving it of threads to serve B.
Because overall availability is negatively impacted, we consider the Service Fuse to be a microservice anti-pattern.
The easiest and most common method to fault isolate the failure and response time propagation of Service C is to deploy it separately (in separate pools) for both Service A and B. In doing so, we ensure that C does not fail for one service as a result of unusual demand from the other. We also isolate failures due to unique requests that might be made by either A or B. In doing so, we do incur some additional operational costs and additional coordination and overhead in releases. But assuming proper automation, the availability and response time improvements are often worth the minor effort.
As with many of our other anti-patterns we can also employ dynamically loadable libraries rather than separate service deployments. While this approach has some of the slight overhead (again assuming proper automation) of the above separate service deployments, it often also benefits from significant server-side response time decreases associated with network transit.
We often see teams over emphasizing the cost of additional deployments. But the separate service deployment or dynamically loadable library deployment seldom results in significantly greater effort. Splitting the capacity of a shared pool relative to the demand split between services A and B (e.g. 50/50, 90/10, etc) and adding a small number of additional services for capacity is the real implication of such a split. Is 5 to 10% additional operational cost and seconds of additional deployment time worth the significant increase in availability? Our experience is that most of the time it is.
Subscribe to the AKF Newsletter
April 21, 2019 | Posted By: Marty Abbott
This article is the third in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services, many of the mistakes or failure points teams create in services splits and the first anti pattern. The second article, Service Fan Out discusses the anti-pattern of a single service acting as a proxy or aggregator of mulitple services.
Data Fan Out, the topic of this microservice anti-pattern, exists when a service relies on two or more persistence engines with categorically unique data, or categorically similar data that is not meant to be processed in parallel. “Categorically Unique” means that the data is in no way related. Examples of categorical uniqueness would be a database that stores customer data and a separate database that stores catalog data. Instances of the same data, such as two separate databases each storing half of product catalog, are not categorically unique. Splitting of similar data is often known as sharding. Such “sharded” instances only violate the Data Fan Out pattern if:
1) They are accessed in series (database 1 is accessed and subsequently database 2 is accessed) –or-
2) A failure or slowness in either database, even if accessed in parallel, will result in a very slow or unavailable service.
Persistence engine means anything that stores data as in the case of a relational database, a NoSQL database, a persistent off-system cache, etc.
Anytime a service relies on more than one persistence engine to perform a task, it is subject to lower availability and a response time equivalent to the slower of the N data stores to which it is connected. Like the Service Fan Out anti-pattern, the availability of the resulting service (“Service A”) is the product of the availability of the service and its constituent infrastructure multiplied by the availability of each N data store to which it is connected.
Further, the response of the services may be tied to the slowest of the runtime of Service A added to the slowest of the connected solutions. If any of the N databases become slow enough, Service A may not respond at all.
Because overall availability is negatively impacted, we consider Data Fan Out to be a microservice anti-pattern.
One clear exception to the Data Fan Out anti-pattern is the highly parallelized querying done of multiple shards for the purpose of getting near linear response times out of large data sets (similar to one component of the MapReduce algorithm). In a highly parallelized case such as this, we propose that each of the connections have a time-out set to disregard results from slowly responding data sets. For this to work, the result set must be impervious to missing data. As an example of an impervious result set, having most shards return for any internet search query is “good enough”. A search for “plumber near me” returns 19/20ths of the “complete data”, where one shard out of 20 is either unavailable or very slow. But having some transactions not present in an account query of transactions for a checking account may be a problem and therefore is not an example of a resilient data set.
Our preferred approach to resolve the Data Fan Out anti-pattern is to dedicate services to each unique data set. This is possible whenever the two data sets do not need to be merged and when the service is performing two separate and otherwise isolatable functions (e.g. “Customer_Lookup” and “Catalog_Lookup”).
When data sets are split for scale reasons, as is the case with data sets that have both an incredibly high volume of requests and a large amount of data, one can attempt to merge the queried data sets in the client. The browser or mobile client can request each dataset in parallel and merge if successful. This works when computational complexity of the merge is relatively low.
When client-side merging is not possible, we turn to the X Axis of the Scale Cube for resolution. Merge the data sets within the data store/persistence engine and rely on a split of reads and writes. All writes occur to a single merged data store, and read replicas are employed for all reads. The write and read services should be split accordingly and our infrastructure needs to correctly route writes to the write service and reads to the read service. This is a valuable approach when we have high read to right ratios – fortunately the case in many solutions. Note that we prefer to use asynchronous replication and allow the “slave” solutions to be “eventually consistent” - but ideally still within a tolerable time frame of milliseconds or a handful of seconds.
What about the case where a solution may have a high write to read ratio (exceptionally high writes), and data needs to be aggregated? This rather unique case may be best solved by the Z axis of the AKF Scale Cube, splitting transactions along customer boundaries but ensuring the unification of the database for each customer (or region, or whatever “shard key” makes sense). As with all Z axis shards, this not only allows faster response times (smaller data segments) but engenders high scalability and availability while also allowing us to put data “closer to the customer” using the service.
AKF Partners helps companies create highly available, highly scalable, easily maintained and easily developed microservice architectures. Give us a call - we can help!
Subscribe to the AKF Newsletter
April 8, 2019 | Posted By: Marty Abbott
This article is the second in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services, many of the mistakes or failure points teams create in services splits and the first anti pattern.
Fan Out, the topic of this microservice anti-pattern, exists when one service either serves as a proxy to two or more downstream services, or serves as an integration of two subsequent service calls. Any of the services (the proxy/integration service “A”, or constituent services “B” and “C”) can cause a failure of all services. When service A fails, service B and C clearly can’t be called. If either service B or C fails or becomes slow, they can affect service A by tying up communication ports. Ultimately, under high call volume, service A may become unavailable due to problems with either B or C.
Further, the response of the services may be tied to the slowest responding service. If A needs both B and C to respond to a request (as in the case of integration), then the speed at which A responds is tied to the slowest response times of B and C. If service A merely proxies B or C, then extreme slowness in either may cause slowness in A and therefore slowness in all calls.
Because overall availability is negatively impacted, we consider Service Fan Out to be a microservice anti-pattern.
One approach to resolve the above anti-pattern is to employ true asynchronous messaging between services. For this to be successful, the requesting service A must be capable of responding to a request without receiving any constituent service responses. Unfortunately, this solution only works in some cases such as the case where service B is returning data that adds value to service A. One such example is a recommendation engine that returns other items a user might like to purchase. The absence of service B responding to A’s request for recommendations is unfortunate but doesn’t eliminate the value of A’s response completely.
As was the case with the Calls In Series Anti-Pattern, we may also be able to solve this anti-pattern with ”Libraries for Depth” pattern.
Of course, each of the libraries also represents a constituent part that may fail for any call – but the number of moving parts for each constituent part decreases significantly relative to a separately deployed service call. For instance, no network interface is required, no additional host and virtual VM is employed during the call, etc. Additionally, call latency goes down without network interfaces.
The most common complaint about this pattern is that development teams cannot release independently. But, as we all know, this problem has been fixed for quite some time with Unix, Linux and Windows dynamically loadable libraries (dlls, dls) and the like.
AKF Partners has helped to architect some of the most scalable, highly available, fault-tolerant and fastest response time solutions on the internet. Give us a call - we can help.
Subscribe to the AKF Newsletter
< 1 2 3 4 >