GROWTH BLOG: The Best Definition of Done for Agile
AKF Partners Logo Technology ConsultingScalability - We wrote the book on it ℠

Growth Blog

Scalability and Technology Consulting Advice for SaaS and Technology Companies

Microservice Bulkhead Pattern - Dos and Don'ts

June 27, 2019  |  Posted By: Marty Abbott

Bulkhead Pattern Overview

Bulkheads in ships separate components or sections of a ship such that if one portion of a ship is breached, flooding can be contained to that section.  Once contained, the ship can continue operations without risk of sinking.  In this fashion, ship bulkheads perform a similar function to physical building firewalls, where the firewall is meant to contain a fire to a specific section of the building.

The microservice bulkhead pattern is analogous to the bulkhead on a ship.  By separating both functionality and data, failures in some component of a solution do not propagate to other components.  This is most commonly employed to help scale what might be otherwise monolithic datastores.  The bulkhead is then a pattern for implementing the AKF principle of “swimlanes” or fault isolation.

Bulkhead pattern usage


Problems the Bulkhead Pattern Fixes

The bulkhead pattern helps to fix a number of different quality of service related issues.

  • Propagation of Failure:  Because solutions are contained and do not share resources (storage, synchronous service-to-service calls, etc), their associated failures are contained and do not propagate.  When a service suffers a programmatic (software) or infrastructure failure, no other service is disrupted.
  • Noisy Neighbors:  If implemented properly, network, storage and compute segmentation ensure that abnormally large resource utilization by a service does not affect other services outside of the bulkhead (fault isolation zone).
  • Unusual Demand:  The bulkhead protects other resources from services experiencing unpredicted or unusual demand.  Other resources do not suffer from TCP port saturation, resulting database deterioration, etc.

Principles to Apply

  1. Share Nearly Nothing:  As much as possible, services that are fault isolated or placed within a bulkhead should not share databases, firewalls, storage, load balancers, etc.  Budgetary constraints may limit the application of unique infrastructure to these services.  The following diagram helps explain what should never be shared, and what may be shared for cost purposes.  The same principles apply, to the extent that they can be managed, within IaaS or PaaS implementations.
  2. Bulkhead pattern usage
  3. Avoid synchronous calls to other services:  Service to service calls extend the failure domain of a bulkhead.  Failures and slowness transit blocking synchronous calls and therefore violate the protection offered by a bulkhead.

Put another way, the dimensions of a bulkhead or failure domain is the largest boundary across which no critical infrastructure is shared, and no synchronous inter-service calls exist. 

Anti-Patterns to Avoid

The following anti-patterns each rely on either synchronous service to service communication or sharing of data solutions.  As such, they represent solutions that should not be present within a bulkhead.

When to use the Bulkhead Pattern

  • Apply the bulkhead pattern whenever you want to scale a service independent of other services.
  • Apply the bulkhead pattern to fault isolate components of varying risk or availability requirements.
  • Apply the bulkhead pattern to isolate geographies for the purposes of increased speed/reduced latency such that distant solutions do not share or communicate and thereby slow response times.

AKF Partners has helped hundreds of companies implement new microservice architectures and migrate existing monolithic products to microservice architectures.  Give us a call – we can help!

Permalink

Command and Query Responsibility Segregation (CQRS): Dos and Don’ts

June 24, 2019  |  Posted By: Marty Abbott

CQRS Overview

The microservice CQRS pattern is most commonly employed to help scale what might be otherwise monolithic datastores.  Per the X-axis of the AKF Scale Cube, a write instance of a datastore receives all creates, updates and deletes (collectively the “commands” within CQRS) while one or more read instances receive reads (the “queries” within CQRS).

Simple Implementations

Most relational (ACID compliant) databases have asynchronous and eventually consistent transfer mechanisms available to create “replicants” or “replica sets” of a “master” database.  These are often called master databases (the write database) and slave databases (the read databases).  The easiest implementation for CQRS then is to rely on native replication technology to create one or more slave databases with the same schema.  The service is then separated into a write service and a read service, each with its own endpoint connected to the appropriate database.

Command and Query Responsibility Segregation Pattern

Many NoSQL solutions also offer similar capabilities.  Attribute names vary by implementation, but the user may identify the number of copies or replica sets for any piece of information.  Further, the user can also often identify how these sets are to be used.  MongoDB, for instance, allows users to identify from which elements of a replica set a read should be performed as well as the level of “synchronicity” between a write and the remaining read elements.  Ideally, this level of synchronicity should be as loose as possible to maximize the value of BASE and avoid the limitations of Brewer’s Cap Theorem.

Advanced Implementations

Highly tuned solutions, or solutions that need to support significant transaction volumes, may benefit from differentiation between the write and read schema instance.  There may be several reasons for this, including:

  • A subset of data that needs to be written compared to that which needs to be read.  For instance, read data may include “static” elements that add no value to the (C)reat, (U)pdate or (D)elete process.
  • An elimination of certain relationships in the read or write schema where those relationships add no value.  Read databases, for instance, may need more relationships than the write in order to perform complex reporting.
  • A reduction or significant difference in indices between those needed for writes (ideally small given that each index needs to be updated for each write, thereby increasing write response time) and those needed for reads.

In these cases, native replication capabilities in either ACID (relational) or BASE (NoSQL) datastores may not fulfill the desired outcomes.  Engineers may need to rely on eventually consistent transfer technologies including queues or buses along with transformation compute capacity. 

Another somewhat “advanced” concept within relational solutions is to use “Master-Master” replication technology as “Master-Slave”.  At AKF, we prefer not to rely on multi-master solutions for distributing writes across multiple database instances.  Very often, “in doubt” transactions can cause a pinging effect against multiple master databases and sometimes result in either starvation or deadlock (the Dining Philosopher’s Problem). 

But if you implement a multi-master replication solution and use it as master-slave we avoid the concerns over transactional coherence and consistency with a single write database.  Should that database fail, however, we can easily “swing” future transactions to the other master previously operating as a slave with a very low probability of conflict.

Benefits of CQRS

  • Scalability of transactions at low development cost and comparatively low conceptual complexity (conceptual cost).
  • Availability/redundancy of solutions.
  • Distribution of X-axis (read copies) geographically for lower latency on read request.

Drawbacks to CQRS

  • Multiple copies of data for scale, leading to potentially higher costs of goods sold
  • Higher number of failures in a production environment (but typically with increasing levels of customer perceived availability).
  • Reads are very often “eventually consistent” as opposed to immediately consistent.  This approach will work for greater than 99.9% of use cases in our experience.

How to Use CQRS

  • Split writes (commands) from reads (queries) both in actions/methods/services and the datastores they use.
  • Implement an eventual consistency solution between the datastores.
  • If implementing more than one read instance, implement a proxy or load balancer with a single service endpoint to “spray” requests across all instances.

What to NEVER do with CQRS

Never employ multiple “write masters” or “command masters” with CQRS where each master shares ownership of the same data elements.

AKF Partners has helped hundreds of companies implement new microservice architectures and migrate existing monolithic products to microservice architectures.  Give us a call – we can help!

Permalink

Transforming to SaaS is not just a technology change

June 19, 2019  |  Posted By: Larry Steinberg


Transforming a traditional on-premise product and company to a SaaS model is currently in vogue and has many broad-reaching benefits for both producers and consumers of services. These benefits span financial, supportability, and consumption simplification. 

In order to achieve the SaaS benefits, your company must address the broad changes necessitated by SaaS and not just the product delivery and technology changes. You must consider the impact on employees, delivery, operations/security, professional services, go to market, and financial impacts.

Employee Transformation

The employee base is one key element of change – moving from a traditional ‘boxed’ software vendor to an ‘as a Service’ company changes not only the skill set but also the dynamics of the engagement with your customers. Traditionally staff has been accustomed to having an arms-length approach for the majority of customer interactions. This traditional process has consisted of a factory that’s building the software, bundling, and a small group (if any) who ‘touch’ the customer. As you move ‘to as a service’ based product, the onus on ensuring the solution is available 24x7 is on everyone within your SaaS company. Not only do you require the skillsets to ensure infrastructure and reliability – but also the support model can and should change.

Now that you are building SaaS solutions and deploy them into environments you control, the operations are much ‘closer’ to the engineers who are deploying builds. Version upgrades happen faster, defects will surface more rapidly, and engineers can build in monitoring to detect issues before or as your customers encounter them. Fixes can be provided much faster than in the past and strict separation of support and engineering organizations are no longer warranted. Escalations across organizations and separate repro steps can be collapsed. There is a significant cultural shift for your staff that has to be managed properly. It will not be natural for legacy staff to adopt a 24x7 mindset and newly minted SaaS engineers likely don’t have the industry or technology experience needed. Finding a balance of shifting culture, training, and new ‘blood’ into the organization is the best course of action.

Passion For Service Delivery

Having Services available 24x7 requires a real passion for service delivery as teams no longer have to wait for escalations from customers, now engineers control the product and operating environment. This means they can see availability and performance in real time. Staff should be proactive about identifying issues or abnormalities in the service. In order to do this the health and performance of what they built needs to be at the forefront of their everyday activities. This mindset is very different than the traditional onpremise software delivery model.

Security Implications

Shifting the operations from your customer environments to your own environments also has a security aspect. Operating the SaaS environment for customers shifts the liability from them to you. The security responsibilities expand to include protecting your customer data, hardening the environments, having a practiced plan of incident remediation, and rapid response to identified vulnerabilities in the environment or application.

Finance & Accounting

Finance and accounting are also impacted by this shift to SaaS - both on the spend/capitalization strategy as well as cost recovery models. The operational and security components are a large cost consideration which has been shifted from your customers to your organization and needs to be modeling into SaaS financials. Pricing and licensing shifts are also very common. Moving to a utility or consumption model is fairly mainstream but is generally new for the traditional product company and its customers. Traditional billing models with annual invoices might not fit with the new approach and systems + processes will need to be enhanced to handle. If you move to a utility-based model both the product and accounting teams need to partner on a solution to ensure you get paid appropriately by your customers.

Customer Service

Think through the impacts on your customer support team. Given the speed at which new releases and fixes become available the support team will need a new model to ensure they remain up to date as these delivery timeframes will be much more rapid than in the past and they must stay ahead of your customer base.

Your go to market strategies will most likely also need to be altered depending on the market and industry. To your benefit, as a SaaS company, you now have access to customer behavior and can utilize this data in order to approach opportunities within the customer base. Regarding migration, you’ll need a plan which ensures you are the best option amongst your competitors.

Most times at AKF we see companies who have only focused on product and technology changes when moving to SaaS but if the whole company doesn’t move in lockstep then progress will be limited.  You are only as strong as your weakest link.

We’ve helped companies of all sizes transition their technology – AND organization – from on-premises to the cloud through SaaS conversion. Give us a call – we can help!

 

Permalink

Strangler Pattern: Dos and Don’ts

June 12, 2019  |  Posted By: Marty Abbott

Strangler Pattern Overview

The microservice Strangler pattern is employed when teams migrate functionality from an old solution to one or more new implementations.  While the pattern can be used for any old to new service migration, the diagram below depicts the common use case of migrating from a monolithic solution to a microservice implementation:

Depiction of the strangler pattern and progression over time during migration

The solution above starts with a single monolithic application.  Step one is to implement a proxy (or context switch) that can separate requests by type (Y-Axis or Service Segmentation), or if attempting to evaluate the solution for efficacy against an existing implementation, the X-Axis by transaction volume. 
The pattern progresses with the monolith growing increasingly smaller as newly implemented services begin to take requests previously bound for the monolith. 
Finally, the proxy is removed once the service disaggregation is completed.

Benefits of Strangler

  • The Strangler pattern allows for graceful migration from a service to one or more replacement services.
  • If implemented properly, with the ability to “roll back”, the pattern allows relatively low risk in migrating to new service(s).
  • The pattern can be used for versioning of APIs.
  • Similar to versioning above, the pattern can be used for legacy interactions (the old service remains for solutions that aren’t or won’t be upgraded).

Drawbacks to Strangler

  • If implemented with a “new” or additional service – something in addition to a prior proxy or load balancer, the solution decreases availability through the multiplicative effect of failure.
  • Per above, additional (new) services will increase latency.

How to Use Strangler

  • Use the Strangler for versioning or migration to new services.
  • Employ within an existing proxy – either a software-based load balancer (such as Nginx) or a layer 7 capable load balancer/context switch (F5 or Netscaler).
  • If using an existing proxy is not possible, ensure the client is responsible for choosing the resource.
  • Keep rules updated as services are migrated.
  • Prune rules as no longer needed.

What to NEVER do with Strangler

Never employ a new service and increase the call depth with Strangler.  Such a service is often called a “façade”.  Your existing proxy or load balancer solutions likely give you this flexibility today.

Facade anti-pattern used with Strangler Microservice Pattern

Never allow the solution to become a bottleneck or a single point of failure.  Always ensure that the solution is scalable along the X-axis for both availability and scalability.

X-Axis Approach

The X-Axis approach takes a percentage of calls for any unique service endpoint and splits them between the old service and the new service.  This approach is useful to A/B test the solution for end-user efficacy, response time, availability, and cost of operations (cost per transaction).  It also allows for graceful “dialing up” or “dialing down” of the transaction volume (say from 1% to 51%).

Y-Axis Approach

This is the traditional usage for Strangler.  It takes a monolith servicing N unique capabilities and partitions them along either verb/service boundaries (e.g. checkout separated from everything else) or noun/resource boundaries (e.g. catalog related actions from customer data actions).

Best Approach

If time allows, implement both the X and Y approach.  X gives you rollback as well as A/B testing and significantly reduces your risk while Y allows service disaggregation.
AKF Partners has helped hundreds of companies implement and improve microservice based architectures.  Give us a call, we can help you with your transition.

Permalink

Achieving Team Autonomy not Anarchy

June 10, 2019  |  Posted By: AKF

Agile teams sometimes struggle with the meaning and value of autonomy within a team.  Is it the autonomy to decide how best to accomplish a goal?  Is it the autonomy to choose what tools and processes they will employ to accomplish that goal?  Is it both?  To answer this, we need to first determine where autonomy drives value creation and where it destroys value within an enterprise.


Getting Team Autonomy over Anarchy

Consider the following analogy:  We have the autonomy to determine what roads and paths we will take to a destination when driving a vehicle, but we are governed by speed limits, right of way (which side to drive on, stop signs, stop signals and other road signs), emission standards, and both vehicle and personal licensing.  Put another way, we are completely empowered to determine WHAT path we take, and WHY we take it.  We are much more limited in HOW we get to a place (on road, off road, speed, only licensed vehicles, etc) and WHO can do it (only sober, licensed drivers).  How does this apply to a fully autonomous technology team? 

The value-creation of autonomy within a team deals with what path a team takes to accomplish a goal and the reasoning as to why that approach is valuable.  A failure to provide structure and rules for how something is accomplished through architectural principles, coding standards, development standards, etc. can start to escalate the costs of development and destroy value.  Also, not sharing best practices through standards, means that teams are bound to repeat mistakes and cause customer interruptions.  While there is a narrow line between autonomy and anarchy, the difference on their effect in an organization is significant.

Consider the matrix below which distinguishes between decision making (What path gets taken to a result and Why that path is taken) and governance (the rules around How and Who should do something).  In an Autocratic organization there is a high amount of Governance controlled via a Top-Down method of Management.  Agile teams here are bounded by how they can accomplish a goal and crippled by what path to take because of the heavy involvement of Management.  Here is where you find organizations that are described derogatorily as Bureaucratic.

On the opposite side of the spectrum, if an Agile shop finds itself unbounded by Governance and capable of making all decisions then you run into the possibility of anarchy.  This type of environment is what allows start-ups to thrive in their early days, but once you have moved beyond a handful of employees, it is necessary to start building governance. 

AKF Autonomous Teams

Where companies with Agile succeed is when Governance is in place and decisions are driven from the bottom.  Ownership and appropriately known boundaries allow Agile teams to deftly maneuver and get after the concept of “Build Small, Fail Fast”.  Lastly, it is important that the lessons learned from all of the Autonomous teams are continually brought up and shared across the multitude of products and teams you have.  It is a waste of time to reinvent the wheel when another team has already solved the problem.


Empowering Teams
Teams should be built around the suggestions from AKF’s white paper on Organizing Product Teams for Innovation: small, cross functional teams built around a service, who are empowered to be autonomous and work independently on their own.  Autonomy should be defined within the rules of the organization, inform the organization’s architectural principles, and drive adherence through leadership.  This is by no means a notion that the organization should avoid cross functional empowered teams!  As we state in our white paper, “We still have executives developing strategy, functional teams (product management, etc.) defining subservient strategies and road maps.  But the primary identities of these folks are embedded within the teams that implement these solutions.”

It is also easy to confuse the notion of empowerment and autonomy.  Empowerment is an action of delegation coupled with the assurance of resources and tools to complete the desired outcomes to the delegated party.  It is only through empowerment that autonomy can be achieved within an organizational hierarchy.  Further, it is only through empowerment that autonomous agile teams can be established.  But both empowerment and autonomy need to have rules governing their action or issuance.  Specifically, an agile team is empowered to be autonomous with the following constraints:  following development protocols and standards, adhering to architectural principals, adhering to established best practices regarding test coverage, etc. and ensuring that you achieve the “non-functional requirements” codified within the Agile Definition of Done.



Have your autonomous technology teams been free to make decisions that do not align with the vision of the company?  Are you fearful of switching to Agile because of the rampant anarchy they will exhibit?  AKF Partners can help ensure that your organization is aligned to the outcomes it desires.

Permalink

Sidecar Pattern: The Dos and Don’ts

June 5, 2019  |  Posted By: Marty Abbott

Man on motorcycle with dog in sidecar Shutterstock Image 1084641965 purchased by AKF Partners June 5 2019

Sidecar Pattern Overview

The Sidecar Pattern is meant to allow the deployment of components of an application into separate, isolated, and encapsulated processes or containers.  This pattern is especially useful when there is a benefit to sharing common components between microservices, as in the case of logging utilities, monitoring utilities, configuration routines, etc.

The Sidecar Pattern name is an analogy indicating the one-seat cars sometimes bolted alongside a motorcycle. 

Benefits of Sidecar

Sidecar comes with many benefits:

  • Use of multiple languages (polyglot) or technologies for each component.  This is especially useful if a language is especially strong in a necessary area (e.g. Python for Machine Learning, or R for statistical work) or if an opensource solution can be leveraged to eliminate in-house specialization (e.g. the use of NGINX for certain network-related functions).
  • Separation of what would otherwise be a monolith, and if used properly, fault isolation of associated services.
  • Conceptually easy interactions between components similar to those provided by libraries, or service calls between microservices.
  • Lower latency than traditional service calls to “other” services as the Sidecar lives in the same processing environment (VM or physical server) – albeit typically in a separate container.
  • Similar to the use of libraries, allows for ownership by individual teams and organizational scalability of a larger team.
  • Similar to the use of dynamically-loadable libraries, allows for independent release by teams of various shared usage components.

Drawbacks to Sidecar

Regardless of implementation (poly- or mono- glot), Sidecar has some drawbacks compared to the use of libraries:

  • Higher inter-process communication latency – Because most implementations are service calls, the loopback interface on a system (127.0.0.1) will increase latency compared to the transition of call flow through memory.
  • Size – especially in polyglot implementations but even in monoglot implementations – Containerization leads to multiple copies of similar libraries and increased memory utilization for comparable operations relative to the use of libraries.
  • Environments – it is difficult to create any notion of fault isolation with Sidecar without containerization technologies.  VM technologies (Sidecar in a VM separate from the host or calling solution) is not an option as it is then a Fan Out or Mesh anti-pattern rather than a local call.

When to Use Sidecar

Sidecar is a compelling alternative to libraries for cases where the increase in latency associated with local service messaging does not impact end-user response times.  Examples of these are asynchronous logging, out of band monitoring, and asynchronous messaging capabilities.  Circuit breakers (time-based request/response timeouts) are also a good example of a Sidecar implementation.

When to Avoid Sidecar

Never use a Sidecar Pattern for synchronous activities that must complete prior to generating a user response.  Doing so will add some delay to end-user response times.

AKF also advises staying away from Sidecar for synchronous communications between services where doing so requires Sidecar to know all endpoints for each service.  A specific example we advise against is having every endpoint (instance) of Service A (e.g. add-to-cart) know of every endpoint (instance) of Service B (e.g. decrement-SKU).  A graphic of this example is given below:

Sidecar is useful for several components but do not use it for allowing every endpoint to communicate to every other endpoint

The above graphic indicates the coordination between just two services and the instances that comprise that service.  Imagine a case where all services may communicate to each other (as in the broader Mesh anti-pattern).  Attempting to isolate faults becomes nearly impossible.

If Service A sometimes fails while calling Service B, how do you know which component is failing?  Is it a failure in Service A, Service A’s Sidecar Proxy or Service B?  Easier is to have a fewer number of proxies (albeit at a higher cost of latency given non-local communication) handle the transactions allowing for easier fault identification.

AKF Partners has helped hundreds of companies move from monolithic solutions to services and microservice architectures.  Give us a call, we can help you with your transition.

Permalink

What's the difference between VMs & Containers?

May 29, 2019  |  Posted By: AKF

VMs vs Containers

Inefficiency and down time have traditionally kept CTO’s and IT decision makers up at night.  Now, new challenges are emerging driven by infrastructure inflexibility and vendor lock-in, limiting Technology more than ever and making strategic decisions more complex than ever.  Both VMs and containers can help get the most out of available hardware and software resources while easing the risk of vendor lock-in limitation. 

Containers are the new kids on the block, but VMs have been, and continue to be, tremendously popular in data centers of all sizes.  Having said that, the first lesson to learn, is containers are not virtual machines.  When I was first introduced to containers, I thought of them as light weight or trimmed down virtual instances.  This comparison made sense since most advertising material leaned on the concepts that containers use less memory and start much faster than virtual machines – basically marketing themselves as VMs.  Everywhere I looked, Docker was comparing themselves to VMs.  No wonder I was a bit confused when I started to dig into the benefits and differences between the two.

As containers evolved, they are bringing forth abstraction capabilities that are now being broadly applied to make enterprise IT more flexible. Thanks to the rise of Docker containers it’s now possible to more easily move workloads between different versions of Linux as well as orchestrate containers to create microservices.  Much like containers, a microservice is not a new idea either. The concept harkens back to service-oriented architectures (SOA). What is different is that microservices based on containers are more granular and simpler to manage. More on this topic in a blog post for another day!
If you’re looking for the best solution for running your own services in the cloud, you need to understand these virtualization technologies, how they compare to each other, and what are the best uses for each. Here’s our quick read.

VM’s vs. Containers – What’s the real scoop?

One way to think of containers vs. VMs is that while VMs run several different operating systems on one server, container technology offers the opportunity to virtualize the operating system itself.


                                                               
      Figure 1 – Virtual Machine                                                         Figure 2 - Container    

VMs help reduce expenses. Instead of running an application on a single server, a virtual machine enables utilizing one physical resource to do the job of many. Therefore, you do not have to buy, maintain and service several servers.  Because there is one host machine, it allows you to efficiently manage all the virtual environments with a centralized tool – the hypervisor. The decision to use VMs is typically made by DevOps/Infrastructure Team.  Containers help reduce expenses as well and they are remarkably lightweight and fast to launch.  Because of their small size, you can quickly scale in and out of containers and add identical containers as needed. 

Containers are excellent for Continuous Integration and Continuous Deployment (CI/CD) implementation. They foster collaborative development by distributing and merging images among developers.  Therefore, developers tend to favor Containers over VMs.  Most importantly, if the two teams work together (DevOps & Development) the decision on which technology to apply (VMs or Containers) can be made collaboratively with the best overall benefit to the product, client and company.

What are VMs?

The operating systems and their applications share hardware resources from a single host server, or from a pool of host servers. Each VM requires its own underlying OS, and the hardware is virtualized. A hypervisor, or a virtual machine monitor, is software, firmware, or hardware that creates and runs VMs. It sits between the hardware and the virtual machine and is necessary to virtualize the server.

IT departments, both large and small, have embraced virtual machines to lower costs and increase efficiencies.  However, VMs can take up a lot of system resources because each VM needs a full copy of an operating system AND a virtual copy of all the hardware that the OS needs to run.  This quickly adds up to a lot of RAM and CPU cycles. And while this is still more economical than bare metal for some applications this is still overkill and thus, containers enter the scene.

Benefits of VMs
• Reduced hardware costs from server virtualization
• Multiple OS environments can exist simultaneously on the same machine, isolated from each other.
• Easy maintenance, application provisioning, availability and convenient recovery.
• Perhaps the greatest benefit of server virtualization is the capability to move a virtual machine from one server to another quickly and safely. Backing up critical data is done quickly and effectively because you can effortlessly create a replication site.
Popular VM Providers
• VMware vSphere ESXi, VMware has been active in the virtual space since 1998 and is an industry leader setting standards for reliability, performance, and support.
• Oracle VM VirtualBox - Not sure what operating systems you are likely to use? Then VirtualBox is a good choice because it supports an amazingly wide selection of host and client combinations. VirtualBox is powerful, comes with terrific features and, best of all, it’s free.
• Xen - Xen is the open source hypervisor included in the Linux kernel and, as such, it is available in all Linux distributions. The Xen Project is one of the many open source projects managed by the Linux Foundation.
• Hyper-V - is Microsoft’s virtualization platform, or ‘hypervisor’, which enables administrators to make better use of their hardware by virtualizing multiple operating systems to run off the same physical server simultaneously.
• KVM - Kernel-based Virtual Machine (KVM) is an open source virtualization technology built into Linux. Specifically, KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).

What are Containers?

Containers are a way to wrap up an application into its own isolated ”box”. For the application in its container, it has no knowledge of any other applications or processes that exist outside of its box. Everything the application depends on to run successfully also lives inside this container. Wherever the box may move, the application will always be satisfied because it is bundled up with everything it needs to run.

Containers virtualize the OS instead of virtualizing the underlying computer like a virtual machine.  They sit on top of a physical server and its host OS — typically Linux or Windows. Each container shares the host OS kernel and, usually, the binaries and libraries, too. Shared components are read-only. Sharing OS resources such as libraries significantly reduces the need to reproduce the operating system code and means that a server can run multiple workloads with a single operating system installation. Containers are thus exceptionally light — they are only megabytes in size and take just seconds to start. Compared to containers, VMs take minutes to run and are an order of magnitude larger than an equivalent container.

In contrast to VMs, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program. This means you can put two to three times as many applications on a single server with containers than you can with VMs. In addition, with containers, you can create a portable, consistent operating environment for development, testing, and deployment. This is a huge benefit to keep the environments consistent.

Containers help isolate processes through differentiation in the operating system namespace and storage.  Leveraging operating system native capabilities, the container isolates process space, may create temporary file systems and relocate process “root” file system, etc.

Benefits of Containers

One of the biggest advantages to a container is the fact you can set aside less resources per container than you might per virtual machine. Keep in mind, containers are essentially for a single application while virtual machines need resources to run an entire operating system. For example, if you need to run multiple instances of MySQL, NGINX, or other services, using containers makes a lot of sense. If, however you need a full web server (LAMP) stack running on its own server, there is a lot to be said for running a virtual machine. A virtual machine gives you greater flexibility to choose your operating system and upgrading it as you see fit. A container by contrast, means that the container running the configured application is isolated in terms of OS upgrades from the host.

Popular Container Providers

1. Docker - Nearly synonymous with containerization, Docker is the name of both the world’s leading containerization platform and the company that is the primary sponsor of the Docker open source project.
2. Kubernetes - Google’s most significant contribution to the containerization trend is the open source containerization orchestration platform it created.
3. Although much of early work on containers was done on the Linux platform, Microsoft has fully embraced both Docker and Kubernetes containerization in general.  Azure offers two container orchestrators Azure Kubernetes Service (AKS) and Azure Service Fabric.  Service Fabric represents the next-generation platform for building and managing these enterprise-class, tier-1, applications running in containers.
4. Of course, Microsoft and Google aren’t the only vendors offering a cloud-based container service. Amazon Web Services (AWS) has its own EC2 Container Service (ECS).
5. Like the other major public cloud vendors, IBM Bluemix also offers a Docker-based container service.
6. One of the early proponents of container technology, Red Hat claims to be “the second largest contributor to the Docker and Kubernetes codebases,” and it is also part of the Open Container Initiative and the Cloud Native Computing Foundation. Its flagship container product is its OpenShift platform as a service (PaaS), which is based on Docker and Kubernetes.

Uses for VMs vs Uses for Containers

Both containers and VMs have benefits and drawbacks, and the ultimate decision will depend on your specific needs, but there are some general rules of thumb.
• VMs are a better choice for running applications that require all of the operating system’s resources and functionality when you need to run multiple applications on servers or have a wide variety of operating systems to manage.
• Containers are a better choice when your biggest priority is maximizing the number of applications running on a minimal number of servers.

Container orchestrators

Because of their small size and application orientation, containers are well suited for agile delivery environments and microservice-based architectures. When you use containers and microservices, however, you can easily have hundreds or thousands of components in your environment. You may be able to manually manage a few dozen virtual machines or physical servers, but there is no way you can manage a production-scale container environment without automation. The task of automating and managing a large number of containers and how they interact is known as orchestration.

Scaling Workloads

Scalability of containerized workloads is a completely different process from VM workloads. Modern containers include only the basic services their functions require, but one of them can be a web server, such as NGINX, which also acts as a load balancer. An orchestration system, such as Google Kubernetes is capable of determining, based upon traffic patterns, when the quantity of containers needs to scale out; can replicate container images automatically; and can then remove them from the system.
For most, the ideal setup is likely to include both. With the current state of virtualization technology, the flexibility of VMs and the minimal resource requirements of containers work together to provide environments with maximum functionality.

If your organization is running many instances of the same operating system, then you should look into whether containers are a good fit. They just might save you significant time and money over VMs.

Permalink

Living in a DR World (Disaster Recovery for the Rest of Us)

May 22, 2019  |  Posted By: Pete Ferguson

akf scale cube architectural principles

Disaster Recovery is the bellybutton of the IT world. Everyone has it (or at least in theory) – However, does anyone know if it can be used for anything useful?

The Disaster When There Is No Active Recovery

Have you been in this scenario: Each year during budget time you SWAG $50-100K to “test disaster recovery.” During the many cycles of budget negotiation, it is dismissed at some point and forgotten until the next budget season? In the meantime, somewhere along the way you are audited or something goes “bang” and DR suddenly becomes a hot topic … for about a week and then it is business as usual.

In many of our technical due diligence reviews we find companies are still stuck in the hypothetical world of table tops and disaster recovery exercises. During our longer term engagements and workshops, we repeatedly hear of DR gone wild - and wrong. The DR exercise is scheduled, then pushed off. Scheduled, pushed off.

Finally the day comes like the first day of school and everyone is aligned to test the DR plan. Unfortunately, too many things have changed since the DR instance was designed and the exercise is a failure. Since business is generally good, the can is kicked down the road to some future utopian alignment of the stars when it can be rescheduled and everyone will have fixed everything on the backlog discovered during the exercise.

Disaster Recovery that Wasn’t

Throughout my career I have been invited to participate and support the physical security for many of these exercises. One was in San Jose over a decade ago. A $1.5M large 53 foot trailer with generator and satellite (but no bathroom or cooking facilities …) was procured and sat on its own concrete pad fenced in and tied into our security system. The day came after months of tabletop planning to exercise the DR plan and nothing worked. The laptops - which were already three years old when reallocated to the trailer several years earlier - couldn’t run the latest service pack and as a result could not VPN into the network. The satellite uplink could barely support two or three laptops with email only. All the money invested was a waste and the trailer was eventually donated to a non-profit and the cement pad turned into additional parking. 

Several years later while I was responsible for operations in AsiaPac, IT approached us again to do a tabletop exercise and then a simulated exercise in india. My team worked with local emergency services to simulate a fire and spice up the annual building evacuation exercise and IT was to also exercise their DR plan. The evacuation exercise had a lot of theatrics and was considered a great success with explosions, fire crews rushing in and spraying the building. At the same time, a group of developers boarded a bus with their laptops and after over an hour and a half in traffic showed up to the DR site IT had been paying for only to find a real fire had occurred a few months previous and the vendor had failed to inform anyone. Instead of a “plug and play” workspace, they found a concrete bunker dark and still smelling of smoke with a few folding tables and chairs with extremely slow and limited connectivity. They got back on the bus and found a local coffee shop where they were a lot more successful logging on to the network.

These two examples are of physical failures, but I use them to illustrate the limitations imposed by theoretical DR. 

Tom Kevan, a founding partner, recalls how once they had set up three instances at eBay for web traffic, gone were the days of overnight outages because they were able to take down the third instance, perform the upgrades and test, and bring it back online. All the while customers never knew he was running “disaster recovery” because it was part of daily operations for web traffic.

Living in 24/7 Disaster Recovery - Active/Active Architecture

Strong, vibrant companies live in a DR world. This means they design with the “rule of three” - always have three active, complete, and independent stacks in production. This allows for continuous maintenance and upgrades on one system while always having at least two active/active systems in production. During peak times, all three are up and taking traffic. The goal is for DR to become your everyday operating cadence, not an unusual event.

As you formalize your architecutural principles, remove the words “disaster recovery” and plan to be forever resilient as table stakes for being in business.

The information on the following slides are what we cover in our three day workshops and often share in our final report with our clients when architectural principles are discovered to be lacking during a technical due diligence.

AKF Partners Architectural Principles: Use Mature Technology, Use Commodity Hardware, Buy When Non-Core, Scalability Needs, Automate Everything, Build in Monitoring, Multiple Live Sites, N+2 Design


AKF Partners Architectural Principles: Scale Out, Not Up, Isolate Faults, Crawl walk run, Design to be disabled, Decompose Monoliths, Use Stateless Systems, Always Design Asynchronous Communications


Whether you are a Fortune 100 or brand new start up with 3 employees, we have a proven track record of helping companies your size build a strong foundation for scalability through rapid growth, cost savings during downturns, and resiliency regardless of how business is doing. Give us a call and let us help!

Permalink

The Scale Cube: Achieve Security Through Scalability

May 14, 2019  |  Posted By: AKF

AKF Scale Cube
If AKF Partners had to be known for one thing and one thing only it would be the Scale Cube.  An ingenious little model designed for companies to identify how scalable they are and set goals along any of the three axes to make their product more scalable.  Based upon the amount of times I have said scalable, or a derivative of the word scale, it should lead you to the conclusion that the AKF Scale Cube is about scale.  And you would be right.  However, the beauty of the cube is that is also applicable to Security.

Xtra Secure

The X-Axis is usually the first axis that companies look at for scalability purposes.  The concept of horizontal duplication is usually the easiest reach from a technological standpoint, however it tends to be fairly costly.  This replication across various tiers (web, application or database) also insulates companies when the inevitable breach does occur.  Planning for only security without also bracing for a data breach is a naive approach.  With replication across the tiers, and even delayed replication to protect against data corruption, not only are you able to accommodate more customers, you now potentially have a clean copy replicated elsewhere if one of your systems gets compromised, assuming you are able to identify the breach early enough.

One of the costliest issues with a breach is recovery to a secure copy.  Your company may take a hit publicity wise, but if you are able to bring your system back up to a clean state, identify the compromise and fix it, then you are can be back on your way to fully operational.  The reluctant acceptance that breaches occur is making its way into the minds of people.  If you are just open and forthright with them, the publicity issue around a breach tends to be lessened.  Showing them that your system is back up, running and now more secure will help drive business in the right direction.

SecuritY

Splitting across services (the Y-Axis) has many benefits beyond just scalability.  It provides ownership, accountability and segregation.  Although difficult to implement, especially if coming from a monolithic base, the benefits of these micro-services help with security as well.  Code bases that communicate via asynchronous calls not only allow a service to fail without a major impact to other services, it creates another layer for a potential intruder to traverse.

Steps that can be implemented to provide a defense in depth of your environment help slow/mitigate attackers.  If asynchronous calls are used between micro-services each lateral or vertical movement is another opportunity to be stopped or detected.  If services are small enough, then once access is gained threats have less access to data than may be ideal for what they are trying to accomplish.

HackerZ

Segmenting customers based upon similar characteristics (be it geography, spending habits, or even just a random selection) helps to achieve Z-Axis scalability.  These pods of customers provide protection from a full data breach as well.  Ideally no customer data would ever be exposed, but if you have 4 pods, 25% of customer data is better than 100%.  And just like the Y-Axis, these splits aid with isolating attackers into only a subset of your environment.  Various governing boards also have different procedures that need to be followed depending upon the nationality of the customer data exposed.  If segmented based upon that (eg. EU vs USA) then how you respond to breaches can be managed differently.

AKF Security

Now I Know My X, Y, Z’s

Sometimes security can take a back seat to product development and other functions within a company.  It tends to be an afterthought until that fateful day when something truly bad happens and someone gains unauthorized access to your network.  Implementing a scalable environment via the AKF Scale Cube achieves a better overall product as well as a more secure one. 

If you need assistance in reaching a more scalable and secure environment AKF is capable of helping.

Permalink

Microservice Anti-Pattern: The Service Mesh

May 8, 2019  |  Posted By: Marty Abbott

This article is the sixth in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits.  Articles two and three cover anti-patterns for service and data fan out respectively.  The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor.  The fifth article expands the fuse metaphor from service fuses to data fuses.

Howard Anton, the author of my college Calculus textbook, was fond of the following phrase:  “It should be intuitively obvious to the casual observer….”.  The clause immediately following that phrase was almost inevitably something that was not obvious to anyone – probably not even the author.  Nevertheless, the phrase stuck with me, and I think I finally found a place where it can live up to its promise. The Service Mesh, the topic of this microservice anti-pattern, is the amalgamation of all the anti-patterns to date.  It contains elements of calls in series, fuses and fan out.  As such, it follows the rules and availability problems of each of those patterns and should be avoided at all costs. 

This is where I need to be very clear, as I’m aware that the Service Mesh has a very large following.  This article refers to a mesh as a grouping of services with request/reply relationships.  Or, put another way, a “Mesh” is any solution that violates repeatedly the anti-patterns of “tree lights”, “fuses” or “fan out”.  If you use “mesh” to mean a grouping of services that never call each other, you are not violating this anti-pattern.

What constitutes a service mesh?

What is NOT a service mesh?

The reason mesh patterns are a bad idea are many-fold:

1)  Availability:  At the extreme, the mesh is subject to the equation: [N∗(N−1)]/2.  This equation represents the number of edges in a fully connected graph with N vertices or nodes.  Asymptotically, this reduces to N2.  To make availability calculations simple, the availability of a complete mesh can be calculated as the service with the lowest availability (A)^(N*N).  If the lowest availability of a service with appropriate X-axis cloning (multiple instances) is 99.9, and the service mesh has 10 different services, the availability of your service mesh will approximate 99.910.  That’s roughly a 99% availability – perhaps good enough for some solutions but horrible by most modern standards.

2) Troubleshooting:  When every node can communicate with every other node, or when the “connectiveness” of a solution isn’t completely understood, how does one go about finding the ailing service causing a disruption?  Because failures and slowness transit synchronous links, a failure or slowness in one or more services will manifest itself as failures and slowness in all services.  Troubleshooting becomes very difficult.  Good luck in isolating the bad actor.

3) Hygiene:  I recall sitting through computer science classes 30 years ago and hearing the term “spaghetti code”.  These days we’d probably just call it “crap”, but it refers to the meandering paths of poorly constructed code.  Generally, it leads to difficulty in understanding, higher rates of defects, etc.  Somewhere along the line, some idiot has brought this same approach to deployments.  Again, borrowing from our friend Anton, it should be intuitively obvious to the casual observer that if it’s a bad practice in code it’s also a bad practice in deployment architectures.

4) Cost to Fix: If points 1 through 3 above aren’t enough to keep you away from connected service meshes, point 4 will hopefully help tip the scales.  If you implement a connected mesh in an environment in which you require high availability, you will spend a significant amount of time and money refactoring it to relieve the symptoms it will cause.  This amount may approximate your initial development effort as you remove each dependent anti-pattern (series, fuse, fan-out) with an appropriate pattern.


Microservice Anti-Pattern:  The Service Mesh

Fixing a mesh is not an easy task.  One solution is to ensure that no service blocks waiting for a request to complete of any other service.  Unfortunately, this pattern is not always easy or appropriate to implement.

Microservice Anti-Pattern Service Mesh Fix - Async Interactions

Another solution is to deploy each service as service when it is responding to an end user request, and as a library for another service wherever needed.

Microservice Anti-Pattern Service Mesh Fix - Libraries

Finally, you can traverse each service node and determine where services can be collapsed or any of the other patterns identified within the tree light, fuse, or fanout anti-patterns.


AKF Partners helps companies create scalable, fault tolerant, highly available and cost effective architectures to meet their product needs.  Give us a call, we can help

Permalink

‹ First  < 2 3 4 5 6 >  Last ›

Categories:

Most Popular: