GROWTH BLOG: Achieving Team Autonomy not Anarchy
AKF Partners Logo We Wrote the Book on Scalability

Growth Blog

Scalability and Technology Consulting Advice for SaaS and Technology Companies

Achieving Team Autonomy not Anarchy

June 10, 2019  |  Posted By: James Fritz

Agile teams sometimes struggle with the meaning and value of autonomy within a team.  Is it the autonomy to decide how best to accomplish a goal?  Is it the autonomy to choose what tools and processes they will employ to accomplish that goal?  Is it both?  To answer this, we need to first determine where autonomy drives value creation and where it destroys value within an enterprise.


Getting Team Autonomy over Anarchy

Consider the following analogy:  We have the autonomy to determine what roads and paths we will take to a destination when driving a vehicle, but we are governed by speed limits, right of way (which side to drive on, stop signs, stop signals and other road signs), emission standards, and both vehicle and personal licensing.  Put another way, we are completely empowered to determine WHAT path we take, and WHY we take it.  We are much more limited in HOW we get to a place (on road, off road, speed, only licensed vehicles, etc) and WHO can do it (only sober, licensed drivers).  How does this apply to a fully autonomous technology team? 

The value-creation of autonomy within a team deals with what path a team takes to accomplish a goal and the reasoning as to why that approach is valuable.  A failure to provide structure and rules for how something is accomplished through architectural principles, coding standards, development standards, etc. can start to escalate the costs of development and destroy value.  Also, not sharing best practices through standards, means that teams are bound to repeat mistakes and cause customer interruptions.  While there is a narrow line between autonomy and anarchy, the difference on their effect in an organization is significant.

Consider the matrix below which distinguishes between decision making (What path gets taken to a result and Why that path is taken) and governance (the rules around How and Who should do something).  In an Autocratic organization there is a high amount of Governance controlled via a Top-Down method of Management.  Agile teams here are bounded by how they can accomplish a goal and crippled by what path to take because of the heavy involvement of Management.  Here is where you find organizations that are described derogatorily as Bureaucratic.

On the opposite side of the spectrum, if an Agile shop finds itself unbounded by Governance and capable of making all decisions then you run into the possibility of anarchy.  This type of environment is what allows start-ups to thrive in their early days, but once you have moved beyond a handful of employees, it is necessary to start building governance. 

AKF Autonomous Teams

Where companies with Agile succeed is when Governance is in place and decisions are driven from the bottom.  Ownership and appropriately known boundaries allow Agile teams to deftly maneuver and get after the concept of “Build Small, Fail Fast”.  Lastly, it is important that the lessons learned from all of the Autonomous teams are continually brought up and shared across the multitude of products and teams you have.  It is a waste of time to reinvent the wheel when another team has already solved the problem.


Empowering Teams
Teams should be built around the suggestions from AKF’s white paper on Organizing Product Teams for Innovation: small, cross functional teams built around a service, who are empowered to be autonomous and work independently on their own.  Autonomy should be defined within the rules of the organization, inform the organization’s architectural principles, and drive adherence through leadership.  This is by no means a notion that the organization should avoid cross functional empowered teams!  As we state in our white paper, “We still have executives developing strategy, functional teams (product management, etc.) defining subservient strategies and road maps.  But the primary identities of these folks are embedded within the teams that implement these solutions.”

It is also easy to confuse the notion of empowerment and autonomy.  Empowerment is an action of delegation coupled with the assurance of resources and tools to complete the desired outcomes to the delegated party.  It is only through empowerment that autonomy can be achieved within an organizational hierarchy.  Further, it is only through empowerment that autonomous agile teams can be established.  But both empowerment and autonomy need to have rules governing their action or issuance.  Specifically, an agile team is empowered to be autonomous with the following constraints:  following development protocols and standards, adhering to architectural principals, adhering to established best practices regarding test coverage, etc. and ensuring that you achieve the “non-functional requirements” codified within the Agile Definition of Done.



Have your autonomous technology teams been free to make decisions that do not align with the vision of the company?  Are you fearful of switching to Agile because of the rampant anarchy they will exhibit?  AKF Partners can help ensure that your organization is aligned to the outcomes it desires.

Subscribe to the AKF Newsletter

Contact Us

Sidecar Pattern: The Dos and Don’ts

June 5, 2019  |  Posted By: Marty Abbott

Man on motorcycle with dog in sidecar Shutterstock Image 1084641965 purchased by AKF Partners June 5 2019

Sidecar Pattern Overview

The Sidecar Pattern is meant to allow the deployment of components of an application into separate, isolated, and encapsulated processes or containers.  This pattern is especially useful when there is a benefit to sharing common components between microservices, as in the case of logging utilities, monitoring utilities, configuration routines, etc.

The Sidecar Pattern name is an analogy indicating the one-seat cars sometimes bolted alongside a motorcycle. 

Benefits of Sidecar

Sidecar comes with many benefits:

  • Use of multiple languages (polyglot) or technologies for each component.  This is especially useful if a language is especially strong in a necessary area (e.g. Python for Machine Learning, or R for statistical work) or if an opensource solution can be leveraged to eliminate in-house specialization (e.g. the use of NGINX for certain network-related functions).
  • Separation of what would otherwise be a monolith, and if used properly, fault isolation of associated services.
  • Conceptually easy interactions between components similar to those provided by libraries, or service calls between microservices.
  • Lower latency than traditional service calls to “other” services as the Sidecar lives in the same processing environment (VM or physical server) – albeit typically in a separate container.
  • Similar to the use of libraries, allows for ownership by individual teams and organizational scalability of a larger team.
  • Similar to the use of dynamically-loadable libraries, allows for independent release by teams of various shared usage components.

Drawbacks to Sidecar

Regardless of implementation (poly- or mono- glot), Sidecar has some drawbacks compared to the use of libraries:

  • Higher inter-process communication latency – Because most implementations are service calls, the loopback interface on a system (127.0.0.1) will increase latency compared to the transition of call flow through memory.
  • Size – especially in polyglot implementations but even in monoglot implementations – Containerization leads to multiple copies of similar libraries and increased memory utilization for comparable operations relative to the use of libraries.
  • Environments – it is difficult to create any notion of fault isolation with Sidecar without containerization technologies.  VM technologies (Sidecar in a VM separate from the host or calling solution) is not an option as it is then a Fan Out or Mesh anti-pattern rather than a local call.

When to Use Sidecar

Sidecar is a compelling alternative to libraries for cases where the increase in latency associated with local service messaging does not impact end-user response times.  Examples of these are asynchronous logging, out of band monitoring, and asynchronous messaging capabilities.  Circuit breakers (time-based request/response timeouts) are also a good example of a Sidecar implementation.

When to Avoid Sidecar

Never use a Sidecar Pattern for synchronous activities that must complete prior to generating a user response.  Doing so will add some delay to end-user response times.

AKF also advises staying away from Sidecar for synchronous communications between services where doing so requires Sidecar to know all endpoints for each service.  A specific example we advise against is having every endpoint (instance) of Service A (e.g. add-to-cart) know of every endpoint (instance) of Service B (e.g. decrement-SKU).  A graphic of this example is given below:

Sidecar is useful for several components but do not use it for allowing every endpoint to communicate to every other endpoint

The above graphic indicates the coordination between just two services and the instances that comprise that service.  Imagine a case where all services may communicate to each other (as in the broader Mesh anti-pattern).  Attempting to isolate faults becomes nearly impossible.

If Service A sometimes fails while calling Service B, how do you know which component is failing?  Is it a failure in Service A, Service A’s Sidecar Proxy or Service B?  Easier is to have a fewer number of proxies (albeit at a higher cost of latency given non-local communication) handle the transactions allowing for easier fault identification.

AKF Partners has helped hundreds of companies move from monolithic solutions to services and microservice architectures.  Give us a call, we can help you with your transition.

Subscribe to the AKF Newsletter

Contact Us

What's the difference between VMs & Containers?

May 29, 2019  |  Posted By: Robin McGlothin

VMs vs Containers

Inefficiency and down time have traditionally kept CTO’s and IT decision makers up at night.  Now, new challenges are emerging driven by infrastructure inflexibility and vendor lock-in, limiting Technology more than ever and making strategic decisions more complex than ever.  Both VMs and containers can help get the most out of available hardware and software resources while easing the risk of vendor lock-in limitation. 

Containers are the new kids on the block, but VMs have been, and continue to be, tremendously popular in data centers of all sizes.  Having said that, the first lesson to learn, is containers are not virtual machines.  When I was first introduced to containers, I thought of them as light weight or trimmed down virtual instances.  This comparison made sense since most advertising material leaned on the concepts that containers use less memory and start much faster than virtual machines – basically marketing themselves as VMs.  Everywhere I looked, Docker was comparing themselves to VMs.  No wonder I was a bit confused when I started to dig into the benefits and differences between the two.

As containers evolved, they are bringing forth abstraction capabilities that are now being broadly applied to make enterprise IT more flexible. Thanks to the rise of Docker containers it’s now possible to more easily move workloads between different versions of Linux as well as orchestrate containers to create microservices.  Much like containers, a microservice is not a new idea either. The concept harkens back to service-oriented architectures (SOA). What is different is that microservices based on containers are more granular and simpler to manage. More on this topic in a blog post for another day!
If you’re looking for the best solution for running your own services in the cloud, you need to understand these virtualization technologies, how they compare to each other, and what are the best uses for each. Here’s our quick read.

VM’s vs. Containers – What’s the real scoop?

One way to think of containers vs. VMs is that while VMs run several different operating systems on one server, container technology offers the opportunity to virtualize the operating system itself.


                                                               
      Figure 1 – Virtual Machine                                                         Figure 2 - Container    

VMs help reduce expenses. Instead of running an application on a single server, a virtual machine enables utilizing one physical resource to do the job of many. Therefore, you do not have to buy, maintain and service several servers.  Because there is one host machine, it allows you to efficiently manage all the virtual environments with a centralized tool – the hypervisor. The decision to use VMs is typically made by DevOps/Infrastructure Team.  Containers help reduce expenses as well and they are remarkably lightweight and fast to launch.  Because of their small size, you can quickly scale in and out of containers and add identical containers as needed. 

Containers are excellent for Continuous Integration and Continuous Deployment (CI/CD) implementation. They foster collaborative development by distributing and merging images among developers.  Therefore, developers tend to favor Containers over VMs.  Most importantly, if the two teams work together (DevOps & Development) the decision on which technology to apply (VMs or Containers) can be made collaboratively with the best overall benefit to the product, client and company.

What are VMs?

The operating systems and their applications share hardware resources from a single host server, or from a pool of host servers. Each VM requires its own underlying OS, and the hardware is virtualized. A hypervisor, or a virtual machine monitor, is software, firmware, or hardware that creates and runs VMs. It sits between the hardware and the virtual machine and is necessary to virtualize the server.

IT departments, both large and small, have embraced virtual machines to lower costs and increase efficiencies.  However, VMs can take up a lot of system resources because each VM needs a full copy of an operating system AND a virtual copy of all the hardware that the OS needs to run.  This quickly adds up to a lot of RAM and CPU cycles. And while this is still more economical than bare metal for some applications this is still overkill and thus, containers enter the scene.

Benefits of VMs
• Reduced hardware costs from server virtualization
• Multiple OS environments can exist simultaneously on the same machine, isolated from each other.
• Easy maintenance, application provisioning, availability and convenient recovery.
• Perhaps the greatest benefit of server virtualization is the capability to move a virtual machine from one server to another quickly and safely. Backing up critical data is done quickly and effectively because you can effortlessly create a replication site.
Popular VM Providers
• VMware vSphere ESXi, VMware has been active in the virtual space since 1998 and is an industry leader setting standards for reliability, performance, and support.
• Oracle VM VirtualBox - Not sure what operating systems you are likely to use? Then VirtualBox is a good choice because it supports an amazingly wide selection of host and client combinations. VirtualBox is powerful, comes with terrific features and, best of all, it’s free.
• Xen - Xen is the open source hypervisor included in the Linux kernel and, as such, it is available in all Linux distributions. The Xen Project is one of the many open source projects managed by the Linux Foundation.
• Hyper-V - is Microsoft’s virtualization platform, or ‘hypervisor’, which enables administrators to make better use of their hardware by virtualizing multiple operating systems to run off the same physical server simultaneously.
• KVM - Kernel-based Virtual Machine (KVM) is an open source virtualization technology built into Linux. Specifically, KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).

What are Containers?

Containers are a way to wrap up an application into its own isolated ”box”. For the application in its container, it has no knowledge of any other applications or processes that exist outside of its box. Everything the application depends on to run successfully also lives inside this container. Wherever the box may move, the application will always be satisfied because it is bundled up with everything it needs to run.

Containers virtualize the OS instead of virtualizing the underlying computer like a virtual machine.  They sit on top of a physical server and its host OS — typically Linux or Windows. Each container shares the host OS kernel and, usually, the binaries and libraries, too. Shared components are read-only. Sharing OS resources such as libraries significantly reduces the need to reproduce the operating system code and means that a server can run multiple workloads with a single operating system installation. Containers are thus exceptionally light — they are only megabytes in size and take just seconds to start. Compared to containers, VMs take minutes to run and are an order of magnitude larger than an equivalent container.

In contrast to VMs, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program. This means you can put two to three times as many applications on a single server with containers than you can with VMs. In addition, with containers, you can create a portable, consistent operating environment for development, testing, and deployment. This is a huge benefit to keep the environments consistent.

Containers help isolate processes through differentiation in the operating system namespace and storage.  Leveraging operating system native capabilities, the container isolates process space, may create temporary file systems and relocate process “root” file system, etc.

Benefits of Containers

One of the biggest advantages to a container is the fact you can set aside less resources per container than you might per virtual machine. Keep in mind, containers are essentially for a single application while virtual machines need resources to run an entire operating system. For example, if you need to run multiple instances of MySQL, NGINX, or other services, using containers makes a lot of sense. If, however you need a full web server (LAMP) stack running on its own server, there is a lot to be said for running a virtual machine. A virtual machine gives you greater flexibility to choose your operating system and upgrading it as you see fit. A container by contrast, means that the container running the configured application is isolated in terms of OS upgrades from the host.

Popular Container Providers

1. Docker - Nearly synonymous with containerization, Docker is the name of both the world’s leading containerization platform and the company that is the primary sponsor of the Docker open source project.
2. Kubernetes - Google’s most significant contribution to the containerization trend is the open source containerization orchestration platform it created.
3. Although much of early work on containers was done on the Linux platform, Microsoft has fully embraced both Docker and Kubernetes containerization in general.  Azure offers two container orchestrators Azure Kubernetes Service (AKS) and Azure Service Fabric.  Service Fabric represents the next-generation platform for building and managing these enterprise-class, tier-1, applications running in containers.
4. Of course, Microsoft and Google aren’t the only vendors offering a cloud-based container service. Amazon Web Services (AWS) has its own EC2 Container Service (ECS).
5. Like the other major public cloud vendors, IBM Bluemix also offers a Docker-based container service.
6. One of the early proponents of container technology, Red Hat claims to be “the second largest contributor to the Docker and Kubernetes codebases,” and it is also part of the Open Container Initiative and the Cloud Native Computing Foundation. Its flagship container product is its OpenShift platform as a service (PaaS), which is based on Docker and Kubernetes.

Uses for VMs vs Uses for Containers

Both containers and VMs have benefits and drawbacks, and the ultimate decision will depend on your specific needs, but there are some general rules of thumb.
• VMs are a better choice for running applications that require all of the operating system’s resources and functionality when you need to run multiple applications on servers or have a wide variety of operating systems to manage.
• Containers are a better choice when your biggest priority is maximizing the number of applications running on a minimal number of servers.

Container orchestrators

Because of their small size and application orientation, containers are well suited for agile delivery environments and microservice-based architectures. When you use containers and microservices, however, you can easily have hundreds or thousands of components in your environment. You may be able to manually manage a few dozen virtual machines or physical servers, but there is no way you can manage a production-scale container environment without automation. The task of automating and managing a large number of containers and how they interact is known as orchestration.

Scaling Workloads

Scalability of containerized workloads is a completely different process from VM workloads. Modern containers include only the basic services their functions require, but one of them can be a web server, such as NGINX, which also acts as a load balancer. An orchestration system, such as Google Kubernetes is capable of determining, based upon traffic patterns, when the quantity of containers needs to scale out; can replicate container images automatically; and can then remove them from the system.
For most, the ideal setup is likely to include both. With the current state of virtualization technology, the flexibility of VMs and the minimal resource requirements of containers work together to provide environments with maximum functionality.

If your organization is running many instances of the same operating system, then you should look into whether containers are a good fit. They just might save you significant time and money over VMs.

Subscribe to the AKF Newsletter

Contact Us

Living in a DR World (Disaster Recovery for the Rest of Us)

May 22, 2019  |  Posted By: Pete Ferguson

akf scale cube architectural principles

Disaster Recovery is the bellybutton of the IT world. Everyone has it (or at least in theory) – However, does anyone know if it can be used for anything useful?

The Disaster When There Is No Active Recovery

Have you been in this scenario: Each year during budget time you SWAG $50-100K to “test disaster recovery.” During the many cycles of budget negotiation, it is dismissed at some point and forgotten until the next budget season? In the meantime, somewhere along the way you are audited or something goes “bang” and DR suddenly becomes a hot topic … for about a week and then it is business as usual.

In many of our technical due diligence reviews we find companies are still stuck in the hypothetical world of table tops and disaster recovery exercises. During our longer term engagements and workshops, we repeatedly hear of DR gone wild - and wrong. The DR exercise is scheduled, then pushed off. Scheduled, pushed off.

Finally the day comes like the first day of school and everyone is aligned to test the DR plan. Unfortunately, too many things have changed since the DR instance was designed and the exercise is a failure. Since business is generally good, the can is kicked down the road to some future utopian alignment of the stars when it can be rescheduled and everyone will have fixed everything on the backlog discovered during the exercise.

Disaster Recovery that Wasn’t

Throughout my career I have been invited to participate and support the physical security for many of these exercises. One was in San Jose over a decade ago. A $1.5M large 53 foot trailer with generator and satellite (but no bathroom or cooking facilities …) was procured and sat on its own concrete pad fenced in and tied into our security system. The day came after months of tabletop planning to exercise the DR plan and nothing worked. The laptops - which were already three years old when reallocated to the trailer several years earlier - couldn’t run the latest service pack and as a result could not VPN into the network. The satellite uplink could barely support two or three laptops with email only. All the money invested was a waste and the trailer was eventually donated to a non-profit and the cement pad turned into additional parking. 

Several years later while I was responsible for operations in AsiaPac, IT approached us again to do a tabletop exercise and then a simulated exercise in india. My team worked with local emergency services to simulate a fire and spice up the annual building evacuation exercise and IT was to also exercise their DR plan. The evacuation exercise had a lot of theatrics and was considered a great success with explosions, fire crews rushing in and spraying the building. At the same time, a group of developers boarded a bus with their laptops and after over an hour and a half in traffic showed up to the DR site IT had been paying for only to find a real fire had occurred a few months previous and the vendor had failed to inform anyone. Instead of a “plug and play” workspace, they found a concrete bunker dark and still smelling of smoke with a few folding tables and chairs with extremely slow and limited connectivity. They got back on the bus and found a local coffee shop where they were a lot more successful logging on to the network.

These two examples are of physical failures, but I use them to illustrate the limitations imposed by theoretical DR. 

Tom Kevan, a founding partner, recalls how once they had set up three instances at eBay for web traffic, gone were the days of overnight outages because they were able to take down the third instance, perform the upgrades and test, and bring it back online. All the while customers never knew he was running “disaster recovery” because it was part of daily operations for web traffic.

Living in 24/7 Disaster Recovery - Active/Active Architecture

Strong, vibrant companies live in a DR world. This means they design with the “rule of three” - always have three active, complete, and independent stacks in production. This allows for continuous maintenance and upgrades on one system while always having at least two active/active systems in production. During peak times, all three are up and taking traffic. The goal is for DR to become your everyday operating cadence, not an unusual event.

As you formalize your architecutural principles, remove the words “disaster recovery” and plan to be forever resilient as table stakes for being in business.

The information on the following slides are what we cover in our three day workshops and often share in our final report with our clients when architectural principles are discovered to be lacking during a technical due diligence.

AKF Partners Architectural Principles: Use Mature Technology, Use Commodity Hardware, Buy When Non-Core, Scalability Needs, Automate Everything, Build in Monitoring, Multiple Live Sites, N+2 Design


AKF Partners Architectural Principles: Scale Out, Not Up, Isolate Faults, Crawl walk run, Design to be disabled, Decompose Monoliths, Use Stateless Systems, Always Design Asynchronous Communications


Whether you are a Fortune 100 or brand new start up with 3 employees, we have a proven track record of helping companies your size build a strong foundation for scalability through rapid growth, cost savings during downturns, and resiliency regardless of how business is doing. Give us a call and let us help!

Subscribe to the AKF Newsletter

Contact Us

The Scale Cube: Achieve Security Through Scalability

May 14, 2019  |  Posted By: James Fritz

AKF Scale Cube
If AKF Partners had to be known for one thing and one thing only it would be the Scale Cube.  An ingenious little model designed for companies to identify how scalable they are and set goals along any of the three axes to make their product more scalable.  Based upon the amount of times I have said scalable, or a derivative of the word scale, it should lead you to the conclusion that the AKF Scale Cube is about scale.  And you would be right.  However, the beauty of the cube is that is also applicable to Security.

Xtra Secure

The X-Axis is usually the first axis that companies look at for scalability purposes.  The concept of horizontal duplication is usually the easiest reach from a technological standpoint, however it tends to be fairly costly.  This replication across various tiers (web, application or database) also insulates companies when the inevitable breach does occur.  Planning for only security without also bracing for a data breach is a naive approach.  With replication across the tiers, and even delayed replication to protect against data corruption, not only are you able to accommodate more customers, you now potentially have a clean copy replicated elsewhere if one of your systems gets compromised, assuming you are able to identify the breach early enough.

One of the costliest issues with a breach is recovery to a secure copy.  Your company may take a hit publicity wise, but if you are able to bring your system back up to a clean state, identify the compromise and fix it, then you are can be back on your way to fully operational.  The reluctant acceptance that breaches occur is making its way into the minds of people.  If you are just open and forthright with them, the publicity issue around a breach tends to be lessened.  Showing them that your system is back up, running and now more secure will help drive business in the right direction.

SecuritY

Splitting across services (the Y-Axis) has many benefits beyond just scalability.  It provides ownership, accountability and segregation.  Although difficult to implement, especially if coming from a monolithic base, the benefits of these micro-services help with security as well.  Code bases that communicate via asynchronous calls not only allow a service to fail without a major impact to other services, it creates another layer for a potential intruder to traverse.

Steps that can be implemented to provide a defense in depth of your environment help slow/mitigate attackers.  If asynchronous calls are used between micro-services each lateral or vertical movement is another opportunity to be stopped or detected.  If services are small enough, then once access is gained threats have less access to data than may be ideal for what they are trying to accomplish.

HackerZ

Segmenting customers based upon similar characteristics (be it geography, spending habits, or even just a random selection) helps to achieve Z-Axis scalability.  These pods of customers provide protection from a full data breach as well.  Ideally no customer data would ever be exposed, but if you have 4 pods, 25% of customer data is better than 100%.  And just like the Y-Axis, these splits aid with isolating attackers into only a subset of your environment.  Various governing boards also have different procedures that need to be followed depending upon the nationality of the customer data exposed.  If segmented based upon that (eg. EU vs USA) then how you respond to breaches can be managed differently.

AKF Security

Now I Know My X, Y, Z’s

Sometimes security can take a back seat to product development and other functions within a company.  It tends to be an afterthought until that fateful day when something truly bad happens and someone gains unauthorized access to your network.  Implementing a scalable environment via the AKF Scale Cube achieves a better overall product as well as a more secure one. 

If you need assistance in reaching a more scalable and secure environment AKF is capable of helping.

Subscribe to the AKF Newsletter

Contact Us

Microservice Anti-Pattern: The Service Mesh

May 8, 2019  |  Posted By: Marty Abbott

This article is the sixth in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits.  Articles two and three cover anti-patterns for service and data fan out respectively.  The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor.  The fifth article expands the fuse metaphor from service fuses to data fuses.

Howard Anton, the author of my college Calculus textbook, was fond of the following phrase:  “It should be intuitively obvious to the casual observer….”.  The clause immediately following that phrase was almost inevitably something that was not obvious to anyone – probably not even the author.  Nevertheless, the phrase stuck with me, and I think I finally found a place where it can live up to its promise. The Service Mesh, the topic of this microservice anti-pattern, is the amalgamation of all the anti-patterns to date.  It contains elements of calls in series, fuses and fan out.  As such, it follows the rules and availability problems of each of those patterns and should be avoided at all costs. 

This is where I need to be very clear, as I’m aware that the Service Mesh has a very large following.  This article refers to a mesh as a grouping of services with request/reply relationships.  Or, put another way, a “Mesh” is any solution that violates repeatedly the anti-patterns of “tree lights”, “fuses” or “fan out”.  If you use “mesh” to mean a grouping of services that never call each other, you are not violating this anti-pattern.

What constitutes a service mesh?

What is NOT a service mesh?

The reason mesh patterns are a bad idea are many-fold:

1)  Availability:  At the extreme, the mesh is subject to the equation: [N∗(N−1)]/2.  This equation represents the number of edges in a fully connected graph with N vertices or nodes.  Asymptotically, this reduces to N2.  To make availability calculations simple, the availability of a complete mesh can be calculated as the service with the lowest availability (A)^(N*N).  If the lowest availability of a service with appropriate X-axis cloning (multiple instances) is 99.9, and the service mesh has 10 different services, the availability of your service mesh will approximate 99.910.  That’s roughly a 99% availability – perhaps good enough for some solutions but horrible by most modern standards.

2) Troubleshooting:  When every node can communicate with every other node, or when the “connectiveness” of a solution isn’t completely understood, how does one go about finding the ailing service causing a disruption?  Because failures and slowness transit synchronous links, a failure or slowness in one or more services will manifest itself as failures and slowness in all services.  Troubleshooting becomes very difficult.  Good luck in isolating the bad actor.

3) Hygiene:  I recall sitting through computer science classes 30 years ago and hearing the term “spaghetti code”.  These days we’d probably just call it “crap”, but it refers to the meandering paths of poorly constructed code.  Generally, it leads to difficulty in understanding, higher rates of defects, etc.  Somewhere along the line, some idiot has brought this same approach to deployments.  Again, borrowing from our friend Anton, it should be intuitively obvious to the casual observer that if it’s a bad practice in code it’s also a bad practice in deployment architectures.

4) Cost to Fix: If points 1 through 3 above aren’t enough to keep you away from connected service meshes, point 4 will hopefully help tip the scales.  If you implement a connected mesh in an environment in which you require high availability, you will spend a significant amount of time and money refactoring it to relieve the symptoms it will cause.  This amount may approximate your initial development effort as you remove each dependent anti-pattern (series, fuse, fan-out) with an appropriate pattern.


Microservice Anti-Pattern:  The Service Mesh

Fixing a mesh is not an easy task.  One solution is to ensure that no service blocks waiting for a request to complete of any other service.  Unfortunately, this pattern is not always easy or appropriate to implement.

Microservice Anti-Pattern Service Mesh Fix - Async Interactions

Another solution is to deploy each service as service when it is responding to an end user request, and as a library for another service wherever needed.

Microservice Anti-Pattern Service Mesh Fix - Libraries

Finally, you can traverse each service node and determine where services can be collapsed or any of the other patterns identified within the tree light, fuse, or fanout anti-patterns.


AKF Partners helps companies create scalable, fault tolerant, highly available and cost effective architectures to meet their product needs.  Give us a call, we can help

Subscribe to the AKF Newsletter

Contact Us

Microservice Anti-Pattern: Data Fuse

May 8, 2019  |  Posted By: Marty Abbott

This article is the fifth in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits.  Articles two and three cover anti-patterns for service and data fan out respectively.  The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor.

The Data Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed data store.  This data store can be any persistence solution from physical file services, to a common storage area network, to relational (ACID) or NoSQL (BASE) databases.  When the shared data solution “C” fails, service A and B fail as well.  Similarly, when data solution “C” becomes slow, slowness under high demand propagates to services A and B. 

As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of data service C.  Service B’s theoretical availability is calculated similarly.  Problems with service A can propagate to service B through the “fused” data element.  For instance, if service A experiences a runaway scenario that completely consumes the capacity of data store C, service B will suffer either severe slowness or will become unavailable. 

Microservices Anti-Pattern - The Data Fuse

The easiest pattern solution for the data fuse is simply to merge the separate services.  This makes the most sense if the services can be owned by the same team.  While availability doesn’t significantly increase (service A can still affect service B, and the data store C still affects both), we don’t have the confusion of two services interacting through a fuse.  But if the rate of change for each service indicates that it needs separate teams, we need to evaluate other options (see ”when to split services”  for a discussion on drivers of services splits.

Data Fuse Microservices Anti-Pattern Fix:  Merge Services

Another way to fix the anti-pattern is to use the X axis of the Scale Cube as it relates to databases. An easy example of this is the sharing of account data between a sign-up service and a sign-in (AUTHN and AUTHZ) service.  In this example, given that sign-up is a write-based service and sign-in is a read based service we can use the X axis of the Scale Cube and split the services on a read and write basis.  To the extent that B must also log activity, it can have separate tables or a separate schema that allows that logging.  Note that the services supporting this split need not be unique - they can in fact be the exact same service - but the traffic they serve is properly segmented such that the read deployment receives only read traffic and the write deployment receives only write traffic.

Data Fuse Microservices Anti-Pattern Fix:  X Axis Read-Write Splits

 

If reads and writes aren’t an easily created X axis split, or if we need the organizational scale engendered by a Y-axis split, we need to be a bit more creative.  An example pattern comes from the differences between add-to-cart and checkout in a commerce solution.  Some functionality is shared between the components, including the notion of showing calculated sales tax and estimated shipping.  Other functionality may be unique, such as heavy computation in add-to-cart for related and recommended items, and up-sale opportunities such as gift wrapping or expedited shipping in checkout.  We also want to keep carts (session data) around in order to reach out to customers who have abandoned carts, but we don’t want this ephemeral clutter clogging the data of checkout.  This argues for separation of data for temporal (response time) reasons.  It also allows us to limit PCI compliance boundaries, removing services (add to cart) from the PCI evaluation landscape.

Data Fuse Microservices Anti-Pattern Fix:  Y Axis Data Split


Transition from add-to-cart to checkout may be accomplished through the client browser, or done as an asynchronous back end transfer of state with the browser polling for completion so as to allow for good fault isolation.  We refactor the datastore to separate data to services along the Y axis of the scale cube

Data Fuse Microservices Anti-Pattern Fix:  Moving Data when necessary for Y Axis Data Split

AKF Partners helps companies create scalable, fault tolerant, highly available and cost-effective architectures to meet their product needs.  Give us a call, we can help.

 

Subscribe to the AKF Newsletter

Contact Us

Why is launching an MVP so difficult?

April 29, 2019  |  Posted By: Robin McGlothin

Man wearing backpack trying to decide which direction to go

Have you ever had that feeling of not knowing where to start?  For writers, it’s called writer’s block.  Painters call it blank-canvas syndrome.  Entrepreneurs & technologist refer to this phenomenon as analysis paralysis, an affliction experienced by all at one point or another.  It’s like having a stroke of genius for the next big idea, but not knowing where to start.

So let’s start by clearly defining the MVP:

A minimum viable product (MVP) is the version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least amount of effort.

Sounds simple … so what’s the issue?

MVP is one of the most misunderstood terms in our product jargon today.  We’ve heard from many a client that MVP is really just a crappy version of a product that is an embarrassment to show to customers.  Over and over, the dialog goes like this, “let’s just remove these features and call it the MVP version.”  Just last week, we heard, “just make it good enough to launch!”

But the purpose of the MVP is to LEARN about customer demand and usability before over-committing resources.  To make sure that we are only building what customers want.  An MVP is NOT a fully usable product that will delight customers.  It is simply a learning vehicle.

So, what’s the problem?

 
Marty Abbott talks about the need to stay competitive, and how firms need to build great products but they also need to lend these products to the uses and misuses of their customers and learn extensively from them in The Power of Customer Mis-Behavior.  He’s basically telling us to focus on discovery and learning what customers really need, not what they say they want.

The point of an MVP is to validate or invalidate a specific hypothesis.  This is why we recommend starting discovery as soon as possible and relying heavily on user testing of prototypes.  But for some reason, most people hear the word Product and assume that it means the first version of a product.  And so, they build that version, release it and guess what…no one likes it. Well…no Duh!

But where to start?

 
Our advice is this: 一 Don’t wait for the perfect product. Create an MVP and start discovery immediately. 

Discovery Happens Along Two Dimensions:

  1. Discovery of the “What” something should do – is product discovery in defining or expanding an MVP.  Discovering “What” the feature set (stories) needs to be successful.
  2. Discovery of the “How” something should work to accomplish the best outcome of the what – this is a hybrid of technical and product discovery meant to find the fastest or best path to a result.

Seems straight forward, but many clients have challenges keeping it simple. 

One common way companies have overbuilt MVPs is by applying an old prioritization technique used for requirements. Each requirement is tagged one of the following:

  • Must-have
  • Should-have
  • Could-have
  • Would like to have/won’t-have

“Must-haves” are the essentials, “should-haves” are really important, “could-haves” might be sacrificed, and “would-likes” probably aren’t going to happen.

The problem with this is at least 60% of any requirements list gets classified as “musts.”  Several stakeholders demand their request is a “must” and fight like wild dogs to avoid “could” or “would” status.

A vicious cycle is created as stakeholders realize that nothing except “musts” will get done.  If it gets to the point where more than 60% of requirements get classified as “musts,” there may even be some “musts” that don’t get done. In some organizations, a stakeholder stampede is triggered every time someone says “MVP,” leading to a bloated first release, unless someone steps in to put stricter limitations on the requirements.

At the same time, other MVP creation pitfalls we commonly see and warn our clients about include:

  1. Making a poor product. The word “minimum” in MVP does not mean bad, buggy or barely usable. “Minimum” means that the scope should be stripped of anything extra.  but whatever features remain should be done in an intuitive and user-friendly way. Products should be unique to what the customer is likely to by relevant to size.
  2. Building a product to sell. Change the sales approach for future customers, sell on risk shift – not features.  Move to discovery vs. sales-lead contract-based product development.
  3. Difficulty in defining the minimum. Often, you want your first product to be as beautiful as it can be, and you are reluctant to throw away all the nice features you’ve thought of. As a result, you spend too much time and money, and, even more damaging, lose focus on the core features. The rule of thumb when defining the scope of your MVP is “can we launch without this or not?” This should be your main criteria and then add all the bells and whistles later when the idea is validated.

Our recommended approach to avoid these pitfalls and launch a successful MVP is based on market-driven analysis with a minimum set of features identified for the go to market strategy. 

The need for speed

Speed is everything when testing an MVP.  Many clients resist launching until a product is “perfect,” but here’s the news flash – it will never be perfect, and holding out for that status could ruin your product going forward.

According to Openview, 50% of SaaS companies fail in their first year due to misunderstanding their market, while 95% close up shop within five years for the same reason.  But strong, early MVP assessments allow you to determine whether you’re onto something (or not) in a low-risk environment.

When it comes to launching an MVP, progress is better than perfection.  The only goal is to put together a scaled-down version of your product or service and see whether clients are willing to buy in.

If your company is struggling with getting their MVP to market, AKF Partners can help you implement a product strategy consistent with your innovation needs. 

Photo by Kun Fotografi from Pexels

Subscribe to the AKF Newsletter

Contact Us

Microservice Anti-Pattern: Service Fuse

April 27, 2019  |  Posted By: Marty Abbott

This article is the fourth in a multi-part series on microservices (micro-services) anti-patterns.  The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture).  Many of the mistakes or failure points teams create in services splits.  Articles two and three cover anti-patterns for service and data fan out respectively. 

The Service Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed service pool.  When the shared service “C” fails, service A and B fail as well.  Similarly, when service “C” becomes slow, slowness under high demand propagates to services A and B. 

As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of service C.  Service B’s theoretical availability is calculated similarly.  Under unusual conditions, the availability of A could also impact B similar to the way in which service fan out works.  Such would be the case if A somehow holds threads for C, thereby starving it of threads to serve B.

Because overall availability is negatively impacted, we consider the Service Fuse to be a microservice anti-pattern.

Microservice Anti-Pattern Sharing a common service deployment


The easiest and most common method to fault isolate the failure and response time propagation of Service C is to deploy it separately (in separate pools) for both Service A and B.  In doing so, we ensure that C does not fail for one service as a result of unusual demand from the other.  We also isolate failures due to unique requests that might be made by either A or B.  In doing so, we do incur some additional operational costs and additional coordination and overhead in releases.  But assuming proper automation, the availability and response time improvements are often worth the minor effort.


Solution to Service Fuse Anti-Pattern - deploy same service separately

As with many of our other anti-patterns we can also employ dynamically loadable libraries rather than separate service deployments.  While this approach has some of the slight overhead (again assuming proper automation) of the above separate service deployments, it often also benefits from significant server-side response time decreases associated with network transit. 

Solution to Service fuse Anti-Pattern - deploy service separately as libraries

We often see teams over emphasizing the cost of additional deployments.  But the separate service deployment or dynamically loadable library deployment seldom results in significantly greater effort.  Splitting the capacity of a shared pool relative to the demand split between services A and B (e.g. 50/50, 90/10, etc) and adding a small number of additional services for capacity is the real implication of such a split.  Is 5 to 10% additional operational cost and seconds of additional deployment time worth the significant increase in availability?  Our experience is that most of the time it is.

Subscribe to the AKF Newsletter

Contact Us

Results and Outcomes – Why Companies Fail

April 21, 2019  |  Posted By: Pete Ferguson

picture of woman looking at iPad with graphs and reports

Results = Results

Apple, Google, and Amazon don’t exist based on a Utopian promise of what is to come – though certainly those promises keep their customers engaged and hopeful for the future.  These companies exist because of the value they have delivered to date and created expectations for us as consumers for a consistent result.

I’m amazed at how simple of a concept Results = Results is – yet constantly we see companies struggle with the concept and we see it as a recurring theme in our 2-3 day workshops with our clients and something we look for in our technical due diligence reviews.

As a corporate survivor of 18 years, looking back I can see where I was distracted by day-today meetings, firefighting, and getting hijacked by initiatives that seemed urgent to some senior leader somewhere – but were not really all that important. 

Suddenly the quarter or half was over and it was time to do a self-evaluation and realize all the effort, all the stress, all the work, wasn’t getting the desired results I’d committed to earlier in the year and I’d have to quickly shuffle and focus on getting stuff done.

While keeping the lights on is important, it diminishes in importance when to do so is at the expense of innovating and adding value to our customers – not just struggling to maintain the status quo.

Outcomes and Key Results (OKRs)

Adapted from John Doerr’s “Objectives” and key results – at AKF we find it more to the point to focus on “outcomes.”  Objectives (definition: a thing aimed at or sought) are a path where as “outcomes” are a destination that is clearly defined to know you have arrived.

Outcomes are the only things that matter to our customers.  Hearing about a desired Utopian state is great and may excite customers to stick around for awhile and put up with current limitations or lack of functionality – but being able to clearly define that you have delivered an outcome and the value to your customers is money in the bank and puts us ahead of our competition.

Yet the majority of our clients have teams who are so focused on cost-cutting for many years that they leave a wide open berth for young startups and their competition to move in and start delivering better outcomes for the customer.

How to Focus on Results and Outcomes

It is easy to become distracted in the day-to-day meetings, incident escalations, post mortems, ect.  As an outside third party, however, it is blatantly obvious to us usually within the first hour of meeting with a new team whether or not they are properly focused.

Here are some of the common themes and questions to ask:

  • Is there effective monitoring to discover issues before our customers do?
  • Do we monitor business metrics and weigh the success (and failure) of initiatives based not on pushing out a new platform or product but whether or not there was significant ROI?
  • How much time is spent limping along to keep a legacy application up and running vs. innovating?
  • Do we continually push off hardware/software upgrades until we are held hostage by compliance and/or end-of-life serviceability by the vendor?

Hopefully the common theme here is obvious – what is the customer experience and how focused are we on them vs internal castle building or day-to-day distractions?

Recently in a team interview the IT “keep the lights on” team told us they were working to be strategic and innovative by hiring new interns.  While the younger generations are definitely less prone to accepting the status quo, the older generation are conceding that they don’t want to be part of the future.  And unfortunately they may not be sooner than planned if they don’t grasp their role in driving innovation and the importance of applying their institutional knowledge.

Not focusing on customer/shareholder related outcomes means that shareholders and customers are negatively impacted.  Here are a few problems with the associated outcomes I’ve seen in my short tenure with AKF and previously as a corporate crusader: 

Observation:

Monolithic applications to save costs: Why organizations do it?  Short term cost savings focus development on one application.  Allows teams to only focus on development of their one area.

Outcomes:

  • One failure means everyone fails.
  • Organizations are unable to scale vis-a-vis Conway’s Law (organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations).
  • Often the teams who develop the monolith don’t have to support it, so they don’t understand why it is a problem.
  • Teams become very focused on solving the problems caused by the monolith just long enough to get it back up and running but fail to see the long-term recurrent loss to the business and wasted hours that could have been spent on innovating new products and services.
  • Catastrophic failure - Intuit pre SaaS, early renditions of iTunes and annual outages when everyone tried to redeem gift cards Christmas morning, early days of eBay, stay tuned, many more yet to come.

Observation:

Ongoing cost cutting to “make the quarter.”

Outcomes:

  • MIssed tech refresh results in machines and operating systems no longer supported and vulnerable to external attacks.
  • Teams become hyper focused on shutting down additional spending, but never take the time to calculate how much wasted effort is spent on keeping the lights on for aging systems with a declining market share or slowed new customer adoption rate.
  • Start saying no to the customer based on cost opening the door for new upstarts and the competition to take away market share.

Observation:

Focusing efforts on Sales Department’s latest contract.

Outcomes:

  • Too much investment in legacy applications instead of innovating new products.
  • “A-team” developers become firefighters to keep customers happy.
  • Sales team creates moral hazards for development teams (i.e. “I smoke, but you get lung cancer” - teams create problems for other teams to fix instead of owning the end-to-end lifecycle of a product)

Observation:

Focus is on mergers and acquisitions instead of core strengths and products.

Outcomes:

  • Distracted organizations give way for upstarts and competition.
  • Become okay or maybe even good at a lot of things but not great at one or two things.
  • Company culture becomes very fragmented and silos create red tape that slows or stifles innovation.

Conclusions

Results = Results.  And nothing else equals results.


If OKRs are not measuring the results needed to compete and win, then teams are wasting a lot of effort, time, and money and the competition is getting a free pass to innovate and outperform your ability to delight and please your customers.

Need an outside view of your organization to help drive better results and outcomes?  Contact us!

Photo by rawpixel.com from Pexels

Subscribe to the AKF Newsletter

Contact Us

 1 2 3 >  Last ›

Newsletter Signup

Receive the newest blog posts in our newsletter!

Categories:

Most Popular: