May 14, 2019 | Posted By: AKF
If AKF Partners had to be known for one thing and one thing only it would be the Scale Cube. An ingenious little model designed for companies to identify how scalable they are and set goals along any of the three axes to make their product more scalable. Based upon the amount of times I have said scalable, or a derivative of the word scale, it should lead you to the conclusion that the AKF Scale Cube is about scale. And you would be right. However, the beauty of the cube is that is also applicable to Security.
The X-Axis is usually the first axis that companies look at for scalability purposes. The concept of horizontal duplication is usually the easiest reach from a technological standpoint, however it tends to be fairly costly. This replication across various tiers (web, application or database) also insulates companies when the inevitable breach does occur. Planning for only security without also bracing for a data breach is a naive approach. With replication across the tiers, and even delayed replication to protect against data corruption, not only are you able to accommodate more customers, you now potentially have a clean copy replicated elsewhere if one of your systems gets compromised, assuming you are able to identify the breach early enough.
One of the costliest issues with a breach is recovery to a secure copy. Your company may take a hit publicity wise, but if you are able to bring your system back up to a clean state, identify the compromise and fix it, then you are can be back on your way to fully operational. The reluctant acceptance that breaches occur is making its way into the minds of people. If you are just open and forthright with them, the publicity issue around a breach tends to be lessened. Showing them that your system is back up, running and now more secure will help drive business in the right direction.
Splitting across services (the Y-Axis) has many benefits beyond just scalability. It provides ownership, accountability and segregation. Although difficult to implement, especially if coming from a monolithic base, the benefits of these micro-services help with security as well. Code bases that communicate via asynchronous calls not only allow a service to fail without a major impact to other services, it creates another layer for a potential intruder to traverse.
Steps that can be implemented to provide a defense in depth of your environment help slow/mitigate attackers. If asynchronous calls are used between micro-services each lateral or vertical movement is another opportunity to be stopped or detected. If services are small enough, then once access is gained threats have less access to data than may be ideal for what they are trying to accomplish.
Segmenting customers based upon similar characteristics (be it geography, spending habits, or even just a random selection) helps to achieve Z-Axis scalability. These pods of customers provide protection from a full data breach as well. Ideally no customer data would ever be exposed, but if you have 4 pods, 25% of customer data is better than 100%. And just like the Y-Axis, these splits aid with isolating attackers into only a subset of your environment. Various governing boards also have different procedures that need to be followed depending upon the nationality of the customer data exposed. If segmented based upon that (eg. EU vs USA) then how you respond to breaches can be managed differently.
Now I Know My X, Y, Z’s
Sometimes security can take a back seat to product development and other functions within a company. It tends to be an afterthought until that fateful day when something truly bad happens and someone gains unauthorized access to your network. Implementing a scalable environment via the AKF Scale Cube achieves a better overall product as well as a more secure one.
If you need assistance in reaching a more scalable and secure environment AKF is capable of helping.
May 8, 2019 | Posted By: Marty Abbott
This article is the sixth in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits. Articles two and three cover anti-patterns for service and data fan out respectively. The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor. The fifth article expands the fuse metaphor from service fuses to data fuses.
Howard Anton, the author of my college Calculus textbook, was fond of the following phrase: “It should be intuitively obvious to the casual observer….”. The clause immediately following that phrase was almost inevitably something that was not obvious to anyone – probably not even the author. Nevertheless, the phrase stuck with me, and I think I finally found a place where it can live up to its promise. The Service Mesh, the topic of this microservice anti-pattern, is the amalgamation of all the anti-patterns to date. It contains elements of calls in series, fuses and fan out. As such, it follows the rules and availability problems of each of those patterns and should be avoided at all costs.
This is where I need to be very clear, as I’m aware that the Service Mesh has a very large following. This article refers to a mesh as a grouping of services with request/reply relationships. Or, put another way, a “Mesh” is any solution that violates repeatedly the anti-patterns of “tree lights”, “fuses” or “fan out”. If you use “mesh” to mean a grouping of services that never call each other, you are not violating this anti-pattern.
The reason mesh patterns are a bad idea are many-fold:
1) Availability: At the extreme, the mesh is subject to the equation: [N∗(N−1)]/2. This equation represents the number of edges in a fully connected graph with N vertices or nodes. Asymptotically, this reduces to N2. To make availability calculations simple, the availability of a complete mesh can be calculated as the service with the lowest availability (A)^(N*N). If the lowest availability of a service with appropriate X-axis cloning (multiple instances) is 99.9, and the service mesh has 10 different services, the availability of your service mesh will approximate 99.910. That’s roughly a 99% availability – perhaps good enough for some solutions but horrible by most modern standards.
2) Troubleshooting: When every node can communicate with every other node, or when the “connectiveness” of a solution isn’t completely understood, how does one go about finding the ailing service causing a disruption? Because failures and slowness transit synchronous links, a failure or slowness in one or more services will manifest itself as failures and slowness in all services. Troubleshooting becomes very difficult. Good luck in isolating the bad actor.
3) Hygiene: I recall sitting through computer science classes 30 years ago and hearing the term “spaghetti code”. These days we’d probably just call it “crap”, but it refers to the meandering paths of poorly constructed code. Generally, it leads to difficulty in understanding, higher rates of defects, etc. Somewhere along the line, some idiot has brought this same approach to deployments. Again, borrowing from our friend Anton, it should be intuitively obvious to the casual observer that if it’s a bad practice in code it’s also a bad practice in deployment architectures.
4) Cost to Fix: If points 1 through 3 above aren’t enough to keep you away from connected service meshes, point 4 will hopefully help tip the scales. If you implement a connected mesh in an environment in which you require high availability, you will spend a significant amount of time and money refactoring it to relieve the symptoms it will cause. This amount may approximate your initial development effort as you remove each dependent anti-pattern (series, fuse, fan-out) with an appropriate pattern.
Fixing a mesh is not an easy task. One solution is to ensure that no service blocks waiting for a request to complete of any other service. Unfortunately, this pattern is not always easy or appropriate to implement.
Another solution is to deploy each service as service when it is responding to an end user request, and as a library for another service wherever needed.
Finally, you can traverse each service node and determine where services can be collapsed or any of the other patterns identified within the tree light, fuse, or fanout anti-patterns.
AKF Partners helps companies create scalable, fault tolerant, highly available and cost effective architectures to meet their product needs. Give us a call, we can help
May 8, 2019 | Posted By: Marty Abbott
This article is the fifth in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits. Articles two and three cover anti-patterns for service and data fan out respectively. The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor.
The Data Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed data store. This data store can be any persistence solution from physical file services, to a common storage area network, to relational (ACID) or NoSQL (BASE) databases. When the shared data solution “C” fails, service A and B fail as well. Similarly, when data solution “C” becomes slow, slowness under high demand propagates to services A and B.
As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of data service C. Service B’s theoretical availability is calculated similarly. Problems with service A can propagate to service B through the “fused” data element. For instance, if service A experiences a runaway scenario that completely consumes the capacity of data store C, service B will suffer either severe slowness or will become unavailable.
The easiest pattern solution for the data fuse is simply to merge the separate services. This makes the most sense if the services can be owned by the same team. While availability doesn’t significantly increase (service A can still affect service B, and the data store C still affects both), we don’t have the confusion of two services interacting through a fuse. But if the rate of change for each service indicates that it needs separate teams, we need to evaluate other options (see ”when to split services” for a discussion on drivers of services splits.
Another way to fix the anti-pattern is to use the X axis of the Scale Cube as it relates to databases. An easy example of this is the sharing of account data between a sign-up service and a sign-in (AUTHN and AUTHZ) service. In this example, given that sign-up is a write-based service and sign-in is a read based service we can use the X axis of the Scale Cube and split the services on a read and write basis. To the extent that B must also log activity, it can have separate tables or a separate schema that allows that logging. Note that the services supporting this split need not be unique - they can in fact be the exact same service - but the traffic they serve is properly segmented such that the read deployment receives only read traffic and the write deployment receives only write traffic.
If reads and writes aren’t an easily created X axis split, or if we need the organizational scale engendered by a Y-axis split, we need to be a bit more creative. An example pattern comes from the differences between add-to-cart and checkout in a commerce solution. Some functionality is shared between the components, including the notion of showing calculated sales tax and estimated shipping. Other functionality may be unique, such as heavy computation in add-to-cart for related and recommended items, and up-sale opportunities such as gift wrapping or expedited shipping in checkout. We also want to keep carts (session data) around in order to reach out to customers who have abandoned carts, but we don’t want this ephemeral clutter clogging the data of checkout. This argues for separation of data for temporal (response time) reasons. It also allows us to limit PCI compliance boundaries, removing services (add to cart) from the PCI evaluation landscape.
Transition from add-to-cart to checkout may be accomplished through the client browser, or done as an asynchronous back end transfer of state with the browser polling for completion so as to allow for good fault isolation. We refactor the datastore to separate data to services along the Y axis of the scale cube.
AKF Partners helps companies create scalable, fault tolerant, highly available and cost-effective architectures to meet their product needs. Give us a call, we can help.
April 29, 2019 | Posted By: AKF
Have you ever had that feeling of not knowing where to start? For writers, it’s called writer’s block. Painters call it blank-canvas syndrome. Entrepreneurs & technologist refer to this phenomenon as analysis paralysis, an affliction experienced by all at one point or another. It’s like having a stroke of genius for the next big idea, but not knowing where to start.
So let’s start by clearly defining the MVP:
A minimum viable product (MVP) is the version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least amount of effort.
Sounds simple … so what’s the issue?
MVP is one of the most misunderstood terms in our product jargon today. We’ve heard from many a client that MVP is really just a crappy version of a product that is an embarrassment to show to customers. Over and over, the dialog goes like this, “let’s just remove these features and call it the MVP version.” Just last week, we heard, “just make it good enough to launch!”
But the purpose of the MVP is to LEARN about customer demand and usability before over-committing resources. To make sure that we are only building what customers want. An MVP is NOT a fully usable product that will delight customers. It is simply a learning vehicle.
So, what’s the problem?
Marty Abbott talks about the need to stay competitive, and how firms need to build great products but they also need to lend these products to the uses and misuses of their customers and learn extensively from them in The Power of Customer Mis-Behavior. He’s basically telling us to focus on discovery and learning what customers really need, not what they say they want.
The point of an MVP is to validate or invalidate a specific hypothesis. This is why we recommend starting discovery as soon as possible and relying heavily on user testing of prototypes. But for some reason, most people hear the word Product and assume that it means the first version of a product. And so, they build that version, release it and guess what…no one likes it. Well…no Duh!
But where to start?
Our advice is this: 一 Don’t wait for the perfect product. Create an MVP and start discovery immediately.
Discovery Happens Along Two Dimensions:
- Discovery of the “What” something should do – is product discovery in defining or expanding an MVP. Discovering “What” the feature set (stories) needs to be successful.
- Discovery of the “How” something should work to accomplish the best outcome of the what – this is a hybrid of technical and product discovery meant to find the fastest or best path to a result.
Seems straight forward, but many clients have challenges keeping it simple.
One common way companies have overbuilt MVPs is by applying an old prioritization technique used for requirements. Each requirement is tagged one of the following:
- Would like to have/won’t-have
“Must-haves” are the essentials, “should-haves” are really important, “could-haves” might be sacrificed, and “would-likes” probably aren’t going to happen.
The problem with this is at least 60% of any requirements list gets classified as “musts.” Several stakeholders demand their request is a “must” and fight like wild dogs to avoid “could” or “would” status.
A vicious cycle is created as stakeholders realize that nothing except “musts” will get done. If it gets to the point where more than 60% of requirements get classified as “musts,” there may even be some “musts” that don’t get done. In some organizations, a stakeholder stampede is triggered every time someone says “MVP,” leading to a bloated first release, unless someone steps in to put stricter limitations on the requirements.
At the same time, other MVP creation pitfalls we commonly see and warn our clients about include:
- Making a poor product. The word “minimum” in MVP does not mean bad, buggy or barely usable. “Minimum” means that the scope should be stripped of anything extra. but whatever features remain should be done in an intuitive and user-friendly way. Products should be unique to what the customer is likely to by relevant to size.
- Building a product to sell. Change the sales approach for future customers, sell on risk shift – not features. Move to discovery vs. sales-lead contract-based product development.
- Difficulty in defining the minimum. Often, you want your first product to be as beautiful as it can be, and you are reluctant to throw away all the nice features you’ve thought of. As a result, you spend too much time and money, and, even more damaging, lose focus on the core features. The rule of thumb when defining the scope of your MVP is “can we launch without this or not?” This should be your main criteria and then add all the bells and whistles later when the idea is validated.
Our recommended approach to avoid these pitfalls and launch a successful MVP is based on market-driven analysis with a minimum set of features identified for the go to market strategy.
The need for speed
Speed is everything when testing an MVP. Many clients resist launching until a product is “perfect,” but here’s the news flash – it will never be perfect, and holding out for that status could ruin your product going forward.
According to Openview, 50% of SaaS companies fail in their first year due to misunderstanding their market, while 95% close up shop within five years for the same reason. But strong, early MVP assessments allow you to determine whether you’re onto something (or not) in a low-risk environment.
When it comes to launching an MVP, progress is better than perfection. The only goal is to put together a scaled-down version of your product or service and see whether clients are willing to buy in.
If your company is struggling with getting their MVP to market, AKF Partners can help you implement a product strategy consistent with your innovation needs.
Photo by Kun Fotografi from Pexels
April 27, 2019 | Posted By: Marty Abbott
This article is the fourth in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture). Many of the mistakes or failure points teams create in services splits. Articles two and three cover anti-patterns for service and data fan out respectively.
The Service Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed service pool. When the shared service “C” fails, service A and B fail as well. Similarly, when service “C” becomes slow, slowness under high demand propagates to services A and B.
As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of service C. Service B’s theoretical availability is calculated similarly. Under unusual conditions, the availability of A could also impact B similar to the way in which service fan out works. Such would be the case if A somehow holds threads for C, thereby starving it of threads to serve B.
Because overall availability is negatively impacted, we consider the Service Fuse to be a microservice anti-pattern.
The easiest and most common method to fault isolate the failure and response time propagation of Service C is to deploy it separately (in separate pools) for both Service A and B. In doing so, we ensure that C does not fail for one service as a result of unusual demand from the other. We also isolate failures due to unique requests that might be made by either A or B. In doing so, we do incur some additional operational costs and additional coordination and overhead in releases. But assuming proper automation, the availability and response time improvements are often worth the minor effort.
As with many of our other anti-patterns we can also employ dynamically loadable libraries rather than separate service deployments. While this approach has some of the slight overhead (again assuming proper automation) of the above separate service deployments, it often also benefits from significant server-side response time decreases associated with network transit.
We often see teams over emphasizing the cost of additional deployments. But the separate service deployment or dynamically loadable library deployment seldom results in significantly greater effort. Splitting the capacity of a shared pool relative to the demand split between services A and B (e.g. 50/50, 90/10, etc) and adding a small number of additional services for capacity is the real implication of such a split. Is 5 to 10% additional operational cost and seconds of additional deployment time worth the significant increase in availability? Our experience is that most of the time it is.
April 21, 2019 | Posted By: Pete Ferguson
Results = Results
Apple, Google, and Amazon don’t exist based on a Utopian promise of what is to come – though certainly those promises keep their customers engaged and hopeful for the future. These companies exist because of the value they have delivered to date and created expectations for us as consumers for a consistent result.
I’m amazed at how simple of a concept Results = Results is – yet constantly we see companies struggle with the concept and we see it as a recurring theme in our 2-3 day workshops with our clients and something we look for in our technical due diligence reviews.
As a corporate survivor of 18 years, looking back I can see where I was distracted by day-today meetings, firefighting, and getting hijacked by initiatives that seemed urgent to some senior leader somewhere – but were not really all that important.
Suddenly the quarter or half was over and it was time to do a self-evaluation and realize all the effort, all the stress, all the work, wasn’t getting the desired results I’d committed to earlier in the year and I’d have to quickly shuffle and focus on getting stuff done.
While keeping the lights on is important, it diminishes in importance when to do so is at the expense of innovating and adding value to our customers – not just struggling to maintain the status quo.
Outcomes and Key Results (OKRs)
Adapted from John Doerr’s “Objectives” and key results – at AKF we find it more to the point to focus on “outcomes.” Objectives (definition: a thing aimed at or sought) are a path where as “outcomes” are a destination that is clearly defined to know you have arrived.
Outcomes are the only things that matter to our customers. Hearing about a desired Utopian state is great and may excite customers to stick around for awhile and put up with current limitations or lack of functionality – but being able to clearly define that you have delivered an outcome and the value to your customers is money in the bank and puts us ahead of our competition.
Yet the majority of our clients have teams who are so focused on cost-cutting for many years that they leave a wide open berth for young startups and their competition to move in and start delivering better outcomes for the customer.
How to Focus on Results and Outcomes
It is easy to become distracted in the day-to-day meetings, incident escalations, post mortems, ect. As an outside third party, however, it is blatantly obvious to us usually within the first hour of meeting with a new team whether or not they are properly focused.
Here are some of the common themes and questions to ask:
- Is there effective monitoring to discover issues before our customers do?
- Do we monitor business metrics and weigh the success (and failure) of initiatives based not on pushing out a new platform or product but whether or not there was significant ROI?
- How much time is spent limping along to keep a legacy application up and running vs. innovating?
- Do we continually push off hardware/software upgrades until we are held hostage by compliance and/or end-of-life serviceability by the vendor?
Hopefully the common theme here is obvious – what is the customer experience and how focused are we on them vs internal castle building or day-to-day distractions?
Recently in a team interview the IT “keep the lights on” team told us they were working to be strategic and innovative by hiring new interns. While the younger generations are definitely less prone to accepting the status quo, the older generation are conceding that they don’t want to be part of the future. And unfortunately they may not be sooner than planned if they don’t grasp their role in driving innovation and the importance of applying their institutional knowledge.
Not focusing on customer/shareholder related outcomes means that shareholders and customers are negatively impacted. Here are a few problems with the associated outcomes I’ve seen in my short tenure with AKF and previously as a corporate crusader:
Monolithic applications to save costs: Why organizations do it? Short term cost savings focus development on one application. Allows teams to only focus on development of their one area.
- One failure means everyone fails.
- Organizations are unable to scale vis-a-vis Conway’s Law (organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations).
- Often the teams who develop the monolith don’t have to support it, so they don’t understand why it is a problem.
- Teams become very focused on solving the problems caused by the monolith just long enough to get it back up and running but fail to see the long-term recurrent loss to the business and wasted hours that could have been spent on innovating new products and services.
- Catastrophic failure - Intuit pre SaaS, early renditions of iTunes and annual outages when everyone tried to redeem gift cards Christmas morning, early days of eBay, stay tuned, many more yet to come.
Ongoing cost cutting to “make the quarter.”
- MIssed tech refresh results in machines and operating systems no longer supported and vulnerable to external attacks.
- Teams become hyper focused on shutting down additional spending, but never take the time to calculate how much wasted effort is spent on keeping the lights on for aging systems with a declining market share or slowed new customer adoption rate.
- Start saying no to the customer based on cost opening the door for new upstarts and the competition to take away market share.
Focusing efforts on Sales Department’s latest contract.
- Too much investment in legacy applications instead of innovating new products.
- “A-team” developers become firefighters to keep customers happy.
- Sales team creates moral hazards for development teams (i.e. “I smoke, but you get lung cancer” - teams create problems for other teams to fix instead of owning the end-to-end lifecycle of a product)
Focus is on mergers and acquisitions instead of core strengths and products.
- Distracted organizations give way for upstarts and competition.
- Become okay or maybe even good at a lot of things but not great at one or two things.
- Company culture becomes very fragmented and silos create red tape that slows or stifles innovation.
Results = Results. And nothing else equals results.
If OKRs are not measuring the results needed to compete and win, then teams are wasting a lot of effort, time, and money and the competition is getting a free pass to innovate and outperform your ability to delight and please your customers.
Need an outside view of your organization to help drive better results and outcomes? Contact us!
Photo by rawpixel.com from Pexels
April 21, 2019 | Posted By: Marty Abbott
This article is the third in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services, many of the mistakes or failure points teams create in services splits and the first anti pattern. The second article, Service Fan Out discusses the anti-pattern of a single service acting as a proxy or aggregator of mulitple services.
Data Fan Out, the topic of this microservice anti-pattern, exists when a service relies on two or more persistence engines with categorically unique data, or categorically similar data that is not meant to be processed in parallel. “Categorically Unique” means that the data is in no way related. Examples of categorical uniqueness would be a database that stores customer data and a separate database that stores catalog data. Instances of the same data, such as two separate databases each storing half of product catalog, are not categorically unique. Splitting of similar data is often known as sharding. Such “sharded” instances only violate the Data Fan Out pattern if:
1) They are accessed in series (database 1 is accessed and subsequently database 2 is accessed) –or-
2) A failure or slowness in either database, even if accessed in parallel, will result in a very slow or unavailable service.
Persistence engine means anything that stores data as in the case of a relational database, a NoSQL database, a persistent off-system cache, etc.
Anytime a service relies on more than one persistence engine to perform a task, it is subject to lower availability and a response time equivalent to the slower of the N data stores to which it is connected. Like the Service Fan Out anti-pattern, the availability of the resulting service (“Service A”) is the product of the availability of the service and its constituent infrastructure multiplied by the availability of each N data store to which it is connected.
Further, the response of the services may be tied to the slowest of the runtime of Service A added to the slowest of the connected solutions. If any of the N databases become slow enough, Service A may not respond at all.
Because overall availability is negatively impacted, we consider Data Fan Out to be a microservice anti-pattern.
One clear exception to the Data Fan Out anti-pattern is the highly parallelized querying done of multiple shards for the purpose of getting near linear response times out of large data sets (similar to one component of the MapReduce algorithm). In a highly parallelized case such as this, we propose that each of the connections have a time-out set to disregard results from slowly responding data sets. For this to work, the result set must be impervious to missing data. As an example of an impervious result set, having most shards return for any internet search query is “good enough”. A search for “plumber near me” returns 19/20ths of the “complete data”, where one shard out of 20 is either unavailable or very slow. But having some transactions not present in an account query of transactions for a checking account may be a problem and therefore is not an example of a resilient data set.
Our preferred approach to resolve the Data Fan Out anti-pattern is to dedicate services to each unique data set. This is possible whenever the two data sets do not need to be merged and when the service is performing two separate and otherwise isolatable functions (e.g. “Customer_Lookup” and “Catalog_Lookup”).
When data sets are split for scale reasons, as is the case with data sets that have both an incredibly high volume of requests and a large amount of data, one can attempt to merge the queried data sets in the client. The browser or mobile client can request each dataset in parallel and merge if successful. This works when computational complexity of the merge is relatively low.
When client-side merging is not possible, we turn to the X Axis of the Scale Cube for resolution. Merge the data sets within the data store/persistence engine and rely on a split of reads and writes. All writes occur to a single merged data store, and read replicas are employed for all reads. The write and read services should be split accordingly and our infrastructure needs to correctly route writes to the write service and reads to the read service. This is a valuable approach when we have high read to right ratios – fortunately the case in many solutions. Note that we prefer to use asynchronous replication and allow the “slave” solutions to be “eventually consistent” - but ideally still within a tolerable time frame of milliseconds or a handful of seconds.
What about the case where a solution may have a high write to read ratio (exceptionally high writes), and data needs to be aggregated? This rather unique case may be best solved by the Z axis of the AKF Scale Cube, splitting transactions along customer boundaries but ensuring the unification of the database for each customer (or region, or whatever “shard key” makes sense). As with all Z axis shards, this not only allows faster response times (smaller data segments) but engenders high scalability and availability while also allowing us to put data “closer to the customer” using the service.
AKF Partners helps companies create highly available, highly scalable, easily maintained and easily developed microservice architectures. Give us a call - we can help!
April 19, 2019 | Posted By: Eric Arrington
Time it takes to boil an egg: 720,000 milliseconds
Average time in line at the supermarket: 240,000 milliseconds
Time it takes to brush your teeth: 120,000 milliseconds
Time it takes to make a sandwich: 90,000 milliseconds
In our everyday lives we aren’t used to measuring things in milliseconds. In the software world our users’ expectations are different. Milliseconds matter. A lot.
The average person may wait 240,000 milliseconds to checkout at the grocery store but not as likely to wait that long to checkout on an e-commerce site.
What Is Latency
Latency is how fast we get an answer back to after making a request to the server.
It’s how long it takes for a request to go from the browser to the server and back to the browser.
Spoiler Alert: Faster is better.
Latency vs Bandwidth
I often see the words latency and bandwidth used together – or even interchangeably – but they have two very different meanings.
Using the metaphor of a restaurant, bandwidth is the amount of seating available. The more seating the restaurant has, the more people it can serve at one time. If a restaurant wants to be able to serve more people in a certain time period they add more seating. Similarly, bandwidth is the maximum amount of data that can be transferred in a specific measure of time.
If bandwidth is the maximum number of diners that can fit in a restaurant at one time, then latency is the amount of time it takes for food to arrive after ordering. On the Internet, latency is a measure of how long it takes for a user to get a response from an action like a click. It is the “performance lag” the user feels while using our product.
Luckily, over the past 20 years, the bandwidth and capacity of memory have increased dramatically. Unfortunately, latency hasn’t increased at all comparatively over the last 20 years.
Latency is directly linked to the “experience” the end user has with our products or services. If our latency isn’t maximized then we are leaving money on the table!
- Amazon did a study that found for every 100ms of latency it cost them 1% in sales.
- Google discovered that for every 500ms they took to show search results, traffic dropped 20%.
Even more shocking is a study done by the TABB Group. The study estimated the outcome of a broker’s electronic trading platform being just 5 milliseconds behind the competition. According to their estimate, this 5 millisecond delay could cost $4 million in revenue per millisecond. Their study also concluded that if an electronic broker is 100 milliseconds behind the competition they might as well shut down and become a floor broker.
100ms can be the difference between strategic advantage and second or third place.
100ms Rule of Latency – Paul Buchheit (Gmail Creator)
How fast is 100ms? Paul Buchheit coined the The 100ms Rule. The rule states that every interaction should be faster than 100ms. Why? 100ms is the threshold “where interactions feel instantaneous.”
What Causes Latency
Finding the cause of all of our latency isn’t always an easy task. There are a lot of possible causes. For the most part we can borrow the Pareto Principle (80/20 Rule) and knock out the usual suspects.
Propagation is how long it takes information to travel. In a perfect world our request travels at the speed of light. Also in a perfect world milkshakes would be good for us. For various reasons, our packet won’t travel at the speed of light.
Even if it did travel at the speed of light, distance from between our server and our web user still matters.
Packets traveling from one side of the world to the other and back would add about 250ms of latency. Unfortunately our data doesn’t travel “as the crow flies.” The paths rarely travel in a straight line (especially if using a VPN). This adds a lot more distance for the request to travel.
Remember when I said it wasn’t a perfect world? This is what I was talking about. The material data cables are made out of affect the speed of propagation. Different materials have different limitations on the speed.
For example, the speed of light can travel from New York to San Francisco in 14ms (in a vacuum). Inside of a fiber cable it takes about 21ms.
For the most part, data travels fast across long distances. The cabling mediums between larger distances is usually faster. The last mile is usually the slowest. One reason for this is the cable medium used in buildings, homes, and commercial areas tend to use existing wiring like coaxial cables or copper. Another reason is explained in the next point. Your data changes hands as it gets back to you (i.e. your router).
Consider yourself lucky if you have fiber installed. Copper and coaxial cables are slower. 4G can add up to 100ms to the latency. We won’t even talk about satellite.
It would be great if our data went straight from our device to the server and back, but again, probably not going to happen. As our packet travels to the server and back to the source it travels through different network devices. The request passes through routers, bridges, and gateways. Each time our data is handed off to the next device, a “network hop” occurs.
These hops add more latency than distance. A request that travels 100 miles but makes 5 hops will have more latency than a request that travels 2500 miles with only 2 hops.
The more hops are in the line, the more latency.
How To Lower Latency?
Latency can best be described as the sum of the previously mentioned causes and a lot more. There is no magical button we can push to achieve ultra low latency. There are a few things that can make a big dent. This is in no means an exhaustive list.
Asynchronous Development Approach
Multitasking as a developer is a bad idea. Making software multitask is a great idea. Whenever possible, make calls asynchronous (multiple calls executed at the same time). This can make a huge difference in latency (and perceived latency which can be just as important, we’ll talk more about that shortly).
Make Fewer External Requests
If we know that the trip to and from the database adds latency, then let’s go less often. There are many ways we can do this. Here are a few:
- Use image sprites
- Eliminate images that don’t contribute to overall product
- Use inline svg code instead of images for icons and logos
- Combine and minify all HTML, CSS, and JS files
There are times we need to reference files from with an external HTTP request. If we don’t control those resources then there is little we can do. We can, however, evaluate and reduce the number of external services we use.
Z Axis split geographically
One of the things AKF Partners is known for is the Scalability Cube. We have helped hundreds of clients scale along all three axises.
If our architecture makes sense to do so, splitting along the Z Axis by geography can make a huge difference in latency. Separating the data based on the geographical location hopefully places servers closer to the end user, thus shortening the round trip. If you would like an evaluation to see if this is something that could be achieved with your current architecture, don’t hesitate to contact us.
Use a CDN
A CDN is a content delivery network. Basically it is a system of distributed servers. These servers deliver pages or other content to end users based on their geographic location. Remember shorter distances mean lower latency.
A CDN can have a huge impact on latency. I stole a few figures from the KeyCDN website to show you how big of a difference. Test site was located in Dallas, TX.
||No CDN (ms)
||With CDN (ms)
A content delivery network can have a major impact in reducing latency.
The details of how caching works can be complicated but the basic idea is simple. If I were to ask you what the result of 8 x 7 is, you will know right away the answer is 56. You didn’t have to think about it. You didn’t calculate it in your head. You’ve done this multiplication so many times in your life that you don’t need to. You just remember the answer. That is kind of how caching works.
If we are set up to cache a page on the server, the first time someone visits our page, it loads normally because the request is received by the server, processed, and sent back to the client as an html file. If we are set up to cache the request, then the HTML file is saved and stored - usually in RAM (which is fast). The next time we make the same request, the server doesn’t need to process anything. It simply serves the HTML page from the cache.
We can also cache at the browser level. The first time a user visits a site, the browser will receive a bunch of assets and (if set up correctly) will cache those assets. The next time the site is visited, the browser will serve the cached assets and it will load a lot quicker.
We decide when these caches expire. Be aggressive with caching of static resources. Set the expiration date for a minimum of a month in advance. I recommend setting them for 12 months if it makes sense for that resource.
Render Content on the Server Side
Rendering templates on the server (rather than dynamically on the client) can also help lower latency. Remember every trip to the database results in higher total latency. Why not pre-render the pages on the server and load the static pages on the client?
This technique doesn’t work for all applications – but content publishing sites like The Washington Post or Medium can benefit greatly from pre-rendering their content on the server side and posting static rendered content on the client.
Use Pre-fetching Methods
I almost didn’t include this tip. Technically this doesn’t lower latency at all. What it does do is lower the “perceived user latency” felt by customers.
Perceived user latency is how long it seems like it takes to the user.
A normal request looks like the graphic below. First is the time it takes the server to process the request. After that is the time it takes the network to get the request back to the client. Next is the time it takes the client to process the request and load the page.
If we pre-fetch certain items or show placeholder images while the response from the server is loading (a la Facebook) then the perceived latency felt by the user is lower. The actual latency is exactly the same but the user “feels” like it loaded faster.
We can get the same benefits of a lower latency by “gaming” latency this way.
Latency is a metric we should all be tracking. Providing a great user experience with low latency makes a difference. It keeps our customers on our applications and sites longer. It fosters retention. Most importantly it will increase conversion rate.
If you aren’t currently tracking your latency as a metric, take that first step and see where you are at. If you need help, let us know and we’d be happy to schedule a call.
April 8, 2019 | Posted By: Marty Abbott
This article is the second in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services, many of the mistakes or failure points teams create in services splits and the first anti pattern.
Fan Out, the topic of this microservice anti-pattern, exists when one service either serves as a proxy to two or more downstream services, or serves as an integration of two subsequent service calls. Any of the services (the proxy/integration service “A”, or constituent services “B” and “C”) can cause a failure of all services. When service A fails, service B and C clearly can’t be called. If either service B or C fails or becomes slow, they can affect service A by tying up communication ports. Ultimately, under high call volume, service A may become unavailable due to problems with either B or C.
Further, the response of the services may be tied to the slowest responding service. If A needs both B and C to respond to a request (as in the case of integration), then the speed at which A responds is tied to the slowest response times of B and C. If service A merely proxies B or C, then extreme slowness in either may cause slowness in A and therefore slowness in all calls.
Because overall availability is negatively impacted, we consider Service Fan Out to be a microservice anti-pattern.
One approach to resolve the above anti-pattern is to employ true asynchronous messaging between services. For this to be successful, the requesting service A must be capable of responding to a request without receiving any constituent service responses. Unfortunately, this solution only works in some cases such as the case where service B is returning data that adds value to service A. One such example is a recommendation engine that returns other items a user might like to purchase. The absence of service B responding to A’s request for recommendations is unfortunate but doesn’t eliminate the value of A’s response completely.
As was the case with the Calls In Series Anti-Pattern, we may also be able to solve this anti-pattern with ”Libraries for Depth” pattern.
Of course, each of the libraries also represents a constituent part that may fail for any call – but the number of moving parts for each constituent part decreases significantly relative to a separately deployed service call. For instance, no network interface is required, no additional host and virtual VM is employed during the call, etc. Additionally, call latency goes down without network interfaces.
The most common complaint about this pattern is that development teams cannot release independently. But, as we all know, this problem has been fixed for quite some time with Unix, Linux and Windows dynamically loadable libraries (dlls, dls) and the like.
AKF Partners has helped to architect some of the most scalable, highly available, fault-tolerant and fastest response time solutions on the internet. Give us a call - we can help.
March 25, 2019 | Posted By: Marty Abbott
This article is the first in a multi-part series on microservices (micro-services) anti-patterns.
There are several benefits to carving up very large applications into service-oriented architectures. These benefits can include many of the following:
- Higher availability through fault isolation
- Higher organizational scalability through lower coordination
- Lower cost of development through lower overhead (coordination)
- Faster time to market achieved again through lower overhead of coordination
- Higher scalability through the ability to independently scale services
- Lower cost of operations (cost of goods sold) through independent scalability
- Lower latency/response time through better cacheability
The above should be considered only a partial list. See our articles on the AKF Scale Cube, and when you should split services for more information.
In order to achieve any of the above benefits, you must be very careful to avoid common mistakes.
Most of the failures that we see in microservices stem from a lack of understanding of the multiplicative effect of failure or “MEF”. Put simply, MEF indicates that the availability of any solution in series is a product of the availability of all components in that series.
Service A has an availability calculated by the product of its constituent parts. Those parts include all of the software and infrastructure necessary to run service A. The server availability, the application availability, associated library and runtime environment availabilities, operating system availability, virtualization software availability, etc. Let’s say those availabilities somehow achieve a “service” availability of “Five 9s” or 99.999 as measured by duration of outages. To achieve 99.999 we are assuming that we have made the service “highly available” through multiple copies, each being “stateless” in its operation.
Service B has a similar availability calculated in a similar fashion. Again, let’s assume 99.999.
If, for a request from any customer to Service A, Service B must also be called, the two availabilities are multiplied together. The new calculated availability is by definition lower than any service in isolation. We move our availability from 99.999 to 99.998.
When calls in series between services become long, availability starts to decline swiftly and by definition is always much smaller than the lowest availability of any service or the constituent part of any service (e.g. hardware, OS, app, etc).
This creates our first anti-pattern. Just as bulbs in the old serially wired Christmas Tree lights would cause an entire string to fail, so does any service failure cause the entire call stream to fail. Hence multiple names for this first anti-pattern: Christmas Tree Light Anti-Pattern, Microservice Calls in Series Anti-Pattern, etc.
The multiplicative effect of failure sometimes is worse with slowly responding solutions than with failures themselves. We can easily respond from failures through “heartbeat” transactions. But slow responses are more difficult. While we can use circuit breaker constructs such as hystrix switches – these assume that we know the threshold under which our call string will break. Unfortunately, under intense flash load situations (unforeseen high demand), small spikes in demand can cause failure scenarios.
One pattern to resolve the above issue is to employ true asynchronous messaging between services. To make this effective, the requesting service must not care whether it receives a response. This service must be capable of responding to a request without receiving any downstream response. Unfortunately, this solution only works in some cases such as the case where service B is returning data that adds value to service A. One such example is a recommendation engine that returns other items a user might like to purchase. The absence of service B responding to A’s request for recommendations is unfortunate, but doesn’t eliminate the value of A’s response completely.
While the above pattern can resolve some use-cases, it doesn’t resolve most of them. Most often downstream services are doing more than “modifying” value for the calling service: they are providing specific necessary functions. These functions may be mail services, print services, data access services, or even component parts of a value stream such as “add to cart” and “compute tax” during checkout.
In these cases, we believe in employing the Libraries for Depth pattern.
Of course, each of the libraries also represents a constituent part that may fail for any call – but the number of moving parts for each constituent part decreases significantly relative to another service call. For instance, no network interface is required, no additional host and virtual VM is employed during the call, etc. Additionally, call latency goes down without network interfaces.
The most common complaint about this pattern is that development teams cannot release independently. But, as we all know, this problem has been fixed for quite some time with Unix, Linux and Windows dynamically loadable libraries (dlls, dls) and the like.
‹ First < 3 4 5 6 7 > Last ›