May 8, 2019 | Posted By: Marty Abbott
This article is the fifth in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture), many of the mistakes or failure points teams create in services splits. Articles two and three cover anti-patterns for service and data fan out respectively. The fourth article covers an anti-pattern for disparate services sharing a common service deployment using the fuse metaphor.
The Data Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed data store. This data store can be any persistence solution from physical file services, to a common storage area network, to relational (ACID) or NoSQL (BASE) databases. When the shared data solution “C” fails, service A and B fail as well. Similarly, when data solution “C” becomes slow, slowness under high demand propagates to services A and B.
As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of data service C. Service B’s theoretical availability is calculated similarly. Problems with service A can propagate to service B through the “fused” data element. For instance, if service A experiences a runaway scenario that completely consumes the capacity of data store C, service B will suffer either severe slowness or will become unavailable.
The easiest pattern solution for the data fuse is simply to merge the separate services. This makes the most sense if the services can be owned by the same team. While availability doesn’t significantly increase (service A can still affect service B, and the data store C still affects both), we don’t have the confusion of two services interacting through a fuse. But if the rate of change for each service indicates that it needs separate teams, we need to evaluate other options (see ”when to split services” for a discussion on drivers of services splits.
Another way to fix the anti-pattern is to use the X axis of the Scale Cube as it relates to databases. An easy example of this is the sharing of account data between a sign-up service and a sign-in (AUTHN and AUTHZ) service. In this example, given that sign-up is a write-based service and sign-in is a read based service we can use the X axis of the Scale Cube and split the services on a read and write basis. To the extent that B must also log activity, it can have separate tables or a separate schema that allows that logging. Note that the services supporting this split need not be unique - they can in fact be the exact same service - but the traffic they serve is properly segmented such that the read deployment receives only read traffic and the write deployment receives only write traffic.
If reads and writes aren’t an easily created X axis split, or if we need the organizational scale engendered by a Y-axis split, we need to be a bit more creative. An example pattern comes from the differences between add-to-cart and checkout in a commerce solution. Some functionality is shared between the components, including the notion of showing calculated sales tax and estimated shipping. Other functionality may be unique, such as heavy computation in add-to-cart for related and recommended items, and up-sale opportunities such as gift wrapping or expedited shipping in checkout. We also want to keep carts (session data) around in order to reach out to customers who have abandoned carts, but we don’t want this ephemeral clutter clogging the data of checkout. This argues for separation of data for temporal (response time) reasons. It also allows us to limit PCI compliance boundaries, removing services (add to cart) from the PCI evaluation landscape.
Transition from add-to-cart to checkout may be accomplished through the client browser, or done as an asynchronous back end transfer of state with the browser polling for completion so as to allow for good fault isolation. We refactor the datastore to separate data to services along the Y axis of the scale cube.
AKF Partners helps companies create scalable, fault tolerant, highly available and cost-effective architectures to meet their product needs. Give us a call, we can help.
April 29, 2019 | Posted By: AKF
Have you ever had that feeling of not knowing where to start? For writers, it’s called writer’s block. Painters call it blank-canvas syndrome. Entrepreneurs & technologist refer to this phenomenon as analysis paralysis, an affliction experienced by all at one point or another. It’s like having a stroke of genius for the next big idea, but not knowing where to start.
So let’s start by clearly defining the MVP:
A minimum viable product (MVP) is the version of a new product which allows a team to collect the maximum amount of validated learning about customers with the least amount of effort.
Sounds simple … so what’s the issue?
MVP is one of the most misunderstood terms in our product jargon today. We’ve heard from many a client that MVP is really just a crappy version of a product that is an embarrassment to show to customers. Over and over, the dialog goes like this, “let’s just remove these features and call it the MVP version.” Just last week, we heard, “just make it good enough to launch!”
But the purpose of the MVP is to LEARN about customer demand and usability before over-committing resources. To make sure that we are only building what customers want. An MVP is NOT a fully usable product that will delight customers. It is simply a learning vehicle.
So, what’s the problem?
Marty Abbott talks about the need to stay competitive, and how firms need to build great products but they also need to lend these products to the uses and misuses of their customers and learn extensively from them in The Power of Customer Mis-Behavior. He’s basically telling us to focus on discovery and learning what customers really need, not what they say they want.
The point of an MVP is to validate or invalidate a specific hypothesis. This is why we recommend starting discovery as soon as possible and relying heavily on user testing of prototypes. But for some reason, most people hear the word Product and assume that it means the first version of a product. And so, they build that version, release it and guess what…no one likes it. Well…no Duh!
But where to start?
Our advice is this: 一 Don’t wait for the perfect product. Create an MVP and start discovery immediately.
Discovery Happens Along Two Dimensions:
- Discovery of the “What” something should do – is product discovery in defining or expanding an MVP. Discovering “What” the feature set (stories) needs to be successful.
- Discovery of the “How” something should work to accomplish the best outcome of the what – this is a hybrid of technical and product discovery meant to find the fastest or best path to a result.
Seems straight forward, but many clients have challenges keeping it simple.
One common way companies have overbuilt MVPs is by applying an old prioritization technique used for requirements. Each requirement is tagged one of the following:
- Would like to have/won’t-have
“Must-haves” are the essentials, “should-haves” are really important, “could-haves” might be sacrificed, and “would-likes” probably aren’t going to happen.
The problem with this is at least 60% of any requirements list gets classified as “musts.” Several stakeholders demand their request is a “must” and fight like wild dogs to avoid “could” or “would” status.
A vicious cycle is created as stakeholders realize that nothing except “musts” will get done. If it gets to the point where more than 60% of requirements get classified as “musts,” there may even be some “musts” that don’t get done. In some organizations, a stakeholder stampede is triggered every time someone says “MVP,” leading to a bloated first release, unless someone steps in to put stricter limitations on the requirements.
At the same time, other MVP creation pitfalls we commonly see and warn our clients about include:
- Making a poor product. The word “minimum” in MVP does not mean bad, buggy or barely usable. “Minimum” means that the scope should be stripped of anything extra. but whatever features remain should be done in an intuitive and user-friendly way. Products should be unique to what the customer is likely to by relevant to size.
- Building a product to sell. Change the sales approach for future customers, sell on risk shift – not features. Move to discovery vs. sales-lead contract-based product development.
- Difficulty in defining the minimum. Often, you want your first product to be as beautiful as it can be, and you are reluctant to throw away all the nice features you’ve thought of. As a result, you spend too much time and money, and, even more damaging, lose focus on the core features. The rule of thumb when defining the scope of your MVP is “can we launch without this or not?” This should be your main criteria and then add all the bells and whistles later when the idea is validated.
Our recommended approach to avoid these pitfalls and launch a successful MVP is based on market-driven analysis with a minimum set of features identified for the go to market strategy.
The need for speed
Speed is everything when testing an MVP. Many clients resist launching until a product is “perfect,” but here’s the news flash – it will never be perfect, and holding out for that status could ruin your product going forward.
According to Openview, 50% of SaaS companies fail in their first year due to misunderstanding their market, while 95% close up shop within five years for the same reason. But strong, early MVP assessments allow you to determine whether you’re onto something (or not) in a low-risk environment.
When it comes to launching an MVP, progress is better than perfection. The only goal is to put together a scaled-down version of your product or service and see whether clients are willing to buy in.
If your company is struggling with getting their MVP to market, AKF Partners can help you implement a product strategy consistent with your innovation needs.
Photo by Kun Fotografi from Pexels
April 27, 2019 | Posted By: Marty Abbott
This article is the fourth in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services (as in the case of creating a microservice architecture). Many of the mistakes or failure points teams create in services splits. Articles two and three cover anti-patterns for service and data fan out respectively.
The Service Fuse, the topic of this microservice anti-pattern, exists when two or more unique services share a commonly deployed service pool. When the shared service “C” fails, service A and B fail as well. Similarly, when service “C” becomes slow, slowness under high demand propagates to services A and B.
As is the case with any group of services connected in series, Service A’s theoretical availability is the product of its individual availability combined with the availability of service C. Service B’s theoretical availability is calculated similarly. Under unusual conditions, the availability of A could also impact B similar to the way in which service fan out works. Such would be the case if A somehow holds threads for C, thereby starving it of threads to serve B.
Because overall availability is negatively impacted, we consider the Service Fuse to be a microservice anti-pattern.
The easiest and most common method to fault isolate the failure and response time propagation of Service C is to deploy it separately (in separate pools) for both Service A and B. In doing so, we ensure that C does not fail for one service as a result of unusual demand from the other. We also isolate failures due to unique requests that might be made by either A or B. In doing so, we do incur some additional operational costs and additional coordination and overhead in releases. But assuming proper automation, the availability and response time improvements are often worth the minor effort.
As with many of our other anti-patterns we can also employ dynamically loadable libraries rather than separate service deployments. While this approach has some of the slight overhead (again assuming proper automation) of the above separate service deployments, it often also benefits from significant server-side response time decreases associated with network transit.
We often see teams over emphasizing the cost of additional deployments. But the separate service deployment or dynamically loadable library deployment seldom results in significantly greater effort. Splitting the capacity of a shared pool relative to the demand split between services A and B (e.g. 50/50, 90/10, etc) and adding a small number of additional services for capacity is the real implication of such a split. Is 5 to 10% additional operational cost and seconds of additional deployment time worth the significant increase in availability? Our experience is that most of the time it is.
April 21, 2019 | Posted By: Pete Ferguson
Results = Results
Apple, Google, and Amazon don’t exist based on a Utopian promise of what is to come – though certainly those promises keep their customers engaged and hopeful for the future. These companies exist because of the value they have delivered to date and created expectations for us as consumers for a consistent result.
I’m amazed at how simple of a concept Results = Results is – yet constantly we see companies struggle with the concept and we see it as a recurring theme in our 2-3 day workshops with our clients and something we look for in our technical due diligence reviews.
As a corporate survivor of 18 years, looking back I can see where I was distracted by day-today meetings, firefighting, and getting hijacked by initiatives that seemed urgent to some senior leader somewhere – but were not really all that important.
Suddenly the quarter or half was over and it was time to do a self-evaluation and realize all the effort, all the stress, all the work, wasn’t getting the desired results I’d committed to earlier in the year and I’d have to quickly shuffle and focus on getting stuff done.
While keeping the lights on is important, it diminishes in importance when to do so is at the expense of innovating and adding value to our customers – not just struggling to maintain the status quo.
Outcomes and Key Results (OKRs)
Adapted from John Doerr’s “Objectives” and key results – at AKF we find it more to the point to focus on “outcomes.” Objectives (definition: a thing aimed at or sought) are a path where as “outcomes” are a destination that is clearly defined to know you have arrived.
Outcomes are the only things that matter to our customers. Hearing about a desired Utopian state is great and may excite customers to stick around for awhile and put up with current limitations or lack of functionality – but being able to clearly define that you have delivered an outcome and the value to your customers is money in the bank and puts us ahead of our competition.
Yet the majority of our clients have teams who are so focused on cost-cutting for many years that they leave a wide open berth for young startups and their competition to move in and start delivering better outcomes for the customer.
How to Focus on Results and Outcomes
It is easy to become distracted in the day-to-day meetings, incident escalations, post mortems, ect. As an outside third party, however, it is blatantly obvious to us usually within the first hour of meeting with a new team whether or not they are properly focused.
Here are some of the common themes and questions to ask:
- Is there effective monitoring to discover issues before our customers do?
- Do we monitor business metrics and weigh the success (and failure) of initiatives based not on pushing out a new platform or product but whether or not there was significant ROI?
- How much time is spent limping along to keep a legacy application up and running vs. innovating?
- Do we continually push off hardware/software upgrades until we are held hostage by compliance and/or end-of-life serviceability by the vendor?
Hopefully the common theme here is obvious – what is the customer experience and how focused are we on them vs internal castle building or day-to-day distractions?
Recently in a team interview the IT “keep the lights on” team told us they were working to be strategic and innovative by hiring new interns. While the younger generations are definitely less prone to accepting the status quo, the older generation are conceding that they don’t want to be part of the future. And unfortunately they may not be sooner than planned if they don’t grasp their role in driving innovation and the importance of applying their institutional knowledge.
Not focusing on customer/shareholder related outcomes means that shareholders and customers are negatively impacted. Here are a few problems with the associated outcomes I’ve seen in my short tenure with AKF and previously as a corporate crusader:
Monolithic applications to save costs: Why organizations do it? Short term cost savings focus development on one application. Allows teams to only focus on development of their one area.
- One failure means everyone fails.
- Organizations are unable to scale vis-a-vis Conway’s Law (organizations which design systems are constrained to produce designs which are copies of the communication structures of these organizations).
- Often the teams who develop the monolith don’t have to support it, so they don’t understand why it is a problem.
- Teams become very focused on solving the problems caused by the monolith just long enough to get it back up and running but fail to see the long-term recurrent loss to the business and wasted hours that could have been spent on innovating new products and services.
- Catastrophic failure - Intuit pre SaaS, early renditions of iTunes and annual outages when everyone tried to redeem gift cards Christmas morning, early days of eBay, stay tuned, many more yet to come.
Ongoing cost cutting to “make the quarter.”
- MIssed tech refresh results in machines and operating systems no longer supported and vulnerable to external attacks.
- Teams become hyper focused on shutting down additional spending, but never take the time to calculate how much wasted effort is spent on keeping the lights on for aging systems with a declining market share or slowed new customer adoption rate.
- Start saying no to the customer based on cost opening the door for new upstarts and the competition to take away market share.
Focusing efforts on Sales Department’s latest contract.
- Too much investment in legacy applications instead of innovating new products.
- “A-team” developers become firefighters to keep customers happy.
- Sales team creates moral hazards for development teams (i.e. “I smoke, but you get lung cancer” - teams create problems for other teams to fix instead of owning the end-to-end lifecycle of a product)
Focus is on mergers and acquisitions instead of core strengths and products.
- Distracted organizations give way for upstarts and competition.
- Become okay or maybe even good at a lot of things but not great at one or two things.
- Company culture becomes very fragmented and silos create red tape that slows or stifles innovation.
Results = Results. And nothing else equals results.
If OKRs are not measuring the results needed to compete and win, then teams are wasting a lot of effort, time, and money and the competition is getting a free pass to innovate and outperform your ability to delight and please your customers.
Need an outside view of your organization to help drive better results and outcomes? Contact us!
Photo by rawpixel.com from Pexels
April 21, 2019 | Posted By: Marty Abbott
This article is the third in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services, many of the mistakes or failure points teams create in services splits and the first anti pattern. The second article, Service Fan Out discusses the anti-pattern of a single service acting as a proxy or aggregator of mulitple services.
Data Fan Out, the topic of this microservice anti-pattern, exists when a service relies on two or more persistence engines with categorically unique data, or categorically similar data that is not meant to be processed in parallel. “Categorically Unique” means that the data is in no way related. Examples of categorical uniqueness would be a database that stores customer data and a separate database that stores catalog data. Instances of the same data, such as two separate databases each storing half of product catalog, are not categorically unique. Splitting of similar data is often known as sharding. Such “sharded” instances only violate the Data Fan Out pattern if:
1) They are accessed in series (database 1 is accessed and subsequently database 2 is accessed) –or-
2) A failure or slowness in either database, even if accessed in parallel, will result in a very slow or unavailable service.
Persistence engine means anything that stores data as in the case of a relational database, a NoSQL database, a persistent off-system cache, etc.
Anytime a service relies on more than one persistence engine to perform a task, it is subject to lower availability and a response time equivalent to the slower of the N data stores to which it is connected. Like the Service Fan Out anti-pattern, the availability of the resulting service (“Service A”) is the product of the availability of the service and its constituent infrastructure multiplied by the availability of each N data store to which it is connected.
Further, the response of the services may be tied to the slowest of the runtime of Service A added to the slowest of the connected solutions. If any of the N databases become slow enough, Service A may not respond at all.
Because overall availability is negatively impacted, we consider Data Fan Out to be a microservice anti-pattern.
One clear exception to the Data Fan Out anti-pattern is the highly parallelized querying done of multiple shards for the purpose of getting near linear response times out of large data sets (similar to one component of the MapReduce algorithm). In a highly parallelized case such as this, we propose that each of the connections have a time-out set to disregard results from slowly responding data sets. For this to work, the result set must be impervious to missing data. As an example of an impervious result set, having most shards return for any internet search query is “good enough”. A search for “plumber near me” returns 19/20ths of the “complete data”, where one shard out of 20 is either unavailable or very slow. But having some transactions not present in an account query of transactions for a checking account may be a problem and therefore is not an example of a resilient data set.
Our preferred approach to resolve the Data Fan Out anti-pattern is to dedicate services to each unique data set. This is possible whenever the two data sets do not need to be merged and when the service is performing two separate and otherwise isolatable functions (e.g. “Customer_Lookup” and “Catalog_Lookup”).
When data sets are split for scale reasons, as is the case with data sets that have both an incredibly high volume of requests and a large amount of data, one can attempt to merge the queried data sets in the client. The browser or mobile client can request each dataset in parallel and merge if successful. This works when computational complexity of the merge is relatively low.
When client-side merging is not possible, we turn to the X Axis of the Scale Cube for resolution. Merge the data sets within the data store/persistence engine and rely on a split of reads and writes. All writes occur to a single merged data store, and read replicas are employed for all reads. The write and read services should be split accordingly and our infrastructure needs to correctly route writes to the write service and reads to the read service. This is a valuable approach when we have high read to right ratios – fortunately the case in many solutions. Note that we prefer to use asynchronous replication and allow the “slave” solutions to be “eventually consistent” - but ideally still within a tolerable time frame of milliseconds or a handful of seconds.
What about the case where a solution may have a high write to read ratio (exceptionally high writes), and data needs to be aggregated? This rather unique case may be best solved by the Z axis of the AKF Scale Cube, splitting transactions along customer boundaries but ensuring the unification of the database for each customer (or region, or whatever “shard key” makes sense). As with all Z axis shards, this not only allows faster response times (smaller data segments) but engenders high scalability and availability while also allowing us to put data “closer to the customer” using the service.
AKF Partners helps companies create highly available, highly scalable, easily maintained and easily developed microservice architectures. Give us a call - we can help!
April 19, 2019 | Posted By: Eric Arrington
Time it takes to boil an egg: 720,000 milliseconds
Average time in line at the supermarket: 240,000 milliseconds
Time it takes to brush your teeth: 120,000 milliseconds
Time it takes to make a sandwich: 90,000 milliseconds
In our everyday lives we aren’t used to measuring things in milliseconds. In the software world our users’ expectations are different. Milliseconds matter. A lot.
The average person may wait 240,000 milliseconds to checkout at the grocery store but not as likely to wait that long to checkout on an e-commerce site.
What Is Latency
Latency is how fast we get an answer back to after making a request to the server.
It’s how long it takes for a request to go from the browser to the server and back to the browser.
Spoiler Alert: Faster is better.
Latency vs Bandwidth
I often see the words latency and bandwidth used together – or even interchangeably – but they have two very different meanings.
Using the metaphor of a restaurant, bandwidth is the amount of seating available. The more seating the restaurant has, the more people it can serve at one time. If a restaurant wants to be able to serve more people in a certain time period they add more seating. Similarly, bandwidth is the maximum amount of data that can be transferred in a specific measure of time.
If bandwidth is the maximum number of diners that can fit in a restaurant at one time, then latency is the amount of time it takes for food to arrive after ordering. On the Internet, latency is a measure of how long it takes for a user to get a response from an action like a click. It is the “performance lag” the user feels while using our product.
Luckily, over the past 20 years, the bandwidth and capacity of memory have increased dramatically. Unfortunately, latency hasn’t increased at all comparatively over the last 20 years.
Latency is directly linked to the “experience” the end user has with our products or services. If our latency isn’t maximized then we are leaving money on the table!
- Amazon did a study that found for every 100ms of latency it cost them 1% in sales.
- Google discovered that for every 500ms they took to show search results, traffic dropped 20%.
Even more shocking is a study done by the TABB Group. The study estimated the outcome of a broker’s electronic trading platform being just 5 milliseconds behind the competition. According to their estimate, this 5 millisecond delay could cost $4 million in revenue per millisecond. Their study also concluded that if an electronic broker is 100 milliseconds behind the competition they might as well shut down and become a floor broker.
100ms can be the difference between strategic advantage and second or third place.
100ms Rule of Latency – Paul Buchheit (Gmail Creator)
How fast is 100ms? Paul Buchheit coined the The 100ms Rule. The rule states that every interaction should be faster than 100ms. Why? 100ms is the threshold “where interactions feel instantaneous.”
What Causes Latency
Finding the cause of all of our latency isn’t always an easy task. There are a lot of possible causes. For the most part we can borrow the Pareto Principle (80/20 Rule) and knock out the usual suspects.
Propagation is how long it takes information to travel. In a perfect world our request travels at the speed of light. Also in a perfect world milkshakes would be good for us. For various reasons, our packet won’t travel at the speed of light.
Even if it did travel at the speed of light, distance from between our server and our web user still matters.
Packets traveling from one side of the world to the other and back would add about 250ms of latency. Unfortunately our data doesn’t travel “as the crow flies.” The paths rarely travel in a straight line (especially if using a VPN). This adds a lot more distance for the request to travel.
Remember when I said it wasn’t a perfect world? This is what I was talking about. The material data cables are made out of affect the speed of propagation. Different materials have different limitations on the speed.
For example, the speed of light can travel from New York to San Francisco in 14ms (in a vacuum). Inside of a fiber cable it takes about 21ms.
For the most part, data travels fast across long distances. The cabling mediums between larger distances is usually faster. The last mile is usually the slowest. One reason for this is the cable medium used in buildings, homes, and commercial areas tend to use existing wiring like coaxial cables or copper. Another reason is explained in the next point. Your data changes hands as it gets back to you (i.e. your router).
Consider yourself lucky if you have fiber installed. Copper and coaxial cables are slower. 4G can add up to 100ms to the latency. We won’t even talk about satellite.
It would be great if our data went straight from our device to the server and back, but again, probably not going to happen. As our packet travels to the server and back to the source it travels through different network devices. The request passes through routers, bridges, and gateways. Each time our data is handed off to the next device, a “network hop” occurs.
These hops add more latency than distance. A request that travels 100 miles but makes 5 hops will have more latency than a request that travels 2500 miles with only 2 hops.
The more hops are in the line, the more latency.
How To Lower Latency?
Latency can best be described as the sum of the previously mentioned causes and a lot more. There is no magical button we can push to achieve ultra low latency. There are a few things that can make a big dent. This is in no means an exhaustive list.
Asynchronous Development Approach
Multitasking as a developer is a bad idea. Making software multitask is a great idea. Whenever possible, make calls asynchronous (multiple calls executed at the same time). This can make a huge difference in latency (and perceived latency which can be just as important, we’ll talk more about that shortly).
Make Fewer External Requests
If we know that the trip to and from the database adds latency, then let’s go less often. There are many ways we can do this. Here are a few:
- Use image sprites
- Eliminate images that don’t contribute to overall product
- Use inline svg code instead of images for icons and logos
- Combine and minify all HTML, CSS, and JS files
There are times we need to reference files from with an external HTTP request. If we don’t control those resources then there is little we can do. We can, however, evaluate and reduce the number of external services we use.
Z Axis split geographically
One of the things AKF Partners is known for is the Scalability Cube. We have helped hundreds of clients scale along all three axises.
If our architecture makes sense to do so, splitting along the Z Axis by geography can make a huge difference in latency. Separating the data based on the geographical location hopefully places servers closer to the end user, thus shortening the round trip. If you would like an evaluation to see if this is something that could be achieved with your current architecture, don’t hesitate to contact us.
Use a CDN
A CDN is a content delivery network. Basically it is a system of distributed servers. These servers deliver pages or other content to end users based on their geographic location. Remember shorter distances mean lower latency.
A CDN can have a huge impact on latency. I stole a few figures from the KeyCDN website to show you how big of a difference. Test site was located in Dallas, TX.
||No CDN (ms)
||With CDN (ms)
A content delivery network can have a major impact in reducing latency.
The details of how caching works can be complicated but the basic idea is simple. If I were to ask you what the result of 8 x 7 is, you will know right away the answer is 56. You didn’t have to think about it. You didn’t calculate it in your head. You’ve done this multiplication so many times in your life that you don’t need to. You just remember the answer. That is kind of how caching works.
If we are set up to cache a page on the server, the first time someone visits our page, it loads normally because the request is received by the server, processed, and sent back to the client as an html file. If we are set up to cache the request, then the HTML file is saved and stored - usually in RAM (which is fast). The next time we make the same request, the server doesn’t need to process anything. It simply serves the HTML page from the cache.
We can also cache at the browser level. The first time a user visits a site, the browser will receive a bunch of assets and (if set up correctly) will cache those assets. The next time the site is visited, the browser will serve the cached assets and it will load a lot quicker.
We decide when these caches expire. Be aggressive with caching of static resources. Set the expiration date for a minimum of a month in advance. I recommend setting them for 12 months if it makes sense for that resource.
Render Content on the Server Side
Rendering templates on the server (rather than dynamically on the client) can also help lower latency. Remember every trip to the database results in higher total latency. Why not pre-render the pages on the server and load the static pages on the client?
This technique doesn’t work for all applications – but content publishing sites like The Washington Post or Medium can benefit greatly from pre-rendering their content on the server side and posting static rendered content on the client.
Use Pre-fetching Methods
I almost didn’t include this tip. Technically this doesn’t lower latency at all. What it does do is lower the “perceived user latency” felt by customers.
Perceived user latency is how long it seems like it takes to the user.
A normal request looks like the graphic below. First is the time it takes the server to process the request. After that is the time it takes the network to get the request back to the client. Next is the time it takes the client to process the request and load the page.
If we pre-fetch certain items or show placeholder images while the response from the server is loading (a la Facebook) then the perceived latency felt by the user is lower. The actual latency is exactly the same but the user “feels” like it loaded faster.
We can get the same benefits of a lower latency by “gaming” latency this way.
Latency is a metric we should all be tracking. Providing a great user experience with low latency makes a difference. It keeps our customers on our applications and sites longer. It fosters retention. Most importantly it will increase conversion rate.
If you aren’t currently tracking your latency as a metric, take that first step and see where you are at. If you need help, let us know and we’d be happy to schedule a call.
April 8, 2019 | Posted By: Marty Abbott
This article is the second in a multi-part series on microservices (micro-services) anti-patterns. The introduction of the first article, Service Calls In Series, covers the benefits of splitting services, many of the mistakes or failure points teams create in services splits and the first anti pattern.
Fan Out, the topic of this microservice anti-pattern, exists when one service either serves as a proxy to two or more downstream services, or serves as an integration of two subsequent service calls. Any of the services (the proxy/integration service “A”, or constituent services “B” and “C”) can cause a failure of all services. When service A fails, service B and C clearly can’t be called. If either service B or C fails or becomes slow, they can affect service A by tying up communication ports. Ultimately, under high call volume, service A may become unavailable due to problems with either B or C.
Further, the response of the services may be tied to the slowest responding service. If A needs both B and C to respond to a request (as in the case of integration), then the speed at which A responds is tied to the slowest response times of B and C. If service A merely proxies B or C, then extreme slowness in either may cause slowness in A and therefore slowness in all calls.
Because overall availability is negatively impacted, we consider Service Fan Out to be a microservice anti-pattern.
One approach to resolve the above anti-pattern is to employ true asynchronous messaging between services. For this to be successful, the requesting service A must be capable of responding to a request without receiving any constituent service responses. Unfortunately, this solution only works in some cases such as the case where service B is returning data that adds value to service A. One such example is a recommendation engine that returns other items a user might like to purchase. The absence of service B responding to A’s request for recommendations is unfortunate but doesn’t eliminate the value of A’s response completely.
As was the case with the Calls In Series Anti-Pattern, we may also be able to solve this anti-pattern with ”Libraries for Depth” pattern.
Of course, each of the libraries also represents a constituent part that may fail for any call – but the number of moving parts for each constituent part decreases significantly relative to a separately deployed service call. For instance, no network interface is required, no additional host and virtual VM is employed during the call, etc. Additionally, call latency goes down without network interfaces.
The most common complaint about this pattern is that development teams cannot release independently. But, as we all know, this problem has been fixed for quite some time with Unix, Linux and Windows dynamically loadable libraries (dlls, dls) and the like.
AKF Partners has helped to architect some of the most scalable, highly available, fault-tolerant and fastest response time solutions on the internet. Give us a call - we can help.
March 25, 2019 | Posted By: Marty Abbott
This article is the first in a multi-part series on microservices (micro-services) anti-patterns.
There are several benefits to carving up very large applications into service-oriented architectures. These benefits can include many of the following:
- Higher availability through fault isolation
- Higher organizational scalability through lower coordination
- Lower cost of development through lower overhead (coordination)
- Faster time to market achieved again through lower overhead of coordination
- Higher scalability through the ability to independently scale services
- Lower cost of operations (cost of goods sold) through independent scalability
- Lower latency/response time through better cacheability
The above should be considered only a partial list. See our articles on the AKF Scale Cube, and when you should split services for more information.
In order to achieve any of the above benefits, you must be very careful to avoid common mistakes.
Most of the failures that we see in microservices stem from a lack of understanding of the multiplicative effect of failure or “MEF”. Put simply, MEF indicates that the availability of any solution in series is a product of the availability of all components in that series.
Service A has an availability calculated by the product of its constituent parts. Those parts include all of the software and infrastructure necessary to run service A. The server availability, the application availability, associated library and runtime environment availabilities, operating system availability, virtualization software availability, etc. Let’s say those availabilities somehow achieve a “service” availability of “Five 9s” or 99.999 as measured by duration of outages. To achieve 99.999 we are assuming that we have made the service “highly available” through multiple copies, each being “stateless” in its operation.
Service B has a similar availability calculated in a similar fashion. Again, let’s assume 99.999.
If, for a request from any customer to Service A, Service B must also be called, the two availabilities are multiplied together. The new calculated availability is by definition lower than any service in isolation. We move our availability from 99.999 to 99.998.
When calls in series between services become long, availability starts to decline swiftly and by definition is always much smaller than the lowest availability of any service or the constituent part of any service (e.g. hardware, OS, app, etc).
This creates our first anti-pattern. Just as bulbs in the old serially wired Christmas Tree lights would cause an entire string to fail, so does any service failure cause the entire call stream to fail. Hence multiple names for this first anti-pattern: Christmas Tree Light Anti-Pattern, Microservice Calls in Series Anti-Pattern, etc.
The multiplicative effect of failure sometimes is worse with slowly responding solutions than with failures themselves. We can easily respond from failures through “heartbeat” transactions. But slow responses are more difficult. While we can use circuit breaker constructs such as hystrix switches – these assume that we know the threshold under which our call string will break. Unfortunately, under intense flash load situations (unforeseen high demand), small spikes in demand can cause failure scenarios.
One pattern to resolve the above issue is to employ true asynchronous messaging between services. To make this effective, the requesting service must not care whether it receives a response. This service must be capable of responding to a request without receiving any downstream response. Unfortunately, this solution only works in some cases such as the case where service B is returning data that adds value to service A. One such example is a recommendation engine that returns other items a user might like to purchase. The absence of service B responding to A’s request for recommendations is unfortunate, but doesn’t eliminate the value of A’s response completely.
While the above pattern can resolve some use-cases, it doesn’t resolve most of them. Most often downstream services are doing more than “modifying” value for the calling service: they are providing specific necessary functions. These functions may be mail services, print services, data access services, or even component parts of a value stream such as “add to cart” and “compute tax” during checkout.
In these cases, we believe in employing the Libraries for Depth pattern.
Of course, each of the libraries also represents a constituent part that may fail for any call – but the number of moving parts for each constituent part decreases significantly relative to another service call. For instance, no network interface is required, no additional host and virtual VM is employed during the call, etc. Additionally, call latency goes down without network interfaces.
The most common complaint about this pattern is that development teams cannot release independently. But, as we all know, this problem has been fixed for quite some time with Unix, Linux and Windows dynamically loadable libraries (dlls, dls) and the like.
March 19, 2019 | Posted By: Marty Abbott
Tim Berners-Lee and his colleagues at CERN, the IETF and the W3C consortium all understood the value of being stateless when they developed the Hyper Text Transfer Protocol. Stateless systems are more resilient to multiple failure types, as no transaction needs to have information regarding the previous transaction. It’s as if each transaction is the first (and last) of its type.
First let’s quickly review three different types of state. This overview is meant to be broad and shallow. Certain state types (such as the notion of View state in .Net development) are not covered.
The Penalty (or Cost) of State
State costs us in multiple ways. State unique to a user interaction, or session state, requires memory. The larger the state, the more memory requirement, the higher cost of the server and the greater the number of servers we need. As the cost of goods sold increase, margins decrease. Further, that state either needs to be replicated for high availability, and additional cost, or we face a cost of user dissatisfaction with discrete component and ultimately session failures.
When application state is maintained, the cost of failure is high as we either need to pay the price of replication for that state or we lose it, negatively impacting customer experience. As memory associated with application state increases, so does the memory requirement and associated costs of the server upon which it runs. At high scale, that means more servers, greater costs, and lower gross margins. In many cases, we simply have no choice but to allow application state. Interpreters and java virtual machines need memory. Most applications also require information regarding their overall transactions distinct from those of users. As such, our goal here is not to eliminate application state but rather minimize it where possible.
When connection state is maintained, cost increases as more servers are required to service the same number of requests. Failures become more common as the failure probability increases with the duration of any connection over distance.
Our ideal outcome is to eliminate session state, minimize application state and eliminate connection state.
But What if I Really, Really, Really Need State?
Our experience is that simply saying “No” once or twice will force an engineer to find an innovative way to eliminate state. Another interesting approach is to challenge an engineer with a statement like “Huh, I heard the engineers at XYZ company figured out how to do this…”. Engineers hate to feel like another engineer is better than them…
We also recognize however that the complete elimination of state isn’t possible. Here are three examples (not meant to be all inclusive) of when we believe the principle of stateless systems should be violated:
Shopping carts need state to work. Information regarding a past transaction - (add_to_cart) for instance needs to be held somewhere prior to check_out. Given that we need state, now it’s just a question of where to store it. Cookies are good places. Distributed object caches are another location. Passing it through the URL in HTTP GET methods is a third. A final solution is to store it in a database.
No sane person wants to wrap debits and credits across distributed servers in a single, two-phase commit transaction. Banks have had a solution for this for years – the eventual consistent account transaction. Using a tiny workflow or state machine, debit in one transaction and eventually (ideally quickly) subsequently credit in a second transaction. That brings us to the notion of workflow and state machines in general.
What good is a state machine if it can’t maintain state? Whether application state or session state, the notion of state is critical to the success of each solution. Workflow systems are a very specific implementation of a state machine and as such require state. The trick with these is simply to ensure that the memory used for state is “just enough”. Govern against ever increasing session or application state size.
This brings us to the newest cube model in the AKF model repository:
The Session State Cube
The AKF State Cube is useful both for thinking through how to achieve the best possible state posture, and for evaluating how well we are doing against an aspiration goal (top right corner) of “Stateless”.
The X axis describes size of state. It moves from very large (XL) state size to the ideal position of zero size, or “No State”. Very large state size suffers from higher cost, higher impact upon failure, and higher probability of failure.
The Y axis describes the degree of distribution of state. The worst position, lower left, is where state is a singleton. While we prefer not to have state, having only one copy of it leaves us open to large – and difficult to recover from – failures and dissatisfied customers. Imagine nearly completing your taxes only to have a crash wipe out all of your work! Ughh!
Progressing vertically along the Y axis, the singleton state object in the lower left is replicated into N copies of that state for high availability. While resolving the recovery and failure issues, performing replication is costly both in extra memory and network transit. This is an option we hope to avoid for cost reasons.
Following replication are several methods of distribution in increasing order of value. Segmenting the data by some value “N” has increasing value as N increases. When N is 2, a failure of state impacts 50% of our customers. When N is 100, only 1% of our customers suffer from a state failure. Ideally, state is also “rebuildable” if we have properly scattered state segments by a shard key – allowing customers to only have to re-complete a portion of their past work.
Finally, of course, we hope to have “no state” (think of this as division by infinite segmentation approaching zero on this axis).
The Z Axis describes where we position state “physically”.
The worst location is “on the same server as the application”. While necessary for application state, placing session data on a server co-resident with the application using it doubles the impact of a failure upon application fault. There are better places to locate state, and better solutions than your application to maintain it.
A costly, but better solution from an impact perspective is to place state within your favorite database. To keep costs low, this could be an opensource SQL or NoSQL database. But remember to replicate it for high availability.
A less costly solution is to place state in an object cache, off server from the application. Ideally this cache is distributed per the Y axis.
The least costly solution is to have the client (browser or mobile app) maintain state. Use a cookie, pass the state through a GET method, etc.
Finally, of course the best solution is that it is kept “nowhere” because we have no state.
The AKF State Cube serves two purposes:
- Prescriptive: It helps to guide your team to the aspirational goal of “stateless”. Where stateless isn’t possible, choose the X, Y and Z axis closest to the notion of no state to achieve a low cost, highly available solution for your minimized state needs.
- Descriptive: The model helps you evaluate numerically, how you are performing with respect to stateless initiatives on a per application/service basis. Use the guide on the right side of the model to evaluate component state on a scale of 1 to 10.
AKF Partners helps companies develop world class, low cost of operations, fast time to market, stateless solutions every day. Give us a call! We can help!
March 19, 2019 | Posted By: Pete Ferguson
The Crippling Cost of Distractions
Perhaps the biggest thief of progress is the misuse of time. Not talking about laziness here – I’m referring to what we often see with our clients – losing focus on what matters most.
Young startups are able to accomplish great things in a very short amount of time because there is rarely anything else to distract them from achieving success and everyone is focused on a shared outcome. Any money they have will likely be quickly exhausted. So food, sleep, vacations, and a life are all put on hold - or at least on the back burner - until products are built, tested, released, and adopted.
As a corporate survivor of 18 years at eBay, I’ve seen a few common themes within my own experience that are also exhibited at other companies where I’ve been involved in the Technical Due Diligence, three day technical architectural workshops, and longer term engagements where I’ve been embedded one of several technical resources. These distractions include:
- Supporting legacy applications at the cost of new innovation
Often M&A teams are highly focused on the short term – what can be accomplished this year or this quarter.
I’ve been involved in many due diligence activities as an internal consultant and as a third-party conducting an external technical due diligence. There is a lot of pressure on the M&A team to sell the deal, and in my experience I’ve seen the positives inflated and the negatives – well, negated, to get the deal done.
Often the full impact on the day-to-day operations team is either not considered or underplayed – if not optimistically overvalued as to how much work can get done in a week realistically over time.
Acquisition Distractions Development is Required to Resolve:
- Integration of Delivery (How to ship, security requirements, etc.)
- Process (Agile vs. Waterfall, what flavor of Agile, etc.)
- Product (Sharing of technologies and features)
An older example, but one that has renewed relevance with the recent focus of another activist investor is the story of eBay’s acquisition of Skype in Q4 of 2005. For shared resources within eBay, this was a huge distraction of already overly leveraged operations teams having to take on additional tasks. The distraction from eBay’s core mission was the true tragedy as it lost focus and opened a window for Amazon to grow 600% over the next 7 years (during one of the worst markets since the Great Depression) while eBay struggled to rebound its’ stock price. When eBay reemerged, Amazon had become a market leader in online commerce and has since pulled significantly ahead in many additional markets.
In our technical due diligences and three day architectural workshops, we can see how companies who were once lean and mean become quickly overwhelmed and distracted from innovation by trying to integrate a flood of acquisitions.
At winning companies, we see that M&A teams carefully evaluate:
- If acquisition will bring immediate ROI (PayPal immediately impacted eBay’s stock price positively and was able to sustain for several years, where Skype did the opposite)
- How much effort integrating the acquisition will require and build enough resources in the acquisition cost to accomplish a speedy induction
- How well the two companies’ cultures meld together
From an internal perspective in my time at eBay, the value Skype would bring wasn’t apparent, and externally most analysts were similarly scratching their heads. For Amazon, it was a godsend as it gave them an opportunity to bear down and focus and take a lot of market share from a sleeping and distracted giant.
Supporting Legacy Applications at the Cost of New Innovation
Legacy applications will keep every business from certain opportunities. We have seen multiple times where a company who was once a leader and innovator became too sales-focused and started squeezing out every last ounce of profitability from their legacy products. As a result, a large % of top developers were focused on building out new features into a declining market allowing a window of opportunity for new startups and competition to edge in.
Top of mind examples are Workday’s stealing of market share from SAP before they purchased Success Factors, or Apple’s edging into PC sales as they created a larger ecosystem of phones, tablets, laptops, TV boxes, etc.
Google is one of the best examples of being willing to dump legacy products - even successful ones - if they see support and maintenance as not keeping up with ROI or cutting into areas where they can innovate faster (Froogle, and Google+ and their capping of support of Chromebooks and Pixel phones at 5 years).
Reasons to sunset legacy products in favor of focusing efforts on developing the “next big thing:”
- Rising Maintenance and Talent Costs – Cost and scope creep happen suddenly and often are much greater than companies are willing to measure or to admit. One of the more informative questions we always ask is: “if you were to build a new product, what would it look like?” or “if you were to go out to market today, would you develop a new product the same way?” The answers are usually very revealing and conclusive - the legacy products have run their course.
- Longer Term Profitability – It takes a bold move to suspend the current cash cow in favor of focusing “all hands on deck” to develop the next big idea and future cash cow. Even when the potential future could be a 3, 4, or 5X revenue generator, companies get short-sighted and are only focused on this quarter’s revenue.
- Lost Opportunity Cost – Put aside your current quarterly projections for sales of legacy products, what is your current and potential competition building today that could greatly diminish your company in just a few years?
To compete with young, well-funded and high octane startups, legacy companies need to create a labs environment to attract top developers and free them from daily distractions of supporting legacy software to remain focused on emerging products and markets.
To succeed, teams must be focused on the highest possible ROI.
- Ensure acquisitions have been properly evaluated for the amount of time and energy it will take to integrate their products and that the cultures of your company and that being acquired are compatible.
- Supporting legacy applications is important for ongoing profitability, but likely can also be done by teams with less experience and expertise. What is often not factored properly is the opportunity cost of keeping legacy products in play long past their shelf life.
- Keep your A-teams focused on new product development, not on keeping the lights on. Create a separate Labs organization and treat them like a startup, and free them from legacy decisions on infrastructure, coding languages, etc. Your competition is starting with a fresh slate, you should too.
We’ve helped many startups and Fortune 500 companies make the transition from legacy software and hardware to SaaS and cloud infrastructure to increase scalability while providing high availability. Let us help your organization!
(Photo Credit: Matthew Henry downloaded from Burst.Shopify.com)
‹ First < 3 4 5 6 7 > Last ›