June 11, 2018 | Posted By: Marty Abbott
Of the many SaaS operating principles, perhaps one of the most misunderstood is the principle of tenancy.
Most people have a definition in their mind for the term “multi-tenant”. Unfortunately, because the term has so many valid interpretations its usage can sometimes be confusing. Does multi-tenant refer to the physical or logical implementation of our product? What does multi-tenant mean when it comes to an implementation in a database?
This article first covers the goals of increasing tenancy within solutions, then delves into the various meanings of tenancy.
Multi-Tenant (Multitenant) Solutions and Cost
One of the primary reasons why companies that present products as a service strive for higher levels of tenancy is the cost reduction it affords the company in presenting a service. With multiple customers sharing applications and infrastructure, system utilization goes up: We get more value production out of each server that we use, or alternatively, we get greater asset utilization.
Because most companies view the cost of serving customers as a “Cost of Goods Sold’, multitenant solutions have better gross margins than single-tenant solutions. The X-Axis of the figure below shows the effect of increasing tenancy on the cost of goods sold on a per customer basis:
Interestingly, multitenant solutions often “force” another SaaS principle to be true: No more than 1 to 3 versions of software for the entire customer base. This is especially true if the database is shared at a logical (row-level) basis (more on that later). Lowering the number of versions of the product decreases the operating expense necessary to maintain multiple versions and therefore also increases operating margins.
Single Tenant, Multi-Tenant and All-Tenant
An important point to keep in mind is that “tenancy” occurs along a spectrum moving from single-tenant to all-tenant. Multitenant is any solution where the number of tenants from a physical or logical perspective is greater than one, including all-tenant implementations. As tenancy increases, so does Cost of Goods Sold (COGS from the above figure) decrease and Gross Margins increase.
The problem with All-Tenant solutions, while attractive from a cost perspective, is that they create a single failure domain [insert https://akfpartners.com/growth-blog/fault-isolation], thereby decreasing overall availability. When something goes poorly with our product, everything is offline. For that reason, we differentiate between solutions that enable multi-tenancy for cost reasons and all-tenant solutions.
The Many Meanings and Implementations of Tenancy
Multitenant solutions can be implemented in many ways and at many tiers.
Physical and Logical
Physical multi-tenancy is having multiple customers share a number of servers. This helps increase the overall utilization of these servers and therefore reduce costs of goods sold. Customers need not share the application for a solution to be physically multitenant. One could, for instance, run a web server, application server or database per customer. Many customers with concerns over data separation and privacy are fine with physical multitenancy as long as their data is logically separated.
Logical multi-tenancy is having data share the same application. The same web server instances, application server instances and database is used for any customer. The situation becomes a bit murkier however when it comes to databases.
Different relational databases use different terms for similar implementations. A SQLServer database, for instance, looks very much like an Oracle Schema. Within databases, a solution can be logically multitenant by either implementing tenancy in a table (we call that row level multitenancy) or within a schema/database (we call that schema multitenancy).
In either case, a single instance of the relational database management system or software (RDBMS) is used, while customer transactions are separated by a customer id inside a table, or by database/schema id if separated as such.
While physical multitenancy provides cost benefits, logical multitenancy often provides significantly greater cost benefits. Because applications are shared, we need less system overhead to run an application for each customer and thusly can get even greater throughput and efficiencies out of our physical or virtualized servers.
Depth of Multi-Tenancy
The diagram below helps to illustrate that every layer in our service architecture has an impact to multi-tenancy. We can be physically or logically multi-tenant at the network layer, the web server layer, the application layer and the persistence or database layer.
The deeper into the stack our tenancy goes, the greater the beneficial impact (cost savings) to costs of goods sold and the higher our gross margins.
The AKF Multi-Tenant Cube
To further the understanding of tenancy, we introduce the AKF Multi-Tenant Cube:
The X-Axis describes the “mode’ of tenancy, moving from shared nothing, to physical, to logical. As we progress from sharing nothing to sharing everything, utilization goes up and cost of goods sold goes down.
The Y-Axis describes the depth of tenancy from shared nothing, through network, web, app and finally persistence or database tier. Again, as the depth of tenancy increase, so do Gross Margins.
The Z-Axis describes the degree of tenancy, or the number of tenants. Higher levels of tenancy decrease costs of goods sold, but architecturally we never want a failure domain that encompasses all tenants.
When running a XaaS (SaaS, etc) business, we are best off implementing logical multitenancy through every layer of our architecture. While we want tenancy to be high per instance, we also do not want all tenants to be in a single implementation.
AKF Partners helps companies of all sizes achieve their availability, time to market, scalability, cost and business goals. Contact us today to see how we can help your organization in scalability and availability.
Subscribe to the AKF Newsletter
May 20, 2018 | Posted By: Dave Berardi
Physical Bare Metal, Virtualization, Cloud Compute, Containers, and now Serverless in your SaaS? We are starting to hear more and more about Serverless computing. Sometimes you will hear it called function as a service. In this next iteration of Infrastructure-as-a-Service, users can execute a task or function without having to provision a server, virtual machine, or any other underlying resource. The word Serverless is a misnomer as provisioning the underlying resources are abstracted away from the user, but they still exist underneath the covers. It’s just that Amazon, Microsoft, and Google manage it for you with their code. AWS Lambda, Azure Functions, and Google Cloud Functions are becoming more common in the architecture of a SaaS product. As technology leaders responsible for architectural decisions for scale and availability, we must understand its pros and cons and take the right actions to apply it.
Several advantages of serverless computing include:
• Software engineers can deploy and run code without having to manage any underlying infrastructure effectively creating a No-Ops environment.
• Auto-scaling is easier and requires less orchestration as compared to a containerized environment running services.
• True On-Demand capacity – no orphaned containers or other resources that might be idling.
• They are cost effective IF we are running the right size workloads.
Disadvantages and potential landmines to watch out for:
• Landmine #1 - No control over the execution environment meaning you are unable to isolate your operational environment. Compute and networking resources are virtualized with no visibility into either of them. Availability is the hands of our cloud provider and uptime is not guaranteed.
• Landmine #2 - SLAs cannot guarantee uptime. Start-up time can take a second causing latency that might not be acceptable.
• Landmine #3 - It’s going to become much easier for engineers to create code, host it rapidly, and forget about it leading to unnecessary compute and additional attack vectors creating a security risk.
• Landmine #4 - You will create vendor lock-in with your cloud provider as you set up your event driven functions to trigger from other AWS or Azure Services or your own services running on compute instances.
AKF is often asked about our position on serverless computing. There are 4 key rules considering the advantages and the landmines that we outlined:
1) Gradually introduce it into your architecture and use it for the right use cases
2) Establish architectural principles that guide its use in your organization that will minimize availability impact for Serverless. You will tie your availability to the FaaS in your cloud provider.
3) Watch out for a false sense of security among your engineering teams. Understand how serverless works before you use it and so you can monitor it for performance and availability.
4) Manage how and what it’s used for - monitor it (eg. AWS Cloud Watch) to avoid neglect and misuse along with cost inefficiencies.
AWS, Azure, or Google Cloud Serverless platforms could provide an affective computing abstraction in your architecture if it’s used for the right use cases, good monitoring is in place, and architectural principles are established.
AKF Partners has helped many companies create highly available and scalable systems that are designed to be monitored. Contact us for a free consultation.
Subscribe to the AKF Newsletter
April 25, 2018 | Posted By: AKF
The Scale Cube is a model for defining microservices and scaling technology products. AKF Partners invented the Scale Cube in 2007, originally publishing it online in our blog in 2007 (original article here) and subsequently in our first book the Art of Scalability and our second book Scalability Rules.
The Scale Cube (sometimes known as the “AKF Scale Cube” or “AKF Cube”) is comprised of 3 axes:
• X-Axis: Horizontal Duplication and Cloning of services and data
• Y-Axis: Functional Decomposition and Segmentation - Microservices (or micro-services)
• Z-Axis: Service and Data Partitioning along Customer Boundaries - Shards/Pods
These axes and their meanings are depicted below in Figure 1.
The Scale Cube helps teams keep critical dimensions of system scale in mind when solutions are designed and when existing systems are being improved.
Figure 2, below, displays how the cube may be deployed in a modern architecture decomposing services (sometimes called microservices architecture), cloning services and data sources and segmenting similar objects like customers into “pods”.
Scaling with the X Axis of the Scale Cube
The most commonly used approach of scaling a solution is by running multiple identical copies of the application behind a load balancer also known as X-axis scaling. That’s a great way of improving the capacity and the availability of an application.
When using X-axis scaling each server runs an identical copy of the service (if disaggregated) or monolith. One benefit of the X axis is that it is typically intellectually easy to implement and it scales well from a transaction perspective. Impediments to implementing the X axis include heavy session related information which is often difficult to distribute or requires persistence to servers – both of which can cause availability and scalability problems. Comparative drawbacks to the X axis is that while intellectually easy to implement, data sets have to be replicated in their entirety which increases operational costs. Further, caching tends to degrade at many levels as the size of data increases with transaction volumes. Finally, the X axis doesn’t engender higher levels of organizational scale.
Figure 3 explains the pros and cons of X axis scalability, and walks through a traditional 3 tier architecture to explain how it is implemented.
Scaling with the Y Axis of the Scale Cube
Y-axis scaling (think services oriented architecture, microservices or functional decomposition of a monolith) focuses on separating services and data along noun or verb boundaries. These splits are “dissimilar” from each other. Examples in commerce solutions may be splitting search from browse, checkout from add-to-cart, login from account status, etc. In implementing splits, Y-axis scaling splits a monolithic application into a set of services. Each service implements a set of related functionalities such as order management, customer management, inventory, etc. Further, each service should have its own, non-shared data to ensure high availability and fault isolation. Y axis scaling shares the benefit of increasing transaction scalability with all the axes of the cube.
Further, because the Y axis allows segmentation of teams and ownership of code and data, organizational scalability is increased. Cache hit ratios should increase as data and the services are appropriately segmented and similarly sized memory spaces can be allocated to smaller data sets accessed by comparatively fewer transactions. Operational cost often is reduced as systems can be sized down to commodity servers or smaller IaaS instances can be employed.
Figure 4 explains the pros and cons of Y axis scalability and shows a fault-isolated example of services each of which has its own data store for the purposes of fault-isolation.
Scaling with the Z Axis of the Scale Cube
Whereas the Y axis addresses the splitting of dissimilar things (often along noun or verb boundaries), the Z-axis addresses segmentation of “similar” things. Examples may include splitting customers along an unbiased modulus of customer_id, or along a somewhat biased (but beneficial for response time) geographic boundary. Product catalogs may be split by SKU, and content may be split by content_id. Z-axis scaling, like all of the axes, improves the solution’s transactional scalability and if fault isolated it’s availability. Because the software deployed to servers is essentially the same in each Z axis shard (but the data is distinct) there is no increase in organizational scalability. Cache hit rates often go up with smaller data sets, and operational costs generally go down as commodity servers or smaller IaaS instances can be used.
Figure 5 explains the pros and cons of Z axis scalability and displays a fault-isolated pod structure with 2 unique customer pods in the US, and 2 within the EU. Note, that an additional benefit of Z axis scale is the ability to segment pods to be consistent with local privacy laws such as the EU’s GDPR.
Like Goldilocks and the three bears, the goal of decomposition is not to have services that are too small, or services that are too large but to ensure that the system is “just right” along the dimensions of scale, cost, availability, time to market and response times.
AKF Partners has helped hundreds of companies, big and small, employ the AKF Scale Cube to scale their technology product solutions. We developed the cube in 2007 to help clients scale their products and have been using it since to help some of the greatest online brands of all time thrive and succeed. For those interested in “time travel”, here are the original 2 posts on the cube from 2007: Application Cube, Database Cube
Subscribe to the AKF Newsletter
April 10, 2018 | Posted By: Marty Abbott
The decomposition of monoliths into services, or alternatively the development of new products in a services-oriented fashion (oftentimes called microservices), is one of the greatest architectural movements of the last decade. The benefits of a services (alternatively microservices or micro-services) approach are clear:
- Independent deployment, decreasing time to market and decreasing time to value realization– especially when continuous delivery is employed.
- Team velocity and ownership (informed by Conway’s Law).
- Increased fault isolation – but only when properly deployed (see below).
- Individual scalability – and the decreasing cost of operations that entails when properly architected.
- Freedom of implementation and technology choices – choosing the best solution for each service rather than subjecting services to the lowest common denominator implementation.
Unfortunately, without proper architectural oversight and planning, improperly architected services can also result in:
- Lower overall availability, especially when those services are deployed in one of a handful of microservice anti-patterns like the mesh, services in depth (aka the Christmas Tree Light String) and the Fuse.
- Higher (longer) response times to end customers.
- Complicated fault isolation and troubleshooting that increases average recovery time for failures.
- Service bloat: Too many services to comprehend (see our service sizing post)
The following are patterns companies should avoid (anti-patterns) when developing services or microservices architectures:
Mesh architectures, where individual services both “fan out” and “share” subsequent services result in the lowest possible availability.
Services that are strung together in long (deep) call trees suffer from low availability and slow page response times as calculated from the product of each service offering availability.
The Fuse is a much smaller anti-pattern than “The Mesh”. In “The Fuse”, 2 distinct services (A and B) rely on service C. Should service C become slow or unavailable, both service A and B suffer.
Architecture Principle: Services – Broad, But Never Deep
These services anti-patterns protect against a lack of fault isolation, where slowness and failures propagate along a synchronous path. One service fails, and the others relying upon that service also suffer.
They also serve to guard against longer latency in call streams. While network calls tend to be minimal relative to total customer response times, many solutions (e.g. payment solutions) need to respond as quickly as possible and service calls slow that down.
Finally, these patterns help protect against difficult to diagnose failures. The Xmas Tree pattern name is chosen because of the difficulty in finding the “failed bulb” in old tree lights wired in series. Similarly, imagine attempting to find the fault in “The Mesh”. The time necessary to find faults negatively effects service restoration time and therefore availability.
As such, we suggest a principal that services should never be deep but instead should be deployed in breadth along product offering boundaries defined by nouns (resources like “customer” or “sales”) or verbs (services like “search” or “add to cart”). We often call this approach “slices instead of layers”.
How then do we accomplish the separation of software for team ownership, and time to market where a single service would otherwise be too large or unwieldy?
Old School – Libraries!
When you need service-like segmentation in a deep call tree but can’t suffer the availability impact and latency associated with multiple calls, look to libraries. Libraries will both eliminate the network associated latency of a service call. In the case of both The Fuse and The Mesh libraries eliminate the shared availability constraints. Unfortunately, we still have the multiplicative effect of failure of the Xmas Tree, but overall it is a faster pattern.
“But My Teams Can’t Release Separately!”
Sure they can – they just have to change how they think about releasing. If you need immediate effect from what you release and don’t want to release the calling services with libraries compiled or linked, consider performing releases with shared objects or dynamically loadable libraries. While these require restarts of the calling service, simple automation will help you keep from having an outage for the purpose of deploying software.
AKF Partners helps companies architecture highly available, highly scalable microservice architecture products. We apply our aggregate experience, proprietary models, patterns, and anti-patterns to help ensure your products can meet your company’s scale and availability goals. Contact us today - we can help!
Subscribe to the AKF Newsletter
March 12, 2018 | Posted By: Dave Swenson
More and more companies are waking up from the 20th century, realizing that their on-premise, packaged, waterfall paradigms no longer play in today’s SaaS, agile world. SaaS (Software as a Service) has taken over, and for good reason. Companies (and investors) long for the higher valuation and increased margins that SaaS’ economies of scale provide. Many of these same companies realize that in order to fully benefit from a SaaS model, they need to release far more frequently, enhancing their products through frequent iterative cycles rather than massive upgrades occurring only 4 times a year. So, they not only perform a ‘lift and shift’ into the cloud, they also move to an Agile PDLC. Customers, tired of incurring on-premise IT costs and risks, are also pushing their software vendors towards SaaS.
SaaS Migration is About More Than Just Technology – It is An Organization Reboot
But, what many of the companies migrating to SaaS don’t realize is that migrating to SaaS is not just a technology exercise. Successful SaaS migrations require a ‘reboot’ of the entire company. Certainly, the technology organization will be most affected, but almost every department in a company will need to change. Sales teams need to pitch the product differently, selling a leased service vs. a purchased product, and must learn to address customers’ typical concerns around security. The role of professional services teams in SaaS drastically changes, and in most cases, shrinks. Customer support personnel should have far greater insight into reported problems. Product management in a SaaS world requires small, nimble enhancements vs. massive, ‘big-bang’ upgrades. Your marketing organization will potentially need to target a different type of customer for your initial SaaS releases - leveraging the Technology Adoption Lifecycle to identify early adopters of your product in order to inform a small initial release (Minimum Viable Product).
It is important to recognize the risks that will shift from your customers to you. In an on-premise (“on-prem”) product, your customer carries the burden of capacity planning, security, availability, disaster recovery. SaaS companies sell a service (we like to say an outcome), not just a bundle of software. That service represents a shift of the risks once held by a customer to the company provisioning the service. In most cases, understanding and properly addressing these risks are new undertakings for the company in question and not something for which they have the proper mindset or skills to be successful.
This company-wide reboot can certainly be a daunting challenge, but if approached carefully and honestly, addressing key questions up front, communicating, educating, and transparently addressing likely organizational and personnel changes along the way, it is an accomplishment that transforms, even reignites, a company.
This is the first in a series of articles that captures AKF’s observations and first-hand experiences in guiding companies through this process.
Don’t treat this as a simple rewrite of your existing product –
Answer these questions first…
Any company about to launch into a SaaS migration should first take a long, hard look at their current product, determining what out of the legacy product is not worth carrying forward. Is all of that existing functionality really being used, and still relevant? Prior to any move towards SaaS, the following questions and issues need to be addressed:
Customization or Configuration?
SaaS efficiencies come from many angles, but certainly one of those is having a single codebase for all customers. If your product today is highly customized, where code has been written and is in use for specific customers, you’ve got a tough question to address. Most product variances can likely be handled through configuration, a data-driven mechanism that enables/disables or otherwise shapes functionality for each customer. No customer-specific code from the legacy product should be carried forward unless it is expected to be used by multiple clients. Note that this shift has implications on how a sales force promotes the product (they can no longer promise to build whatever a potential customer wants, but must sell the current, existing functionality) as well as professional services (no customizations means less work for them).
Many customers, even those who accept the improved security posture a cloud-hosted product provides over their own on-premise infrastructure, absolutely freak when they hear that their data will coexist with other customers’ data in a single multi-tenant instance, no matter what access management mechanisms exist. Multi-tenancy is another key to achieving economies of scale that bring greater SaaS efficiencies. Don’t let go of it easily, but if you must, price extra for it.
Who Owns the Data?
Many products focus only on the transactional set of functionality, leaving the analytics side to their customers. In an on-premise scenario, where the data resides in the customers’ facilities, ownership of the data is clear. Customers are free to slice & dice the data as they please. When that data is hosted, particularly in a multi-tenant scenario where multiple customers’ data lives in the same database, direct customer access presents significant challenges. Beyond the obvious related security issues is the need to keep your customers abreast of the more frequent updates that occur with SaaS product iterations. The decision is whether you replicate customer data into read-only instances, provide bulk export into their own hosted databases, or build analytics into your product?
All of these have costs - ensure you’re passing those on to your customers who need this functionality.
May I Upgrade Now?
Today, do your customers require permission for you to upgrade their installation? You’ll need to change that behavior to realize another SaaS efficiency - supporting of as few versions as possible. Ideally, you’ll typically only support a single version (other than during deployment). If your customers need to ‘bless’ a release before migrating on to it, you’re doing it wrong. Your releases should be small, incremental enhancements, potentially even reaching continuous deployment. Therefore, the changes should be far easier to accept and learn than the prior big-bang, huge upgrades of the past. If absolutely necessary, create a sandbox for customers to access new releases, but be prepared to deal with the potentially unwanted, non-representative feedback from the select few who try it out in that sandbox.
Wait? Who Are We Targeting?
All of the questions above lead to this fundamental issue: Are tomorrow’s SaaS customers the same as today’s? The answer? Not necessarily. First, in order to migrate existing customers on to your bright, shiny new SaaS platform, you’ll need to have functional parity with the legacy product. Reaching that parity will take significant effort and lead to a big-bang approach. Instead, pick a subset or an MVP of existing functionality, and find new customers who will be satisfied with that. Then, after proving out the SaaS architecture and related processes, gradually migrate more and more functionality, and once functional parity is close, move existing customers on to your SaaS platform.
To find those new customers interested in placing their bets on your initial SaaS MVP, you’ll need to shift your current focus on the right side of the Technology Adoption Lifecycle (TALC) to the left - from your current ‘Late Majority’ or ‘Laggards’ to ‘Early Adopters’ or ‘Early Majority’. Ideally, those customers on the left side of the TALC will be slightly more forgiving of the ‘learnings’ you’ll face along the way, as well as prove to be far more valuable partners with you as you further enhance your MVP.
The key is to think out of the existing box your customers are in, to reset your TALC targeting and to consider a new breed of customer, one that doesn’t need all that you’ve built, is willing to be an early adopter, and will be a cooperative partner throughout the process.
Our next article on SaaS migration will touch on organizational approaches, particularly during the build-out of the SaaS product, and the paradigm shifts your product and engineering teams need to embrace in order to be successful.
AKF has led many companies on their journey to SaaS, often getting called in as that journey has been derailed. We’ve seen the many potholes and pitfalls and have learned how to avoid them. Let us help you move your product into the 21st century. See our SaaS Migration service
Subscribe to the AKF Newsletter
September 5, 2017 | Posted By: Roger Andelin
Last month, a bot developed by OpenAI (co-founded by Elon Musk) beat the world’s best, pro Dota 2 players. This is another milestone accomplishment in the field of artificial intelligence and machine learning and more fuel for the fire of concerns surrounding the AI debate. However, before we jump into that debate, here is some background you should understand about the technology fueling this debate.
The Evolution of Traditional Programming
A lot of what computer programming is can be simplified into three steps. First step, read in some data. Second step, do something with that data. Third step, output some result.
For example, imagine you want to fly somewhere for the weekend. You may first go to your travel app and input some dates, times, number of people traveling, airports, etc. Second, the app uses that information to search its database of available flights. Third, it returns a list of available flights for you to see.
This approach to software design has been the norm since the earliest days of programming. Artificial intelligence, in particular machine learning, has changed that approach. The first step is still the same: Read in some data. The third step is the same: Output some result.
However, with artificial intelligence technologies like machine learning, the second step, doing something with the data, is very different. In the example of finding a flight, a programmer easily can read the software code to understand the sequence of steps the computer has been programmed to do to produce the output data. If the programmer wants to change or improve the program’s behavior she can do that by writing new code or by altering the existing code. For example, if you wanted to compare the prices for available flights near the dates you have selected, a programmer can easily change several lines of code in the program to do just that. The programming code identifies every step the computer takes to arrive at its output. Said another way, the program only does what it’s specifically told to do in the code, nothing more or nothing less.
By contrast, the output of today’s most common machine learning programs is not determined by instructions written in computer code. There is no code for a programmer to read or modify when a change is desired. The output is determined by the program’s neural network.
Neural Networks in Action
What is a neural network? At the core of a neural network is a neuron. Similar to a traditional computer program, a neuron takes some input data, does a mathematical calculation on that data and then outputs some data. A typical neuron in a neural network will receive as input hundreds to thousands of numbers, typically between 0 and 1. A neuron will then multiply each number by a weight and sum the result of all the numbers. Many neurons will then convert the result into a number between 0 and 1. That result is then sent to the next neuron in sequence until the final output neuron is reach.
Here is an example of the math a typical neuron will do: If “x1, x2, x3…” represents input data and “w1, w2, w3…” represent the weights stored in the neuron, the calculation done by the neuron in a neural network looks like this: x1*w1 + x2*w2 + x3*w3 and so forth.
You can think of the calculation inside the neuron in a different way: The neuron is reading in a bunch of numbers and the weights in the neuron determine the importance, “or weight” of that input in producing the output. If the input is not important the weight for the input will be near zero and the input is not passed along to the next neuron. Therefore, the weights in a neuron effectively decided what input is valuable and what input should be ignored.
In a neural network, neurons like the one I described above are connected in parallel and in series to create a matrix of neurons. The input data to a neural network will go into hundreds or thousands of neurons in parallel, all with different weights. The output of those neurons is then sent to another layer of neurons and so forth, usually multiple layers deep. This is called a deep neural network. Another way to look at this is the neurons are grouped into a matrix of rows and columns, all interconnected. The final layer of the neural network is the output layer. Therefore, the final output of a neural network is the result of millions of calculations done by the neurons of that network.
When a programmer creates a neural network in software, the weights for each neuron are initially just random numbers. In other words, the weights arbitrarily decide to either diminish, increase or leave the input data alone, and output from the network is random. However, through a process called training, the weights move from randomly assigned values to values that can produce very useful outputs.
Training is both a time consuming and complicated mathematical process. However, it is much like training you and I would do to get better at something. For example, let’s say I wanted to learn how to shoot and arrow with a bow. I might pick up the bow and arrow, point it at the target, pull back the string and release. In my case, I know the arrow would miss the target. Therefore, I would try again and again making corrections to my aim based on on how far and which direction I was off from the target.
During the training process for a neural network, the weights in each of the neurons are changed slightly to improve the output, or “aim.” The most common approach for making those changes is called backpropagation. Backpropagation is a mathematical approach for applying corrections to every weight in every neuron of the network. During training, input is fed into the network and output is generated. The output is compared to the desired target and the difference between the output and the target is the error. Using the error, backpropagation makes changes to the weights in each neuron to reduce the error. If all goes well during training and backpropagation, the output error diminishes until it reaches expert or better than expert level.
AI vs Humans
In the case of the OpenAI Dota bot that recently beat the world’s best Dota 2 player, the outputs, which were a sequence of steps, strategies and decisions, went from random moves to moves that were so good the bot was able to easily defeat the best pro players in the world. The critical information that enabled the bot to win its matches was stored in the weights of the neurons and the neural network architecture itself.
A good question at this point is to ask if a programmer looking at the Dota 2 bot’s neural network could understand the steps taken by the bot to beat the human player. The answer is no. A programmer can see areas of the neural network that influence an output but it is not possible to explain why the bot took specific steps to formulate its moves and strategies. All the programmer would see is a huge matrix of weights that would be quite overwhelming to interpret.
Another good question to ask is whether or not a program written traditionally by a programmer with step by step instructions could beat the best Dota 2 player. The answer is no. Step by step programs where the programmer specifically instructs the computer to do something would easily be defeated by a professional player. However, a neural network can learn from training things that a programmer would never have the knowledge to program, store that learning in its neurons and use that learning to do things like defeat a human pro.
What makes the Dota 2 bot special is that it learned to beat the best pro players by playing against itself whereas most machine learning programs learn from training on data given to it by a programmer. In machine learning, good training data is like gold. It’s scarce and valuable. (note: This is one reason why Google and other big tech companies want to collect so much data.) Data is used to train neural networks to do useful things like recognizing people and places in your pictures or recognizing your voice from others in your family. OpenAI built a bot that learned almost entirely by playing against itself with the exception of some coaching provided by the OpenAI team. OpenAI has shown clearly that learning can occur without having tons of training data. It’s a little like being able to make gold.
Does the development of the OpenAI dota bot mean bots can now decide to train against themselves and become super bots? No. But it does say that humans can now program two bots to train against each other to become superbots. The key enabler being us. It’s anyone’s guess what type of bot can be imagined and developed in this way, useful or harmful. Obviously to most, a gaming superbot seems pretty innocuous, except of course to the gamer who may unexpectedly run into one during a match. However, it’s not hard to imagine super bots that are not so harmless. Or, perhaps you can just imagine a time when someone trains a bot to play football against itself until the bot becomes better at calling plays and strategy than every coach in the NFL. What happens then? The answer is disruption. Are you ready for it?
AKF Partners recommends that boards and executives direct their teams to identify sources of innovation and patterns of disruption that AI techniques may represent within their respective markets Walmart is already working on facial recognition technology in their stores to determine whether or not shoppers are satisfied at checkout. Will this give them a potential advantage over Amazon? How can machine learning and AI help you prevent fraud in your payment systems or the use of your commerce system to launder money?
AKF is prepared to help answer that question and others you may be facing. We will help you craft your AI strategy, sort through the hype, help you find the opportunities, and identify the potential threats of AI technology to your business.
Reach out to AKF
Subscribe to the AKF Newsletter
April 3, 2017 | Posted By: AKF
It seems that everyone is on the microservice architecture bus these days (splits by the Y axis on the Scale Cube). One question we commonly receive as companies create their first microservice solution, or as they transition from a monolithic architecture to a microservice architecture is “How large should any of my services be?”
To help answer these questions, we’ve put together a list of considerations based on developer throughput, availability, scalability, and cost. By considering these, you can decide if your application should be grouped into a large, monolithic codebases or split up into smaller individual services and swim lanes. You must also keep in mind that splitting too aggressively can be overly costly and have little return for the effort involved. Companies with little to no growth will be better served focusing their resources on developing a marketable product than by fine tuning their service sizes using the considerations below.
Frequency of Change – Services with a high rate of change in a monolithic codebase cause competition for code resources and can create a number of time to market impacting conflicts between teams including product merge conflicts. Such high change services should be split off into small granular services and ideally placed in their own fault isolative swim lane such that the frequent updates don’t impact other services. Services with low rates of change can be grouped together as there is little value created from disaggregation and a lower level of risk of being impacted by updates.
The diagram below illustrates the relationship we recommend between functionality, frequency of updates, and relative percentage of the codebase. Your high risk, business critical services should reside in the upper right portion being frequently updated by small, dedicated teams. The lower risk functions that rarely change can be grouped together into larger, monolithic services as shown in the bottom left.
Degree of Reuse – If libraries or services have a high level of reuse throughout the product, consider separating and maintaining them apart from code that is specialized for individual features or services. A service in this regard may be something that is linked at compile time, deployed as a shared dynamically loadable library or operates as an independent runtime service.
Team Size – Small, dedicated teams can handle microservices with limited functionality and high rates of change, or large functionality (monolithic solutions) with low rates of change. This will give them a better sense of ownership, increase specialization, and allow them to work autonomously. Team size also has an impact on whether a service should be split. The larger the team, the higher the coordination overhead inherent to the team and the greater the need to consider splitting the team to reduce codebase conflict. In this scenario, we are splitting the product up primarily based on reducing the size of the team in order to reduce product conflicts. Ideally splits would be made based on evaluating the availability increases they allow, the scalability they enable or how they decrease the time to market of development.
Specialized Skills – Some services may need special skills in development that are distinct from the remainder of the team. You may for instance have the need to have some portion of your product run very fast. They in turn may require a compiled language and a great depth of knowledge in algorithms and asymptotic analysis. These engineers may have a completely different skillset than the remainder of your code base which may in turn be interpreted and mostly focused on user interaction and experience. In other cases, you may have code that requires deep domain experience in a very specific area like payments. Each of these are examples of considerations that may indicate a need to split into a service and which may inform the size of that service.
Availability and Fault Tolerance Considerations:
Desired Reliability – If other functions can afford to be impacted when the service fails, then you may be fine grouping them together into a larger service. Indeed, sometimes certain functions should NOT work if another function fails (e.g. one should not be able to trade in an equity trading platform if the solution that understands how many equities are available to trade is not available). However, if you require each function to be available independent of the others, then split them into individual services.
Criticality to the Business – Determine how important the service is to business value creation while also taking into account the service’s visibility. One way to view this is to measure the cost of one hour of downtime against a day’s total revenue. If the business can’t afford for the service to fail, split it up until the impact is more acceptable.
Risk of Failure – Determine the different failure modes for the service (e.g. a billing service charging the wrong amount), what the likelihood and severity of each failure mode occurring is, and how likely you are to detect the failure should it happen. The higher the risk, the greater the segmentation should be.
Scalability of Data – A service may be already be a small percentage of the codebase, but as the data that the service needs to operate scales up, it may make sense to split again.
Scalability of Services – What is the volume of usage relative to the rest of the services? For example, one service may need to support short bursts during peak hours while another has steady, gradual growth. If you separate them, you can address their needs independently without having to over engineer a solution to satisfy both.
Dependency on Other Service’s Data – If the dependency on another service’s data can’t be removed or handled with an asynchronous call, the benefits of disaggregating the service probably won’t outweigh the effort required to make the split.
Effort to Split the Code – If the services are so tightly bound that it will take months to split them, you’ll have to decide whether the value created is worth the time spent. You’ll also need to take into account the effort required to develop the deployment scripts for the new service.
Shared Persistent Storage Tier – If you split off the new service, but it still relies on a shared database, you may not fully realize the benefits of disaggregation. Placing a readonly DB replica in the new service’s swim lane will increase performance and availability, but it can also raise the effort and cost required.
Network Configuration – Does the service need its own subdomain? Will you need to make changes load balancer routing or firewall rules? Depending on the team’s expertise, some network changes require more effort than others. Ensure you consider these changes in the total cost of the split.
The illustration below can be used to quickly determine whether a service or function should be segmented into smaller microservices, be grouped together with similar or dependent services, or remain in a multifunctional, infrequently changing monolith.
Subscribe to the AKF Newsletter
April 3, 2017 | Posted By: AKF
The most common point of congestion and therefore barrier to scale that we see in our practice is the database. Referring back to our earlier article “Splitting Applications or Services for Scale”, it is very common for engineers to create scalability along the X axis of our cube by persisting data in a single monolithic database and having multiple “cloned” applications servers retrieve and store data within that database. For young companies this is a very good approach as if done properly it will also eliminate the need for persistence or affinity to a given application server and as a result will increase customer perceived availability.
The problem, however, with this single monolithic data structure is threefold:
- Even with clustering technology (the existence of a second physical system or database that can take the load of the first in the event of failure), failures of the primary database will result in short service outages for 100% of the user community.
- This approach ultimately relies solely on technical improvements in cpu speed, memory access speed, memory access size, mass
storage access speeds and size, etc to insure the companies needs for scale.
- Relying upon (2) above in the extreme cases is not the most cost effective solutions as the newest and fastest technologies come at
a premium to older generations of technology and do not necessarily have the same processing power per dollar as older and/or
smaller (fewer cpus etc) systems.
As we have argued in the aforementioned post, a great engineering team will think about how to scale their platform well in advance of the need to rely solely upon partner technology advances. By making small modifications to our previously presented “Scale Cube”, the same concepts applied to the problem of splitting services for scale can be useful in addressing how to split a database for scale. As with the AKF Services Scale Cube, the AKF Database Scale Cube consists of an X, Y and Z axes – each addressing a different approach to scale transactions applied to a database. The lowest left point of the cube (coordinates X=0, Y=0 and Z=0) represents the worst case monolithic database – a case where all data is located in a single location and all accesses go to this single database.
The X Axis of the cube represents a means of spreading load across multiple instances of a replicated representation of the data. This is the first approach most companies use in scaling databases and is often both the easiest to implement and the least costly in both engineering time and hardware. Many third party and open source databases have native properties or functions that will allow the near real time replication of data to multiple “read databases”. The engineering cost of such an approach is low as typically database calls only need to be identified as a “read” or “write” and sent to the appropriate write database or bank of read databases. The “bank” of read databases should have reads evenly split across this if possible and many companies employ simple 3d party load balancers to perform this distribution.
Included in our Xaxis split are third party and open source caching solutions that allow reads to be split across “cache” hosts before actually reading from a database upon a cache miss. Caching is another simple way to reduce the load on the database but in our experience is not sufficient for hyper growth SaaS sites. If implemented properly, this Xaxis split also can increase availability as if replication is near real time, a read server can be promoted as the singular “write server” in the event of a “write server” failure. The combination of caching and read/write splits (our X axis) is sufficient for many companies but for companies with extreme hyper growth and massive data retention needs it is often not enough.
The Y Axis of our database cube represents a split by function, service or resource just as it did with the service cube. A service might represent a set of usecases and is most often easiest to envision through thinking of it as a verb or action like “login” and a resource oriented split is easiest to envision by thinking of splits as nouns like “account information”. These splits help handle not only the split of transactions across multiple systems as did the X axis, but can also be helpful in speeding up database calls by allowing more information specific to the request to be held in memory rather than needing to make a disk access. Just as with our approach in scaling services, our recommended approach to identify the order in which these splits should be accomplished is to determine which ones will give you the greatest “headroom” or capacity “runway” for the least amount of work. These splits often come at a higher cost to the engineering team as very often they will require that the application be split up as well. It is possible to take a monolithic application and perform physical splits by say URL/URI to different service or resource oriented pools. While this approach will help spread transaction processing across multiple systems similar to our X axis implementation it may not offer the added benefit of reducing the amount of system memory required by service / pool / resource / application. Another reason to consider this type of split in very large teams is to dedicate separate engineering teams to focus on specific services or resources in order to reduce your application learning curve, increase quality, decrease time to market (smaller code bases), etc. This type of split is often referred to as “swimlaning” an application and data set, especially when both the database and applications are split to represent a “failure domain” or fault isolative infrastructure.
The Z Axis represents ways to split transactions by performing a lookup, a modulus or other indiscriminate function (hash for instance). The most common way to view this is to consider splitting your resources by customer if your entity relationships allow that to happen. In the world of media, you might consider splitting it by article_id or media_id and in the world of commerce a split by product_id might be appropriate. In the case where you split customers from your products and perform splits within customers and products you would be implementing both a Y axis split (splitting by resource or call – customers and products) and a Z axis split (a
modulus of customers and products within their functional splits).
Z axis splits tend to be the most costly for an engineering team to perform as often many functions that might be performed within the database (joins for instance) now need to be performed within the application. That said, if done appropriately they represent the greatest potential for scale for most companies.
Subscribe to the AKF Newsletter
April 3, 2017 | Posted By: AKF
Splitting Applications or Services for Scale
Most internet enabled products start their life as a single application running on an appserver or appserver/webserver combination and potentially communicating with a database. Many if not all of the functions are likely to exist within a monolithic application code base making use of the same physical and virtual resources of the system upon which the functions operate: memory, cpu, disk, network interfaces, etc. Potentially the engineers have the forethought to make the system highly available by positioning a second application server in the mix to be used in the event that the first application server fails.
This monolithic design will likely work fine for many sites that receive low levels of traffic. However, if the product is very successful and receives wide and fast adoption user perceived response times are likely to significantly degrade to the point that the product is almost entirely unusable. At some point, the system will likely even fail under the load as the inbound request rate is significantly greater than the processing power of the system and the resulting departure rate of responses to requests.
A great engineering team will think about how to scale their platform well in advance of such a catastrophic failure. There are many ways to approach how to think about such scalability of a platform and we present several through a representation of a three dimensional cube addressing three approaches to scale that we call the AKF Scale Cube.
The AKF Scale Cube (aka Scale Cube and AKF Cube) consists of an X, Y and Z axes – each addressing a different approach to scale a service. The lowest left point of the cube (coordinates X=0, Y=0 and Z=0) represents the worst case monolithic service or product identified above: a product wherein all functions exist within a single code base on a single server making use of that server’s finite resources of memory, cpu speed, network ports, mass storage, etc.
The X Axis of the cube represents a means of spreading load across multiple instances of the same application and data set. This is the first approach most companies use to scale their services and it is effective in scaling from a request per second perspective. Oftentimes it is sufficient to handle the scale needs of a moderate sized business. The engineering cost of such an approach is low compared to many of the other options as no significant rearchitecting of the code base is required unless the engineering team needs to eliminate affinity to a specific server because the application maintains state. The approach is simple: clone the system and service and allow it to exist on N servers with each server handling 1/Nth the total requests. Ideally the method of distribution is a loadbalancer configured in a highly available manner with a passive peer that becomes active should the active peer fail as a result of hardware or software problems. We do not recommend leveraging roundrobin DNS as a method of load balancing. If the application does maintain state there are various ways of solving this including a centralized state service, redesigning for statelessness, or as a last resort using the load balancer to provide persistent connections. While the Xaxis approach is sufficient for many companies and distributes the processing of requests across several hosts it does not address other potential bottlenecks like memory constraints where memory is used to cache information or results.
The Y Axis of the cube represents a split by function, service or resource. A service might represent a set of usecases and is most often easiest to envision through thinking of it as a verb or action like “login” and a resource oriented split is easiest to envision by thinking of splits as nouns like “account information”. These splits help handle not only the split of transactions across multiple systems as did the X axis, but can also be helpful in reducing or distributing the amount of memory dedicated to any given application across several systems. A recommended approach to identify the order in which these splits should be accomplished is to determine which ones will give you the greatest “headroom” or capacity “runway” for the least amount of work. These splits often come at a higher cost to the engineering team as very often they will require that the application be split up as well. As a quick first step, a monolithic application can be placed on multiple servers and dedicate certain of those servers to specific “services” or URIs. While this approach will help spread transaction processing across multiple systems similar to our X axis implementation it may not offer the added benefit of reducing the amount of system memory required by service/pool/resource/application. Another reason to consider this type of split in very large teams is to dedicate separate engineering teams to focus on specific services or resources in order to reduce your application learning curve, increase quality, decrease time to market (smaller code bases), etc. This type of split is often referred to as
“swimlaning” an application.
The Z Axis represents ways to split transactions by performing a lookup, a modulus or other indiscriminate function (hash for instance). As with the Y axis split, this split aids not only fault isolation, but significantly reduces the amount of memory necessary
(caching, etc) for most transactions and also reduces the amount of stabile storage to which the device/service needs attach. In this case, you might try a modulus by content id (article), or listing id, or a hash from the received IP address, etc. The Z axis split is often the most costly of all splits and we only recommend it for clients that have hypergrowth or very high rates of transaction. It should only be used after a company has implemented a very granular split along the Y axis. That said, it also can offer the greatest degree of scalability as the number of “swimlanes within swimlanes” that it creates is virtually limitless. For instance, if a company implements a Z axis split as a modulus of some transaction id and the implementation is a configurable number “N”, then N can be 10, 100, 1000, etc and each order of magnitude increase in N creates nearly an order of magnitude of greater scale for the company.
Subscribe to the AKF Newsletter
‹ First < 2 3 4