May 2, 2018 | Posted By: AKF
Our post on the AKF Scale Cube made reference to a concept that we call “Fault Isolation” and sometimes “Swim lanes” or “Swim-laned Architectures”. We sometimes also call “swim lanes” fault isolation zones or fault isolated architecture.
Fault Isolation Defined
A “Swim lane” or fault isolation zone is a failure domain. A failure domain is a group of services within a boundary such that any failure within that boundary is contained within the boundary and the failure does not propagate or affect services outside of said boundary. Think of this as the “blast radius” of failure meant to answer the question of “What gets impacted should any service fail?” The benefit of fault isolation is twofold:
1) Fault Detection: Given a granular enough approach, the component of availability associated with the time to identify the failure is significantly reduced. This is because all effort to find the root cause or failed component is isolated to the section of the product or platform associated with the failure domain. Once something breaks, because the failure is limited in scope, it can be more rapidly identified and fixed. Recovery time objectives (RTO) are subsequently decreased which increases overall availability.
2) Fault Isolation: As stated previously, the failure does not propagate or cause a deterioration of other services within the platform. The “blast radius” of a failure is contained. As such, and depending upon approach, only a portion of users or a portion of functionality of the product is affected. This is akin to circuit breakers in your house - the breaker exists to limit the fault zone for any load that exceeds a limit imposed by the breaker. Failure propagation is contained by the breaker popping and other devices are not affected.
Architecting Fault Isolation
A fault isolated architecture is one in which each failure domain is completely isolated. We use the term “swim lanes” to depict the separations. In order to achieve this, ideally there are no synchronous calls between swim lanes or failure domains made pursuant to a user request. User initiated synchronous calls between failure domains are absolutely forbidden in this type of architecture as any user-initiated synchronous call between fault isolation zones, even with appropriate timeout and detection mechanisms, is very likely to cause a cascading series of failures across other domains. Strictly speaking, you do not have a failure domain if that domain is connected via a synchronous call to any other service in another domain, to any service outside of the domain, or if the domain receives synchronous calls from other domains or services. Again, “synchronous” is meant to identify a synchronous call (call, wait for a response) pursuant to any user request.
It is acceptable, but not advisable, to have asynchronouss calls between domains and to have non-user initiated synchronous calls between domains (as in the case of a batch job collecting data for the purposes of reporting in another failure domain). If such a communication is necessary it is very important to include failure detection and timeouts even with the asynchronous calls to ensure that retries do not call port overloads on any services. Here is an interesting blog post about runaway scripts and their impact on Apache, PHP, and MySQL.
As previously indicated, a swim lane should have all of its services located within the failure domain. For instance, if database [read/writes] are necessary, the database with all appropriate information for that swim lane should exist within the same failure domain as all of the application and webservers necessary to perform the function or functions of the swim lane. Furthermore, that database should not be used for other requests of service from other swim lanes. Our rule is one production database on one host.
The figure below demonstrates the components of software and infrastructure that are typically fault isolated:
Rarely are shared higher level network components isolated (e.g. border systems and core routers).
Sometimes, if practical, firewalls and load balancers are isolated. These are especially the case under very high demand situations where a single pair of devices simply wouldn’t meet the demand.
The remainder of solutions are always isolated, with web-servers, top of rack switches (in non IaaS implementations), compute (app servers) and storage all being properly isolated.
Applying Fault Isolation with AKF’s Scale Cube
As we have indicated with our Scale Cube in the past, there are many ways in which to think about swim laned architectures. Swim lanes can be isolated along the axes of the Scale Cube as shown below with AKF’s circuit breaker analogy to fault isolation.
Fault isolation in X-axis would mean replicating everything for high availability - and performing the replication asynchronously and in an eventually consistent (rather than a consistent) fashion. For example, when a data center fails the fault will be isolated to the one failed data center or multiple availability zones. This is common with traditional disaster recovery approaches, though we do not often advise it as there are better and more cost effective solutions for recovering from disaster.
Fault Isolation in the Y-axis can be thought in terms of a separation of services e.g. “login” and “shopping cart” (two separate swim lanes) each having the web and app servers as well as all data stores located within the swim lane and answering only to systems within that swim lane. Each portion of a page is delivered from a separate service reducing the blast radius of a potential fault to it’s swim lane.
While purposely not legible (fuzzy) the fake example above shows different components of a fictional business account from a fictional bank. Components of the page are separated with one component showing a summary, another component displaying more detailed information and still other components showing dynamic or static links - each derived from properly isolated services.
Another approach would be to perform a separation of your customer base or a separation of your order numbers or product catalog. Assuming an indiscriminate function to perform this separation (like a modulus of id), such a split would be a Z axis swim lane along customer, order number or product id lines. More beneficially, if we are interested in fastest possible response times to customers, we may split along geographic boundaries. We may have data centers (or IaaS regions) serving the West and East Coasts of the US respectively, the “Fly-Over States” of the US, and regions serving the EU, Canada, Asia, etc. Besides contributing to faster perceived customer response times, these implementations can also help ensure we are compliant with data sovereignty laws unique to different countries or even states within the US.
Combining the concepts of service and database separation into several fault isolative failure domains creates both a scalable and highly available platform. AKF has helped achieve a high availability through fault isolation. Contact us to see how we can help you achieve the same fault tolerance.
AKF Partners helps companies create highly available, fault isolated solutions. Send us a note - we’d love to help you!
May 2, 2018 | Posted By: Marty Abbott
The rate of involuntary turnover for the senior executive running technology and engineering in a company is unfortunately high, with most senior technology executives lasting less than four years in a job. Our experience over the last 12 years, and with nearly 500 companies, indicates that nearly 50% of senior technology executives are at-risk with their peers around the CEO’s management team table.
The reasons why CEOs and their teams are concerned about their senior technology executive vary, but the trigger causes of concern cluster around 5 primary themes:
- Lack of Business Acumen – Doing the Wrong Things
- Failure to Lead and Inspire
- Failure to Manage and Execute
- Trapped in Today – Failure to Plan for Tomorrow
- Lack of Technical Knowledge - Doing Things the Wrong Way
Before digging into each of these, I believe it is informative to first investigate the sources (backgrounds) of most senior technology executives.
CTO and CIO Backgrounds
Within our dataset, there are 2 primary backgrounds from which most senior technology executives come:
1. Raised through the Technical Ranks
These executives have spent a majority of their working career in some element of product engineering or IT. They are very often promoted based on technical acumen, and a perception of being able to “get things done”. Often, they are technically gifted folks with technical or engineering undergraduate degrees, and sometimes they have a great deal of project management experience.
2. Co-Opted from the Business
These executives have spent a majority of their working career in a function other than technology. They may have been raised through marketing, sales or finance and have demonstrated an affinity for technology and some high-level understanding of its application. They rarely have a deep technical understanding and have done very little hands-on work within technology or engineering.
We see both backgrounds struggling with the chief technology role. Sometimes they fail for similar reasons, but often for very different reasons within our five primary causes.
Lack of Business Acumen – Doing the Wrong Things
This is by far the most common reason for failure for chief technologists “Raised through the technical ranks”. On the flip side, it is rarely in our experience the reason why executives “Co-Opted from the Business” find themselves in trouble.
The technologist’s peers often complain that they do not trust the executive in question and often cite a failure to present opportunities, fixes, and projects in “business terms”. Put frankly, a lack of understanding of business in general, and an inability (or lack of desire to) justify actions in meaningful business terms causes a lack of trust with the CIO/CTO’s peers.
Further, CTOs sometimes overly focus on what is “cool” rather than the things that create significant business and stakeholder value. Business peers complain that money and headcount is being spent without a justifiable return. An oft heard quote is “We don’t get why there are so many people working on things that don’t drive revenue.”
The fix for this is easy. Executives raised through the technology ranks should seek education and training in the fundamentals and language of business. A great way to do this is for the tech exec to get an MBA or attend an abbreviated MBA-like program such as those offered by Harvard Business School or Stanford. Many business courses focused on cost justification and the business financial statements are also available from online programs.
Failure to Lead and Inspire
This failure is most common for executives “raised through the technical ranks”. Comments from peers and CEOs of the CTO indicate that he struggles with creating a strategy that clearly supports the needs of the business. Further, individual contributors within the organization appear to be disconnected from the mission of the business and disengaged at work. Often, employees within such an organization will complain that the CIO/CTO requires that all major decisions go through her. Such employees and organizations often test low on work engagement and overall morale.
The cause of this failure is again often the result of promoting a person solely upon her technical capabilities. While there are many great technology leaders, leadership and technical acumen have very little in common. Some folks can be trained to appreciate the value and need for a compelling vision and to be inspirational, but alas some cannot.
The fix is to attempt training, but where the executive does not show the willingness or capability, a replacement may be necessary. Sometimes we find that mentoring from a former or current successful CIO/CTO helps. Coaching from a professional coach or leadership professional may also be helpful.
Failure to Manage and Execute
This failure is common for both those raised through the technical ranks, and those co-opted from the business. At the very heart of this failure is a perceived lack of execution on the part of the executive – specifically in getting things done. Often the cause is a lack of communication. Sometimes the cause is poor project management capabilities within the organization or a lack of management acumen within the organization.
For executives co-opted from the business, we find that they sometimes struggle in communicating effectively with the technical members of their team and as such don’t have a clear understanding of status. They may also struggle with the right questions to ask to either probe for current status or to help keep their teams on track.
Where the problem is the lack of technical understanding by non-technical executives running tech functions, the fix is to augment them with strong technical people who also speak the language of business (similar to the lack of business acumen failure). For the other failures, the fix is to build appropriate project management and oversight into product delivery. This may be adding skills to the team, or it may mean needing to infuse an element of valuing execution into the technology organization.
Trapped in Today – Failure to Plan for Tomorrow
This type of failure is common to executives of both backgrounds. In fact, the problem is common to leaders in all positions requiring a mix of both operational focus and forward-looking strategy development. The needs of running the day to day business often biases the executive to what is necessary to ensure that you make a day, month or quarter at the expense of what needs to happen for the future. As such, fast moving secular trends (past examples of which are IaaS, Agile, NoSQL, mobile adoption, etc) get ignored today and become crises tomorrow.
There are many potential fixes for this, including ensuring that executives budget their time to create “space” for future planning. Organizationally, we may add someone in larger companies to focus on either operational needs or the needs of tomorrow. Additionally, we can bring in outside help and perspectives either in the form of new hires or consultants who have experience with emerging trends and technologies.
Lack of Technical Knowledge
This failure is owned almost entirely by folks with a non-technology background who find themselves running technical organizations. It manifests itself in multiple ways including not understanding what technologists are saying, not understanding the right questions to ask, and not fully understanding the ramification of various technical decisions. While a horrible position to be in, it is no more or less disastrous than lacking business acumen. In either scenario, we have an imperfect matching of an engine and the transmission necessary to gain benefit from that engine.
Unfortunately, this one is the hardest of all problems to fix. Whereas it is comparatively easy to learn to speak the language of business, the breadth and depth of understanding necessary to properly run a technology organization is not easily acquired. One need not be the best engineer to be successful, but one should certainly understand “how and why things work” to be able to properly evaluate risk and ask the right questions.
The fix is the mirror image of the fix for business acumen. No technology executive should be without deep technical understanding AND a clear understanding of business fundamentals.
Perhaps the most important point to learn from our experience in this area is to ensure that, regardless of your CTO/CIO’s background, you ensure he or she has both the technical and business skills necessary to do the job. Identify weaknesses through interviews and daily interactions and help the executive focus on shoring up these weaknesses through skill augmentation within their organization or through education, mentoring and coaching.
AKF Partners provides mentoring and coaching for both CTOs and CEOs to help ensure a successful partnership between these two key creators of company value. If your CEO or CTO has departed, we also provide interim leadership services.
April 27, 2018 | Posted By: Dave Swenson
Agile Software Development is a widely adopted methodology, and for good reason. When implemented properly, Agile can bring tremendous efficiencies, enabling your teams to move at their own pace, bringing your engineers closer to your customers, and delivering customer value
quicker with less risk. Yet, many companies fall short from realizing the full potential of Agile, treating it merely as a project management paradigm by picking and choosing a few Agile structural elements such as standups or retrospectives without actually changing the manner in which product delivery occurs. Managers in an Agile culture often forget that they are indeed still managers that need to measure and drive improvements across teams.
All too often, Agile is treated solely as an SDLC (Software Development Lifecycle), focused only upon the manner in which software is developed versus a PDLC (Product Development Lifecycle) that leads to incremental product discovery and spans the entire company, not just the Engineering department.
Here are the five most common Agile failures that we see with our clients:
- Technology Executives Abdicate Responsibility for their Team’s Effectiveness
Management in an Agile organization is certainly different than say a Waterfall-driven one. More autonomy is provided to Agile teams. Leadership within each team typically comes without a ‘Manager’ title. Often, this shift from a top-down, autocratic, “Do it this way” approach to a grass-roots, bottoms-up one sways way beyond desired autonomy towards anarchy, where teams have been given full freedom to pick their technologies, architecture, and even outcomes with no guardrails or constraints in place. See our Autonomy and Anarchy article for more on this.
Executives often become focused solely on the removal of barriers the team calls out, rather than leading teams towards desired outcomes. They forget that their primary role in the company isn’t to keep their teams happy and content, but instead to ensure their teams are effectively achieving desired business-related outcomes.
The Agile technology executive is still responsible for their teams’ effectiveness in reaching specified outcomes (e.g.: achieve 2% lift in metric Y). She can allow a team to determine how they feel best to reach the outcome, within shared standards (e.g.: unit tests must be created, code reviews are required). She can encourage teams to experiment with new technologies on a limited basis, then apply those learnings or best practices across all teams. She must be able to compare the productivity and efficiencies from one team to another, ensuring all teams are reaching their full potential.
- No Metrics Are Used
The age-old saying “If you can’t measure it, you can’t improve it” still applies in an Agile organization. Yet, frequently Agile teams drop this basic tenet, perhaps believing that teams are self-aware and critical enough to know where improvements are required. Unfortunately, even the most transparent and aware individuals are biased, fall back on subjective characteristics (“The team is really working hard”), and need the grounding that quantifiable metrics provide. We are continually surprised at how many companies aren’t even measuring velocity, not necessarily to compare one team with another, but to compare a team’s sprint output vs. their prior ones. Other metrics still applicable in an Agile world include quality, estimation accuracy, predictability, percent of time spent coding, the ratio of enhancements vs. maintenance vs. tech debt paydown.
These metrics, their definitions and the means of measuring them should be standardized across the organization, with regular focus on results vs. desired goals. They should be designed to reveal structural hazards that are impeding team performance as well as best practices that should be adopted by all teams.
- Your Velocity is a Lie
Is your definition of velocity an honest one? Does it truly measure outcomes, or only effort? Are you consistent with your definition of ‘done’? Take a good look at how your teams are defining and measuring velocity. Is velocity only counted for true ‘ready to release’ tasks? If QA hasn’t been completed within a sprint, are the associated velocity points still counted or deferred?
Velocity should not be a measurement of how hard your teams are working, but instead an indicator of whether outcomes (again, e.g.: achieve 2% lift in metric Y) are likely to be realized - take credit for completion only when in the hands of customers.
- Failure to Leverage Agile for Product Discovery
From the Agile manifesto: “Our highest priority is to satisfy the customer through early and continuous delivery of valuable software”. Many companies work hard to get an Agile structure and its artifacts in place, but ignore the biggest benefit Agile can bring: iterative and continuous product discovery. Don’t break down a six-month waterfall project plan into two week sprints with standups and velocity measurements and declare Agile victory.
Work to create and deliver MVPs to your customers that allow you to test expected value and customer satisfaction without huge investment.
- Treating Agile as an SDLC vs. a PDLC
As explained in our article PDLC or SDLC, SDLC (Software Development Lifecycle) lives within PDLC (Product Development Lifecycle). Again, Agile should not be treated as a project management methodology, nor as a means of developing software. It should focus on your product, and hopefully the related customer success your product provides them. This means that Agile should permeate well beyond your developers, and include product and business personnel.
Business owners or their delegates (product owners) must be involved at every step of the PDLC process. PO’s need to be embedded within each Agile team, ideally colocated alongside team members. In order to provide product focus, POs should first bring to the team the targeted customer problem to be solved, rather than dictating only a solution, then work together with the team to implement the most effective solution to that problem.
AKF Partners helps companies transition to Agile as well as fine-tune their existing Agile processes. We can readily assess your PDLC, organization structure, metrics and personnel to provide a roadmap for you to reach the full value and benefits Agile can provide. Contact us to discuss how we can help.
April 25, 2018 | Posted By: Robin McGlothin
The Scale Cube is a model for building resilient and scalable architectures using patterns and practices that apply broadly to any industry and all solutions. AKF Partners invented the Scale Cube in 2007, publishing it online in our blog in 2007 (original article here) and subsequently in our first book the Art of Scalability and our second book Scalability Rules.
The Scale Cube (sometimes known as the “AKF Scale Cube” or “AKF Cube”) is comprised of an 3 axes: X-axis, Y-axis, and Z-axis.
• Horizontal Duplication and Cloning (X-Axis )
• Functional Decomposition and Segmentation - Microservices (Y-Axis)
• Horizontal Data Partitioning - Shards (Z-Axis)
These axes and their meanings are depicted below in Figure 1.
The Scale Cube helps teams keep critical dimensions of system scale in mind when solutions are designed and when existing systems are being improved.
Most internet enabled products start their life as a single application running on an appserver or appserver/webserver combination and likely communicate with a database. This monolithic design will work fine for relatively small applications that receive low levels of client traffic. However, this monolithic architecture becomes a kiss of death for complex applications.
A large monolithic application can be difficult for developers to understand and maintain. It is also an obstacle to frequent deployments. To deploy changes to one application component you need to build and deploy the entire monolith, which can be complex, risky, time consuming, require the coordination of many developers and result in long test cycles.
Consequently, you are often stuck with the technology choices that you made at the start of the project. In other words, the monolithic architecture doesn’t scale to support large, long-lived applications.
Figure 2, below, displays how the cube may be deployed in a modern architecture decomposing services (sometimes called micro-services architecture), cloning services and data sources and segmenting similar objects like customers into “pods”.
Scaling Solutions with the X Axis of the Scale Cube
The most commonly used approach of scaling an solution is by running multiple identical copies of the application behind a load balancer also known as X-axis scaling. That’s a great way of improving the capacity and the availability of an application.
When using X-axis scaling each server runs an identical copy of the service (if disaggregated) or monolith. One benefit of the X axis is that it is typically intellectually easy to implement and it scales well from a transaction perspective. Impediments to implementing the X axis include heavy session related information which is often difficult to distribute or requires persistence to servers – both of which can cause availability and scalability problems. Comparative drawbacks to the X axis is that while intellectually easy to implement, data sets have to be replicated in their entirety which increases operational costs. Further, caching tends to degrade at many levels as the size of data increases with transaction volumes. Finally, the X axis doesn’t engender higher levels of organizational scale.
Figure 3 explains the pros and cons of X axis scalability, and walks through a traditional 3 tier architecture to explain how it is implemented.
Scaling Solutions with the Y Axis of the Scale Cube
Y-axis scaling (think services oriented architecture, micro services or functional decomposition of a monolith) focuses on separating services and data along noun or verb boundaries. These splits are “dissimilar” from each other. Examples in commerce solutions may be splitting search from browse, checkout from add-to-cart, login from account status, etc. In implementing splits, Y-axis scaling splits a monolithic application into a set of services. Each service implements a set of related functionalities such as order management, customer management, inventory, etc. Further, each service should have its own, non-shared data to ensure high availability and fault isolation. Y axis scaling shares the benefit of increasing transaction scalability with all the axes of the cube.
Further, because the Y axis allows segmentation of teams and ownership of code and data, organizational scalability is increased. Cache hit ratios should increase as data and the services are appropriately segmented and similarly sized memory spaces can be allocated to smaller data sets accessed by comparatively fewer transactions. Operational cost often is reduced as systems can be sized down to commodity servers or smaller IaaS instances can be employed.
Figure 4 explains the pros and cons of Y axis scalability and shows a fault-isolated example of services each of which has its own data store for the purposes of fault-isolation.
Scaling Solutions with the Z Axis of the Scale Cube
Whereas the Y axis addresses the splitting of dissimilar things (often along noun or verb boundaries), the Z-axis addresses segmentation of “similar” things. Examples may include splitting customers along an unbiased modulus of customer_id, or along a somewhat biased (but beneficial for response time) geographic boundary. Product catalogs may be split by SKU, and content may be split by content_id. Z-axis scaling, like all of the axes, improves the solution’s transactional scalability and if fault isolated it’s availability. Because the software deployed to servers is essentially the same in each Z axis shard (but the data is distinct) there is no increase in organizational scalability. Cache hit rates often go up with smaller data sets, and operational costs generally go down as commodity servers or smaller IaaS instances can be used.
Figure 5 explains the pros and cons of Z axis scalability and displays a fault-isolated pod structure with 2 unique customer pods in the US, and 2 within the EU. Note, that an additional benefit of Z axis scale is the ability to segment pods to be consistent with local privacy laws such as the EU’s GDPR.
Like Goldilocks and the three bears, the goal of decomposition is not to have services that are too small, or services that are too large but to ensure that the system is “just right” along the dimensions of scale, cost, availability, time to market and response times.
AKF Partners has helped hundreds of companies, big and small, employ the AKF Scale Cube to scale their technology product solutions. We developed the cube in 2007 to help clients scale their products and have been using it since to help some of the greatest online brands of all time thrive and succeed. For those interested in “time travel”, here are the original 2 posts on the cube from 2007: Application Cube, Database Cube
April 24, 2018 | Posted By: Marty Abbott
We can tell a lot about a company within the first hour or so of any discussion. Consider the following statement fragments:
“We have a lot of smart people here…”
“We have some of the hardest working engineers…”
Contrast these with the following statement fragments:
“We measure ourselves against very specific business KPIs…”
“We win or lose daily based on how effectively we meet our customer expectations…”
There is a meaningful difference in the impact these two groups of statements have on a company’s culture and how that culture enables or stands in the way of success. The first group of statements are associated with independent variables (inputs) that will rarely in isolation result in desired outcomes (dependent variables). When used in daily discourse, they reinforce the notion that something other than outcomes are the things upon which a company prides itself. Our experience is that these statements create an environment of hubris that often runs perpendicular to, and at best in no way reinforces, the achievement of results. Put another way, when we hear statements like this, we expect to find many operational problems.
The second group of statements are focused on meaningful and measurable outcomes. The companies with which we’ve worked that frequently communicate with these statements are among the most successful we’ve seen. Even when these companies struggle, their effort and focus is solidly behind the things that matter – those things that create value for the broadest swath of stakeholders possible.
The point here is that how we focus communication inside our companies has an important impact on the outcomes we achieve.
Success is often a result of several independent variables aligning to achieve a desired outcome. These may include, as Jim Collins points out, being in the right place at the right time – sometimes called “luck”. Further, there is rarely a single guaranteed path to success; multiple paths may result in varying levels of the desired outcome. Great companies and great leaders realize this, and rather than focusing a culture on independent variables they focus teams on outcomes. By focusing on outcomes, leaders are free to attempt multiple approaches and to tweak a variety of independent variables to find the most expedient path to success. We created the AKF Equation (we sometimes refer to it as the AKF Law) to help focus our clients on outcomes first:
Two very important corollaries follow from this equation or “law”:
Examples of Why Results, and Not Paths Matter
Intelligence Does Not Equal Success
As an example of why the dependent variable of results and not an independent variable like intelligence is most important consider Duckworth and Seligman’s research. Duckworth and Seligman and associates (insert link) conducted a review of GPA performance in adolescents. They expected to find that intelligence was the best indication of GPA. Instead, they found that self-discipline was a better indication of the best GPAs:
Lewis Terman, a mid-20th century Stanford Pyschology professor hypothesized that IQ was highly correlated with success in his famous termite study of 1500 students with an average IQ of 151. Follow on analysis and study indicated that while these students were successful, they were half as successful as a group of other students with a lower IQ.
Chris Langan, the world’s self-proclaimed “most intelligent man” with an IQ of 195 can’t seem to keep a job according to Malcolm Gladwell. He’s been a cowboy, a stripper, a day laborer and has competed on various game shows.
While we’d all like to have folks of average or better intelligence on our team, the above clearly indicates that it’s more important to focus on outcomes than an independent variable like intelligence.
Drive Does Not Equal Success
While most successful companies in the Silicon Valley started with employees that worked around the clock, The Valley is also littered with the corpses of companies that worked their employees to the bone. Just ask former employees of the failed social networking company Friendster. Hard work alone does not guarantee success. In fact, most “overnight” success appears to take about 10,000 hours of practice just to be good, and 10 years of serious work to be truly successful (according to both Ramit Sethi and Malcolm Gladwell.)
Hard work, if applied in the right direction and to the right activities should of course help achieve results and success. But again, it’s the outcome (results and success) that matter.
Wisdom Does Not Equal Success
Touting the age and experience of your management team? Think again. There’s plenty of evidence that when it comes to innovative new approaches, we start to “lose our touch” at the age of 40. The largest market cap technology companies of our age were founded by “youngsters” – Bezos being the oldest of them at the age of 30. Einstein posited all of his most significant theories well before the age of 40 – most of them in the “miracle year” at the age of 26 in 1905. The eldest of the two Wright Brothers was 39.
While there are no doubt examples of successful innovation coming after the age of 40, and while some of the best managers and leaders we know are over the age of 40, wisdom alone is not a guarantee for success.
Only Success = Success
Most of the truly successful and fastest growing companies we know focus on a handful of dependent variables that clearly identify results, progress and ultimately success. Their daily manners and daily discourse are carefully formulated around evaluation of these success criteria. Even these company’s interaction with outside firms focuses on data driven indications of success time and time again – not on independent variables such as intelligence, work ethic, wisdom (or managerial experience), etc.
These companies identify key performance indicators for everything that they do, believing that anything worth doing should have a measurable value creation performance indicator associated with it. They maniacally focus on the trends of these performance indicators, identifying significant deviations (positive or negative) in order to learn and repeat the positive causes and avoid those that result in negative trends. Very often these companies employ agile planning and focus processes similar to the OKR process.
The most successful companies rarely engage in discussions around spurious relationships such as intelligence to business success, management experience to business success, or effort to business success. They recognize that while some of these things are likely valuable in the right proportions, they are rarely (read never) the primary cause of success.
AKF Partners helps companies develop highly available, scalable, cost effective and fast time-to-market products. We know that to be successful in these endeavors, companies must have the right people, in the right roles, with the right behaviors - all supported by the right culture. Contact us to discuss how we can help with your product needs.
April 23, 2018 | Posted By: Geoffrey Weber
The Relative Risk Equation
Technologists are frequently asked: what are the chances that a given software release is going to work? Do we understand the risk that each component or new feature brings to the entire release?
In this case, measuring risk is assessing the probability that a component will perform poorly or even fail. The higher the probability of failure, the higher the risk. Probabilities are just numbers, so ideally, we should be able to calculate the risk (probability of failure) of the entire release by aggregating the individual risk of each component. In reality this calculation works quite well.
Putting a number to risk is a very useful tool and this article will provide a simple and easy way to calculate risk and produce a numeric result which can be used to compare risk across a spectrum of technology changes.
When we assess a system, one of the key characteristics we want to benchmark is the probability that a system will fail. In particular, if we want to understand whether or not a system can support an availability goal of 99.95% we have to do some analysis to see if the probability that a failure occurs is lower than 0.05%. How do we calculate this?
First let’s introduce some vocabulary.
DEFINITION: Pi is the probability that a given system will experience an incident, i.
For the purposes of this article we are measuring relative and not absolute values. A system where Pi=1 means the system is very unlikely to experience failure. On other hand, Pi values approaching 10 indicate a system with a 100% probability of failure.
DEFINITION: Ii is the impact (or blast radius) a system failure will have.
Ii=1 indicated no impact where Ii=10 indicates a complete failure of an entire system.
DEFINITION: Pd is the probability that an incident will be detected.
Pd=1 means an incident will be completely undetected and Pd=10 indicates that a failure will be completely detected 100% of the time.
Measuring across a scale from 1 to 10 is often too granular; we can reduce scale to tee-shirt sizes and replace 1, 2, …, 10 with Small (3), Medium (5), Large (7). Any series of values will work so long as we are consistent in our approach.
Relative Risk is now only a question of math:
Ri = (Pi x Ii ) / Pd
With values of 1, 2, ..., 10, the minimum Relative Risk value is 0.1 (effectively 0 relative risk) and the maximum value is 100. With tee-shirt sizes, the minimum Relative Risk value is 6/7 and the maximum value is 16.333. Basic statistics can help us to standardize values from 1 to 10:
std(Ri) = (Ri - Min(Ri)) / (Max(Ri) - Min(Ri)) x 10
where Max(Ri) = 16 1/3 and Min(Ri) = 1 2/7 (in the case of tee-shirt sizing)
Example 1: Adding a new data file to a relational database
- Pi = 3 (low.). It’s unlikely that adding a data file will cause a system failure, unless we’re already out of space.
- Ii = 5 (medium.). A failure to add a datafile indicates a larger storage issue may exist which would be very impactful for this database instance. However as there is a backup (for this example), the risk is lowered.
- Pd = 7 (high). It is virtually certain that any failure would be noticed immediately.</li>
- Therefore Ri = (3 X 5) / 7 = 2.1. Standardizing this to our 1 to 10 scale produces a value of 0.51. That is a very low number, so adding datafiles is relatively low risk procedure.
Example 2: Database backups have been stored on tapes that have been demagnetized during transportation to offsite storage.
- Pi = 5 (medium.). While restoring a backup is a relatively safe event, on a production system it is likely happening during a time of maximum stress.
- Ii = 7 (high.). When we attempt to restore the backup tape, it will fail.
- Pd= 3 (low.) The demagnetization of the tapes was a silent and undetected failure.
- Therefore Ri = (5 X 7) / 3 or 11.7. We arrive at a value of 7 on the standard scale, which is quite high, so we should consider randomly testing tapes from off-site storage.
This formula has utility across a vast spectrum of technology:
- Calculate a relative risk value for each feature in a software release, then take the total value of all features in order to compare risk of a release against other releases and consider more detailed testing for higher relative risk values.
- During security risk analysis, calculate a relative risk value for each threat vector and sort the resulting values. The result is a prioritized list of steps required to improve security based on the probabilistic likelihood that a threat vector will cause real damage.
- During feature planning and prioritization exercises, this formula can be altered to calculate feature risk. For example, Pi can mean confidence in estimate, Ii can be converted to impact of feature (e.g. higher revenue = higher impact) and Pd is perceived risk of the feature. Putting all features through this calculation then sorting from high values to low values yields a list of features ranked by value and risk.
The purpose of this formula and similar methods is not to produce a mathematically absolute estimate of risk. The real value here is to remove guessing and emotion from the process of evaluating risk and providing a framework to compare risk across a variety of changes.
Click here to see how AKF Partners can help you manage risk and other technology issues.
April 18, 2018 | Posted By: Pete Ferguson
During due diligence and in-depth engagements, we often hear feedback from client employees that policies either do not exist - or are not followed.
All too often we see policies that are poorly written, difficult for employees to understand or find, and lack clear alignment with the desired outcomes. Policies are only one part of a successful program - without sound practices, policies alone will not ensure successful outcomes.
Do You Have a Policy …?
Early in my career I was volunteered to be responsible for PCI compliance shortly after eBay purchased PayPal. I’d heard folklore of auditors at other companies coming in and turning things over with the resulting aftermath leading to people being publicly humiliated or losing their job. I suddenly felt on the firing line and asking “why me?”
I booked a quick flight to Phoenix to be in town before the auditor arrived and I prepared by walking through our data center and reviewing our written policies. When I met with the auditor, he looked to be in his early 20s and handed me a business card from a large accounting firm. I asked him about his background; he was fresh out of college and we were one of his first due diligence assignments. He pulled out his laptop and opened an Excel spreadsheet and began reading off the list:
- Do you have cameras? “Yes,” I replied and pointed to the ceiling in the lobby littered with little black domes.
- Do you record the cameras? “Yes,” and I took him into the control room and showed him that we had 90 days of recording.
- Do you have a security policy? “Yes,” and I showed him a Word Document starting with “1.1.1 Purpose of This Policy ....”
Several additional questions, and 10 minutes later, we were done. He and I had both flown some distance so I gave him a tour of the data center and filled him full of facts about square footage and miles of cable and pipes until his eyes glossed over and his feet were tired from walking and off he went.
I was relieved, but let down! I felt we had a really good program and wanted to see how we measured up under scrutiny. Subsequent years brought more sophisticated reviews - and reviewers - but the one question I was always waiting to be asked - but never was:
“Is your policy easily accessible, how do employees know about it, and how do you measure their comprehension and compliance?”
My first compliance exercise didn’t seem all that scary after all, it was only a due diligence “check the box” exercise and didn’t dive deeper into how effective our program was and where it needed to be reinforced.
While having a policy for compliance requirements is important, on its own, policy does not guarantee positive outcomes. Policy must be aligned with day-to-day operations and make sense to employees and customers.
The Traditional Boredom of Policy
Typically policy is written from the auditor’s point of view to ensure compliance to government and industry requirements for public health, anti-corruption, and customer data security standards.
Image Credit: Imgur.com
Unfortunately, this leads to a very poor user experience wading through the 1.1.1 … 1.1.2 … . Certainly a far deviation from how a good novel or any online news story reads.
I’ve heard companies - both large and small - give great assurances that they have policies and they have shown me the 12pt Times New Roman documents that start with “1.1.1 Purpose of This Policy …” as evidence.
I had to argue the point at a former position that the first way to lose interest with any audience is to start with 1.1.1 … and with Times New Roman font in a Microsoft Word document that was not easy to find. It was a difficult argument and I was instructed to stick with the approved, and traditional, industry-accepted method.
Fast forward a decade later and our HR Legal team was reviewing policy and invited me to a meeting with the internal communications team. Before we started talking documents, the Director of Communications asked me if I’d seen the latest safety video for Virgin Atlantic Airlines. I thought it a strange question, but after she told me how surprised and inspired by it she was, I took a look.
VA thankfully took a required dull and mundane US Federal Aviation Administration ritual and instead saw it as a differentiator of their brand from the pack of other airlines. Whoever thought a safety demonstration could also be a 4-minute video on why an airline is different and fun?!? Up until that point, no one! Certainly not on any flight I had previously flown.
Thankfully, since then, Delta and others have followed their example and made something I and millions of airline crews and passengers had previously dreaded - safety policy and procedure - into a more fun, engaging, and entertaining experience.
While policy needs to comply with regulations and other requirements, for policies to move from the page to practice they need to be presented in a way employees clearly understand what is expected - so in writing policy, put the desired outcome first! The regulatory document for auditors can be incorporated at the end of each policy or consider a separate document that calls out only the required sections of your employe handbook or where ever your company policies are presented and stored.
Clarifying the Purpose of Your Policy
In her article “Why Policies Don’t Work,” HR Lawyer Heather Bussing boils down the core issue: “There are two main reasons to have employment policies: to educate and to manage risk. The trouble is that policies don’t do either.”
She further expounds on the problem in her experience:
“ … policies get handed out at a time when no one pays attention to them (first week of employment if not the first day), they are written by people who don’t know how the company really works (usually outside legal counsel), and they have very little to do with what happens. So much for education.”
As for managing risk, Bussing points out that policies are often at odds with each other, or so broad that they can’t be effectively enforced.
“Unless it is required to be on a poster, or unless you can apply it in every instance without variance, you don’t want policies. Your at-will policy covers it. And if you don’t follow your policies to the letter, you will look like a liar in a courtroom.”
Don’t let your online policy repository feel like a suppository - focus on what you want to accomplish!
Small and fast-growing companies typically have little need for formalized policies because people trust each other and can work things out. But as they grow it has been my experience that often the trust and holding people accountable - which sets fast growing companies apart as a cool place to work - get replaced with bureaucratic rituals cemented in place as more and more executives migrate from larger, bureaucratic behemoths. If the way policy is presented is the litmus test for the true company culture, a lot of companies are in trouble!
Policy must be closely aligned to the shared outcomes of the company and interwoven into company culture. Otherwise they are a bureaucratic distraction and will only be adopted or sustained with a lot of uphill effort. In short, if people do not understand how a policy helps them do their job more easily, they are going to fight it.
Adapting Policy To Your Audience
In the early days of eBay, the culture was very much about collectables, and walking through the workspace many employees displayed their collections of trading cards, Legos, and comic books. When it came time to publish our security policies, we hired Foxnoggin - a professional marketing strategy company - and took the time to get to understand our culture and then organized a comprehensive campaign to include contests, print and online material, and other collateral.
They helped formulate an awareness campaign to educate employees and measure the effectiveness of policy through surveys and monitoring employee behavior.
To break away from the usual email method of communication, we got and held employee attention with a series of comic books which included superheroes and supervillains in a variety of scenarios highlighting our policies.
An unintended consequence from our collector employees was that they didn’t want to open their comic books and instead kept them sealed in plastic. To combat this, we provided extra copies (not sealed in plastic) in break rooms and other common areas and future editions were provided without the bags. The messages were reinforced with large movie-style posters displayed throughout the work area.
This approach was wildly popular among employees located at the customer support and developer sites and surveys showed that security was becoming a top of mind topic for employees. Unfortunately, this approach was not as popular with Europeans - who felt we were talking down to them - and by the executives coming from more stodgy and formal companies like Bain & Company or GE and particularly unpopular with execs from the financial industry after the purchase of PayPal.
Intertwining policy into the culture of your organization makes compliance natural and part of daily operations.
Make Sure Your Message Matches Your Audience
President and CEO of Lead From Within Lolly Daskal writes on Inc.com:
“... sometimes the dumbest rules can drive away the best employees … too many workplaces create rule-driven cultures that may keep management feeling like things are under control, but they squelch creativity and reinforce the ordinary.”
Be creative and look at the company culture and how to interweave policies. Policies need to be part of the story you tell your employees to reinforce why they should want to work for you.
Nathan Christensen writes in his Fast Company article: How to Create An Employee Handbook People Will Actually Want to Read, “let’s face it, most handbooks aren’t exactly page-turners. They’re documents designed to play defense or, worse yet, a catalog of past workplace problems.”
Christensen recommends “presenting” policies in a readable and attractive manner. It must be an opportunity to excite people in meeting a greater group purpose and cause.
Your policies need to match your company culture, be in language they use and and understand, and the ask for compliance needs to be easily enough for a new employee to be able to explain to anyone.
Writing Content Your Audience Will Actually Read and Understand
According to the Center for Plain Language - which has the goal to help organizations “write so clearly that their intended audience understands what they are saying the first time they read or hear it” - there are five steps to plain language:
- Identify and describe the target audience: “The audience definition works when you know who you are and are not designing for, what they want to do, and what they know and need to learn.”
- Structure the content to guide the reader through it: “The structure works when readers can quickly and confidently find the information they are looking for.”
- Write the content in plain language: “Use a conversational, rather than legal our bureaucratic tone … pick strong verbs in the active voice and use words the audience knows.”
- Use information design to help readers see and understand: Font choice, line spacing, and use of graphics help break up long sections of text and increase the readability score.
- Work with target user groups to test design and content: Ask readers to describe the content and have them show you where they would find relevant content.
As an illustration, here is a before and after comparison of the AARP Financial policy on giving and receiving gifts:
In reading the “before” example, my eyes immediately glazed over and my mind began to wander until the mention of “courtesies of a de minimus ... “ Did the guy who wrote that go home that night to his family and instruct his kids, “you will need to consume a courtesise of a de minimus amount of broccoli if you want videogame time after dinner”? I sure hope not!
On the “after” example, notice the change in line spacing, switching of font and use of bullet points. Overall the presentation is a lot more conversational and less formal. It also has a call to action in the title starting with two verbs “give and accept …”
I’d add as the 6th step to remember K.I.S.S. - Keep It Simple Stupid! You get a few seconds to grab your audience’s attention and only a few more minutes to keep it.
As a content editor, I was feeling proud of myself when I distilled 146 pages of confusing policies, procedures and “how to” down to 14 pages over the course of several weeks. But when I mentioned this to my wife, she said “you are going to make them read 14 pages?!?”
So I looked at it a few days later with fresh eyes and realized I could condense it down again to two pages by making it more of a table of contents with a brief description of each bullet point and then include links after each section if employees wanted to learn more, and I was able to retain a font size of 14 and plenty of white space.
In reading the two pages, people would understand what was expected of them and could easily learn more - but only if they were interested.
Write policy in language a new employee will quickly understand and be thoughtful in how much you present to employees on their first day, week, and month.
Document Readability is How You Show Your People Love - And Soon To Be the Law In the EU
Speaking more in terms of content marketing, VisibleThread author “Fergle” quotes Neil Patel, columnist for Forbes, Inc, as stating “content that people love and content that people can read is almost the same thing.” Yet, as Fergle points out, “a lot of content being created is not the stuff people love. Or read.”
“Content that people love and content that people can read is almost the same thing.”
Writing content with the aim of it being easy to read as something people love may seem a bit altruistic. But for information regarding data privacy, it is also soon to be the law - at least in the EU and for any international policy which would reach an EU resident. On May 28th of 2018 the General Data Protection Regulation (GDPR) goes into effect. From the GDPR :
“The principle of transparency requires that any information and communication relating to the processing of those personal data be easily accessible and easy to understand, and that clear and plain language be used.”
There are a number of ways to measure readability ease and grade level of your content, and a good communications expert will be able to help you identify the proper tools.
Scores are a good benchmark, but don’t forget the most important resource for feedback - your potential audience!
Buy them lunch, have them come and review your plan and provide their feedback. Bring them back in later when you have content to review and provide an environment where they can be brutally honest - again a communications expert outside of your department will help provide a bit of a buffer and allow your audience to be open, honest, and direct.
But don’t just write policy to comply with due diligence or for policy’s sake - be sure it is part of the company culture, easy to search, and placed where and when your employees or customers will need it. When there are shared outcomes between compliance and how employees operate, policy is integrated and effective.
Timing is Important
Think of ways to break down your policy content not just by audience, but by timing and when the information will actually be relevant.
In retail, the term “point of sale” refers to the checkout process - when taxes, final cost and payment are all settled. The placement of “last minute items” at the POS is very carefully, and competitively assigned only to items with a high ROI measured by the amount of inches each item takes up on the limited shelf space. This careful placement has also been adopted to the online marketplace when you add an item to your shopping cart and a prompt arises to add additional items others have also purchased with your item.
This same methodology in thinking should be applied to where - and when - you introduce your policies to your audience.
We made the mistake for years of pushing our travel safety program and policies for everyone during new hire orientation when only about half of the population traveled and most of them wouldn’t be traveling for several weeks or months. It made a lot more sense to move the travel policies to the travel booking page.
If you only give out corporate credit cards to Directors and above, there is no sense pushing policies on spend limits to the global population. It makes a lot more sense to push the policy when someone is applying for the card and as a reminder each time their credit card expires and they are being issued a new one.
Your audience will appreciate only being told what they need to know when they need the information and will be more likely to not only retain the information, but to comply!
For similar content on our Growth Blog, click here
Know How You Will Measure Successful Outcomes
Perhaps the most important question to ask when designing policy is “how we will know we are successful?”
Having good policy written in a clear and concise manner and stored in an easy to find location is still a very passive approach. Good policy should evolve as your company evolves and should be flexible and realistic to business, customer, and employee needs. It must be modeled by company leadership and hold true to the daily actions of your company.
Tests at the end of annual compliance training are only a “check the box” measure of compliance. Think back to how much you actually learned - or, better yet, retained - the last time you were subjected to hours of compliance training!
If metrics cannot support that your policies are known and followed, then you need to re-evaluate the purpose of your policies and if they are contributing to the benefit of your employees and customers or just ticking compliance boxes.
While compliance is important, compliance alone does not make for better business practices or a competitive edge. Effective, measurable compliance protects your employees and provides value to your customers.
Subject-matter experts are often too close to the policies to be objective. A little tough love is needed and it is best to bring in experts in marketing and communications who will not be biased to the content, but biased to the reader who is the intended audience.
A good communications plan will cover the following:
- Be clear on the desired behavior the policy is to encourage and enforce - and that behavior is streamlined with the overall company purpose
- Identify the target audiences and each of their self-interests
- Outline which channels each audience is receptive to (email/print/video, etc.)
- Identify the inside jargon and language styles needed
- Decide when and where each audience will want to find relevant information
- Plan how often policies will be reviewed - and include as many stakeholders as possible in the review process
- Decide how implementation of policies and compliance to the policies will be measured
Only AFTER the communications plan is agreed upon - with plenty of input from representatives of the target audiences - should the content review begin. Otherwise the temptation from subject-matter experts will be to tell people everything they know.
Pulling it All Together
Poorly written policies that are difficult for employees to search or find do little to meet the mission of policy: to provide a consistent approach to how your company does business and satisfies regulatory compliance. Policies on their own do not make for good operations or guarantee overall success. Remember the true test of policies is not whether they exist, but if they are tightly aligned and incorporated into daily operations, how they contribute to the success of your employees and customers, and if their effectiveness can be measured in a tangible way.
Experiencing growing pains? AKF is here to help! We are an industry expert in technology scalability and due diligence. Put our 200+ years of combined experience to work for you today!
Get this article and others like it by signing up for our newsletter.
April 11, 2018 | Posted By: Geoffrey Weber
The Three Sentence Rule
Variations of the Three Sentence Rule have been around for a long time. The differences are multiplicative but the base rule is a useful and often necessary tool to teach concision to the wordy.
Say what you need to say in THREE sentences, or less.
Anyone who has been through flight school learns how to be concise on the radio. Who are you? Where are you? What do you want? That happens to be three sentences. “Palo Alto Tower, Cessna 15957X; 5 miles southwest of SLAC with information Echo; request landing.” The need to be precise, accurate and speedy is a requirement at a tower as busy as Palo Alto tower. Controllers have no patience because there are 12 other aircraft waiting to communicate.
As technologists, we are generally rewarded for producing details, the more the better. Engineers have to be obsessed with substance; their work is about precision and there are no shortcuts when it comes to building complex tools. It wouldn’t make any sense to try and distill how Cassandra compares to a relational database, in three sentences, in a room filled with colleagues, at a meet-up.
But what if our CEO asks us about Cassandra? How can we possibly explain to someone who is just a wee bit tech-illiterate the differences between two very different data stores? Moreover, why on earth would we try and distill that down to three sentences?? Let’s start at the beginning… before there were databases there were Hollerith Cards…
Lack of brevity is a death sentence to any technologist who finds themselves interacting with non-technologists on a regular basis. We see this as a common anti-pattern for CTOs; some never learn the difference between a novel, a paragraph, or sentence and why each has utility.
The controller in the tower could care less about why we’re in an airplane today, that we’re stopping at the restaurant for the traditional $100 hamburger and that we need to be home for dinner tonight:
- Who are you?
- Where are you?
- What do you want?
- Cassandra is a new database technology.
- It’s very different than what we use today.
- It will lower costs in the next 12 months.
That is the CEO-version of Cassandra in three sentences. “What is it called, why should I remember it, what does it do for me?”
At AKF Partners, we believe that technology executives need to start practicing a version of the three sentence rule as soon as they transition into their first leadership role. Specialists in Operations roles have an advantage because of the daily chaos and need for ongoing communications: “Customer sign-in unavailable for 15 minutes; 100% of our customers are impacted for 15 minutes; we are restarting the service for 100% operations in 10 minutes—update in 20 minutes.”
- What happened?
- What is the impact?
- When is it going to be fixed?
There’s a practical reason for such precision: most CEOs are consuming information on tiny screens, sometimes over really bad internet (Detroit Airport) at 2 in the morning, and news also just arrived about a sales crisis in Europe, there’s a supply-chain issue in India, and the Wall Street Journal is doing a feature about the product that’s going live next week. If we mentioned Hollerith, or even thought about it for a second, we’re Exploring Alternative Employment. If they have a moment to breath, they can ask for more detail. Or maybe next week.
Sometimes we’re required to communicate when there’s no answer. Try this:
- Impact update,
- Standard procedures failed, assembled SWAT,
- Updates every 15 minutes until resolution.
An equally important rule for executives is the “No Surprise Rule” (stay tuned) and zero sentences are as fatal as 4 sentences at the wrong time. Keeping a CEO waiting for 2 hours until root cause is determined is stupid.
The final place to consider the Three Sentence Rule is the boardroom itself. Most boards members are not going to read through the 200 page board deck, and our ten minutes to discuss the Cassandra project is unlikely to resonate with most of the attendees. Up and coming executives understand the need for absolute precision. Steve Jobs could do it with a single slide:
Three sentences, if you count the background gradient as a sentence.
For the Board:
- We’re introducing new technology next fiscal year.
- It’s called Cassandra.
- A year from now I will demonstrate how it increased EBITDBA by $2M.
- Anything can be explained in 3 sentences
- Even concepts so fantastic they seem magical
- If you don’t believe me, Books in 3 Sentences
At AKF Partners, we can help with mentoring, coaching and leadership training.
April 10, 2018 | Posted By: Marty Abbott
The decomposition of monoliths into services, or alternatively the development of new products in a services-oriented fashion (oftentimes called microservices), is one of the greatest architectural movements of the last decade. The benefits of a services (alternatively microservices or micro-services) approach are clear:
- Independent deployment, decreasing time to market and decreasing time to value realization– especially when continuous delivery is employed.
- Team velocity and ownership (informed by Conway’s Law).
- Increased fault isolation – but only when properly deployed (see below).
- Individual scalability – and the decreasing cost of operations that entails when properly architected.
- Freedom of implementation and technology choices – choosing the best solution for each service rather than subjecting services to the lowest common denominator implementation.
Unfortunately, without proper architectural oversight and planning, improperly architected services can also result in:
- Lower overall availability, especially when those services are deployed in one of a handful of microservice anti-patterns like the mesh, services in depth (aka the Christmas Tree Light String) and the Fuse.
- Higher (longer) response times to end customers.
- Complicated fault isolation and troubleshooting that increases average recovery time for failures.
- Service bloat: Too many services to comprehend (see our service sizing post)
The following are patterns companies should avoid (anti-patterns) when developing services or microservices architectures:
Mesh architectures, where individual services both “fan out” and “share” subsequent services result in the lowest possible availability.
Services that are strung together in long (deep) call trees suffer from low availability and slow page response times as calculated from the product of each service offering availability.
The Fuse is a much smaller anti-pattern than “The Mesh”. In “The Fuse”, 2 distinct services (A and B) rely on service C. Should service C become slow or unavailable, both service A and B suffer.
Architecture Principle: Services – Broad, But Never Deep
These services anti-patterns protect against a lack of fault isolation, where slowness and failures propagate along a synchronous path. One service fails, and the others relying upon that service also suffer.
They also serve to guard against longer latency in call streams. While network calls tend to be minimal relative to total customer response times, many solutions (e.g. payment solutions) need to respond as quickly as possible and service calls slow that down.
Finally, these patterns help protect against difficult to diagnose failures. The Xmas Tree pattern name is chosen because of the difficulty in finding the “failed bulb” in old tree lights wired in series. Similarly, imagine attempting to find the fault in “The Mesh”. The time necessary to find faults negatively effects service restoration time and therefore availability.
As such, we suggest a principal that services should never be deep but instead should be deployed in breadth along product offering boundaries defined by nouns (resources like “customer” or “sales”) or verbs (services like “search” or “add to cart”). We often call this approach “slices instead of layers”.
How then do we accomplish the separation of software for team ownership, and time to market where a single service would otherwise be too large or unwieldy?
Old School – Libraries!
When you need service-like segmentation in a deep call tree but can’t suffer the availability impact and latency associated with multiple calls, look to libraries. Libraries will both eliminate the network associated latency of a service call. In the case of both The Fuse and The Mesh libraries eliminate the shared availability constraints. Unfortunately, we still have the multiplicative effect of failure of the Xmas Tree, but overall it is a faster pattern.
“But My Teams Can’t Release Separately!”
Sure they can – they just have to change how they think about releasing. If you need immediate effect from what you release and don’t want to release the calling services with libraries compiled or linked, consider performing releases with shared objects or dynamically loadable libraries. While these require restarts of the calling service, simple automation will help you keep from having an outage for the purpose of deploying software.
AKF Partners helps companies architecture highly available, highly scalable microservice architecture products. We apply our aggregate experience, proprietary models, patterns, and anti-patterns to help ensure your products can meet your company’s scale and availability goals. Contact us today - we can help!
April 10, 2018 | Posted By: Greg Fennewald
It seems as though a week cannot go by without news reports of yet another data breach at a large, recognizable company. One wonders what has been compromised but not yet detected or announced.
Security issues are perceived far differently than other technology issues. Consider an example of “Dilly Dilly Fidget Spinners has hard coded IP addresses in their code base” – most people would infer little if anything from that fact, while a minority would shake their heads and feel nauseous. On the other hand, “Dilly Dilly Fidget Spinners suffered a data breach affecting thousands of customers” is likely to have a negative perception from everyone who hears about it. The public sensitivity to all things security warrants a thorough investigation of security practices and incidents prior to any investment.
What should a potential investor look for in regard to information security during a due diligence effort? The answer to that question will vary widely based on the market segment of the potential investment, but there are some common considerations for information security
Common Security Considerations
1. Fit the Risk
Security posture should fit the risk a company faces. A company providing financial services or healthcare has a far higher risk to manage than a company involved in consumer product pricing and availability. The security policies, regulatory compliance and certifications, and operational practices should fit the risk. Going beyond the appropriate degree of security adds cost and may not make business sense but is far superior to inadequate security.
A security program that fits the risk profile for the company can be a business enabler. Security programs consume time and cost money – establishing the right fit and balance can conserve resources. Alternatively, a poor fit can add drag to a company and damage the business. Consider industries that have a strong reputation for security and face significant regulatory requirements, industries such as financial services and insurance. An experienced security professional with a banking background moves to a telematics company and is determined to bring bank level security to his new role. The telematics company deals with route optimization and fleet maintenance management. It does not process credit card payments or store PII. Bank level security would be a horrible fit that adds cost without benefit and ultimately damages the culture.
2. Security Minded Culture
Security awareness and accountability should be part of the culture. Well written policies do not accomplish much if they are not internalized and emphasized by leaders. Technology leaders must treat security in the same manner as they treat availability, quality of service, and engineering productivity - by establishing transparent goals and objective metrics by which those goals are measured.
An excellent resource for security awareness training is the OWASP Top 10 Application Security Risks list. The top 10 list is revised periodically as security threat vectors morph. The top three risks from the 2017 list are injections, broken authentication, and sensitive data exposure. More information can be found here.
3. Validation via Recurring Testing
Recurring testing is a hallmark of successful security programs. Areas to test include employee security policy training, 3d party network penetration tests, static code vulnerability testing, and drills to rehearse information security policies such as a security incident response plan. Testing validates the policies and practices are effective and part of the company’s culture.
QA automation is needed for agile product development that seeks rapid iteration and market discovery. 75% code coverage or greater is recommended. Incorporating automated security testing into the overall testing program is a smart move.
4. Cover the Basics
Basic security hygiene items that should be considered table stakes today include role-based access with audit trails, closing server ports by default and opening them by exception, segregating networks, logging production access, and encrypting data at rest. None of these actions are particularly difficult or expensive. Implementing them demonstrates security awareness and commitment. Controlling who can access data in a taciturn server farm, logging that access, and encrypting the data is a pretty good start to effective security.
How AKF Can Help
AKF Partners has performed hundreds of due diligence efforts over the last 10 years and is comprised of technology professionals that have walked the walk at widely recognized companies such as eBay, PayPal, and General Electric. Our security expertise comes from living the reality of technology, not an auditing course.
‹ First < 3 4 5 6 7 > Last ›