September 26, 2019 | Posted By: Marty Abbott
The Problem – Too Much Planning, Too Little Execution
How many of you spend a significant portion of your year planning for the next year, two years, or five years of activities? How often are these plans useful beyond the first three to four months of execution?
We have many large clients who will begin one, two ,or (gasp!) five-year year plans in July or August of the current year. They spend a significant amount of effort creating these plans over a five-to-six-month span of time. The plans are often very specific as to what they will do; what projects they will deliver, what products they will create, how many people they will hire, what training their teams will undertake, etc. The plan is typically well followed in month 1 and 2 and starts to degrade significantly in month 3. By month 6, just before they start the next annual planning cycle, the original plan is at best 50% accurate; the original projects have been replaced, new market intelligence has informed different product solutions, new skills and different teammates are needed, etc.
Hurricanes always have an associated cone of uncertainty. The current position of the hurricane and current direction and velocity are well known. But several factors may cause the hurricane to act differently an hour or a day from now than it is behaving at exactly this moment. The same is true with businesses. We know what we need to do today to maintain our position or gain market share, but those activities may change in priority and number in the next handful of months.
So why do we spend so much time on solutions and approaches when at best 25% of the plan we produce is accurate? We don’t have to waste time as we do today, there is a better way.
Financial vs Operational Plans
First, let’s acknowledge that there is a difference between a financial plan (how much we will spend as a company and what we expect to make as a return on that spend) and an operational plan. The board of directors for your company has a fiduciary responsibility to exercise, in non-expert legal terms:
- A duty of loyalty – the director must put the interests of the institution and its shareholders before his or her own.
- A duty of care – the director must behave prudently, diligently and with skill.
- A duty of obedience – the director must ensure consistency with the purpose of the company – and in a for-profit company, this means ensuring profitability.
Any board of directors, to ensure they are consistent with the law, will require a financial plan. At the very least, they need to govern the spend of the company relative to its revenues to ensure profitability, and ideally over time, an increase in profitability. But that does not mean they need to go into great detail regarding the exact path and actions to achieve the financial plan. We all likely agree that we also have a duty of loyalty, care, and obedience to ourselves and our families – but how many of us go beyond creating an annual budget (financial plan) for any given year?
One of the best known and most successful directors and investors of all time, Warren Buffett, has what amounts to be (in another authors terms) a list of “10 Commandments” for boards and directors. A quick scan of these makes it clear that Buffet’s perspective for board’s focus should be on the performance of the CEO and the company itself – not the detailed operational plan to achieve a financial plan. The board does arguably need to ensure that a strategy exists and is viable – but a strategy need not be a list of tasks for every subordinate organization for an entire year. In fact, given the arguments above, such a task list (or deep operational plan) won’t be followed past a handful of months anyway.
The Fix: Reduce Planning, Increase Execution
If the problem is too much time wasted creating plans that are good for only a short period of time, the fix should be obvious. For this, we offer the AKF 5-95 Rule: spend 5 percent of your time planning and 95% of your time executing. This stands in stark contrast to the “Soviet-esque” way in which many companies operate with executives spending as much as 25 percent of a year involved in financial and execution plans.
- Decrease the horizon (focused endpoint) of planning and decrease the specificity of plans. Take a portion of the five percent of your total time and create a good financial plan of what you would like to achieve. The remainder should be used to iteratively identify the short-term paths to achieve that plan using windows no greater than 3 months. Anything beyond 3 months has a high degree of waste.
- Adopt development methodologies that maximize execution value. Adopt Agile development methodologies meant to embrace low levels of specificity and rely on discovery to identify the “right solution” to maximize market adoption.
The best way to maximize the AKF 5-95 rule is to implement OKRs – (O)bjectives and (K)ey (R)esults as a business, the bowling alley methodology of product focus and Agile product development practices .
Subscribe to the AKF Newsletter
September 16, 2019 | Posted By: Marty Abbott
Two of the most common statements we hear from our clients are:
Business: “Our product and engineering teams lack the agility to quickly pivot to the needs of the business”.
Product and Engineering: “Our business lacks the focus and discipline to complete any initiative. We are subject to the ‘Bright Shiny Object (BSO’ or ‘Squirrel!’ phenomenon”.
These two teams seem to be at an impasse in perspective requiring a change by one team or the other for the company to be successful.
Companies need both focus and agility to be successful. While these two concepts may appear to be in conflict, a team need only three things to break the apparent deadlock:
- Shared Context.
- Shared agreement as to the meaning of some key terms.
- Three process approaches across product, the business, and engineering.
First, let’s discuss a common context within which successful businesses in competitive environments operate. Second, we’ll define a common set of terms that should be agreed upon by both the business and engineering. Finally, we’ll dig into the approaches necessary to be successful.
Successful businesses operating within interesting industries attract competition. Competitors seek innovative approaches to disrupt each other and gain market share within the industry. Time to market (TTM) in such an environment is critical, as the company that finds an approach (feature, product, etc.) to shift or gain market share has a compelling advantage for some period. As such, any business in a growth industry must be able to move and pivot quickly (be agile) within its product development initiatives. Put another way, businesses that can afford to stick to a dedicated plan likely are not in a competitive or growing segment, probably don’t have competition, and aren’t likely attractive to investors or employees.
The focus that matters within business is a focus on outcomes. Why focus on outcomes instead of the path to achieve them? Focusing on a path implies a static path, and when is the last time you saw a static path be successful? (Hint: most of us have never seen a static path be successful). Obviously, sometimes outcomes need to change, and we need a process by which we change desired outcomes. But outcomes should change much less frequently than path.
Agility enables changing directions (paths) to achieve focused outcomes. ‘Nuff said.
Commonly known as (O)bjectives and (K)ey (Results), or in AKF parlance Outcomes and Key Results, OKRs are the primary mechanism of focus while allowing for some level of agility in changing outcomes for business needs. Consider the O (objectives or outcomes) as the thing upon which a company is focused, and the Key Results as the activities to achieve those outcomes. KRs should change more frequently than the Os as companies attempt to define better activities to achieve the desired outcomes. An objective/outcome could be “Improve Add-To-Cart/Search ratio by 10%”.
Each objective/outcome should have 3 to 5 supporting activities. For the add-to-cart example above, the activities may implement personalization to drive 3% improvement, add re-targeting for a net 4% improvement, and improve descriptive meta-tags in search for a 3% improvement.
OKRs help enforce transparency across the organization and help create the causal roadmap to success. Subordinate organizations understand how their initiatives nest into the high-level company objectives by following the OKR “tree” from leave to root. By adhering to a strict and small number of high-level objectives, the company creates focus. When tradeoffs must happen, activities not aligned with high level objectives get deprioritized or deferred.
Geoffrey Moore outlines an approach for product organizations to stay focused in their product development efforts. When combined with the notion of a Minimum Viable Product the approach is to stay focused on a single product, initially small, focused on the needs of the pioneers within the technology adoption lifecycle (TALC) for a single target market or industry.
The single product for a single industry (P1T1) or need is the headpin of the bowling alley. The company maintains focus on this until such time as they gain significant adoption within the TALC – ideally a beachhead in the early majority of the TALC.
Only after significant adoption through the TALC (above) does the company then introduce the existing product to target market 2 (P1T2) and begin work on product 2 (or significant extension of product 1) in target market 1 (P2T1).
While OKRs and the Bowling Alley help create focus, Agile product methodologies help product and engineering teams maintain flexibility and agility in development. Epics and stories map to key results within the OKR framework. Short duration development cycles help limit the loss in effort associated with changing key results and help to provide feedback as to whether the current path is likely to meet the objectives and key results within OKRs. Backlogs visible to any Agile team are deep enough to allow for grooming and sizing, but shallow enough such that churn and the resulting morale impact do not jeopardize the velocity of development teams.
Putting it all together:
There is no discrepancy between agility and focus if you:
- Agree to shared definitions of both agility and focus per above
- Jointly agree that both agility and focus are necessary
- Implement OKRs to aid with both agility and focus
- Employ an Agile methodology for product and product development
- Use the TALC in your product management efforts and to help enforce focus on winning markets
Subscribe to the AKF Newsletter
September 6, 2019 | Posted By: Pete Ferguson
In many of our technical due diligence engagements, it is common to find that companies are building tools with considerable development effort (and ongoing maintenance) for something that is not part of their core strength and thus providing a competitive advantage. What criteria does your organization us in deciding when to build vs. buy?
If you perform a simple web search for “build vs. buy” you will find hundreds of articles, process flows, and decision trees on when to build and when to buy. Many of these are cost-centric decisions including discounted cash flows for maintenance of internal development and others are focused on strategy. Some of the articles blend the two.
We have many examples from our customers developing load balancing software, building their own databases, etc. In nearly every case, a significant percentage of the engineering team (and engineering cost) go into a solution that:
- Does not offer long term competitive differentiation
- Costs more than purchasing an existing product
- Steals focus away from the engineering team
- Is not aligned with the skills or business level outcomes of the team
If You Can’t Beat Them - Join Them
(or buy, rent, or license from them)
Here is a simple set of questions that we often ask our customers to help them with the build v. buy decision:
1. DOES THIS “THING” (PRODUCT / ARCHITECTURAL COMPONENT / FUNCTION) CREATE STRATEGIC DIFFERENTIATION IN OUR BUSINESS
Shiny object distraction is a very real thing we observe regularly. Companies start - innocently enough - building a custom tool in a pinch to get them by, but never go back and reassess the decision. Over time the solution snowballs and consumes more and more resources that should be focused on innovating strategic differentiation.
- We have yet to hear a tech exec say “we just have too many developers, we aren’t sure what to do with them.”
- More often than not “resource constraints” is mentioned within the first few hours of our engagements.
- If building instead of buying is going to distract from focusing efforts on the next “big thing” – then 99% of the time you should just stop here and attempt to find a packaged product, open-source solution, or outsourcing vendor to build what you need.
If after reviewing these points, if the answer is “Yes, it will provide a strategic differentiation,” then proceed to question 2.
2. ARE WE THE BEST COMPANY TO BUILD THIS “THING”?
This question helps inform whether you can effectively build it and achieve the value you need. This is a “core v. context” question; it asks both whether your business model supports building the item in question and also if you have the appropriate skills to build it better than anyone else.
For instance, if you are a social networking site, you probably don’t have any business building relational databases for your own use. Go to question number (3) if you can answer “Yes” to this question and stop here and find an outside solution if the answer is “No”.
And please, don’t fool yourself – if you answer “Yes” because you believe you have the smartest people in the world (and you may), do you really need to dilute their efforts by focusing on more than just the things that will guarantee your success?
3. ARE THERE FEW OR NO COMPETING PRODUCTS TO THIS “THING” THAT YOU WANT TO CREATE?
We know the question is awkwardly worded – but the intent is to be able to exit these four questions by answering “yes” everywhere in order to get to a “build” decision.
- If there are many providers of the “thing” to be created, it is a potential indication that the space might become a commodity.
- Commodity products differ little in feature sets over time and ultimately compete on price which in turn also lowers over time.
- A “build” decision today will look bad tomorrow as features converge and pricing declines.
If you answer “Yes” (i.e. “Yes, there are few or no competing products”), proceed to question (4).
4. CAN WE BUILD THIS “THING” COST EFFECTIVELY?
- Is it cheaper to build than buy when considering the total lifecycle (implementation through end-of-life) of the “thing” in question? Many companies use cost as a justification, but all too often they miss the key points of how much it costs to maintain a proprietary “thing”, “widget”, “function”, etc
- If your business REALLY grows and is extremely successful, do you want to be continuing to support internally-developed monitoring and logging solutions, mobile architecture, payments, etc. through the life of your product?
Don’t fool yourself into answering this affirmatively just because you want to work on something “neat.” Your job is to create shareholder value – not work on “neat things” – unless your “neat thing” creates shareholder value.
There are many more complex questions that can be asked and may justify the building rather than purchasing of your “thing,” but we feel these four questions are sufficient for most cases.
A “build” decision is indicated when the answers to all 4 questions are “Yes.”
We suggest seriously considering buying or outsourcing (with appropriate contractual protection when intellectual property is a concern) anytime you answer “No” to any question above.
While startups and small companies roll their own tools early on to get product out the door, as they grow, the timeline of planning (and related costs) needs to increase from the next sprint to a longer-term annual and multi-year strategy. That, plus growth, tips the scale to buy instead of build. The more internal products produced and supported, the more tech debt is required and distracts medium-to-large organizations from competing against the next startup.
While building custom tools and products seems to make sense in the immediate term, looking at the long-term strategy and desired outcome of your organization needs to be fully-weighted in the decision process. Distraction from focus is the number one harm we have seen many times with our clients as they fall behind the competition and burn sprint cycles on maintaining products that don’t move the needle with their customers. The crippling cost of distractions is what causes successful companies from losing their competitive advantage as well as slipping into oblivion.
Like the ugly couch your auntie gave you for your first apartment, it can often be difficult to assess what makes sense without an outside opinion. Contact us, we can help!
Subscribe to the AKF Newsletter
September 2, 2019 | Posted By: Greg Fennewald
As a company matures from a startup to a growing business, there are a number of measurables that become table stakes – basic tools for managing a business. These measurables include financial reporting statements, departmental budgets, KPIs, and OKRs. Another key measurable is the availability of your product or service and this measurable should be owned by the technology team.
When we ask clients about availability goals or SLAs, some do not have it documented and say something along the lines of “we want our service to always be available”. While a nice sentiment, unblemished availability is virtually impossible to achieve and prohibitively expensive to pursue. Availability goals must be relevant to the shared business outcomes of the company.
If you are not measuring availability, start. If nothing else, the data will inform what your architecture and process can do today, providing a starting point if the business chooses to pursue availability improvements.
Some clients who do have an availability measurable use a percentage of clock time – 99.95% for example. This is certainly better than no measurable at all, but still leaves a lot to be desired.
Reasons why clock time is not the best measure for availability:
- Units of time are not equal in terms of business impact – a disruption during the busiest part of the day would be worse than an issue during a slow period. This is intrinsically known as many companies schedule maintenance windows for late at night or early in the morning, periods where the impact of disruption is smaller.
- The business communicates in business terms (revenue, cost, margin, return on investment) and these terms are measured in dollars, not clock time.
- Using the uptime figure from a server or other infrastructure component as an availability measure is inaccurate because it does not capture software bugs or other issues rendering your service inoperative despite the server uptime status.
Now that we’ve established that availability should be measured and that clock time is not the best unit of measure, what is a better choice? Transactional metrics aligned to the desired business outcome are the better choice.
- Rates – log transactional rates such as logins, add to cart, registration, downloads, orders, etc. Apply Statistical Process Control or other analysis methods to establish thresholds indicating an unusual deviation in the transaction rate.
- Ratios – the proportion of undesired or failed outcomes such as failed logins, abandoned shopping carts, and HTTP 400s can be useful for measuring the quality of service. Analysis of such ratios will establish unusual deviation levels.
- Patterns – transaction patterns can identify expected activity, such as order rates increasing when an item is first available for sale or download rates increasing in response to a viral social media video. The absence of an expected pattern change can signal an availability issue with your product or service.
Alignment with Desired Outcomes
What are the goals of your business? What is your value proposition? Choose metrics that comprehensively measure the availability of your product or service. The ability of a customer to buy a product from your website (login, search, add to cart, and check out). The proportion of file downloads successfully completed in less than 4 seconds. The success rate of posting a message to a social media platform and the ability of others to view it. Measuring availability with metrics aligned with the desired outcomes keeps the big picture at the forefront and helps business colleagues understand how the technology team contributes to value creation.
Not measuring availability is bad. Measuring it in clock time is better, but still leaves something to be desired. Measuring availability with transactional metrics tied to the desired business outcome is best. Don’t settle for better when you can be best.
Interested in learning more? Struggling with analyzing data? Unsure of how to apply architectural principles to achieve higher availability? Contact us, we’ve been in your shoes.
(Image Credit: Sarah Pflug from Burst)
Subscribe to the AKF Newsletter
August 21, 2019 | Posted By: Bill Armelin
At AKF Partners, we believe in learning aggressively, not just from your successes, but also your failures. One common failure we see are service disrupting incidents. These are the events that either make your systems unavailable or significantly degrade performance for your customers. They result in lost revenue, poor customer satisfaction and hours of lost sleep. While there are many things we can do to reduce the probability of an incident occurring or the impact if it does happen, we know that all systems fail.
We like to say, “An incident is a terrible thing to waste.” The damage is already done. Now, we need to learn as much about the causes of the incident to prevent the same failures from happening again. A common process for determining the causes of failure and preventing them from reoccurring is the postmortem. In the Army, it is called an After-Action Review. In many companies it is called a Root Cause Analysis. It doesn’t matter what you call it, as long as you do it.
We actually avoid using Root Cause Analysis. Many of our clients that use the term focus too much on finding that one “root cause” of the issue. There will never be a single cause to an incident. There will always be a chain of problems with a trigger or proximate event. This is the one event that causes the system to finally topple over. We need a process that digs into the entire chain of events inclusive of the trigger. This is where the postmortem comes in. It is a cross-functional brainstorming meeting that not only identifies the root causes of a problem, but also help in identifying issues with process and training.
Postmortem Process – TIA
The purpose of a good postmortem is to find all of the contributing events and problems that caused an incident. We use a simple three step process called TIA. TIA stands for imeline, ssues, and ctions.
First, we create a timeline of events leading up the issue, as well as the timeline of all the actions taken to restore service. There are multiple ways to collect the timeline of events. Some companies have a scribe that records events during the incident process. Increasingly, we are seeing companies use chat tools like Slack to record events related to restoration. The timestamp in Slack for the message is a good place to extract the timeline. Don’t start your timeline at the beginning of the incident. It starts with the activities prior to the incident that cause the triggering event (e.g. a code deployment). During the postmortem meeting, augment the timeline with additional details.
The second part of TIA is Issues. This is where we walkthrough the timeline and identify issues. We want to focus on people, process, and technology. We want to capture all of the things that either allowed the incident to happen (e.g. lack of monitoring), directly triggered it (e.g. a code push), or increased the time to restore the system to a stable state (e.g. could get the right people on the call). List each issue separately. At this point, there is no discussion about fixing issues, we only focus on the timeline and identifying issues. There is also no reference to ownership. We also don’t want to assign blame. We want a process that provides constructive feedback to solve problems.
Avoid the tendency to find a single triggering event and stop. Make sure you continue to dig into the issues to determine why things happened the way they did. We like to use the “5-whys” methodology to explore root causes. This entails repeatedly asking questions about why something happened. The answer to one question becomes the basis for the next. We continue to ask why until we have identified the true causes of the problems.
Here is a summary of anti-patterns we see when companies conduct postmortems:
|Not conducting a postmortem after a serious (e.g. Sev 1) incident
||Conduct a postmortem within a week after a serious incident
||Avoid blame and keep it constructive
|Not having the right people involved
||Assemble a cross functional team of people involved or needed to resolve problems
|Using a postmortem block (e.g. multiple postmortems during a 1-hour session every two weeks)
||Dedicate time for a postmortem based on the severity of the incident
|Lack of ownership of identified tasks
||Make one person accountable to complete a task within an appropriate timeframe
|Not digging far enough into issues (finding a single root cause)
||Use the 5-Why methodology to identify all of the causes for an issue
Incidents will always happen. What you do after service restoration will determine if the problem occurs again. A structured, timely postmortem process will help identify the issues causing outages and help prevent their reoccurrence in the future. It also fosters a culture of learning from your mistakes without blame.
Are you struggling with the same issues impacting your site? Do you know you should be conducting postmortems but don’t know how to get started? AKF can help you establish critical incident management and postmortem processes. Call us – we can help!
Subscribe to the AKF Newsletter
July 11, 2019 | Posted By: Marty Abbott
Attempting to transform a company to compete effectively in the Digital Economy is difficult to say the least. In the experience of AKF Partners, it is easier to be “born digital” than to transform a successful, long tenured business, to compete effectively in the Digital age.
There is no single guaranteed fail-safe path to transformation. There are, however, 10 principles by which you should abide and 3 guaranteed paths to failure.
Avoid these 3 common mistakes at all costs or suffer a failed transformation.
Having the Wrong Team and the Wrong Structure
If you have a successful business, you very likely have a very bright and engaged team. But unless a good portion of your existing team has run a successful “born digital” business, or better yet transformed a business in the digital age, they don’t have the experience necessary to complete your transformation in the timeframe necessary for you to compete. If you needed lifesaving surgery, you wouldn’t bet your life on a doctor learning “on the job”. At the very least, you’d ensure that doctor was alongside a veteran and more than likely you would find a doctor with a successful track record of the surgery in question. You should take the same approach with your transformation.
This does not mean that you need to completely replace your team. Companies have been successful with organization strategies that include augmenting the current team with veterans. But you need new, experienced help, as employees on your team.
Further, to meet the need for speed of the new digital world, you need to think differently about how you organize. The best, fastest performing Digital teams organize themselves around the outcomes they hope to achieve, not the functions that they perform. High performing digital teams are
It also helps to hire a firm that has helped guide companies through a transformation. AKF Partners can help.
Planning Instead of Doing
The digital world is ever evolving. Plans that you make today will be incorrect within 6 months. In the digital world, no plan survives first contact with the enemy. In the old days of packaged software and brick and mortar retail, we had to put great effort into planning to reduce the risk associated with being incorrect after rather long lead times to project completion. In the new world, we can iterate nearly at the speed of thought. Whereas being incorrect in the old world may have meant project failure, in the new world we strive to be incorrect early such that we can iterate and make the final solution correct with respect to the needs of the market. Speed kills the enemy.
Eschew waterfall models, prescriptive financial models and static planning in favor of Agile methodologies, near term adaptive financial plans and OKRs. Spend 5 percent of your time planning and 95% of your time doing. While in the doing phase, learn to adapt quickly to failures and quickly adjust your approach to market feedback and available data.
The successful transformation starts with a compelling vision that is outcome based, followed by a clear near-term path of multiple small steps. The remainder of the path is unclear as we want the results of our first few steps to inform what we should do in the next iteration of steps to our final outcome. Transformation isn’t one large investment, but a series of small investments, each having a measurable return to the business.
Knowing Instead of Discovering
Few companies thrive by repeatedly being smarter than the market. In fact, the opposite is true – the Digital landscape is strewn with the corpses of companies whose hubris prevented them from developing the real time feedback mechanisms necessary to sense and respond to changing market dynamics. Yesterdays approaches to success at best have diminishing returns today and at worst put you at a competitive disadvantage.
Begin your journey as a campaign of exploration. You are finding the best path to success, and you will do it by ensuring that every solution you deploy is instrumented with sensors that help you identify the efficacy of the solution in real time. Real time data allows us to inductively identify patterns that form specific hypothesis. We then deductively test these hypotheses through comparatively low-cost solutions, the results of which help inform further induction. This circle of induction and deduction propels us through our journey to success.
Subscribe to the AKF Newsletter
July 10, 2019 | Posted By: Bill Armelin
We are surprised at how often we go into a client and find that management does not have any metrics for their teams. The managers respond that they don’t want to negatively affect the team’s autonomy or that they trust the team to do the right things. While trusting your teams is a good thing, how do you know what they are doing is right for the company? How can you compare one team to another? How do you know where to focus on improvements?
Recently, we wrote an article about team autonomy, discussing how an empowered team is autonomous within a set of constraints. The article creates an analogy to driving a car, with the driver required to reach a specific destination, but empowered to determine WHAT path to take and WHY she takes it. She has gauges, such as a speedometer to give feedback on whether she is going too fast or too slow. Imagine driving a car without a speedometer. You will never know if you are sticking to the standard (the speed limit) or when you will get to where you need to go (velocity).
As a manager, it is your responsibility to set the appropriate metrics to help your teams navigate through the path to building your product. How can you hold your teams to certain goals or standards if you can’t tell them where they are in relation to the goal or standard today? How do you know if the actions you are taking are creating or improving shareholder value?
What metrics do you set for your teams? It is an important question. Years ago, while working at a Big 6 consulting firm, I had the pleasure of working with a very astute senior manager. We were redesigning manufacturing floors into what became Lean Manufacturing. He would walk into a client and ask them what the key metrics were. He would then proceed to tell them what their key issues were. He was always right. With metrics, you get what you measure. If you align the correct metrics with key company goals, then all is great. If you misalign them, you end up with poor performance and questionable behaviors.
So, what are the right metrics for a technology team? In 2017, we published an article on what we believe are the engineering metrics by which you should measure your teams. Some of the common metrics we focused on were velocity, efficiency, and cost. At initial glance, you might think that these seem “big brother-ish.” But, in reality, these metrics will provide your engineering teams with critical feedback to how they are doing. Velocity helps a team identify structural defects within the team (and should not be used to compare against other teams or push them to get more done). Efficiency helps the teams identify where they are losing precious development time to less valuable activities, such as meetings, interviews and HR training. It helps them and their managers quantify the impact of non-development and reduce such activities.
Cost helps the team identify how much they are spending on technology. We have seen this metric particularly used effectively in companies deploying to the cloud. Many companies allow cloud spending to significantly and uncontrollably increase as they grow. Looking at costs exposes things like the need for autoscaling to reduce the number of instances required during off peak times, or to purge unused instances that should be shut down.
The key to avoiding metrics from being perceived as overbearing is to keep them transparent. The teams must understand the purpose of the metric and how it is calculated. Don’t use them punitively. Use them to help the teams understand how they are doing in relation to the larger goals. How do you align the higher-level company goals to the work you teams are performing? We like to use Objectives and Key Results, or OKRs. This concept was created by Andy Grove at Intel and brought to Google by John Doerr. The framework aims to align higher level “objectives” to measurable “key results.” An objective at one level has several key results. These key results become the objectives for the next level down and defines another set of key results at that level. This continues all the way down to the lowest levels of the company resulting in alignment of key results and objectives across the entire company.
Choosing the Right MetricMetrics-driven institutions demonstrably outperform those that rely on intuition or “gut feel.” This stated, poorly chosen metrics or simply too many metrics may hinder performance.
- A handful of carefully chosen metrics. Choose a small number of key metrics over a large volume. Ideally, each Agile team should be evaluated/tasked with improving 2-3 metrics (no more than 5). (Of note, in numerous psychological studies, the quality of decision-making has actual been shown to decrease when too much information is presented).
- Easy to collect and or calculate. A metric such as “Number of Customer Service Tickets per Week” although crude, is better than “Engineer Hours spent fixing service” as it requires costly time/effort to collect.
- Directly Controllable by the Team. Assigning a metric such as “Speed and Accuracy of Search” to a Search Service is preferred to “Overall Revenue” which is less directly controllable.
- Reflect the Quality of Service. The number of abandoned shopping carts reflects the quality of a Shopping Cart service, whereas number of shopping cart views is not necessarily reflective of service quality.
- Difficult to Game. The innate human tendency to game any system should be held in check by selecting the right metrics. Simple velocity measures are easily gamed while the number of Sev 1 incidents cannot be easily gamed.
- Near Real Time Feedback. Metrics that can be collected and presented over short-time intervals are most desirable. Information is most valuable when fresh — Availability week over week is better than a yearly availability measure.
Managers are responsible for the performance of their teams in relation to the company’s objectives and how they create shareholder value. Measuring how your teams are performing against or their contribution to those goals is only speculation if you don’t have the correct measurements and metrics in place. The bottom line is, “If you are not measuring, you are not managing.”
Are you having difficult defining the right metrics for your teams? Are you interested in defining OKRs but don’t know where or how to get started? AKF has helped many companies identify and implement key metrics, as well as implementing OKRs. We have over 200 years of combined experience helping companies ensure their organizations, processes, and architecture are aligned to the outcomes they desire. Contact us, we can help.
Subscribe to the AKF Newsletter
July 1, 2019 | Posted By: Greg Fennewald
As technology professionals, managing risk is an important part of the value we provide to the business. Risk can take many forms, including threats to availability, scalability, information security, and time to market. Physical layer risks from the data center realm can severely impact availability, as the events of the February 2019 Wells Fargo outage demonstrate.
Transitioning Away from On-Prem Hosting
Over the last decade, knowledge of data center architecture, operating principles, capabilities, and associated risks has decreased in general due to the rise of managed hosting and especially cloud hosting. This is particularly true for small and medium sized companies, which may have chosen cloud hosting early on and thus never have dealt with colocation or owned data centers. This is not necessarily a bad trend – why devote resources to learn domains that are not core to your value proposition?
While knowledge of data center geekdom may have decreased, the risks associated with data centers has not substantially changed. Even the magic pixie dust of cloud hosting is a data center at its core, albeit with a degree of operational excellence exceeding the stereotypical company-owned data center + colo combination.
Given that technologists can mitigate data center risks by choosing cloud hosting with a major provider capable of mastering data center operations, why spend any time to learn about data center risks?
- Cloud hosting sites do encounter failures. The ability to ask informed questions during the vendor selection process can help optimize the availability for your business.
- Business or regulatory changes may force a company to use colocation to meet data residency or other requirements.
- A company may grow to the size where owning data centers makes business sense for a portion of their hosting need.
- A hosting provider could exit the business or face bankruptcy, forcing tenants to take over or move on short notice. Been there, done that, got the T shirt.
Data Center Lifespan Risk
For the purposes of this article, we will consider data center lifespan risk. We define this risk as the probability of an infrastructure failure causing significant, and possibly complete, business disruption and the level of difficulty in restoring functionality.
A chart of data center lifespan risk resembles a bathtub – a high level of failures as the site is first built and undergoing 5 levels of commissioning towards the left side of the chart, followed by a long period of lower threat that can extend 15 years or more. As time continues to march on, the risk rises again, creating the right-hand side of the bathtub curve.
The risk of failure increases over time as the useful service life of infrastructure components approach their end. The risk of failure approaches unity over a sufficiently long-time span.
Service Life Examples
Below are some service life estimates based on our experience for critical data center components that are properly maintained;
||4 years VRLA, 12+ wet cell
||Battery string monitoring strongly recommended
||12,000+ hours before overhaul, run 100 or less annually
|Main switchgear PLC
||15 + years
||PLC model EOL is the risk
|CRAH/CRAC fan motors
||The magic smoke wants to escape
|Galvanized cooling tower wet surfaces
||Varies with water chemistry, stainless steel worth the cost
|Electrical distribution board
||EOL of breaker style and PLC is the risk
|Chilled water piping
||Design for continuous duty, ~ 7 FPS flow velocity
All the above examples are measured in years. If you are in the early years of a data center lifespan, there’s not a lot to worry about other than batteries. Most growing companies are more concerned about adequate capacity, availability, and cost when they create their hosting strategy. Not much thought is given to an exit strategy. Such an effort is probably not worth it for a startup company, but established companies need to be thinking beyond next quarter and next year.
If your product or service can survive the loss of a single hosting site without impact (i.e. multi-active sites with validated traffic shifts), you could afford to run a bit deeper into the service life timeline. If you can’t - or, like Wells Fargo thought you could but learned the hard way that was not the case - you need to plan ahead to mitigate these risks.
As mentioned before, the risks we want to mitigate are an impactful failure and a complex restoration after a failure. By complex, we mean trying to find parts and trained technicians for components that were EOL 5 years ago and end of OEM support 18 months ago. Not a fun place to be. Would you feel comfortable running your online business with switches and routers that are EOL and EOS? Hopefully not. Why would you do so for your hosting location?
Mitigating the Risks
The best way to mitigate the risk of an impactful infrastructure failure is to be able to survive the loss of a hosting site regardless of type with business disruption that is acceptable to the business and customers. That could vary, your hosting solution should be tailored to the needs of the business.
Some thoughts on aging hosting sites;
- All the characteristics that make cloud hosting taste great and be less filling (containerization, automation, infrastructure as code, orchestration, etc.) can also make the effort to stand up a new site and exit an old one much less onerous.
- If you are committed to an owned data center or colo, moving to a newer site is the best choice. Could you combine a move with a tech refresh cycle? Could the aging data center fulfill a different purpose such as hosting development and QA environments? Such environments should have less business impact from a failure, and you can squeeze out the last few years of life from that site.
- You can purchase extra spare parts for components nearing EOL or EOS and send technicians to training courses. This can mitigate risk but is really analogous to convincing yourself that you can scale your DB by tuning the SQL queries. Viable only to add 6 or 12 months to a move/exit timeline.
Just about any of the components mentioned above in the useful life estimate can be replaced, especially if the data center can be shut down for weeks or months to make the replacement and test the systems. Trying to replace components while still serving traffic is extremely risky. Very few data centers have the redundancy to replace electrical components while still providing conditioned power and cooling to the server rooms. Those sites that can usually cannot do so without reducing their availability. We’ve had to take a dual UPS (2N) site to a single UPS source (N) for a week to correct a serious design flaw. Single corded is not appropriate if your DR plan checks an audit box and not much else
The tremendous popularity of cloud hosting does not alleviate the need to understand physical layer risks, including data center lifespan risks. Understanding them enables technology leaders to mitigate the risks.
Interested in learning more? Need assistance with hosting strategy? Considering a transition to SaaS? AKF Partners can help.
Subscribe to the AKF Newsletter
June 19, 2019 | Posted By: Larry Steinberg
Transforming a traditional on-premise product and company to a SaaS model is currently in vogue and has many broad-reaching benefits for both producers and consumers of services. These benefits span financial, supportability, and consumption simplification.
In order to achieve the SaaS benefits, your company must address the broad changes necessitated by SaaS and not just the product delivery and technology changes. You must consider the impact on employees, delivery, operations/security, professional services, go to market, and financial impacts.
The employee base is one key element of change – moving from a traditional ‘boxed’ software vendor to an ‘as a Service’ company changes not only the skill set but also the dynamics of the engagement with your customers. Traditionally staff has been accustomed to having an arms-length approach for the majority of customer interactions. This traditional process has consisted of a factory that’s building the software, bundling, and a small group (if any) who ‘touch’ the customer. As you move ‘to as a service’ based product, the onus on ensuring the solution is available 24x7 is on everyone within your SaaS company. Not only do you require the skillsets to ensure infrastructure and reliability – but also the support model can and should change.
Now that you are building SaaS solutions and deploy them into environments you control, the operations are much ‘closer’ to the engineers who are deploying builds. Version upgrades happen faster, defects will surface more rapidly, and engineers can build in monitoring to detect issues before or as your customers encounter them. Fixes can be provided much faster than in the past and strict separation of support and engineering organizations are no longer warranted. Escalations across organizations and separate repro steps can be collapsed. There is a significant cultural shift for your staff that has to be managed properly. It will not be natural for legacy staff to adopt a 24x7 mindset and newly minted SaaS engineers likely don’t have the industry or technology experience needed. Finding a balance of shifting culture, training, and new ‘blood’ into the organization is the best course of action.
Passion For Service Delivery
Having Services available 24x7 requires a real passion for service delivery as teams no longer have to wait for escalations from customers, now engineers control the product and operating environment. This means they can see availability and performance in real time. Staff should be proactive about identifying issues or abnormalities in the service. In order to do this the health and performance of what they built needs to be at the forefront of their everyday activities. This mindset is very different than the traditional onpremise software delivery model.
Shifting the operations from your customer environments to your own environments also has a security aspect. Operating the SaaS environment for customers shifts the liability from them to you. The security responsibilities expand to include protecting your customer data, hardening the environments, having a practiced plan of incident remediation, and rapid response to identified vulnerabilities in the environment or application.
Finance & Accounting
Finance and accounting are also impacted by this shift to SaaS - both on the spend/capitalization strategy as well as cost recovery models. The operational and security components are a large cost consideration which has been shifted from your customers to your organization and needs to be modeling into SaaS financials. Pricing and licensing shifts are also very common. Moving to a utility or consumption model is fairly mainstream but is generally new for the traditional product company and its customers. Traditional billing models with annual invoices might not fit with the new approach and systems + processes will need to be enhanced to handle. If you move to a utility-based model both the product and accounting teams need to partner on a solution to ensure you get paid appropriately by your customers.
Think through the impacts on your customer support team. Given the speed at which new releases and fixes become available the support team will need a new model to ensure they remain up to date as these delivery timeframes will be much more rapid than in the past and they must stay ahead of your customer base.
Your go to market strategies will most likely also need to be altered depending on the market and industry. To your benefit, as a SaaS company, you now have access to customer behavior and can utilize this data in order to approach opportunities within the customer base. Regarding migration, you’ll need a plan which ensures you are the best option amongst your competitors.
Most times at AKF we see companies who have only focused on product and technology changes when moving to SaaS but if the whole company doesn’t move in lockstep then progress will be limited. You are only as strong as your weakest link.
We’ve helped companies of all sizes transition their technology – AND organization – from on-premises to the cloud through SaaS conversion. Give us a call – we can help!
Subscribe to the AKF Newsletter
May 29, 2019 | Posted By: AKF
VMs vs Containers
Inefficiency and down time have traditionally kept CTO’s and IT decision makers up at night. Now, new challenges are emerging driven by infrastructure inflexibility and vendor lock-in, limiting Technology more than ever and making strategic decisions more complex than ever. Both VMs and containers can help get the most out of available hardware and software resources while easing the risk of vendor lock-in limitation.
Containers are the new kids on the block, but VMs have been, and continue to be, tremendously popular in data centers of all sizes. Having said that, the first lesson to learn, is containers are not virtual machines. When I was first introduced to containers, I thought of them as light weight or trimmed down virtual instances. This comparison made sense since most advertising material leaned on the concepts that containers use less memory and start much faster than virtual machines – basically marketing themselves as VMs. Everywhere I looked, Docker was comparing themselves to VMs. No wonder I was a bit confused when I started to dig into the benefits and differences between the two.
As containers evolved, they are bringing forth abstraction capabilities that are now being broadly applied to make enterprise IT more flexible. Thanks to the rise of Docker containers it’s now possible to more easily move workloads between different versions of Linux as well as orchestrate containers to create microservices. Much like containers, a microservice is not a new idea either. The concept harkens back to service-oriented architectures (SOA). What is different is that microservices based on containers are more granular and simpler to manage. More on this topic in a blog post for another day!
If you’re looking for the best solution for running your own services in the cloud, you need to understand these virtualization technologies, how they compare to each other, and what are the best uses for each. Here’s our quick read.
VM’s vs. Containers – What’s the real scoop?
One way to think of containers vs. VMs is that while VMs run several different operating systems on one server, container technology offers the opportunity to virtualize the operating system itself.
Figure 1 – Virtual Machine Figure 2 - Container
VMs help reduce expenses. Instead of running an application on a single server, a virtual machine enables utilizing one physical resource to do the job of many. Therefore, you do not have to buy, maintain and service several servers. Because there is one host machine, it allows you to efficiently manage all the virtual environments with a centralized tool – the hypervisor. The decision to use VMs is typically made by DevOps/Infrastructure Team. Containers help reduce expenses as well and they are remarkably lightweight and fast to launch. Because of their small size, you can quickly scale in and out of containers and add identical containers as needed.
Containers are excellent for Continuous Integration and Continuous Deployment (CI/CD) implementation. They foster collaborative development by distributing and merging images among developers. Therefore, developers tend to favor Containers over VMs. Most importantly, if the two teams work together (DevOps & Development) the decision on which technology to apply (VMs or Containers) can be made collaboratively with the best overall benefit to the product, client and company.
What are VMs?
The operating systems and their applications share hardware resources from a single host server, or from a pool of host servers. Each VM requires its own underlying OS, and the hardware is virtualized. A hypervisor, or a virtual machine monitor, is software, firmware, or hardware that creates and runs VMs. It sits between the hardware and the virtual machine and is necessary to virtualize the server.
IT departments, both large and small, have embraced virtual machines to lower costs and increase efficiencies. However, VMs can take up a lot of system resources because each VM needs a full copy of an operating system AND a virtual copy of all the hardware that the OS needs to run. This quickly adds up to a lot of RAM and CPU cycles. And while this is still more economical than bare metal for some applications this is still overkill and thus, containers enter the scene.
Benefits of VMs
• Reduced hardware costs from server virtualization
• Multiple OS environments can exist simultaneously on the same machine, isolated from each other.
• Easy maintenance, application provisioning, availability and convenient recovery.
• Perhaps the greatest benefit of server virtualization is the capability to move a virtual machine from one server to another quickly and safely. Backing up critical data is done quickly and effectively because you can effortlessly create a replication site.
Popular VM Providers
• VMware vSphere ESXi, VMware has been active in the virtual space since 1998 and is an industry leader setting standards for reliability, performance, and support.
• Oracle VM VirtualBox - Not sure what operating systems you are likely to use? Then VirtualBox is a good choice because it supports an amazingly wide selection of host and client combinations. VirtualBox is powerful, comes with terrific features and, best of all, it’s free.
• Xen - Xen is the open source hypervisor included in the Linux kernel and, as such, it is available in all Linux distributions. The Xen Project is one of the many open source projects managed by the Linux Foundation.
• Hyper-V - is Microsoft’s virtualization platform, or ‘hypervisor’, which enables administrators to make better use of their hardware by virtualizing multiple operating systems to run off the same physical server simultaneously.
• KVM - Kernel-based Virtual Machine (KVM) is an open source virtualization technology built into Linux. Specifically, KVM lets you turn Linux into a hypervisor that allows a host machine to run multiple, isolated virtual environments called guests or virtual machines (VMs).
What are Containers?
Containers are a way to wrap up an application into its own isolated ”box”. For the application in its container, it has no knowledge of any other applications or processes that exist outside of its box. Everything the application depends on to run successfully also lives inside this container. Wherever the box may move, the application will always be satisfied because it is bundled up with everything it needs to run.
Containers virtualize the OS instead of virtualizing the underlying computer like a virtual machine. They sit on top of a physical server and its host OS — typically Linux or Windows. Each container shares the host OS kernel and, usually, the binaries and libraries, too. Shared components are read-only. Sharing OS resources such as libraries significantly reduces the need to reproduce the operating system code and means that a server can run multiple workloads with a single operating system installation. Containers are thus exceptionally light — they are only megabytes in size and take just seconds to start. Compared to containers, VMs take minutes to run and are an order of magnitude larger than an equivalent container.
In contrast to VMs, all that a container requires is enough of an operating system, supporting programs and libraries, and system resources to run a specific program. This means you can put two to three times as many applications on a single server with containers than you can with VMs. In addition, with containers, you can create a portable, consistent operating environment for development, testing, and deployment. This is a huge benefit to keep the environments consistent.
Containers help isolate processes through differentiation in the operating system namespace and storage. Leveraging operating system native capabilities, the container isolates process space, may create temporary file systems and relocate process “root” file system, etc.
Benefits of Containers
One of the biggest advantages to a container is the fact you can set aside less resources per container than you might per virtual machine. Keep in mind, containers are essentially for a single application while virtual machines need resources to run an entire operating system. For example, if you need to run multiple instances of MySQL, NGINX, or other services, using containers makes a lot of sense. If, however you need a full web server (LAMP) stack running on its own server, there is a lot to be said for running a virtual machine. A virtual machine gives you greater flexibility to choose your operating system and upgrading it as you see fit. A container by contrast, means that the container running the configured application is isolated in terms of OS upgrades from the host.
Popular Container Providers
1. Docker - Nearly synonymous with containerization, Docker is the name of both the world’s leading containerization platform and the company that is the primary sponsor of the Docker open source project.
2. Kubernetes - Google’s most significant contribution to the containerization trend is the open source containerization orchestration platform it created.
3. Although much of early work on containers was done on the Linux platform, Microsoft has fully embraced both Docker and Kubernetes containerization in general. Azure offers two container orchestrators Azure Kubernetes Service (AKS) and Azure Service Fabric. Service Fabric represents the next-generation platform for building and managing these enterprise-class, tier-1, applications running in containers.
4. Of course, Microsoft and Google aren’t the only vendors offering a cloud-based container service. Amazon Web Services (AWS) has its own EC2 Container Service (ECS).
5. Like the other major public cloud vendors, IBM Bluemix also offers a Docker-based container service.
6. One of the early proponents of container technology, Red Hat claims to be “the second largest contributor to the Docker and Kubernetes codebases,” and it is also part of the Open Container Initiative and the Cloud Native Computing Foundation. Its flagship container product is its OpenShift platform as a service (PaaS), which is based on Docker and Kubernetes.
Uses for VMs vs Uses for Containers
Both containers and VMs have benefits and drawbacks, and the ultimate decision will depend on your specific needs, but there are some general rules of thumb.
• VMs are a better choice for running applications that require all of the operating system’s resources and functionality when you need to run multiple applications on servers or have a wide variety of operating systems to manage.
• Containers are a better choice when your biggest priority is maximizing the number of applications running on a minimal number of servers.
Because of their small size and application orientation, containers are well suited for agile delivery environments and microservice-based architectures. When you use containers and microservices, however, you can easily have hundreds or thousands of components in your environment. You may be able to manually manage a few dozen virtual machines or physical servers, but there is no way you can manage a production-scale container environment without automation. The task of automating and managing a large number of containers and how they interact is known as orchestration.
Scalability of containerized workloads is a completely different process from VM workloads. Modern containers include only the basic services their functions require, but one of them can be a web server, such as NGINX, which also acts as a load balancer. An orchestration system, such as Google Kubernetes is capable of determining, based upon traffic patterns, when the quantity of containers needs to scale out; can replicate container images automatically; and can then remove them from the system.
For most, the ideal setup is likely to include both. With the current state of virtualization technology, the flexibility of VMs and the minimal resource requirements of containers work together to provide environments with maximum functionality.
If your organization is running many instances of the same operating system, then you should look into whether containers are a good fit. They just might save you significant time and money over VMs.
Subscribe to the AKF Newsletter
1 2 3 > Last ›