July 8, 2018 | Posted By: Robin McGlothin
AKF often recommends to our clients the adoption of business metric monitoring – the use of high-level user activity or transaction patterns that can often provide early warning of an incident. Business metric monitors will not tell you where or what the problem is, rather – and most importantly – they tell you something appears to be abnormal and should be investigated, that something has affected your customer experience.
A significant part of recovery time (and therefore availability) is the time required to detect and localize service incidents. A 2013 study by Business Internet Group of San Francisco found that of the 40 top-performing websites (as identified by KeyNote Systems), 72% had suffered user-visible failures in common functionality, such as items not being added to a shopping cart or an error message being displayed.
Our conversations with clients confirm that detecting these failures is a significant problem. AKF Partners estimates that 75% of the time spent recovering from application-level failures is time spent detecting them! Application-level failures can sometimes take days to detect, though they are repaired quickly once found. Fast detection of these failures (Time to Detect – TTD) is, therefore, a key problem in improving service availability.
The duration of a product impairment is TTR.
To improve TTR, implement a good notification system that first, based on business metrics, tells you that an error affecting your users is happening. Then, rely upon application and system monitoring to inform you on where and what has failed. Make sure to have good and easy view logs for all errors, warnings and other critical data your application creates. We already have many technologies in this space and we just need to employ them in an effective manner with the focus on safeguarding the client experience.
In the form of Statistical Process Control (SPC – defined below) two relatively simple methods to improve TTD:
- Business KPI Monitors (Monitor Real User Behavior): Passively monitor critical user transactions such as logins, queries, reports, etc. Use math to determine abnormal behavior. This is the first line of defense.
- Synthetic Transactions (Simulate User Behavior): Synthetic transactions are scripted actions that attempt to mimic real customer behavior. Examples might be sign-ons, add to cart, etc. They provide a more meaningful view of your customers’ experiences vs. just looking at page load times, error rates, and similar. Do this with Keynote or a similar product and expand it to an internal systems scope. Alerts from a passive monitor can be confirmed or denied and escalated as appropriate. This is the second line of defense.
Monitor the Bad – potential, & actual bad things (alert before they happen), and tune and continuously improve (Iterate!)
If you can’t identify all problem areas, identify as many as possible. The best monitoring starts before there’s a problem and extends beyond the crisis.
Because once the crisis hits, that’s when things get ugly! That’s when things start falling apart and people point fingers.
At times, failures do not disable the whole site, but instead cause brown-outs, where part of a site’s functionality is disabled or only some users are unable to access the site. Many of these failures are application-level failures that change the user-visible functionality of a service but do not cause obvious lower-level failures detectable by service operators. Effective monitoring will detect these faults as well.
The more proactive you can be about identifying the issues, the easier it will be to resolve and prevent them.
In fault detection, the aim is to determine whether an abnormal event happened or when an application being monitored is out of control. The early detection of a fault condition is important in avoiding quality issues or system breakdown, and this can be achieved through the proper design of effective statistical process control with upper & lower limits identified. If the values of the monitoring statistics exceed the control limits of the corresponding statistics, a fault is detected. Once a fault condition has been positively detected, the next step is to determine the root cause of the out-of-control status.
One downside of the SPC method is that significant changes in amplitude (natural increases in your business metrics) can cause problems. An alternative to SPC is First and Second Derivative testing. These tests tell if the actual and expected curve forms are the same.
Here’s a real-world example of where business metrics help us determine changes in normal usage at eBay.
We had near real-time graphs of user metrics such as bids, listings, logins, and new user registrations. The data was graphed week over week. Usage patterns throughout a day followed a readily identifiable pattern with peaks and valleys. These graphs were displayed in the Network Operations Center, which was staffed 24x7. Deviations from the previous week’s pattern had proven useful, identifying issues such as ISP instability in the EU impacting customers trying to access eBay.
Everything seemed normal on a Wednesday evening – right up to the point that bids and listings both took a nosedive. The NOC quickly initiated the SEV1 process and technical resources checked their areas. The site had no identifiable faults, services were confirmed to be working fine, yet the user activity was still markedly lower. Roughly 20 minutes into the SEV1 process, the root cause was identified. The finale episode of American Idol was being broadcast. Our site was fine – but our customers had other things on their mind. The business metric monitors worked – they gave warning of an aberrant usage pattern.
How would your company react to this critical change in normal usage patterns? Use business metric monitors to detect workload shifts.
July 8, 2018 | Posted By: Dave Berardi
The Leap of Faith
When we embark on building SaaS product that will delight customers we are taking a leap of faith. We often don’t even know whether or not the outcomes targeted are possible. Investing and building software is often risky for several reasons:
- We don’t know what the market wants.
- The market is changing around us.
- Competition is always improving their time to market (TTM) releasing competitive products and services.
We have to assume there will be project assumptions made that will be wrong and that the underlying development technology we use to build products is constantly changing and evolving. One thing is clear on the SaaS journey – the future is always murky!
The journey that’s plagued with uncertainty for developing SaaS is seen throughout the industry and is evidenced by success and failure from big and small companies – from Facebook to Apple to Salesforce to Google. Google is one of many innovating B2C companies that have used the cone of uncertainty to help inform how to go to market and whether or not to sunset a service. The company realizes that in addition to innovating, they need to reduce uncertainty quickly.
For example, Google Notebook, a browser-based note-taking and information sharing service, was killed and resurrected as part of Google Docs and has a mobile derivative called Keep. Google Buzz, Google’s first attempt at a social network was quickly killed after a little over a year in 2011. These are just a few B2C examples from Google. All of these are examples of investments that faced the cone of uncertainty. Predicting successful outcomes longer term and locking in specifics about a product will only be wasteful and risky.
The cone of uncertainty describes the uncertainty and risk that exist when an investment is made for a software project. The cone depicts the amount of risk and degree of precision for certainty thru the funnel. The further out we try to forecast features, capabilities, and adoption, the more risk and uncertainty we must assume. This is true for what we attempt to define as a product to be delivered and the timing on when we will deliver it to market. Over time, firms must make adjustments to the planned path along the way to capture and embrace changing market needs.
In today’s market we must quickly test our hypothesis and drive innovation to be competitive. An Agile product development life cycle (PDLC) and appropriately aligned organization helps us to do just that. To address the challenge the cone represents, we must understand what an Agile PDLC can do for the firm and what it cannot do for the firm.
Address the Uncertainty of the Cone
When we use an Agile approach, we must fix time and cost for development and delivery of a product but we allow for adjustment and changes to scope to meet fixed dates. The team can extend time later in the project but the committed date to delivery does not change. We also do not add people since Brooks Law teaches us that adding human resources to a late software project only delays it further. Instead we accelerate our ability to learn with frequent deployments to market resulting in a reduction in uncertainty. Throughout this process, discovery of both what the feature set needs to be for a successful outcome and how something should work is accomplished.
Agile allows for frequent iterations that can keep us close to the market thru data. After a deployment, if our system is designed to be monitored, we can capture rich information that will help to inform future prioritization, new ideas about features and modifications that may be needed to the existing feature set. Agile forces us to frequently estimate and as such produce valuable data for our business. The resulting velocity of our sprints can be used to revise future delivery range forecasts for both what will be delivered and when it will be delivered. Data will also be produced throughout our sprints that will help to identify what may be slowing us down ultimately impacting our time to market. Positive morale will be injected into the tams as results can be observed and felt in the short term.
What agile is not and how we must adjust?
While using an Agile method can help address the cone of uncertainty, it’s not the answer to all challenges. Agile does not help to provide a specific date when a feature or scope will be delivered. Instead we work towards ranges. It also does not improve TTM just because our teams started practicing it. Company philosophies, principles, and rules are not defined through an Agile PDLC. Those are up to the company to define. Once defined the teams can operate within the boundaries to innovate. Part of this boundary definition needs to start at the top. Executives need to paint a vivid picture of the desired outcome that stirs up emotion and can be measurable. The vision is at the opening of the cone. Measurable Key Results that executives define to achieve outcomes allow for teams to innovate making tradeoffs as they progress towards the vision. Agile alone does not empower teams or help to innovate. Outcomes, and Key Results (OKRs) cascaded into our organization coupled with an Agile PDLC can be a great combination that will empower teams giving us a better chance to innovate and achieve desirable time to market. Implementing an OKR framework helps to remove the focus of cranking out code to hit a date and redirects the needed attention on innovation and making tradeoffs to achieve the desired outcome.
Agile does not align well with annual budget cycles. While many times, an annual perspective is required by shareholders, an Agile approach is in conflict with annual budgeting. Since Agile sees changing market demands, frequent budget iterations are needed as teams may request additional funding to go after an opportunity. It’s key that finance leaders embrace the importance of adjusting the budgeting approach to align with an Agile PDLC. Otherwise the conflict created could be destructive and create a barrier to the firms desired outcome.
Applying Agile properly benefits a firm by helping to address the cone and reducing uncertainty, empowering teams to deliver on an outcome, and ultimately become more competitive in the global marketplace. Agile is on the verge of becoming table stakes for companies that want to be world class. And as we described above noting the importance of a different approach to something like budgeting, its not just for software – it’s the entire business.
Let Us Help
AKF has helped many companies of all sizes when transitioning to an organization, redefining PDLC to align with desired speed to market outcomes, and SaaS migrations. All three are closely tied and if done right, can help firms compete more effectively. Contact us for a free consultation. We would love to help!
July 8, 2018 | Posted By: Dave Swenson
AKF often finds itself required to act as a marriage counselor trying to improve the relationship between technology and business ‘spouses’. In fact, we rarely find the relationship between these partners without at least some opportunity for a 3rd party, external, unbiased perspective to produce some suggestions. Given the backgrounds of the prototypical CEO or CTO, it is no surprise there are misunderstandings, miscommunication, and misalignment – there is a substantial experiential chasm between the two…
Recognizing how big this chasm, where it is narrow vs. wide between the two partners is vital to bridging this gap. One of the key aspects we try to immediately ascertain is whether there is a true partnership in place, versus a customer / order taker mindset. How much trust is currently present? Is a single language being used by the two, or is it bits and bytes vs. $$$?
Whether you are a CTO or a business executive, we suggest you go through the following set of questions. Even better, ask your tech or business partner to do their side and discuss and compare! Additionally, this self-analysis shouldn’t occur only at the highest levels, but all throughout the organization, particularly if you’re organizationally aligned.
Questions for Technology Leaders:
- When did you last come up with a proposal to increase revenue?
The best and perhaps most extreme, example of this is AWS, where the technology team took an internal solution built to improve Amazon developer productivity, recognized that all developers must face the same infrastructure challenges, and proposed it to Bezos as a new business line. Are you constantly seeking out ways in product, marketing, sales, technology to generate additional revenue, or solely focused on cost containment?
- Do you understand the balance sheet, statement of cash flows and income statement of your company?
These artifacts describe how the overall business community, your investors, are measuring you. Learning the meaning of these documents aids in spanning the bits & bytes vs. $$$ language barrier. This is where getting an MBA provides the most value.
- Can you represent the importance of addressing technical debt to your business peers?
You are responsible for the technical debt in your codebase, not your business. If you can’t explain the true ongoing cost of the incurred debt, if you can’t justify the periodic pay down of that debt, you frankly are failing as a technology leader, at least if you have a business partner willing to listen.
- Can you state the highest priority issues facing your business peers today?
We love the following quote from Camille Fournier ( former CTO of Rent the Runway and author of The Manager’s Path):
“If the CTO does not have a seat at the executive table and does not understand the business challenges the company is facing, there is no way the CTO can guide the technology to solve those problems”
- Do you feel your team, your engineers, understand how their daily activities affect the business and your customers?
I once left a company producing a relational database to then join a startup that had built its application on top of that RDBMS. I quickly found issues that I knew could easily and cheaply be addressed, but had never heard of these pain points until I personally experienced them! I vowed to never be so removed and distant from my customers again. Zappos requires all new employees to take a month long customer service stint, spending 40 hours on the phones. During the holiday peak, all employees are expected to jump on the phones to ensure the same level of response as the rest of the year. Don’t just “eat your own dog food”, but understand how your customers eat it.
- Do your engineers understand what each functional product component costs to build, maintain, and support - relative to the value it brings to the business? Do they push back against product and business when there’s a minimal or even negative ROI?
A great vehicle to explain revenue flow is a Dupont diagram, mapping out the user experience flow, and assigning value across that flow. That value makes it clear that say, a .5% improvement in relevant search results can turn into a .025% uptick in items in cart, that turns into $X increase in revenue.
- Do you provide early feedback on the likelihood of making key dates? Is that feedback consistently incorrect?
If you’ve ever had your house remodeled, you’ll agree that there’s little that is more frustrating than a contractor who consistently under-delivers, and it late on agreed to delivery dates. You’ve got plans hinging upon the construction completion date, and when that date slips, it destroys your plans. Your business peers feel the same way when your date slips, or scope gets cut. Are you actively seeking out the causes of such delays? How can you be transparent with your partners when you don’t understand the causes?
- Does your technology team measure themselves against metrics that are meaningful to the business?
Ensure your teams are measuring the outcome of their work, not simply the completion and delivery of that work. And, that outcome measurement should be made in business terms. Velocity should always be measured, but an increase in velocity is frankly less important than moving targeted business needles in the desired direction!
- What are you least transparent about, and why?
Typically, the issues we are most reluctant to share are those we ourselves are uncomfortable with. The answer to this question can show you the areas where you are paying the least attention.
Questions for Business Executives:
- What is your reaction when you hear that a date has slipped?
“Shit happens” is too simple of an explanation, but there are many reasons why a key date slips. There might have been a change in prioritization, driven from the highest levels. There could have been critical site issues that pulled the team away from new functionality. The scope could have been grossly underestimated, or have grown for innumerable reasons along the way. The key thing is that your technology partner should be able to explain the causes - so don’t be afraid to ask for an explanation. Just don’t start the conversation by pointing a finger.
- Do you feel technology as a whole understands the business? Are engineers close enough to your customers to really understand the value you bring them?
I am always dismayed when I find engineers who don’t understand what value, and how, your product provides the customer. An engineer shouldn’t only be motivated by technology problems, but should appreciate the value their product provides. I had the great pleasure of witnessing a company adopt Agile, resulting in tighter bonds between customers, business, and technology. A particular engineer had never understood the value of their product, not to just to their immediate customers but their customers’ customers. As this was a medically-oriented product, that end value was basically a better life. The engineer had worked at this company for a few years, yet never witnessed the true value of the product he had been building – a tragedy in my book. Make the effort to ensure everyone in your company understands the value your products provide, and the revenue stream flowing into the company – it will absolutely be worth the investment!
- Do you as a business leader spend as much time attempting to understand the technology team as they are hopefully trying to learn to read financial statements? Any time?
I absolutely love when a business leader is present at an AKF workshop/engagement. I certainly appreciate the dedication of time, but more importantly, the desire to better understand. Have you asked your technology team for a walkthrough of how the systems work? What their challenges are?
- Do the business leaders understand how to ask questions to know whether dates are both aggressive and achievable?
Your car has a redline. Do you typically exceed that redline RPM? Doubtful. Do you understand when your technology team has over-extended themselves? When they have relied upon heroics to meet a delivery?
- Does the business spend time in the beginning of a product life cycle figuring out how to measure success?
The entire company, business/support/sales/marketing/product/technology teams should be driving to achieve important business goals, and measure themselves by the progress, the outcomes, towards those goals. Delivering new functionality is critical, but more important are the improvements in business metrics that functionality brings. Are you measuring how you are affecting business metrics?
Questions for Both:
- What are your shared goals?
We are firm believers in OKRs (Objectives & Key Results), shared across the entire company. Alignment around these goals help frame discussions.
- Who gives more than takes? How are compromises reached?
There should be no real scorecard on this (classic passive/aggressive move, don’t go there), but can you provide examples of where you met in the middle? As in every relationship, it is critical to both give and take.
- Do you meet mostly by exception? When was the last time you did lunch?
I hated my dentist for years, until I met him on a soccer field and saw the whole individual, not just the guy that causes me pain. Commit to meeting your technology/business partner on a regular basis, including periodic out-of-the-office meetings.
- What is your “Marriage Math”?
Psychologist John Gottman, Ph.D., when trying to determine a methodology to predict which marriages will last and which will end in a divorce, found that when the ratio of positive to negative interactions fell below 5:1 (5 positive for every negative interaction), divorce was likely. Do you have a healthy line of communication with your partner, or does the communication quickly degrade into contempt and name calling?
Hopefully now you agree that a look at your relationship with your technology/business partner is of value. Every relationship requires investment and commitment on both sides. Consider bringing AKF to help facilitate these discussions – we are excellent marriage counselors.
June 29, 2018 | Posted By: Marty Abbott
Following our first article on the conflict between licensed products and SaaS solutions, we now present a necessary (but not always sufficient) list of SaaS operating principles. These principles are important whether one is building a new XaaS (PaaS, SaaS, IaaS) solution, or migrating to an XaaS solution from an on-premise, licensed product.
These principles are developed from the perspective of the product and engineering organization, but with business value (e.g. margins) in mind. They do not address the financial or other business operations needs within a SaaS product company.
1. Build to Market Need – Not Customer Want
Reason: Smaller product (less bloat). Lower Cost of Goods (COGS). Lower R&D cost.
Customer “wants” now help inform and validate professional product management analysis as to what true market “need” is for a product. Products are based on the minimum viable product concept and iterated through a scientific method of developing a hypothesis, testing that hypothesis, correcting where necessary and expanding it when appropriate. Sales teams need to learn to sell “the cars that are on the lot”, not sell something that is not available. Smaller, less bloated, and more configurable products are cheaper to operate, and less costly to maintain.
2. Build “-ilities” First
Reason: Availability, Customer Retention, and high Revenue Retention.
Availability, scalability, reliability, and nearly always on “utility” are now a must have, as the risk of failure moves from the customer in the on-premise world to the service provider. No longer can product managers ignore what was once known as NFRs or “Non-Functional Requirements”. The solution must always meet, as a bare minimum, the availability and response times necessary to be successful while scaling in a multi-tenant way under hopefully (significant) demand.
3. Design to be Monitored
Reason: Availability, Customer Retention, and high Revenue Retention.
Sometimes considered part of the “ilities” to achieve specifically high availability, we call this one out specifically as engineers must completely change behavior. Like the notion of test driven development, this principle requires that engineers think about how to log and create data to validate that the solution works as expected before starting development. Gone are the days of closing a defect as “working as designed” – we now promote solutions to production that can easily validate usage and value creation and we concern ourselves only with “Does it work as expected for customer benefit?”
4. Design for Easy and Efficient Operations – Automate Always
Reason: Lower COGS. Availability.
Everything we do to develop product and deliver a customer experience needs to be enabled through automation: environment creation, product releases, system analysis, database upgrades, etc. Whether this happens within a traditional operations team, a DevOps group, or product engineering, automation is part of our “whole product” and “ships” with our service solution upgrades.
5. Design for Low Cost of Operations
Reason: Lower COGS.
Automation helps us lower the cost of operations, but we must also be cognizant of infrastructure related costs. How do we limit network utilization overall, such that we can lower our costs associated with transit fees? How do we use less memory footprint and fewer compute cycles to perform the same activity, such that we can reduce server infrastructure related costs? What do we really need to keep in terms of data to reduce storage related costs? Few if any of these things are our concerns on-premise, but they all affect gross margin when we run a SaaS business.
6. Engage Developers in Incident Resolution and Post Mortems
Reason: Faster Time to Resolution. Availability. Better Learning Processes.
On premise companies value developer time because new feature creation trumps everything else. SaaS companies know that restoring services is more important than anything else. Further, developers must understand and “feel” customer pain. There is no better motivation for ensuring that problems do not recur, and that we create a learning organization, than ensuring engineers understand the pain and cost of failure.
7. Configuration over Customization
Reason: Smaller Product. Lower COGS. Lower R&D Cost. Higher Quality.
One “core”, lots of configuration, no customization is the mantra of every great SaaS company. This principle enables others, and aids in creating a smaller code base with lower development costs and lower costs of operations. Lower cyclomatic complexity means higher quality, lower cost of future augmentation, and lower overall maintenance costs.
8. Create and Maintain a Homogeneous Environment
Reason: Lower COGS.
Just as the software that enables our product should not be unique per customer, similarly the infrastructure and configuration of our infrastructure should not be unique per customer. Everyone orders off the menu, and the chef does not create something special for you. The menu offers opportunities for configurations – sometimes at costs (e.g. bacon on your burger) but you cannot substitute if the menu does not indicate it (e.g. no chicken breast).
9. Publish One Single Release for All Customers
Reason: Decreased COGS. Decreased R&D Cost (low cost of maintenance).
The licensed software world is lousy with a large engineering burden associated with supporting multiple releases. The SaaS world attempts to increase operating margins by significantly decreasing, ideally to one, the number of versions supported for any product.
10. Evolve Your Services, Don’t Revolutionize Them
Reason: Easier Upgrades. Availability. Lower COGS.
No customer wants downtime associated with an upgrade. The notion is just ridiculous. How often does your utility company take your service offline (power for instance) because they need to perform an “upgrade”? Infrequently (nearly never) at best – and if/when they do it is a giant pain for all customers. Our upgrades need to be simple and small, capable of being reversed, and “boil the frog” as the rather morbid saying goes. No more large upgrades with significant changes to data models requiring hours of downtime and months of preparation on the part of a customer.
11. Provide Frequent Updates
Reason: Smaller product (less bloat). Lower COGS. Lower R&D cost. Faster Time to Market (TTM).
Pursuant to the evolutionary principle above, updates need to happen frequently. These two principles are really virtuous (or if not performed properly, vicious) when related to each other. Doing small upgrades, as solutions are ready to ship, means that customers (and our company) benefit from the value the upgrades engender sooner. Further, small upgrades mean incremental (evolutionary) changes. Faster value, smaller impact. It cures all ills and is both a dessert topping and floor wax.
12. Hire and Promote Experienced SaaS Talent
Reason: Ability to Achieve Goals and SaaS Principles.
Running SaaS businesses, and developing SaaS products require different skills, knowledge and behaviors than licensed, on premise products. While many of these can be learned or trained, attempting to be successful in the XaaS world without ensuring that you have the right knowledge, skills, and abilities on the team early on is equivalent to assuming that all athletes can play all sports. Assembling a football team out of basketball players is unlikely to land you in the Super Bowl.
13. Restrict Access
Reason: Lower Risk. Higher Availability.
Licensed product engineers rarely have access to customer production environments. But in our new world, its easy to grant it and in many ways it can be beneficial. Unfortunately, unfettered access increases the risk of security breaches. As such, we both need to restrict access to production environments and ensure that the engineering team has access to appropriate trouble shooting data outside of the production environment to ensure rapid problem and incident resolution.
14. Implement Multi-Tenancy
Reason: Lower COGS.
Solutions should be multi-tenant to enable better resource utilization, but never all-tenant to ensure that we have proper fault isolation and appropriate availability.
How is your SasS product performing? Need a checkup? We can help
Need help with your SaaS migration?
June 29, 2018 | Posted By: James Fritz
The simplest definition of risk is the probability an incident occurs times the impact of the incident. The higher this number is, the higher the risk. With the identification of risk then comes the method of how to handle risk. The four risk management strategies are mitigation, transference, avoidance and acceptance. Three of these strategies can be utilized by a company that finds itself in the later stages of the Technology Adoption Life Cycle, where they have a solid customer base that is looking for stability. However, only mitigation works for companies that are attempting to enter the market and capture the Innovators and Early Adopters.
What are the Risk Management Strategies?
The four strategies mentioned are:
- Mitigation: Knowing an incident is going to occur and attempting to minimize its impact or probability.
- Transference: Transferring the burden of the risk to someone else. Usually done through insurance.
- Avoidance: Not doing an activity, thereby eliminating the probability of the incident occurring.
- Acceptance: Realizing that you have done all you can and must accept both the impact and the probability of an incident.
As stated earlier, for new products to the market, the most viable option of managing risk is through mitigation. If a company was to avoid the risk, then that would potentially deprecate the product they were trying to bring to market. By transferring a company runs the risk of losing its early customer base when an incident occurs, likewise with accepting the risk.
So how can you handle the risk as a company attempting to attract Innovators and Early Adopters? The answer is in using the AKF Risk Model.
AKF and Risk
In his article, Evaluating Technology Risk Using Mathematics, Geoffrey Weber lays out a finite method of identifying a certain level of risk a company may be at utilizing impact, probability and the ability to detect. This method returns concrete data that would then allow a company to prioritize what risk they intended to manage.
Once you have prioritized your risk and are now determining how to mitigate it, the five main components to the AKF Risk Model can be used. Depending on whether impact or probability was the issue, the below areas can be focused on to mitigate the overall risk.
How to Affect Probability
- Payload Size: By limiting the sizes of change that are introduced to a product, the probability of an incident occurring is limited. This ensures that if an incident occurs it only affects a small aspect of the product.
- Testing: Testing aims to mitigate potential incidents prior to them occurring. Unit testing and code review will not catch 100% of the issues that may arise, but it helps to lessen the probability that something was to occur.
How to Affect Impact
- Fault Isolation Architecture: By utilizing the AKF Scale Cube, Fault Isolation will be built into the product. This helps to mitigate the impact of an incident to just a small percentage of the overall customer base.
- Monitoring: The establishment of proper monitoring provides developers a way to identify issues just prior to occurring, or as they occur. With real-time monitoring an incident can be identified as it is occurring and the company can quickly act to mitigate the impact.
- Rollback: If an incident has occurred, being able to rollback the product helps to lessen the impact. If a stable version of the product exists then that can be implemented while the company identifies what happened with the newer deployment, allowing customers to continue with their interactions.
Where does your company fit in?
If you find yourself with a customer base of Laggards and Late Majority then you have a lot more options for managing risk. Issues that arise through the use of transference, acceptance and avoidance still allow you to maintain solid sales.
If you find yourself with a customer base of Innovators and Early Adopters then you have less options for managing risk. Mitigation is the most beneficial option that is available to you for you to continue to move through the cycle and start to pick up the Early Majority.
If the AKF Risk Model is something you would like to learn more about, feel free to reach out to AKF.
June 28, 2018 | Posted By: Pete Ferguson
In our technical due diligence reviews we conduct for investment firms, we see five common mistakes by both young startups and in well-seasoned companies alike:
Lack of Security Mindset During Development: Security gets a bad rapport for being overkill, costly, and a hindrance to fast growth when security teams fall into the “No It All” philosophy of saying “No” especially when their focus on security overshadows any considerations of what revenue they halt or hinder in saying “No.” This tends to be true when organizations do not share the same risk related goals, or when the security organization feels that it is only responsible for risk alone and not the maximization of revenue with appropriate risk. Good security requires a team effort and is difficult to add into a well-oiled machine as an afterthought. It is much easier to do at the onset of writing code and including checks for security with the automation of testing and QA will ensure security is baked in. Hold developers responsible for security and security responsible for enabling revenue while protecting the company in a common sense approach.
Failing to Separate Duties: Usually as small companies grow larger, everyone continues to have access to everything, and many original employees wear multiple hats. Making sure no one individual is responsible for development to production gains points in other areas like business continuity as well. Separation of duties does not just exist between two employees - the separation can also be created by automation as is the case in many successful continuous deployment/delivery shops deploying directly into production. Automated testing will additionally help with code compliance and quality assurance. Automate access control by role wherever possible and regularly have business owners review and sign off on access control lists (at least monthly). My colleague James Fritz goes into greater detail in a separate article.
Not Segregating and Encrypting Sensitive Data At Rest: Encrypting all data at rest may not make sense, but segregating all personal identifiable information (PII), financial, medical, and any other sensitive or confidential information into a separate, encrypted database is a better attack plan. Even if you are not required to be under PCI or HIPPA or other regulations, limiting exposure to your customer and company confidential information is a best practice. You can add additional protections by tokenizing the information wherever possible. When there is a security breach (probably safe in today’s climate to say “when” not “if” there is a breach), it is really hard to try and explain to your customers why you didn’t encrypt their sensitive data at all times. Given recent headlines, this is now considered entry level security table stakes and a safeguard required by your customers - and no longer a nice to have optional item.
Checklist Only Mentality: In our experience, many auditors have been focused primarily only on checklist compliance until recently - but times are changing and the true test of compliance is moving from a checklist and certification to trying to explain your most recent data breach to your customers and stakeholders. Constantly working towards safeguarding your customers and serving them will likely mean you easily fall within current and future security requirements or can get there quickly. It is much easier to design security into your products now than to be relegated to doing it later because of a misstep and it will do a lot more for customer adoption and retention.
These are just a summary of five common findings – there are certainly many others. The common denominator we find with successful companies is that they are thinking holistically about their customers by automatically building security into their products and are able to scale and expand into new market segments more readily. Building in security as a part of a holistic approach will address areas in business continuity, disaster recovery, resiliency, being able to roll back code, etc.
Under the Hood - Our Security Questions for Technical Due Diligence
In our assessments, we cover each of the areas below - using these questions as guidelines for conversation - not a point-by-point Q&A. These are not a yes/no checklist, we rank our target based on other similarly sized clients and industry averages. Each question receives a ranking from 1-4, with 4 being the highest score and then we graph our findings against similar and competing companies within the market segment.
- Is there a set of approved and published information security policies used by the organization?
- Has an individual who has final responsibility for information security been designated?
- Are security responsibilities clearly defined across teams (i.e., distributed vs completely centralized)?
- Are the organization’s security objectives and goals shared and aligned across the organization?
- Has an ongoing security awareness and training program for all employees been implemented?
- Is a complete inventory of all data assets maintained with owners designated?
- Has a data categorization system been established and classified in terms of legal/regulatory requirements (PCI, HIPAA, SOX, etc.), value, sensitivity, etc.?
- Has an access control policy been established which allows users access only to network and network services required to perform their job duties?
- Are the access rights of all employees and external party users to information and information processing facilities removed upon termination of their employment, contract or agreement?
- Is multi-factor authentication used for access to systems where the confidentiality, integrity or availability of data stored has been deemed critical or essential?
- Is access to source code restricted to only those who require access to perform their job duties?
- Are the development and testing environments separate from the production/operational environment (i.e., they don’t share servers, are on separate network segments, etc.)?
- Are network vulnerability scans run frequently (at least quarterly) and vulnerabilities assessed and addressed based on risk to the business?
- Are application vulnerability scans (penetration tests) run frequently (at least annually or after significant code changes) and vulnerabilities assessed and addressed based on risk to the business?
- Are all data classified as sensitive, confidential or required by law/regulation (i.e., PCI, PHI, PII, etc.) encrypted in transit?
- Is testing of security functionality carried out during development?
- Are rules regarding information security included and documented in code development standards?
- Has an incident response plan been documented and tested at least annually?
- Are encryption controls being used in compliance with all relevant agreements, legislation and regulations? (i.e., data in use, in transit and at rest)
- Do you have a process for ranking and prioritizing security risks?
June 26, 2018 | Posted By: James Fritz
For many companies the thought of being audited is never a fun one. Especially when most audits are focused on documentation that is slow to react to an ever-changing environment in the Technology sector. If your company is attempting to utilize Agile methods but you fall under strict regulations for the Payment Card Industry (PCI), Health Insurance Portability and Accountability Act (HIPAA) or Sarbanes-Oxley (SOX), then there is usually a high level of trepidation when it comes to full adoption of Agile into your development. One of the biggest pitfalls that companies run into is in thinking that Separation of Duties (SoD) cannot be achieved in an Agile environment.
What is Separation of Duties?
In short, SoD is ensuring that no one person holds all the keys to enact any change. If a developer can create a product and push it into production with no checks, then this is a violation of SoD. The two basic premises this is designed to create is the elimination of fraud and the detection of errors. So, by having checks and balances along the way your company is more secure against fraud and error that would naturally occur with the introduction of human intervention.
This separation is key for any of the above-mentioned standards, PCI, HIPAA or SOX. An excerpt out of the PCI Data Security Standards states,
“In environments where one individual performs multiple roles (for example application development and implementing updates to production systems), duties should be assigned such that no one individual has end-to-end control of a process without an independent checkpoint.”
All three of the regulations share similar verbiage. No one person can have control from the beginning to the very end. But that is the inherent beauty of a fully adopted Agile system. No one person has complete control. The system relies heavily upon automation.
If you look at a normal Agile Development process (see graphic below) there are still clear timelines and interjections of human input, on top of the vast automated input provided. This creates a system that brings in a committee of personnel to achieve certain goals every sprint. When automation is laid on top of the human based work then another layer of checks and balances is established. And this automation is not just a random compiler that was pulled off the internet. It has agreed upon criteria and testing mechanisms that have been refined over time by a team that documents what is being tested against. So if an auditor were to ask, “How do you check against X?” then you would have a clear answer from the documentation that “X” is checked and logged at this stage in the process.
If your developers are utilizing proper version control and your automated systems are documented then you would have a higher chance of catching any changes, whether purposeful or accidentally, than you would in another life cycle development model. If left purely up to humans, then the ability to get distracted and lose focus would end up creating more errors than a more highly automated system would. Agile practices can help reduce human error.
Additionally, if you are looking to capture any sort of fraud then you would be able to implement more monitoring on your privileged developers specifically. Between this monitoring and the automated testing, it should be easily caught if one of your developers goes outside of a scope that they are supposed to be working in, thus protecting any sensitive data that would be required under PCI, HIPAA or SOX regulations. Not only will this automation and monitoring help stop fraud internally, it also can help detect actors on the outside attempting to gain access to your sensitive data.
Do I actually need Separation of Duties?
Yes. And if utilizing an Agile framework you already have that separation. Everything about creating an Agile environment lends itself towards highly auditable and loggable activity. This same activity is what is required for SoD. Checks and balances are achieved when you create, document and continually refine the process of how your company delivers viable product sprint after sprint. Auditors are quick to say that Agile doesn’t support the necessary separation, but usually that is because there isn’t a single person that they can put their finger on that does a certain activity. Instead they have a refined and documented product, both heavily monitored and logged, that is the independent checkpoint they require to ensure that no individual has complete end-to-end control.
If you have any further questions on how your Agile environment follows Separation of Duties, feel free to contact AKF.
June 22, 2018 | Posted By: Marty Abbott
Photo Credit: www.icomputerdenver.com
Attempting to migrate on-premise, licensed products to a recurring revenue SaaS (XaaS) solution is perhaps one of the most difficult things a company can do. As Dave Swenson writes in SaaS Migration Challenges, changes are often necessary to the fabric of the company, including the company culture and mindset.
In many cases, the principles that make a company successful in the on-premise, packaged software model are insurmountable barriers to success in XaaS. As an example of the difficulty of this transition, consider one group of opposing principles endemic to SaaS transition: building to customer want (licensed model) versus building to market need (SaaS model). For years, on-premise providers were successful by filling their product backlog with customer requests (or “wants”). When requests were specific to a single customer and didn’t appear to be extensible to a broader base, a revenue stream associated with professional services was created. Professional services, or a system integrator, would make modifications to source code creating a unique “version” for that customer. This approach led to bloated products, oftentimes with a single customer using less than 30% of a product’s capabilities. Moreover, release “skew” developed, increasing the cost of maintenance of multiple releases in “the wild”.
Contrast this with a product approach that attempts to “discover” the intersection of true market need. Product backlogs are built primarily on thorough market analysis coupled with experimentation. Customer asks (or “wants”) help inform prioritization, but are not the primary driver of product backlog. Sales organizations focus on selling the existing product rather than driving it, and the role of professional product managers emerge. This notion of “want” vs. “need” is critical to the success of a company’s transformation.
The difficulty of such a journey from “want” in the licensed product world to “need” in the XaaS world should be clear to everyone. If the difficulty is high in a transition, imagine the stressful forces a company must endure to live in both worlds for an extended period of time. Sales organizations that build upon selling a customer “whatever they desire” often revolt at being relegated to selling only what exists. Product management organizations that previously acted like “business analysts” taking requirements, now need to perform real market analysis, understand discovery and experimentation and get “closer to the market”. How does one build a company that can be successful in both worlds at once? Difficult indeed given the chasm between one approach and the other.
The figures below indicate a handful of the most common opposing forces as examples of how companies need to “reinvent themselves” and think differently to be successful as a XaaS provider.
Purchase versus Lease
At the root of many of the differences between Licensed/On-Premise products (or licensed products that are hosted in ASP-like models) and true XaaS products is the notion of what the customer expects with the sales transaction. In the licensed model, customers want outcomes but understand they are purchasing software. The customer understands that much of the risk of running the software including the “ilities” like availability and scalability are borne by the customer.
In the SaaS world, the customer mindset is that of leasing outcomes. Comparatively low switching costs (compared to on-premise) allows the customer to more easily move should they not achieve their desired outcomes. As such, service providers must move closer to the customer and better understand, in real time, customer expectations and the product’s performance to those expectations.
To highlight this difference, consider the difference between renting a home and owning a home. In most markets, the total cost of rent against the combination of amortized home values over an equivalent period and the residual (resale value) of the home indicates that renting is more expensive. But a majority of the risk of renting exists with the leasor including home maintenance, risk associated with loss to catastrophe, etc. The same is true with SaaS compared to owning a product: SaaS requires the service provider to take on significantly greater risk compared to selling a product to a customer.
Want versus Need
Henry Ford perhaps said it best when he said, “If I’d asked customers what they want, they would have told me a faster horse.” Steve Jobs agreed with Ford, indicating that people don’t know what they want until you show it to them. Here Steve is referring to true “need” vs. a stated customer “want”. Licensed products often start with customer wants. Sales teams garner desires, and either force them into product backlogs or sell modifications through professional services. Because the customer purchases the software, the company isn’t as incented as the SaaS model to ensure that the customer gets value from that which they purchased. The product development lifecycle, regardless of how it is represented to the customer, is almost always in fact waterfall.
XaaS companies attempt to “find” (aka discover) market need. Product managers analyze markets, establish goals, create hypotheses to achieve these goals, and co-create solutions to test these hypotheses with their engineering teams. What a customer may indicate as a “want” may help inform prioritization of the backlog especially if there is an intersection in customer desired outcomes.
Operations Afterthought versus Designed for Efficient Operations
In the on-premise world, the customer is responsible for the cost of operations and the effective and efficient operations of any solution. Considerations such as monitoring for early fault and incident detection are at best after thoughts in the world of on-premise software development, giving rise to an industry specializing in after market monitoring. The reasoning behind this approach is clear – time spent producing efficient solutions that can be easily maintained takes away from new feature development and new revenue associated with those features.
In the SaaS world, we are responsible for our own cost of operations and therefore building products that operate efficiently with low cost of goods sold is important to achieving high gross margins and profit margins. Furthermore, as we are responsible for all the “ilities” (described below), we must design our solutions to be monitored in real time to detect errors, faults, problems and incidents.
Revolutionary versus Evolutionary
Put simply, on-premise models rely on “fork-lift” major product upgrades that often take significant effort, downtime and customer expense to implement. The latter is often seen as a bonus to the provider of software, as upgrades can become an additional revenue stream for professional services both in assisting with the upgrade and re-implementing customizations lost during the upgrade.
XaaS products simply can’t afford the downtime, or cost associated with revolutionary implementations. As such data models and functionality evolve quickly but iteratively with the capability to completely roll back or “undo” both schema modifications and functionality additions.
Many Releases versus Single Release
Because on-premise companies can rarely force patch or release adoption across their installed base (they typically have no access to these environments), the number of releases they support may be in the 100s or even 1000s. This release skew increases operating expenses as a large portion of engineering may be spent supporting problems across a very large number of releases.
SaaS companies recognize that release skew increases the cost of development, and as such, force most customers into a single (or no more than 3) releases. Companies that fail in this principle face on-premise-like operating margins as engineering budgets increase to support releases.
Customization Equals Revenue versus Customization Equals Cost of Operations
On premise companies see customization as a valuable revenue stream, often at the expense of increased engineering budgets to support the maintenance of these customizations and the distractions they cause.
SaaS companies abhor customization and favor configuration, leading to higher quality products and better overall customer experiences.
Features First versus “-ilities” First
While all of the XaaS principles are important, one of the most difficult to grasp for our on-premise clients is the notion that the “-ilities” (sometimes called NFRs or Non Functional Requirements in waterfall development) like availability, scalability, reliability, etc are now the most important feature within their products. It pains me to even call them a feature, as they are the “table stakes” (the ante, the price we pay) for the game in which we participate.
License/on-premise companies can punt on the “-ilities” because the customer is accountable for running the solution and bears a good portion of the risk. Availability impacts like hardware failures are ultimately the responsibility of the customer. The decision to pay for high availability represented through infrastructure is also the responsibility of the customer.
These “-ilities” steal time from engineering teams – and appropriately so – to ensure the quality of the service provided. Feature production is still important – it’s just second in line behind ensuring that the solution is available for value creation.
Developers Protected versus Developers Front-Lines
Because feature production is highly correlated with revenue in the licensed software model, companies often protect developer time with multiple layers of support. When customers have outages, companies attempt to resolve them with lower cost support personnel before stealing development time from critical projects. Licensed product companies can get away with this because when there is a failure at a customer site a comparatively small portion of revenue and customer base is at risk.
In the XaaS model, multiple customers are likely impacted with any outage due to increased tenancy. And because we believe that availability is one of the most important features, we must respond with the resources necessary to resolve an incident as quickly as possible.
This is yet another sticking point with companies making the SaaS transition: sometimes significant portions of engineering organizations do not want to be on call. Those people simply have no role in a utility company-like model where “dial tone” is essential.
Part 2 describes a set SaaS principles derived from the conflict inherent to a migration from licensed products to the XaaS model.
The move from on-premise, licensed software revenue models to XaaS models is difficult. Many of the principles that make an on-premise company successful are diametrically opposed to success in SaaS. The longer a company must operate in both models, the more likely the culture, mindset and fabric of the company will be stretched. Many successful companies decide to operate the two businesses separately, to ensure that one culture is not influenced negatively by the other.
AKF Partners has aided a number of companies in their journey from on-premise to SaaS. Read more about our SaaS migration services.
June 18, 2018 | Posted By: Pete Ferguson
In my short tenure at AKF, I have found the topic of Stored Procedures (SPROCs) to be provocatively polarizing. As we conduct a technical due diligence with a fairly new upstart for an investment firm and ask if they use stored procedures on their database, we often get a puzzled look as though we just accused them of dating their sister and their answer is a resounding “NO!”
However, when conducting assessments of companies that have been around awhile and are struggling to quickly scale, move to a SaaS model, and/or migrate from hosted servers to the cloud, we find “server huggers” who love to keep their stored procedures on their database.
At two different clients earlier this year, we found companies who have thousands of stored procedures in their database. What was once seen as a time-saving efficiency is now one of several major obstacles to SaaS and cloud migration.
In our book, Scalability Rules: Principles for Scaling Web Sites, (Abbott, Martin L.. Scalability Rules: Principles for Scaling Web Sites) Marty outlines many reasons why stored procedures should not be kept in the database, here are the top 8:
- Cost: Databases tend to be one of the most expensive systems or services within the system architecture. Each transaction cost increases with each additional SPROC. Increase cost of scale by making a synchronous call to the ERP system for each transaction – while also reducing the availability of the product platform by adding yet another system in series – doesn’t make good business sense.
- Creates a Monolith: SPROCs on a database create a monolithic system which cannot be easily scaled.
- Limits Scalability: The database is a governor of scale, SPROCS steal capacity by running other than relational transactions on the database.
- Limits Automated Testing: SPROCs limit the automation of code testing (in many cases it is not as easy to test stored procedures as it is the other code that developers write), slowing time to market and increasing cost while decreasing quality.
- Creates Lockin: Changing to an open-source or a NoSQL solution requires the need to develop a plan to migrate SPROCs or replace the logic in the application. It also makes it more difficult to switch to new and compelling technologies, negotiate better pricing, etc.
- Adds Unneeded Complexity to Shard Databases: Using SPROCs and business logic on the database makes sharding and replacement of the underlying database much more challenging.
- Limits Speed To The Weakest Link: Systems should scale independently relative to individual needs. When business logic is tied to the database, each of them needs to scale at the same rate as the system making requests of them - which means growth is tied to the slowest system.
- More Team Composition Flexibility: By separating product and business intelligence in your platform, you can also separate the teams that build and support those systems. If a product team is required to understand how their changes impact all related business intelligence systems, it will slow down their pace of innovation as it significantly broadens the scope when implementing and testing product changes and enhancements.
Per the AKF Scale Cube, we desire to separate dissimilar services - having stored procedures on the database means it cannot be split easily.
Need help migrating from hosted hardware to the cloud or migrating your installed software to a SaaS solution? We have helped hundreds of companies from small startups to well-established Fortune 50 companies better architect, scale, and deliver their products. We offer a host of services from technical due diligences, onsite workshops, and provide mentoring and interim staffing for your company.
June 11, 2018 | Posted By: Marty Abbott
Of the many SaaS operating principles, perhaps one of the most misunderstood is the principle of tenancy.
Most people have a definition in their mind for the term “multi-tenant”. Unfortunately, because the term has so many valid interpretations its usage can sometimes be confusing. Does multi-tenant refer to the physical or logical implementation of our product? What does multi-tenant mean when it comes to an implementation in a database?
This article first covers the goals of increasing tenancy within solutions, then delves into the various meanings of tenancy.
Multi-Tenant (Multitenant) Solutions and Cost
One of the primary reasons why companies that present products as a service strive for higher levels of tenancy is the cost reduction it affords the company in presenting a service. With multiple customers sharing applications and infrastructure, system utilization goes up: We get more value production out of each server that we use, or alternatively we get greater asset utilization. Because most companies view the cost of serving customers as a “Cost of Goods Sold’, multitenant solutions have better gross margins than single-tenant solutions. The X Axis of the figure below shows the effect of increasing tenancy on the cost of goods sold on a per customer basis:
Interestingly, multitenant solutions often “force” another SaaS principle to be true: No more than 1 to 3 versions of software for the entire customer base. This is especially true if the database is shared at a logical (row-level) basis (more on that later). Lowering the number of versions of the product, decreases the operating expense necessary to maintain multiple versions and therefore also increases operating margins.
Single Tenant, Multi-Tenant and All-Tenant
An important point to keep in mind is that “tenancy” occurs along a spectrum moving from single-tenant to all-tenant. Multitenant is any solution where the number of tenants from a physical or logical perspective is greater than one, including all-tenant implementations. As tenancy increases, so does Cost of Goods Sold (COGS from the above figure) decrease and Gross Margins increase.
The problem with All-Tenant solutions, while attractive from a cost perspective, is that they create a single failure domain [insert https://akfpartners.com/growth-blog/fault-isolation], thereby decreasing overall availability. When something goes poorly with our product, everything is off line. For that reason, we differentiate between solutions that enable multi-tenancy for cost reasons and all-tenant solutions.
The Many Meanings and Implementations of Tenancy
Multitenant solutions can be implemented in many ways and at many tiers.
Physical and Logical
Physical multi-tenancy is having multiple customers share a number of servers. This helps increase the overall utilization of these servers and therefore reduce costs of goods sold. Customers need not share the application for a solution to be physically multitenant. One could, for instance, run a webserver, application server or database per customer. Many customers with concerns over data separation and privacy are fine with physical multitenancy as long as their data is logically separated.
Logical multi-tenancy is having data share the same application. The same webserver instances, application server instances and database is used for any customer. The situation becomes a bit murkier however when it comes to databases.
Different relational databases use different terms for similar implementations. A SQLServer database, for instance, looks very much like an Oracle Schema. Within databases, a solution can be logically multitenant by either implementing tenancy in a table (we call that row level multitenancy) or within a schema/database (we call that schema multitenancy). In either case, a single instance of the relational database management system or software (RDBMS) is used, while customer transactions are separated by a customer id inside a table, or by database/schema id if separated as such.
While physical multitenancy provides cost benefits, logical multitenancy often provides significantly greater cost benefits. Because applications are shared, we need less system overhead to run an application for each customer and thusly can get even greater throughput and efficiencies out of our physical or virtualized servers.
Depth of Multi-Tenancy
The diagram below helps to illustrate that every layer in our service architecture has an impact to multi-tenancy. We can be physically or logically multi-tenant at the network layer, the web server layer, the application layer and the persistence or database layer.
The deeper into the stack our tenancy goes, the greater the beneficial impact (cost savings) to costs of goods sold and the higher our gross margins.
The AKF Multi-Tenant Cube
To further the understanding of tenancy, we introduce the AKF Multi-Tenant Cube.
The X axis describes the “mode’ of tenancy, moving from shared nothing, to physical, to logical. As we progress from sharing nothing to sharing everything, utilization goes up and cost of goods sold goes down.
The Y axis describes the depth of tenancy from shared nothing, through network, web, app and finally persistence or database tier. Again, as the depth of tenancy increase, so do Gross Margins.
The Z axis describes the degree of tenancy, or the number of tenants. Higher levels of tenancy decrease costs of goods sold, but architecturally we never want a failure domain that encompasses all tenants.
When running a XaaS (SaaS, etc) business, we are best off implementing logical multitenancy through every layer of our architecture. While we want tenancy to be high per instance, we also do not want all tenants to be in a single implementation.
AKF Partners helps companies of all sizes achieve their availability, time to market, scalability, cost and business goals.
‹ First < 3 4 5 6 7 > Last ›