A Financial Analogy for Technical Debt
October 12, 2018 | Posted By: Bill Armelin
Understanding Technical Debt
During the course of our client engagements, there are a few common topics or themes that are always discussed, and the clients themselves usually introduce them. One such area is technical debt. Every team has it, every team believes they have too much of it, and every team struggles to explain why it’s important to address it.
Let’s start by defining what technical debt means. It is the difference between doing something the “desired” or “best” way and doing something quickly (i.e. reduce time to market). The difference results in the company taking on “debt” within the solution. Technical debt requires acting with forethought. In other words, you only assume technical debt knowingly and with commission. Acts of omission (forgetting to plan or do something) do not count as debt. Our partners in business may think we are hiding the truth if we do not clearly delineate the difference between debt (known assumptions) and mistakes, failures or other issues related to maintenance.
The following list provides examples of things that are not tech debt:
- Software defects (unless we decide to NOT fix them for an extended period of time – but defects are still human failures – not debt.)
- Failures in design that are not previously tagged as debt.
- Failures to identify scalability bottle necks.
- Poor choices in technology components that fail to scale.
- Failure to properly identify infrastructure failures, or high failure rates of vendors in infrastructure.
A Financial Analogy for Tech Debt
When you hear the words “technical debt”, it invokes a negative connotation. However, the judicious use of tech debt is a valuable addition to your product development process. Tech debt is analogous to financial debt. Companies can raise capital to grow their business by either issuing equity or issuing debt. Issuing equity means giving up a percentage of ownership in the company and dilutes current shareholder value. Issuing debt requires the payment of interest but does not give up ownership or dilute shareholder value. Issuing debt is good, until you can’t service it. Once you have too much debt and cannot pay the interest, you are in trouble.
Tech debt operates in the same manner. Companies use tech debt to defer performing work on a product. As we develop our minimum viable product, we build a prototype, gather feedback from the market, and iterate. The parts of the product that didn’t meet the definition of minimum or the decisions/shortcuts made during development represent the tech debt that was taken on to get to the MVP. This is the debt that we must service in later iterations. In fact, our definition of done must include the servicing of the resulting tech debt. Taking on tech debt early can pay big dividends by getting your product to market faster. However, like financial debt, you must service the interest. If you don’t, you will begin to see scalability and availability issues. At that point, refactoring the debt becomes more difficult and time critical. It begins to affect your customers’ experience.
Many development teams have a hard time convincing leadership that technical debt is a worthy use of their time. Why spend time refactoring something that already “works” when you could use that time to build new features customers and markets are demanding now? The danger with this philosophy is that by the time technical debt manifests itself into a noticeable customer problem, it’s often too late to address it without a major undertaking. It’s akin to not having a disaster recovery plan when a major availability outage strikes. To get the business on-board, you must make the case using language business leaders understand – again this is often financial in nature. Be clear about the cost of such efforts and quantify the business value they will bring by calculating their ROI. Demonstrate the cost avoidance that is achieved by addressing critical debt sooner rather than later - calculate how much cost would be in the future if the debt is not addressed now. The best practice is to get leadership to agree and commit to a certain percentage of development time that can be allocated to addressing technical debt on an on-going basis. If they do, it’s important not to abuse this responsibility. Do not let engineers alone determine what technical debt should be paid down and at what rate – it must have true business value that is greater than or equal to spending that time on other activities.
Just as with debt that a company assumes, in and of itself, technical debt is not bad. It can be looked at as a leveraging tool to optimize the technology resources in the short term - delaying a hardware tech refresh or the release date for HTML 5. Delaying attention to address technical issues allows greater resources to be focused on higher priority endeavors. The absence of technical debt probably means missed business opportunities– use technical debt as a tool to best meet the needs of the business. However, excessive technical debt will cause availability and scalability issues, and can choke business innovation (too much engineering time dealing with debt rather than focusing on the product).
Develop a technology balance sheet and profit and loss (income) statement to discuss tech debt with the business in a manner they understand – finance. Let’s first look at the balance sheet, where Assets = Liabilities + Equity. Our assets are the engineering time spent creating the product. Liabilities are the principle of the tech debt (i.e. the difference between “desired” and “actual.” Equity is the remainder, or the engineering resources spent creating the product while not contributing to tech debt.
Here is an example of a technology balance sheet:
To further the financial analogy, we need to have a technology P&L statement. Here, the interest on tech debt is the difficulty or increased level of effort in modifying something in subsequent releases. This manifests as a reduction in developer productivity per value created. The more debt you take on or less principle you pay down, the higher your interest payment becomes, and the cost to the organization.
Dedicating resources on an ongoing basis to service technical debt can be a challenging discussion with the business. Resources are always limited and employing them in the manner which best benefits the business is a critical business priority decision. Similar to the notion of debt within business, you should never take on technical debt without a plan to pay the interest (increased future cost of development) and principal (fixing the difference between appropriate and as-is). Relating technical debt to financial debt can help those outside of your technology organization grasp the concept and understand the need to keep technical debt under control.
One way to make the concept of debt real is to estimate, for any debt item, the amount of “interest” one will need to pay in the future to modify the solution in question.
- For the benefit of time to market, you decide to “hard code” a number of “display strings” that you’d rather set aside in a resource file to modify and translate later.
- You save 2 weeks of development time, creating a 2-week liability on your balance sheet. You have a 2-week principal to fix.
- You estimate that for all future string modifications (or translations) it will take an additional day of development. Your interest is 1 day, payable for each modification.
Just as retiring all financial liabilities at once does not make good business sense, trying to wipe out technical debt in one fell swoop is a bad idea. Continuous service to the technical debt is required to prevent technical liabilities from wiping out technical equity. An informed decision to increase debt service to reduce the principal will result in more productive product development time (smaller debt requires less on-going service). A short-term decision to reduce tech debt service in favor of a critical product launch may be viable if not used often. Keep track of both your principal (balance sheet) and your interest payments (income statement). Use these to help your business partners with debt related decisions.
Do NOT mix the cost of defects, or other infrastructure and software mistakes with tech debt. Doing so creates two very big problems:
- It becomes harder for the technology team to learn from past mistakes. Mistakes are mistakes and we should use them as learning opportunities. Debt is taken thoughtfully. Track them separately and treat them differently.
- Using the debt term for non-debt related items, will lower the level of trust between you and the business. Businesses don’t for instance “mistakenly” take on debt. Mixing these terms can cause relationship problems.
Additionally, be clear about how you define technical debt, so time spent paying it down is not comingled with other activities. Bugs in your code are not technical debt. Refactoring your code base to make it more scalable, however, would be. A good test is to ask if the path you chose was a conscious or unconscious decision. Meaning, if you decided to go in one direction knowing that you would later need to refactor. You are making a specific decision to do or not to do something knowing that you will need to address it later. Bugs are found in sloppy code, and that is not tech debt, it is just bad code.
So how do you decide what tech debt should be addressed and how do you prioritize? If you have been tracking work with Agile storyboards and product backlogs, you should have an idea where to begin. Also, if you track your problems and incidents like we recommend, then this will show elements of tech debt that have begun to manifest themselves as scalability and availability concerns. Set a budget and begin paying down the debt. If you are working on less 12%, you are not spending enough effort. If you are spending over 25%, you are probably fixing issues that have already manifested themselves, and you are trying to catch up. Setting an appropriate budget and maintaining it over the course of your development efforts will pay down the interest and help prevent issues from arising.
Taking on technical debt to fund your product development efforts is an effective method to get your product to market quicker. But, just like financial debt, you need to take on an appropriate amount of tech debt that you can service by making the necessary interest and principle payments to reduce the outstanding balance. Failing to set an appropriate budget will result in a technical “bankruptcy” that will be much harder to dig yourself out of later.
Tech Debt Takeaways
Here is a list of our tech debt takeaways:
Subscribe to the AKF Newsletter
The Domino or Multiplicative Effect of Failure
September 18, 2018 | Posted By: Pete Ferguson
As part of our Technical Due Diligence and Architectural reviews, we always want to see a company’s system architecture, understand their process, and review their org chart. Without ever stepping foot at a client we can begin to see the forensic evidence of potential problems.
Like that ugly couch you bought when you were in college and still have in your front room, often inefficiencies in architecture, process, and organization are nostalgic memories that have long since outlived their purpose – and while you have become used to the ugly couch, outsiders look in and recognize it as the eyesore it is immediately and often customers feel the inefficiencies through slow page loads and shopping cart issues. “That’s how it has always been” is never a good motto when designing systems, processes, and organizations for flexibility, availability, and scalability.
It is always interesting to hear companies talk with the pride of a parent about their unruly kid when they use words like “our architecture/organization is very complex” or “our systems/organization has a lot of interdependent components” – as if either of these things are something special or desirable!
All systems fail. Complex systems fail miserably, and – like Dominos – take down neighboring systems as well resulting in latency, down time, and/or flat out failure.
ARCHITECTURE & SOFTWARE
Some common observations in hardware/software we repeatedly see:
Problem: Overloaded F5 or other similar firewalls are trying to encrypt all data because Personal Identifiable Information (PII) is stored in plain text, usually left over from a business decision made long ago that no one can quite recall and an auditor once said “encrypt everything” to protect it. Because no one person is responsible for a 30,000 foot view of the architecture, each team happily works in their silo and the decision to encrypt is held up like a trophy without seeing that the F5 is often running hot, causing latency, and is now a bottleneck (resulting in costly requests for more F5s) doing something it has no business doing in the first place.
Solution: Segregate all PII, tokenize it and only encrypt the data that needs to be encrypted, speeding up throughput and better isolating and protecting PII.
Integration (or Rather Lack Thereof) Of Mergers & Acquisitions
Problem: A recent (and often not so recent) flurry of acquisitions is resulting in cross data center calls in and out of firewalls. Purchased companies are still in their own data center or public cloud and the entire workflow of a customer request is crisscrossing the country multiple times not only causing latency, but if one thing goes wrong (remember, everything fails …) timeouts result in customer frustration and lost transactions.
Solution: Integrate services within one isolated stack or swim lane – either hosted or public cloud – to avoid cross data center calls. Replicate services so that each datacenter or cloud instance has everything it needs.
Problem: As the company grew and gained more market share, the search for bigger and better has resulted in a monolithic database that is slow, requires specialized hardware, specialized support, ongoing expensive software licenses, and maintenance fees. As a result, during peak times the database slows everyone and everything down. The temptation is to buy bigger and better hardware and pay higher monthly fees for more bandwidth.
Solution: Break down databases by customer, region, or other Z-Axis splits on the AKF Scale Cube. This has multiple wins – you can use commodity servers instead of large complex file storage, failure for one database will not affect the others, you can place customer data closest to the customer by region, and adding additional servers does not required a long lead time or request for substantial capital expenditure.
PROCESSES & ORGANIZATION
What sets AKF apart is that we don’t just look at systems, we always want to understand the people and organization supporting the system architecture as well and here there are additional multiplicative effects of failure. We have considerable expertise working for and with Fortune 100 companies, startups, and agencies in many different competencies. The common mistakes we see on the organization side of the equation:
Lack of Cross Functional Teams
Problem: Agile Scrum teams do not have all the resources needed within the team to be self sufficient and autonomous. As a result, teams are waiting on other internal resources for approvals or answers to questions in order to complete a Sprint - or keep these items on the backlog because effort estimation is too high. This results in decreased time to market, losing what could have been a competitive advantage, and lower revenue.
Solution: Create cross-functional teams so that each Sprint can be completed with necessary access to security, architecture, QA, and other resources. This doesn’t mean each team needs a dedicated resource from each discipline – one resource can support multiple teams and the information needed can be greatly augmented by creating guildes where the subject matter expert (SME) can “deputize” multiple people on what is required to meet policy and published standards as well as a dedicated channel of communication to the SME greatly simplifying and speeding up the approval process.
Lack of Automation
Problem: It isn’t done enough! As a result, people are waiting on other people for needed approvals. Often the excuse is that there isn’t enough time or resources. In most cases when we do the math, the cost of not automating far outweighs the short-term investment with a continuous long-term payout that automation would bring. We often see that the individual with the deployment knowledge is insecure and doesn’t want automation as they feel their job is threatened. This is a very short-sighted approach that requires coaching for them to see how much more valuable they can be to the organization by getting out of the way of stifling progress!
Solution: Automate everything possible from testing, quality assurance, security compliance, code compliance (which means you need a good architectural review board and standards), etc! Automation is the gift that keeps on giving and is part of the “secret sauce” of top companies who are our clients.
Not Empowering Teams to Get Stuff Done!
Problem: Often teams work in a silo, only focused on their own tasks and are quick to blame others for their lack of success. They have been delegated tasks, but do not have the ability to get stuff done.
Solution: Similar to cross functional teams, each team must also be given the authority to make decisions (hence why you want the right people from a variety of dependencies on the team) and get stuff done. Am empowered team will iterate much faster and likely with a lot more innovation.
While each organization will have many variables both enabling and hindering success, the items listed here are common denominators we see time and time again often needing an outside perspective to identify. Back to the ugly couch analogy, it is often easy to walk into someone else’s house and immediately spot their ugly couch!
Pay attention to those you have hired away from the competition in their early days and seek their opinions and input as your organization’s old bad habits likely look ridiculous to them. Of course only do this with an intent to listen and to learn – getting defensive or stubbornly trying to explain why things are the way they are will not only bring a dead end to you learning, but will also abruptly stop any budding trust with your new hire.
And of course, we are always more than happy to pop the hood and take a look at your organization just as we have been doing for the top banks, Fortune 100, healthcare, and many other organizations. Put our experience to work for you!
Subscribe to the AKF Newsletter
Effective Incident Communications
September 17, 2018 | Posted By: Bill Armelin
Everything fails! This is a mantra that we are always espousing at AKF. At some point, these failures will manifest themselves as an outage. In a SaaS world, restoring service as quickly as possible is critical. It requires having the right people available and being able to communicate with them effectively. A lack of good communications can cause an incident to drag on.
For startups and smaller companies, problems with communications during incidents is less of an issue. Systems tend to be smaller or monolithic. Teams supporting these systems also tend to be small. When something happens, everyone jumps on a call to figure out the problem. As companies grow, the number of people needed to resolve an incident grows. Coordinating communications between a large group of people becomes difficult. Adding to the chaos are executives joining the conference bridges demanding updates about service restoration.
In order to minimize the time to restore a system during an incident, companies need the right people on the call. For large, complex systems, identify the right resources to solve a problem can be difficult. We recommend swarming an issue with everyone that could be needed to resolve an incident, and then release those that are no longer needed. But, with such a large number of people, it can be difficult to coordinate communications, especially on a single conference call bridge.
Managing the communications of a large group of people working an incident is critical to minimizing the restoration time. We recommend a communication method that many of us at AKF learned in the military. It involves using multiple voice and chat channels to coordinate work and the flow of information. Before we get into the details of managing communications, we need to first look at the leadership required to effectively work the incident.
Technical Incident Manager and Incident Communications Manager
Managing a large incident is usually too much for a single individual. She cannot manage coordinating the work occurring to resolve the incident, as well as reporting status to and answering questions from executives eager to know what is going on. We recommend that companies manage incidents with two people. The first person is the individual that is responsible for directing all activities geared towards restoration of service. We call this person the Technical Incident Manager. This individual’s main job is to reduce the mean time to restoration. She needs an overall architectural knowledge of the product and systems to direct the work. She is responsible for leading the call and deescalating after diagnosis informs who needs to be involved. She identifies and diagnoses the service issues and engages the appropriate subject matter experts to assist in restoration.
The second individual is the Incident Communications Manager. He is responsible for supporting the Technical Incident Manager be listening to the technical resolution chatter and summarizing it for a non-technical audience. His focus is on communications speed, quality, and accuracy. He is the primary communications channel for both internal and external messaging. He owns the incident communications process.
Incident Communications Process
This process involves using multiple communication channels to control information and work performed. The first channel established is the Control Channel. This is in the form of a conference bridge and a chat channel. The Technical Incident Manager controls both of these channels. The second channel created is the Status Channel. This also has a voice bridge and a chat channel. The Incident Communication Manager is responsible for managing this channel.
The Control Channel is used for all communication related to the restoration of service. People only use the voice channel for immediate communication and to announce work that is occurring or address immediate questions that need to be answered. Detailed work conducted is placed in the chat channel. This reduces the chatter on the voice channel to command and control messages. It also serves as a record of actions taken that can be referenced in the post mortem/RCA process. If specific teams need to discuss the work they are performing, separate voice and chat breakout channels are created for them. They move off the main channel into their breakout channels to perform the work. The leader of these teams periodically communicates status back up to the control channel.
As the work is progressing, the Incident Communications Manager monitors the Control Channel to provide the basis for his messaging. He formulates updates that he delivers over the Status bridge and chat channel. He keeps executives and customers informed of progress and status, keeping the control channel free of requests for frequent updates and dedicated to restoring service.
This method of communications has worked well in the military for years and has been adopted by many large companies to manage their incident communications. While it is overkill for small companies, it becomes an effective process as companies grow and systems become more complex.
Subscribe to the AKF Newsletter
Are you compromised?
September 14, 2018 | Posted By: Larry Steinberg
It’s important to acknowledge that a core competency for hackers is hiding their tracks and maintaining dormancy for long periods of time after they’ve infiltrated an environment. They also could be utilizing exploits which you have not protected against - so given all of this potential how do you know that you are not currently compromised by the bad guys? Hackers are great hidden operators and have many ‘customers’ to prey on. They will focus on a customer or two at a time and then shut down activities to move on to another unsuspecting victim. It’s in their best interest to keep their profile low and you might not know that they are operating (or waiting) in your environment and have access to your key resources.
Most international hackers are well organized, well educated, and have development skills that most engineering managers would admire if not for the malevolent subject matter. Rarely are these hacks performed by bots, most occur by humans setting up a chain of software elements across unsuspecting entities enabling inbound and outbound access.
What can you do? Well to start, don’t get complacent with your security, even if you have never been compromised or have been and eradicated what you know, you’ll never know for sure if you are currently compromised. As a practice, it’s best to always assume that you are and be looking for this evidence as well as identifying ways to keep them out. Hacking is dynamic and threats are constantly evolving.
There are standard practices of good security habits to follow - the NIST Cybersecurity Framework and OWASP Top 10. Further, for your highest value environments here are some questions that you should consider: would you know if these systems had configuration changes? Would you be aware of unexpected processes running? If you have interesting information in your operating or IT environment and the bad guys get in, it’s of no value unless they get that information back out of the environment; where is your traffic going? Can you model expected outbound traffic and monitor this? The answer should be yes. Then you can look for abnormalities and even correlate this traffic with other activities in your environment.
Just as you and your business are constantly evolving to service your customers and to attract new ones, the bad guys are evolving their practices too. Some of their approaches are rudimentary because we allow it but when we buckle down they have to get more innovative. Ensure that you are constantly identifying all the entry points and close them. Then remain diligent to new approaches they might take.
Don’t forget the most common attack vector - humans. Continue evolving your training and keep the awareness high within your staff - technical and non-technical alike.
Your default mental model should be that you don’t know what you don’t know. Utilize best practices for security and continue to evolve. Utilize external or build internal expertise in the security space and ensure that those skills are dynamic and expanding. Utilize recurring testing practices to identify vulnerabilities in your environment and to prepare against emerging attack patterns.
Open Source Software as a malware on ramp
5 Focuses for a Better Security Culture
3 Practices Your Security Program Needs
Security Considerations for Technical Due Diligence
Subscribe to the AKF Newsletter
September 10, 2018 | Posted By: Robin McGlothin
The Scalability Cube – Your Guide to Evaluating Scalability
Perhaps the most common question we get at AKF Partners when performing technical due diligence on a company is, “Will this thing scale?” After all, investors want to see a return on their investment in a company, and a common way to achieve that is to grow the number of users on an application or platform. How do they ensure that the technology can support that growth? By evaluating scalability.
Let’s start by defining scalability from the technical perspective. The Wikipedia definition of “scalability” is the capability of a system, network, or process to handle a growing amount of work, or its potential to be enlarged to accommodate that growth. That definition is accurate when applied to common investment objectives. The question is, what are the key attributes of software that allow it to scale, along with the anti-patterns that prevent scaling? Or, in other words, what do we look for at AKF Partners when determining scalability?
While an exhaustive list is beyond the scope of this blog post, we can quickly use the Scalability Cube and apply the analytical methodology that helps us quickly determine where the application will experience issues.
AKF Partners introduced the scalability cube, a scale design model for building resilience application architectures using patterns and practices that apply broadly to any application. This is a best practices model that describes all scale dimensions from “The Art of Scalability” book (AKF Partners – Abbot, Keeven & Fisher Partners).
The “Scale Cube” is composed of an X-axis, Y-axis, and Z-axis:
1. Technical Architectural Layering (X-Axis ) – No single points of failure. Duplicate everything.
2. Functional Decomposition Segmentation – Componentization to Modules & Microservices (Y-Axis). Split Report, Message, Locate, Forms, Calendar into fault isolated swim lanes.
3. Horizontal Data Partitioning - Shards (Z-Axis). Beginning with pilot users, start with “podding” users for scalability and availability.
The Scale Cube helps teams keep critical dimensions of system scale in mind when solutions are designed. Scalability is all about the capability of a design to support ever growing client traffic without compromising performance. It is important to understand there are no “silver bullets” in designing scalable solutions.
An architecture is scalable if each layer in the multi-layered architecture is scalable. For example, a well-designed application should be able to scale seamlessly as demand increases and decreases and be resilient enough to withstand the loss of one or more computer resources.
Let’s start by looking at the typical monolithic application. A large system that must be deployed holistically is difficult to scale. In the case where your application was designed to be stateless, scale is possible by adding more machines, virtual or physical. However, adding instances requires powerful machines that are not cost-effective to scale. Additionally, you have the added risk of extensive regression testing because you cannot update small components on their own. Instead, we recommend a microservices-based architecture using containers (e.g. Docker) that allows for independent deployment of small pieces and the scale of individual services instead of one big application.
Monolithic applications have other negative effects, such as development complexity. What is “development complexity”? As more developers are added to the team, be aware of the effects suffering from Brooks’ Law. Brooks’ law states that adding more software developers to a late project makes the project even later. For example, one large solution loaded in the development environment can slow down a developer and gets worse as more developers add components. This causes slower and slower load times on development machines, and developers stomping on each other with changes (or creating complex merges) as they modify the same files.
Another example of development complexity issue is large outdated pieces of the architecture or database where one person is an expert. That person becomes a bottleneck to changes in a specific part of the system. As well, they are now a SPOF (single point of failure) if they are the only resource that understands the monolithic beast. The monolithic complexity and the rate of code change make it hard for any developer to know all the idiosyncrasies of the system, hence more defects are introduced. A decoupled system with small components helps prevents this problem.
When validating database design for appropriate scale, there are some key anti-patterns to check. For example:
• Do synchronous database accesses block other connections to the database when retrieving or writing data? This design can end up blocking queries and holding up the application.
• Are queries written efficiently? Large data footprints, with significant locking, can quickly slow database performance to a crawl.
• Is there a heavy report function in the application that relies on a single transactional database? Report generation can severely hamper the performance of critical user scenarios. Separating out read-only data from read-write data can positively improve scale.
• Can the data be partitioned across different load databases and/or database servers (sharding)? For example, Customers in different geographies may be partitioned to various servers more compatible with their locations. In turn, separating out the data allows for enhanced scale since requests can be split out.
• Is the right database technology being used for the problem? Storing BLOBs in a relational database has negative effects – instead, use the right technology for the job, such as a NoSQL document store. Forcing less structured data into a relational database can also lead to waste and performance issues, and here, a NoSQL solution may be more suitable.
We also look for mixed presentation and business logic. A software anti-pattern that can be prevalent in legacy code is not separating out the UI code from the underlying logic. This practice makes it impossible to scale individual layers of the application and takes away the capability to easily do A/B testing to validate different UI changes. Layer separation allows putting just enough hardware against each layer for more minimal resource usage and overall cost efficiency. The separation of the business logic from SPROCs (stored procedures) also improves the maintainability and scalability of the system.
Another key area we dig for is stateful application servers. Designing an application that stores state on an individual server is problematic for scalability. For example, if some business logic runs on one server and stores user session information (or other data) in a cache on only one server, all user requests must use that same server instead of a generic machine in a cluster. This prevents adding new machine instances that can field any request that a load balancer passes its way. Caching is a great practice for performance, but it cannot interfere with horizontal scale.
Finally, long-running jobs and/or synchronous dependencies are key areas for scalability issues. Actions on the system that trigger processing times of minutes or more can affect scalability (e.g. execution of a report that requires large amounts of data to generate). Continuing to add machines to the set doesn’t help the problem as the system can never keep up in the presence of many requests. Blocking operations exasperate the problem. Look for solutions that queue up long-running requests, execute them in the background, send events when they are complete (asynchronous communication) and do not tie up key application and database servers. Communication with dependent systems for long-running requests using synchronous methods also affects performance, scale, and reliability. Common solutions for intersystem communication and asynchronous messaging include RabbitMQ and Kafka.
Again, the list above is not exhaustive but outlines some key areas that AKF Partners look for when evaluating an architecture for scalability. If you’re looking for a checklist to help you perform your own diligence, feel free to use ours. If you’re wondering more about our diligence practice, you may be interested in our thoughts on best practices, or our beliefs around diligence and how to get it right. We’ve performed technical diligence for seed rounds, A-series and beyond, carve-outs, strategic investments and taking public companies private. From $5 million invested to over $1 billion. No matter the size of company or size of the investment, we can help.
The Importance of a Post Mortem
September 6, 2018 | Posted By: James Fritz
“An incident is a terrible thing to waste” is a common mantra that AKF repeats during its Engagements. And rightfully so as many companies have an incident response plan in place but stop there. Why are incidents so important? What is the true value in doing a proper Post Mortem and actually learning from an incident?
Incidents identify issues in your product. But if that is all you take out of an incident then you are missing out on so much more information that an incident can provide. An incident is the first step to identifying a problem that exists in your product, infrastructure processes, and perhaps, people. “But aren’t incidents and problems the same thing?” Not necessarily. An incident is a one time event. It can occur multiple times if you never address the problem, but it is not isolated.
Conducting a Post Mortem
Gather as many data points as possible shortly after an incident concludes and schedule a Post Mortem review meeting.
Start with the incident timeline. Sufficiently logging events over time provides ready access to the needed data for forensic analysis. From this information you can then start to identify what went wrong, when it went wrong and how quickly you were able to respond to it. The below definitions are all factors that need to be identified:
- Time To Detect: How quickly did you identify that an incident had occurred
- Time To Escalate: How quickly did you get everyone necessary to fix the incident involved
- Time To Isolate: How quickly did you stop the incident from affecting other portions
- Time To Restore: How quickly did the system get brought back up
- Time To Repair: How quickly did you fix the incident
This all leads to the Incident Timeline Analysis.
If you can gather information from several incidents and look at them in your Post Mortem review, then you can figure out where your biggest issues are when it comes to incidents and getting the system back up and running. It is not uncommon for us to see that it often takes longer to detect the incident than to restore from it. This could be mitigated with more monitoring at more appropriate positions then you currently have.
Or maybe the time to escalate is an issue. Why does it take so long to get the proper engineers involved? Maybe a real-time alert system is required or a phone tree. And it is important to track and measure total time of an incident as beginning with when it occured (not when it was reported) all the way through to when customers were back up at 100% (not just when your systems were restored).
Problem vs. Incident
How do you know if your incident is also a problem? It’s actually fairly easy to determine. If you have an incident, you have a problem. The scale of the problem may vary by incident but every incident is caused because of something larger than itself.
During our Technical Due Diligences we always want to know how companies categorize incidents vs. problems. If the company properly categorizes problems related to incidents, they will be able to answer “Can you rank your problems to show which cause the most customer impact?” Many times, they can’t - but that ranking is critical to show which problems to attack first.
An incident, at its core, is caused by a problem. If your product crashes anytime someone attempts to access it via an unapproved protocol, the incident is the attempted access. The problem may be an improper review of your architecture. Or it may be lack of QA. Identifying the problem is much more difficult than identifying the incident. Imagine you find a termite on your deck. This small pest could be considered an incident. If you deal with the incident and get rid of the termite everything is good, right? If you don’t look any further than the incident you can’t identify the problem. And in this case the problem could be exposed, untreated wood allowing termites to slowly eat away at the inside of your house.
If you are keeping proper documentation each time you conduct a Post Mortem review, then you will have a history that will start to paint of a picture of ongoing and recurring problems that exist. Remedying the problem will stop the incident from occurring in the same exact way in the future. But small variations of the incident can still occur. If you fix the problem then you are stopping future iterations of that incident from happening again.
Subscribe to the AKF Newsletter
Migrating from a legacy product to a SaaS service? Don't make these mistakes!!
September 4, 2018 | Posted By: Dave Swenson
AKF has been kept quite busy over the last decade helping companies move from an on-premise product to a SaaS service - often one of the most difficult transitions a company can face. We have found that the following are the top mistakes made during that migration.
With apologies to David Letterman…
The Top 5 SaaS Migration Mistakes
5. Treat Your SaaS Migration Only as a Marketing Exercise
Wall Street values SaaS companies significantly higher than traditional software companies, typically double the revenue multiples. A key reason for this is the improved margins the economies of scale true SaaS companies gain. If you are primarily addressing your customer’s desires to move their IT infrastructure out of their shop, an ASP model hosted in the cloud is fine. However, if your investors or Wall Street are viewing you as a SaaS company, ASP gross margins will not be accepted. A SaaS company is expected to produce in excess of 80% gross margin, whereas an ASP model typically caps out at around 60 or 70%.
How to Avoid?
Make a decision up front on what you want your gross and operating margins to be. This decision will guide how you sell your product (e.g.: highest margins require no code-level customizations), how you architect your systems (multi-tenancy provides greater economies of scale), and even how you release (you, not your customers, control release timing and frequencies). A note of warning: without SaaS margins, you will likely face pricing pressure from an existing or entrant competitor who achieved SaaS margins.
4. Tack the word ‘cloud’ on to your existing on-prem product and host it
Often a direct result of the above mistake, this is an ASP (Application Service Provider) model, not a SaaS one. While this exercise might be useful in exploring some hosting aspects, it won’t truly inform you about what your product and organization needs to become in order to successfully migrate to SaaS. It will result in nowhere near the gross and operating margins true SaaS provides, and your Board and Wall Street expect to see. As discussed in The Many Meanings of Cloud, the danger of tacking the word “Cloud” onto your product offering is that your company will start believing you are a “Contender”, and will stop pushing for true SaaS.
How to Avoid?
Again, if an ASP model is ‘good enough’, fine - just don’t label or market yourselves as SaaS. If you start the SaaS journey with an ASP model, make sure all within your company recognize your ASP implementation is a dead-end, a short-term solution, and that the real endpoint is a true SaaS offering.
3. Target Your Existing Customer Base
Many companies are so focused upon their existing customer base that they forget about entire markets that might be better suited for SaaS. The mistaken perception is
“We need to take customers along with us”,
when in fact, the SaaS reality is
“We need to use the Technology Adoption Lifecycle, compete with ourselves and address a different or smaller customer base first”.
How to Avoid?
Ignore your current customers and find early adopters to target, even if your top customers are pushing you to move to SaaS. A move to true SaaS from your on-premise product almost always requires significant architectural changes. Putting your entire product and code base through these changes will take time.
Instead, apply the Crossing the Chasm Bowling Alley strategy to grow your SaaS offering into your current customer base rather than fork-lifting your current solution in an ASP fashion into the cloud…
Carve out key slices of functionality from your product that can provide value in a standalone fashion, and find early adopters to help shake out your new offering. Even if you are in a ‘laggard’ industry (banking, healthcare, education, insurance), you will find early adopters within your targeted customer base. Seek them out; they will likely be far better partners in your SaaS migration than your existing ones.
2. Ignore Risk Shifts
It may come as a surprise to some within your company to find out how much risk your customers bear in order to host and use your product on-premise. These risks include security, availability, capacity, scalability, and disaster recovery. They also include costs such as software licensing that have been passed through to your customers, but are now yours, part of your operating margins.
How to Avoid?
Many of these “-ilities” are likely to be new disciplines that now must be instilled in your company, some by hiring key individuals (e.g.: a CISO), some through additional focus and rigor during your PDLC. Part of the process of rearchitecting for SaaS includes ensuring you have adequate scalability, are designed with availability in mind, and a production topology that enables disaster recovery. Where vertical scalability might be acceptable in your on-prem world (“just buy a bigger machine”), you now need to ensure you have horizontal scalability, ideally in an elastic form. The cost of proprietary software (e.g.: database licenses) is now yours to carry, and a shift to open source software can significantly improve your margins. These “-ilities” are also known as non-functional requirements (NFRs), and need to be considered with at least as much weight as your functional requirements during your backlog planning and prioritization.
And now, the biggest mistake we see made in SaaS migrations…
1. Underestimate Inertia
Inertia is a powerful force. Over the years spent building up your on-premise capabilities, you’ve almost certainly developed tremendous inertia - defined for our purposes as “a tendency to do nothing or remain unchanged”. In order to achieve true SaaS, in order to satisfy your investors and reach SaaS-like multiples, nearly every part of your company needs to act differently.
How to Avoid?
First, ensure your entire company is ready to embrace change. For many companies, the move to SaaS is the only answer to an existential threat that a known (or unknown competitor) presents, one who is listening to your customers say “Get this IT stuff out of my shop!”. Examples of SaaS disruptors include:
- Salesforce destroying Siebel
- ServiceNow vs. Remedy
- Workday taking on Oracle/Peoplesoft
Is there a disruptor waiting to take over your business?
Many companies choose to disrupt themselves, and after switching to SaaS, drive their stock price through the roof. Look at Adobe’s stock price once they fully embraced (or made their customers embrace) SaaS over packaged software.
Regardless of how you position the importance of SaaS to your employees, there will still be some that are stuck by inertia in the on-prem ways. Either relegate them to stay with the on-prem effort, or ‘weed’ them out.
Once you’ve got some built up momentum and desire within your company to make the move to SaaS, make a concerted examination department-by-department to determine how they will need to change. All involved need to recognize the risk shifts as mentioned, and associated required mindsets. While you likely have excellent, seasoned on-prem employees, do you have enough SaaS experience across each team? The SaaS migration should in no way be treated solely as an engineering exercise.
It always comes as a shock how many departments need to break out of their existing inertia, and act differently. Some examples:
- Can no longer promise code customizations.
- Need to address cloud security concerns.
- Must ensure existing customers know they can no longer dictate when releases occur.
- Learn to speak ARR (annual recurring revenue).
- Should look at alternative revenue schemes (seat vs. utility).
- SaaS presents changes in revenue recognition. Be prepared.
- Should focus in iterative releases that enable product discovery.
- Must learn to balance the NFRs/”-ilities” along with new features.
- Need to consider alternative customers and markets for your new SaaS offering.
- Will likely need to become continually engaged in the PDLC process, in order to stay abreast of releases occurring at far greater frequency than old.
- Must develop (along with engineering) incident management processes that deal with multiple customers simultaneously having issues.
- Better spin up this department in a hurry!
- As no code-level customizations should be happening, you might end up reducing this team, focusing them more on integrations with your customers’ IT or 3rd party products.
Hmm, so much change for this department. Where to start? Start by bringing AKF on board to examine your SaaS migration effort. Here is where we can help you the most.
Subscribe to the AKF Newsletter
Open Source Software as a malware “on ramp”
August 21, 2018 | Posted By: Larry Steinberg
Open Source Software (OSS) is an efficient means to building out solutions rapidly with high quality. You utilize crowdsourced design, development and validation to conveniently speed your engineering. OSS also fosters a sense of sharing and building together - across company boundaries or in one’s free time.
So just pull a library down off the web, build your project, and your company is ready to go. Or is that the best approach? What could be in this library you’re including in your solution which might not be wanted? This code will be running in critical environments - like your SaaS servers, internal systems, or on customer systems. Convenience comes at a price and there are some well known situations of hacks embedded in popular open source libraries.
What is the best approach to getting the benefits of OSS and maintaining the integrity of your solution?
Good practices are a necessity to ensure a high level of security. Just like when you utilize OSS and then test functionality, scale, and load - you should be validating against vulnerabilities. Pre-production vulnerability and penetration testing is a good start. Also, utilize good internal process and reviews. Keep the process simple to maintain speed but establish internal accountability and vigilance on code that’s entering your environment. You are practicing good coding techniques already with reviews and/or peer coding - build an equivalency with OSS.
Always utilize known good repositories and validate the project sponsors. Perform diligence on the committers just like you would for your own employees. You likely perform some type of background check on your employees before making an offer - whether going to a third party or simply looking them up on linkedin and asking around. OSS committers have the same risk to your company - why not do the same for them? Understandably, you probably wouldn’t do this for a third party purchased solution, but your contract or expectation is that the company is already doing this and abiding by minimum security standards. That may not be true for your OSS solutions, and as such your responsibility for validation is at least slightly higher. There are plenty of projects coming from reputable sources that you can rely on.
Ensure that your path to production is only coming from artifacts which have been built on internal sources which were either developed or reviewed by your team. Also, be intentional about OSS library upgrades, this should planned and part of the process.
OSS is highly leveraged in today’s software solutions and provides many benefits. Be diligent in your approach to ensure you only see the upside of open source.
5 Focuses for a Better Security Culture
3 Practices Your Security Program Needs
Security Considerations for Technical Due Diligence
Subscribe to the AKF Newsletter
The Top 20 Technology Blunders
July 20, 2018 | Posted By: Pete Ferguson
One of the most common questions we get is “What are the most common failures you see tech and product teams make?” To answer that question we queried our database consisting of 11 years of anonymous client recommendations. Here are the top 20 most repeated failures and recommendations:
1) Failing to Design for Rollback
If you are developing a SaaS platform and you can only make one change to your current process make it so that you can always roll back any of your code changes. Yes, we know that it takes additional engineering work and additional testing to make nearly any change backwards compatible but in our experience that work has the greatest ROI of any work you can do. It only takes one really bad release in which your site performance is significantly degraded for several hours or even days while you attempt to “fix forward” for you to agree this is of the utmost importance. The one thing that is most likely to give you an opportunity to find other work (i.e. “get fired”) is to roll a product that destroys your business. In other words, if you are new to your job DO THIS BEFORE ANYTHING ELSE; if you have been in your job for awhile and have not done this DO THIS TOMORROW. (Related Content: Monitoring for Improved Fault Detection)
2) Confusing Product Release with Product Success
Do you have “release” parties? Stop it! You are sending your team the wrong message! A release has nothing to do with creating shareholder value and very often it is not even the end of your work with a specific product offering or set of features. Align your celebrations with achieving specific business objectives like a release increasing signups by 10%, or increasing checkouts by 15% or increasing the average sale price of a all checkouts by 12% or increasing click-through-rates by 22%. See #10 below on incenting a culture of excellence. Don’t celebrate the cessation of work – celebrate achieving the success that makes shareholder’s wealthy! (Related Content: Agile and the Cone of Uncertainty)
3) Insular Product Development / Engineering
How often does one of your engineering teams complain about not “being in the loop” or “being surprised” by a change? Does your operations team get surprised about some new feature and its associated load on a database? Does engineering get surprised by some new firewall or routing infrastructure resulting in dropped connections? Do not let your teams design in a vacuum and “throw things over the wall” to another group. Organize around your outcomes and “what you produce” in cross functional teams rather than around activities and “how you work.” (Related Content: The No Surprises Rule)
4) Over Engineering the Solution
One of our favorite company mottos is “simple solutions to complex problems”. The simpler the solution, the lower the cost and the faster the time to market. If you get blank stares from peers or within your organization when you explain a design do not assume that you have a team of idiots – assume that you have made the solution overly complex and ask for assistance in resolving the complexity.
Image Source: Hackernoon.com
5) Allowing History to Repeat itself
Organizations do not spend enough time looking at past failures. In the engineering world, a failure to look back into the past and find the most commonly repeated mistakes is a failure to maximize the value of the team. In the operations world, a failure to correlate past site incidents and find thematically related root causes is a guarantee to continue to fight the same fires over and over. The best and easiest way to improve our future performance is to track our past failures, group them into groups of causation and treat the root cause rather than the symptoms. Keep incident logs and review them monthly and quarterly for repeating issues and improve your performance. Perform post mortems of projects and site incidents and review them quarterly for themes.
6) Vendor Lock
Every vendor has a quick fix for your scale issues. If you are a hyper growth SaaS site, however, you do not want to be locked into a vendor for your future business viability; rather you want to make sure that the scalability of your site is a core competency and that it is built into your architecture. This is not to say that after you design your system to scale horizontally that you will not rely upon some technology to help you; rather, once you define how you can horizontally scale you want to be able to use any of a number of different commodity systems to meet your needs. As an example, most popular databases (and NoSQL solutions) provide for multiple types of native replication to keep hosts in synch.
7) Relying on QA to Find Your Mistakes
You cannot test quality into a system and it is mathematically impossible to test all possibilities within complex systems to guarantee the correctness of a platform or feature. QA is a risk mitigation function and it should be treated as such. Defects are an engineering problem and that is where the problem should be treated. If you are finding a large number of bugs in QA, do not reward QA – figure out how to fix the problem in engineering! Consider implementing test driven design as part of your PDLC. If you find problems in production, do not punish QA; figure out how you created them in engineering. All of this is not to say that QA should not be held responsible for helping to mitigate risk – they should – but your quality problems are an engineering issue and should be treated within engineering.
8) Revolutionary or “Big Bang” Fixes
In our experience, complete re-writes or re-architecture efforts end up somewhere on the spectrum of not returning the desired ROI to complete and disastrous failures. The best projects we have seen with the greatest returns have been evolutionary rather than revolutionary in design. That is not to say that your end vision should not be to end up in a place significantly different from where you are now, but rather that the path to get there should not include “and then we turn off version 1.0 and completely cutover to version 2.0”. Go ahead and paint that vivid description of the ideal future, but approach it as a series of small (but potentially rapid) steps to get to that future. And if you do not have architects who can help paint that roadmap from here to there, go find some new architects.
9) The Multiplicative Effect of Failure – Eliminate Synchronous Calls
Every time you have one service call another service in a synchronous fashion you are lowering your theoretical availability. If each of your services are designed to be 99.999% available, where a service is a database, application server, application, webserver, etc. then the product of all of the service calls is your theoretical availability. Five calls is (.99999)^5 or 99.995 availability. Eliminate synchronous calls wherever possible and create fault-isolative architectures to help you identify problems quickly.
10) Failing to Create and Incentivize a Culture of Excellence
Bring in the right people and hold them to high standards. You will never know what your team can do unless you find out how far they can go. Set aggressive yet achievable goals and motivate them with your vision. Understand that people make mistakes and that we will all ultimately fail somewhere, but expect that no failure will happen twice. If you do not expect excellence and lead by example, you will get less than excellence and you will fail in your mission of maximizing shareholder wealth. (Related Content: Three Reasons Your Software Engineers May Not Be Successful)
11) Under-Engineer for Scale
The time to think about scale is when you are first developing your platform. If you did not do it then, the time to think about scaling for the future is right now! That is not to say that you have to implement everything on the day you launch, but that you should have thought about how it is that you are going to scale your application services and your database services. You should have made conscious decisions about tradeoffs between speed to market and scalability and you should have ensured that the code will not preclude any of the concepts we have discussed in our scalability postings. Hold quarterly scalability meetings where you discuss what you need to do to scale to 10x your current volume and create projects out of the action items. Approach your scale needs in evolutionary rather than revolutionary fashion as in #8 above.
12) “Not Built Here” Culture
We see this all the time. You may even have agreed with point (6) above because you have a “we are the smartest people in the world and we must build it ourselves” culture. The point of relying upon third parties to scale was not meant as an excuse to build everything yourselves. The real point to be made is that you have to focus on your core competencies and not dilute your engineering efforts with things that other companies or open source providers can do better than you. Unless you are building databases as a business, you are probably not the best database builder. And if you are not the best database builder, you have no business building your own databases for your SaaS platform. Focus on what you should be the best at: building functionality that maximizes your shareholder wealth and scaling your platform. Let other companies focus on the other things you need like routers, operating systems, application servers, databases, firewalls, load balancers and the like.
13) A New PDLC will Fix My Problems
Too often CTO’s see repeated problems in their product development life cycles such as missing dates or dissatisfied customers and blame the PDLC itself.
The real problem, regardless of the lifecycle you use, is likely one of commitment and measurement. For instance, in most Agile lifecycles there needs to be consistent involvement from the business or product owner. A lack of involvement leads to misunderstandings and delayed products. Another very common problem is an incomplete understanding or training on the existing PDLC. Everyone in the organization should have a working knowledge of the entire process and how their roles fit within it. Most often, the biggest problem within a PDLC is the lack of progress measurement to help understand likely dates and the lack of an appropriate “product discovery” phase to meet customer needs. (Related Content: The Top Five Most Common PDLC Failures)
14) Inability to Hire Great People Quickly
Often when growing an engineering team quickly the engineering managers will push back on hiring plans and state that they cannot possibly find, interview, and hire engineers that meet their high standards. We agree that hiring great people takes time and hiring decisions are some of the most important decisions managers can make. A poor hiring decision takes a lot of energy and time to fix. However, there are lots of ways to streamline the hiring process in order to recruit, interview, and make offers very quickly. A useful idea that we have seen work well in the past are interview days, where potential candidates are all invited on the same day. This should be no more than 2 - 3 weeks out from the initial phone screen, so having an interview day per months is a great way to get most of your interviewing in a single day. Because you optimize the interview process people are much more efficient and it is much less disruptive to the daily work that needs to get done the rest of the month. Post interview discussions and hiring decisions should all be made that same day so that candidates get offers or letters of regret quickly; this will increase the likelihood of offers being accepted or make a professional impression on those not getting offers. The key is to start with the right answer that “there is a way to hire great people quickly” and the myriad of ways to make it happen will be generated by a motivated leadership team.
15) Diminishing or Ignoring SPOFs (Single Point of Failure)
A SPOF is a SPOF and even if the impact to the customer is low it still takes time away from other work to fix right away in the event of a failure. And there will be a failure…because that is what hardware and software does, it works for a long time and then eventually it fails! As you should know by now, it will fail at the most inconvenient time. It will fail when you have just repurposed the host that you were saving for it or it will fail while you are releasing code. Plan for the worst case and have it run on two hosts (we actually recommend to always deploy in pools of three or more hosts) so that when it does fail you can fix it when it is most convenient for you.
16) No Business Continuity Plan
No one expects a disaster but they happen and if you cannot keep up normal operations of the business you will lose revenue and customers that you might never get back. Disasters can be huge, like Hurricane Katrina, where it take weeks or months to relocate and start the business back up in a new location. Disasters can also be small like a winter snow storm that keeps everyone at home for two days or a HAZMAT spill near your office that keeps employees from coming to work. A solid business continuity plan is something that is thought through ahead of time, before you need it, and explains to everyone how they will operate in the event of an emergency. Perhaps your satellite office will pick up customer questions or your tech team will open up an IRC channel to centralize communication for everyone capable of working remotely. Do you have enough remote connections through your VPN server to allow for remote work? Spend the time now to think through what and how you will operate in the event of a major or minor disruption of your business operations and document the steps necessary for recovery.
17) No Disaster Recovery Plan
Even worse, in our opinion, than not having a BC plan is not having a disaster recovery plan. If your company is a SaaS-based company, the site and services provided is the company’s sole source of revenue! Moreover, with a SaaS company, you hold all the data for your customers that allow them to operate. When you are down they are more than likely seriously impaired in attempting to conduct their own business. When your collocation facility has a power outage that takes you completely down, think 365 Main datacenter in San Francisco, how many customers of yours will leave and never return? Our preference is to provide your own disaster recovery through multiple collocation facilities but if that is not yet technically feasible nor in the budget, at a minimum you need your code, executables, configurations, loads, and data offsite and an agreement in place for both collocation services as well as hosts. Lots of vendors offer such packages and they should be thought of as necessary business insurance.
If you are cloud hosted, this still applies to you! We often find in technical due diligence reviews that small companies who are rapidly growing haven’t yet initiated a second active tech stack in a different availability zone or with a second cloud provider. Just because AWS, Azure and others have a fairly reliable track record doesn’t mean they always will. You can outsource services, but you still own the liability!
Image Source: Kaibizzen.com.au
18) No Product Management Team or Person
In a similar vein to #13 above, there needs to be someone or a team of people in the organization who have responsibility for the product lines. They need to have authority to make decisions about what features get added, which get delayed, and which get deprecated (yes, we know, nothing ever gets deprecated but we can always hope!). Ideally these people have ownership of business goals (see #10) so they feel the pressure to make great business decisions.
19) Failing to Implement Continuously
Just because you call it scheduled maintenance does not mean that it does not count against your uptime. While some of your customers might be willing to endure the frustration of having the site down when they want to access it in order to get some new features, most care much more about the site being available when they want it. They are on the site because the existing features serve some purpose for them; they are not there in the hopes that you will rollout a certain feature that they have been waiting on. They might want new features, but they rely on existing features. There are ways to roll code, even with database changes, without bringing the site down (back to #17 - multiple active sites also allows for continuous implementation and the ability to roll back). It is important to put these techniques and processes in place so that you plan for 100% availability instead of planning for much less because of planned down time.
20) Firewalls, Firewalls, Everywhere!
We often see technology teams that have put all public facing services behind firewalls while many go so far as to put firewalls between every tier of the application. Security is important because there are always people trying to do malicious things to your site, whether through directed attacks or random scripts port scanning your site. However, security needs to be balanced with the increased cost as well as the degradation in performance. It has been our experience that too often tech teams throw up firewalls instead of doing the real analysis to determine how they can mitigate risk in other ways such as through the use of ACLs and LAN segmentation. You as the CTO ultimately have to make the decision about what are the best risks and benefits for your site.
Whatever you do, don’t make the mistakes above! AKF Partners helps companies avoid costly product and technology mistakes - and we’ve seen most of them. Give us a call or shoot us an email. We’d love to help you achieve the success you desire.
Subscribe to the AKF Newsletter
People Due Diligence
July 12, 2018 | Posted By: Robin McGlothin
Most companies do a thorough job of financial due diligence when they acquire other companies. But all too often, dealmakers simply miss or underestimate the significance of people issues. The consequences can be severe, from talent loss after a deal’s announcement, to friction or paralysis caused by differences in decision-making styles.
When acquirers do their people homework, they can uncover skills & capability gaps, points of friction, and differences in decision making. They can also make the critical people decisions - who stays, who goes, who runs the various lines of business, what to do with the rank and file at the time the deal is announced or shortly thereafter. Making such decisions within the first 90 days is critical to the success of a deal.
Take for example, Charles Schwab’s 2000 acquisition of US Trust. Schwab & the nation’s oldest trust company set out to sign up the newly minted millionaires created by a soaring bull market. But the cultures could not have been farther apart – a discount do-it-yourself stock brokerage style and a full-service provider devoted to pampering multimillionaires can make for a difficult integration. Six years after the merger, Chuck Schwab came out of retirement to fix the issues related to culture clash. The acquisition reflects a textbook common business problem. The dealmakers simply ignored or underestimated the significance of people and cultural issues.
Another example can be found in the 2002 acquisition of PayPal by eBay. The fact that many on the PayPal side referred to it as a merger, sets the stage for conflicting cultures. eBay was often embarrassed by the fact that PayPal invoice emails for a won auction arrived before the eBay end of auction email - PayPal made eBay look bad in this instance and the technology teams were not eager to combine. As well, PayPal titles were discovered to be one level higher than eBay titles considering the scope of responsibilities. Combining the technology teams did not go well and was ultimately scrapped in favor of dual teams - not the most efficient organizational model.
People due diligence lays the groundwork for a smooth integration. Done early enough, it also helps acquirers decide whether to embrace or kill a deal and determine the price they are willing to pay. There’s a certain amount of people due diligence that companies can and must do to reduce the inevitable fallout from the acquisition process and smooth the integration.
Ultimately, the success or failure of any deal has to do with people. Empowering people and putting them in a position where they will be successful is part of our diligence evaluation at AKF Partners. In our experience with clients, an acquiring company must start with some fundamental question:
1. What is the purpose of the deal?
2. Whose culture will the new organization adopt?
3. Will the two cultures mesh?
4. What organizational structure should be adopted?
5. How will rank-and-file employees react to the deal?
Once those questions are answered, people due diligence can focus on determining how well the target’s current structure and culture will mesh with those of the proposed new company, who should be retained and by what means, and how to manage the reaction of the employee base.
In public, deal-making executives routinely speak of acquisitions as “mergers of equals.” That’s diplomatic, politically correct speak and usually not true. In most deals, there is not only a financial acquirer, there is also a cultural acquirer, who will set the tone for the new organization after the deal is done. Often, they are one and the same, but they don’t have to be.
During our Technology Due Diligence process at AKF Partners, we evaluate the product, technology and support organizations with a focus on culture and think through how the two companies and teams are going to come together. Who the cultural acquirer is dependes on the fundamental goal of the acquisition. If the objective is to strengthen the existing product lines by gaining customers and achieving economies of scale, then the financial acquirer normally assumes the role of the cultural acquirer.
People due diligence, therefore, will be to verify that the target’s culture is compatible enough with the acquirers to allow for the building of necessary bridges between the two organizations. Key steps that are often missed in the process:
• Decide how the two companies will operate after the acquisition — combined either as a fully integrated operating company or as autonomous operating companies.
• Determine the new organizational structure and identify areas that will need to be integrated.
• Decide on the new executive leadership team and other key management positions.
• Develop the process for making employment-related decisions.
With regard to the last bullet point, some turnover is to be expected in any company merger. Sometimes shedding employees is even planned. It is important to execute The Weed, Seed & Feed methodology ongoing not just at acquisition time. Unplanned, significant levels of turnover negatively impact a merger’s success.
AKF Partners brings decades of hands-on executive operational experience, years of primary research, and over a decade of successful consulting experience to the realm of product organization structure, due diligence and technology evaluation. We can help your company successfully navigate the people due diligence process.
1 2 3 >