March 22, 2018 | Posted By: Marty Abbott
It is sad and unfortunate, but the inevitable has finally happened; we’ve suffered our first death from an autonomous vehicle.
The Uber Autonomous Vehicle fatality in Tempe is odd, as there are several contributing factors:
- The pedestrian was crossing the street at night outside of a cross walk. Jaywalking, as it is commonly called, is against city ordinances in Tempe, AZ.
- The pedestrian evidently didn’t see the car’s lights, or hear the car approaching.
- The safety assistant in the vehicle, who was meant to help avoid such crashes by taking control of the car, was not paying attention at the time of the crash and apparently had a prior felony conviction (raising the question of how she was hired in the first place).
- The vehicle’s collision avoidance system failed somehow to either detect the individual or take the appropriate action upon detection.
These factors raise several immediate questions regarding who (or what) is to blame for the incident:
- Who’s at fault? The jaywalking pedestrian? The safety assistant for negligence? Uber for potential vehicle failures?
- If either the assistant or Uber bear the blame, does this rise to the level of a crime? Vehicular manslaughter for instance?
Technology Advancements and the Benefits They Bring Almost Always Have a Price
First, to be clear, technical advances very often come at peril to human life.
Advances in both flight and space travel have both resulted in several deaths over the last century – the manner of death being impossible before the advancement.
While per capita deaths associated with automobiles are lower today than they were for horse related transportation in 1900 , the fact remains that the introduction of the automobile increased fatalities for a number of years through at least 1930.
Power transmission to homes may be linked to leukemia in children. While the jury is out regarding whether smart phones cause brain cancer, the “selfie” phenomenon they’ve enabled has been implicated in some deaths .
Even seemingly harmless entertainment devices such as televisions have caused fatalities.
We Have a Lot to Gain by Moving Forward
While sad, and in this particular case completely avoidable (the pedestrian could have crossed at a cross walk, could have avoided the vehicle, and the safety assistant could have been paying attention), this should not halt the advancement of research in this area. Yes, we should pause briefly to understand and correct the cause.
But we also need to realize that the benefits to society are immense and cry for rapid progress and adoption:
- A likely overall reduction in vehicular accidents and fatalities as the technology progresses and gains adoption. Driver attention problems (texting, cell phones) go away.
- Lower insurance rates as overall vehicle related claims decline.
- An elimination of alcohol related driving crimes and accidents.
- A reduction in the overall cost of living for many Americans who need flexible transportation in metro areas, but struggle to afford a vehicle.
- A reduction in the cost of living for American families who may only need one car if it could return home for other duties, but must buy two because each car remains with its owner wherever he or she goes.
- A reduction in vehicle related pollution and its associated climate effects as vehicle ownership declines and affordable autonomous ride sharing increases.
- Fewer traffic delays as autonomous vehicles select better routes, lowering commute times.
- Less road congestion as fewer vehicles compete for the limited infrastructure.
- Lower infrastructure costs longer term, as less road maintenance is required and one day the need for street lights go away. Taxes similarly drop.
- Lower local taxes as the need for traffic related law enforcement declines over time.
Unfortunately Implementation and Adoption Will Likely Slow Down
While the benefits of autonomous vehicles are clear, Autonomous Vehicle (AV) deaths will provide fodder for special interest groups to slow down AV legalization:
- Several unions, including those related to livery services, will strive to keep their member base employed and either sue to stop implementation or fund political action committees to influence legislation biased against AVs.
- Because the secondary car market is likely to see a significant decline in demand (who needs a used car if one can get a ride service easily at lower overall cost?), car dealerships will fund PACs to similarly sway politicians.
- Automobile manufacturers who today see a near term opportunity with Autonomous Vehicles, may determine that vehicle sales overall in the new car industry could decline and join existing PACs
- Other unions and businesses reliant upon vehicle ownership for employment of some (or all) of their member base or for their very existence (car washes, gas stations, “Big Oil”, police departments, maintenance facilities, etc) may also join PACs.
While there are many societal benefits in adopting the AV, there are certain interest groups which have a lot to lose with their wide-scale adoption. These interest groups will almost certainly mobilize and look to stall progress and in so doing keep society from reaping the benefits.
Death associated with technical advancement is nothing new and while we should strive to limit it, we must expect it. While we stand to gain a great deal from autonomous vehicles, we must be wary of entrenched interests that may attempt to use events like this one to block their adoption.
Subscribe to the AKF Newsletter
March 19, 2018 | Posted By: Daniel Hanley
Agile teams often seem to be confused with the meaning and value of autonomy within a team. Is it the autonomy to decide review what path to take to accomplish a goal? Is it the autonomy to choose what tools and processes they will employ to accomplish that goal? Is it both? To answer this, we need to first unlock the riddle of where autonomy drives value creation and where and how it destroys value within an enterprise.
Balancing Team Autonomy and Anarchy
As an aid to this discussion, consider the following analogy: We have the autonomy to determine what roads and paths we will take to a destination when driving a vehicle, but we are governed by speed limits, right of way (which side to drive on, stop signs, stop signals and other road signs), emission standards, and both vehicle and personal licensing. Put another way, we are completely empowered to determine WHAT path we take, and WHY we take it. We are much more limited in HOW we get to a place (on road, off road, speed, only licensed vehicles, etc) and WHO can do it (only sober, licensed drivers). How does this apply to a fully autonomous technology team? The value-creating autonomy within a team deals with WHAT paths a team takes to accomplish a goal and the reasoning as to why that approach is valuable. A failure to provide structure and rules for HOW something is accomplished through architectural principles, coding standards, development standards, etc. can start to escalate the costs of development and cause repeating failures maintenance. Not sharing best practices through standards means that teams are bound to repeat mistakes and cause customer interruptions. While there is a narrow line between autonomy and anarchy, the difference in their effect on an organization is significant.
Consider the matrix below which differentiates the notion of decision making (What path gets taken to a result and Why that path is taken) from governance (the rules around How and Who should do something). Autocracy (complete top down decisions and governance) occupies the upper left and the spectrum moves toward Anarchy in the bottom right (bottom-up decisions with little to no alignment to organizational goals and vision, and a complete lack of governance and best practice adherence). Autonomy (as in an autonomous agile teams) exists in the upper right quadrant of the AKF’s Team Autonomy 2x2 with a high level of innovation because teams share best practices through governance but are free to experiment with paths to achieve desired outcomes. Freedom in deciding what path to take to a destination or desired outcome is valuable, however it should be balanced with the governance to validate that past lessons learned (be they security related, operationally related, or simply development related) are properly applied.
Similarly, we can plot the decision making authority for “WHO and HOW” something gets done as compared to the “WHAT (paths) and WHY (a path is taken)”. In so doing, autonomous agile teams exist in the realm where leadership enforces standards, but the team is the deciding authority for what path to take. Anarchy (bad) exists when the team makes all decisions as learnings codified within shared standards are not applied. Warring tribes emerge when leadership defines a path but leaves the decision of who and how to the teams (similar to the “bad leadership” example in the prior matrix). Autocracy exists when the leadership is responsible for all decisions. The reason we identify this as “medium” innovation is that we assume the leadership is at least experientially diverse. But innovation is low relative to autonomous teams because we aren’t tapping the innovation capabilities of the teams themselves - the intellectual capital in which we’ve so heavily invested.
How to Find the Balance of Freedom and Governance
Teams should be built around the suggestions from AKF’s white paper on Organizing Product Teams for Innovation: small, cross functional teams built around a service, who are empowered to be autonomous and work independently on their own. Autonomy should be defined within the rules of the organization, inform the organization’s architectural principles, and drive adherence through leadership. This is by no means a notion that the organization should avoid cross functional empowered teams! As we state in our white paper, Organizing Product Teams for Innovation, “We still have executives developing strategy, functional teams (product management, etc.) defining subservient strategies and roadmaps. But the primary identities of these folks are embedded within the teams that implement these solutions.”
It is easy to confuse the notion of empowerment and autonomy. Empowerment is an action of delegation coupled with the assurance of resources and tools to complete the desired outcomes to the delegated party. It is only through empowerment that autonomy can be achieved within an organizational hierarchy. Further, it is only through empowerment that autonomous agile teams can be established. But both empowerment and autonomy need to have rules governing their action or issuance. Specifically, an agile team is empowered to be autonomous with the following constraints: following development protocols and standards, adhering to architectural principals, adhering to established best practices regarding test coverage, etc. and ensuring that you achieve the “non-functional requirements” codified within the Agile Definition of Done (more on this later).
Like this article? Share it with friends and subscribe to the newsletter here.
Have your autonomous technology teams been free to make decisions that do not align with the vision of the company? Are you fearful of switching to Agile because of the rampant anarchy they will exhibit? AKF Partners has over 200 combined years of experience helping companies ensure that their organizations and architectures are aligned to the outcomes they desire. We’d love to help you achieve the success you desire.
Subscribe to the AKF Newsletter
March 12, 2018 | Posted By: Dave Swenson
More and more companies are waking up from the 20th century, realizing that their on-premise, packaged, waterfall paradigms no longer play in today’s SaaS, agile world. SaaS (Software as a Service) has taken over, and for good reason. Companies (and investors) long for the higher valuation and increased margins that SaaS’ economies of scale provide. Many of these same companies realize that in order to fully benefit from a SaaS model, they need to release far more frequently, enhancing their products through frequent iterative cycles rather than massive upgrades occurring only 4 times a year. So, they not only perform a ‘lift and shift’ into the cloud, they also move to an Agile PDLC. Customers, tired of incurring on-premise IT costs and risks, are also pushing their software vendors towards SaaS.
But, what many of the companies making the move to SaaS don’t realize is that moving to SaaS is not just a technology exercise; instead, it should be approached as a ‘reboot’ of the entire company. Certainly, the technology organization will be most affected, but almost every department in a company will need to change. Sales folks need to pitch the product differently, selling a leased service vs. a purchased product, and must learn to address customers’ typical concerns around security. The role of professional services teams drastically changes, and frankly, shrinks. Customer support personnel should have far greater insight into reported problems. Product management in a SaaS world requires small, nimble enhancements vs. massive, ‘big-bang’ upgrades. Your marketing organization will potentially need to target a different type of customer for your initial SaaS releases.
It is important to recognize the risks that will shift from your customers to you. In an on-prem product, your customer carries the burden of capacity planning, security, availability, disaster recovery. SaaS companies sell a service, not just a product, and that service-offering must include these risks, aspects that are potentially new concerns to your organization, who will need to learn to address these issues in a cost-efficient manner bringing the margins expected from a SaaS company.
This company-wide reboot can certainly be a daunting challenge, but if approached carefully and honestly, addressing key questions up front, communicating, educating, and transparently addressing likely organizational and personnel changes along the way, it is an accomplishment that transforms, even reignites, a company.
This is the first in a series of articles that captures AKF’s observations and first-hand experiences in guiding companies through this process.
Don’t treat this as a simple rewrite of your existing product - answer these questions first…
Any company about to launch into a SaaS migration should first take a long, hard look at their current product, determining what out of the legacy product is not worth carrying forward. Is all of that existing functionality really being used, and still relevant? Prior to any move towards SaaS, the following questions and issues need to be addressed:
Customization or Configuration?
SaaS efficiencies come from many angles, but certainly one of those is having a single codebase for all customers. If your product today is highly customized, where code has been written and is in use for specific customers, you’ve got a tough question to address. Most product variances can likely be handled through configuration, a data-driven mechanism that enables/disables or otherwise shapes functionality for each customer. No customer-specific code from the legacy product should be carried forward unless it is expected to be used by multiple clients. Note that this shift has implications on how a sales force promotes the product (they can no longer promise to build whatever a potential customer wants, but must sell the current, existing functionality) as well as professional services (no customizations means less work for them).
Many customers, even those who accept the improved security posture a cloud-hosted product provides over their own on-premise infrastructure, absolutely freak when they hear that their data will coexist with other customers’ data in a single multi-tenant instance, no matter what access management mechanisms exist. Multi-tenancy is another key to achieving economies of scale that bring greater SaaS efficiencies. Don’t let go of it easily, but if you must, price extra for it.
Who owns the data?
Many products focus only on the transactional set of functionality, leaving the analytics side to their customers. In an on-premise scenario, where the data resides in the customers’ facilities, ownership of the data is clear. Customers are free to slice & dice the data as they please. When that data is hosted, particularly in a multi-tenant scenario where multiple customers’ data lives in the same database, direct customer access presents significant challenges. Beyond the obvious related security issues is the need to keep your customers abreast of the more frequent updates that occur with SaaS product iterations. The decision is whether you replicate customer data into read-only instances, provide bulk export into their own hosted databases, or build analytics into your product?
All of these have costs - ensure you’re passing those on to your customers who need this functionality.
May I Upgrade Now?
Today, do your customers require permission for you to upgrade their installation? You’ll need to change that behavior to realize another SaaS efficiency - supporting of as few versions as possible. Ideally, you’ll typically only support a single version (other than during deployment). If your customers need to ‘bless’ a release before migrating on to it, you’re doing it wrong. Your releases should be small, incremental enhancements, potentially even reaching continuous deployment. Therefore, the changes should be far easier to accept and learn than the prior big-bang, huge upgrades of the past. If absolutely necessary, create a sandbox for customers to access new releases, but be prepared to deal with the potentially unwanted, non-representative feedback from the select few who try it out in that sandbox.
Wait? Who Are We Targeting?
All of the questions above lead to this fundamental issue: Are tomorrow’s SaaS customers the same as today’s? The answer? Not necessarily. First, in order to migrate existing customers on to your bright, shiny new SaaS platform, you’ll need to have functional parity with the legacy product. Reaching that parity will take significant effort and lead to a big-bang approach. Instead, pick a subset or an MVP of existing functionality, and find new customers who will be satisfied with that. Then, after proving out the SaaS architecture and related processes, gradually migrate more and more functionality, and once functional parity is close, move existing customers on to your SaaS platform.
To find those new customers interested in placing their bets on your initial SaaS MVP, you’ll need to shift your current focus on the right side of the Technology Adoption Lifecycle (TALC) to the left - from your current ‘Late Majority’ or ‘Laggards’ to ‘Early Adopters’ or ‘Early Majority’. Ideally, those customers on the left side of the TALC will be slightly more forgiving of the ‘learnings’ you’ll face along the way, as well as prove to be far more valuable partners with you as you further enhance your MVP.
The key is to think out of the existing box your customers are in, to reset your TALC targeting and to consider a new breed of customer, one that doesn’t need all that you’ve built, is willing to be an early adopter, and will be a cooperative partner throughout the process.
Our next article on SaaS migration will touch on organizational approaches, particularly during the build-out of the SaaS product, and the paradigm shifts your product and engineering teams need to embrace in order to be successful.
AKF has led many companies on their journey to SaaS, often getting called in as that journey has been derailed. We’ve seen the many potholes and pitfalls and have learned how to avoid them. Let us help you move your product into the 21st century.
Subscribe to the AKF Newsletter
February 20, 2018 | Posted By: Greg Fennewald
You should not buy a home without an inspection by a licensed home inspector and you should not buy a used car without having a mechanic check it out for you. Diligence - it just makes good sense. Similarly, it is prudent to include a technical diligence effort as part of the evaluation for a potential technology company investment.
Diligence Informs Risk Management
Private equity and venture capital firms typically evaluate many areas preceding a potential investment. The business case, legal structure, competitive analysis, product strategy, financial audits and contractual landscape are all examples of diligence deemed necessary prior to an investment. A company with a great product but three years left on an extremely expensive office lease will probably have a lower value. Breaking the lease or living with it until the term expires means higher costs and thus lower EBITDA. A hot start up with an inexperienced CFO that has run on cash-based accounting from day 1 and is rapidly approaching $6 million in annual revenue needs to move to accrual-based accounting. That takes time and effort and possibly a talent search - this affects the value of the investment.
But what about the technical underpinnings of the product itself? A company with a solitary production database and a marketing analyst with access to directly query that database is likely headed for performance and availability incidents. Single points of failure create a high probability of non-availability. Solutions that don’t allow for seamless and elastic scalability may run into either capacity or cost of operations problems.
Preventing these incidents and altering the conditions that enabled them to exist takes time and effort. All of these assessment areas boil down to risk management. Further, understanding the cost of fixing these solutions helps a company understand their true cost of investment. Your investment includes not just the “PIC” or capital that you put into the company - it also includes all the costs to ensure continuing operations of the product that enables that company. A comprehensive diligence effort will prepare the investor to make an informed business decision - know the risks and adjust the value proposition accordingly.
Technology Risk Areas
Technology risks can be grouped into four broad areas - Architecture, Process, Organization, and Security. Each area has several subordinate themes.
Architecture - subordinate themes are availability, scalability, cost control.
• Commodity hardware - Corollas, not Carreras
• Horizontal scalability - scale out, not up
• Design for monitoring - see issues before your customers do
• N+1 design - everything fails eventually
• Design for rollback - minimize the impairment
• Asynchronous design - stateless systems
Process - subordinate themes are engineering, operations, and problem management
• Product management - a product owner should be able add, delay, or deprecate features from an upcoming release
• Metrics - development teams should use effort estimation and velocity measurement metrics to monitor progress and performance
• Development practices - developers should conduct code reviews and be held accountable for unit testing
• Incident management - incidents should be logged with sufficient details for further follow up
• Post mortem - a structured process should be in place to review significant problems, assign action items, and track resolution
Organization - subordinate themes are PDLC (Product Development Lifecycle) structure, product alignment and team composition
• Product or Service Alignment - cross functional teams should be aligned by product or service and understand how their efforts complement business goals
• Agile or Waterfall - if “discovering” the market or choosing the best possible product for a market then Agile is appropriate - if developing to well defined contracts then waterfall may be necessary.
• Team composition - the engineer to QA tester ratio should ideally exceed 3.5:1. Significant deviations may be a sign or trouble or a harbinger of problems to come
• Goals - measurable goals aligned with business priorities should be visible to all with clear accountability
Security - subordinate themes are framework, prevention, detection and response
• Framework - use NIST, ISO, PCI or other regulatory standards to establish the framework for a security program. The standards do overlap, think it through and avoid duplication of effort.
• Policies in place - a sound security program will have multiple security related policies such as employee acceptable use, access controls, data classification, and an incident response plan.
• Security risk matrix - security risks should be graded by their impact, probability of occurrence, and controlling measures
• Business metrics - analysis of business metrics (revenue per minute, change of address, checkout value anomalies, file saves per minute, etc) can develop thresholds for alerting to a potential security incident. Over time, the analysis can inform prevention techniques.
• Response plan - a plan must be in place and must have regular rehearsals.
Technology Cost Impact on Investment Value
Technology costs can have a significant impact on the overall investment value. Strengths and weaknesses uncovered during a technical diligence effort help the investor make the best overall business decision.
Technology costs are normally captured in 2 areas of the income statement, cost of revenue (production environment and personnel) and operating expenses (software development). Technology costs can also affect depreciation (server capital purchases) and amortization (pre-paid licensing and support). These cost areas should be reviewed for unusual patterns or abnormally high or low spend rates. It is also important to understand the term of equipment purchase, software licensing, and support contracts - spend may be committed for several years.
Cost Cautions - tales from the past
• Support for production equipment purchased from a 3d party because the equipment is old and no longer supported by the OEM. Use equipment as long as possible, but don’t risk a production outage.
• Constant software vendor license audits - they will find revenue, but the technology team that leaves their company vulnerable on a recurring basis is likely to have other significant issues.
• Lack of an RFP or benchmarking process to periodically assess the cost effectiveness of hardware, software, hosting, and support vendors. Making a change in one of these areas is not simple, but the technology team should know how much they should pay before a change is better for the company.
A technical diligence effort should also identify the level of technical debt and quantify the amount of engineering resources dedicated to servicing the technical debt.
Technical debt is a conscious choice to take a shortcut in the technology arena - the delta between the desired or intended way and quicker way. The shortcut is usually taken for time to market reasons and is a sound business decision within reason. Technical debt is analogous in many ways to financial debt - a complete lack of it probably means missed business opportunities while an excess means disaster around the corner.
Just like financial debt, technical debt must be serviced, and it is serviced by the efforts of the engineering team - the same team developing the software. AKF recommends 12% to 25% of engineering effort be spent servicing technical debt. Whether that resource allocation keeps the debt static, reduces it, or allows it to grow depends upon the amount of technical debt. It is easy to see how a company delinquent in servicing their technical debt will have to increase the resource allocation to deal with it, reducing resources for product innovation and market responsiveness.
Put It All Together
The investor has made use of several specialists in an overall diligence effort and is digesting the information to zero in on the choice to invest and at what price. The business side looks good - revenue growth, product strategy, and marketing are solid. The legal side has some risks relating to returning a leased office space to its original condition, but the lease has 5 years to run. Now for technology;
• Tech refresh is overdue, so additional investment is needed or a move to the cloud accelerated - either choice puts pressure on thin margins.
• An expensive RDBMS is in use, but the technology team avoids stored procedures and keeps their SQL as vanilla as possible - moving to open source is doable.
• Technical debt service is constantly derailed by feature requests from sales and marketing. Additional resources, hired or contracted, will be needed and will raise the technology run rate. More margin pressure.
• Conclusion - the investment needed to address tech refresh and technical debt changes the investment value. The investor lowers the offer price.
Interested in learning more about technical due diligence? Here are some due diligence do’s and don’ts.
How AKF can help
AKF has conducted hundreds of technical due diligence studies over the last 10 years. One would want an attorney for a legal diligence effort and one would want a technologist for a technical due diligence. AKF does technology right. Read more about our technical due diligence offerings here.
Subscribe to the AKF Newsletter
February 13, 2018 | Posted By: Marty Abbott
Necessary But Insufficient Security Reviews
From a security perspective, tech product companies far too often focus solely on various ISO and/or NIST audits to help inform their view of how they manage risk within their company and their products. The problem with the standards that exist today is that none of them tread deeply enough into the waters of detection and prevention of malicious activities within products. Instead, they focus more on the processes of response, identification, notification, employee access, etc.
While these activities (and audits) are necessary, they are insufficient to ensure that we properly manage risk (and prevent malicious activities) in our products. As we’ve written previously, erecting barriers and hiding behind big walls may make you feel better and help you sleep at night – but it’s not going to keep the bad guys from scaling your walls and taking your stuff.
The Online World is Getting Scarier
Consider the following secular trends for online products:
• A continuing mix-shift of commerce from retail to online. Within the US today, excluding certain goods, this number stands at a meager 9% of total commerce in 2017 up from 1% in 2002. If one excludes extremely high dollar items (vehicles, etc) the percentage of sales is significantly higher. Growing at a slightly higher than linear rate since 2002, this number should easily double within the next 7 years. From the perspective of a malicious hacker, this is a growth in opportunity.
• Developing and established nations outside of N. America and Western Europe continue to invest heavily in STEM-based education.
• Overall employment in many of these countries is comparatively low outside of what Western Nations provide through off-shore contracting opportunities. Combined with recent nationalistic trends and a desire to “keep jobs at home” or not “offshore jobs” there is a strong possibility that demand for offshore agencies will decrease over time.
• Some nations within the set of nations spending heavily on STEM education, have created cyber-institutes promoting cyber and security related warfare capabilities.
• A smaller set of the nations described above have heavily promoted state sponsored cyber warfare initiatives, setting these teams (e.g. the PRNK’s Unit 180) against corporate infrastructure within the United States.
• The barrier to entry for malicious actors to be effective in attacking corporate assets has declined. Hacker communities commonly share exploits and malware, and certain nation-states (e.g. Russia and N. Korea) have contributed to hacking toolsets, thereby decreasing the barrier to entry for a malicious actor and resultingly increasing the supply of said malicious actors.
• Extradition from other countries for crimes committed, especially those with which the US is not allied, is difficult to impossible. View this as a low perceived cost of committing a crime. If you cannot be prosecuted, there is no to low perceived cost of committing the crime.
• Crypto-currency (e.g. Bitcoin) provide a near untraceable means of selling stolen data, or holding systems for ransom.
The resulting forces of these meta or secular trends are clear:
1) The value of being a malicious actor has increased as the supply (in terms of sales/value) continues to increase. View this economically as an increasing opportunity for crime.
2) The barrier to entry to become a malicious actor is decreasing.
3) The cost in terms of prosecution, if performed outside the US is low to zero.
These points combine to make one clear outcome: Cybercrime and cyberterrorism (fraud, malicious use, etc) will rise as a percentage of revenue transacted online.
To help combat this rising malicious activity, we need new models and approaches to help us think about how to Identify and Prevent bad actors from doing horrible things.
Enter the AKF Security Insights Cube.
If It Isn’t Real Time It Is Worthless
The AKF Partners Security Insights Cube is predicated on the notion that all the data it addresses is accessible in near-real-time. This alone is a considerable barrier for many companies. Identifying fraudulent activity after credit cards are processed, for instance, is simply too late. We want to know that bad people are entering our neighborhood and at our door – not that they stole something from our house yesterday.
The lower left corner of the cube is the starting point for any solution – the point at which you are flying blind and have no real time data. Again – getting data from 15 minutes ago or 24 hours ago is as useless in driving a product as it is in driving a car or flying a plane; you simply have no idea what is going on.
The X axis of the cube evaluates the breadth of data available to an organization in real time. The far left is “zero real time data”. Progressing to the right of the axes are increasingly valuable risk related data points from real time key performance indicators like logins, add-to-carts, check-outs, auth activity (and failures), searches, etc. Moving further right, we may keep all session data such that we can interrogate and perform behavioral analysis and pattern matching. The far right of the axis is the point at which we keep absolutely everything, increasing the optionality of how we may interrogate the data for risk management and malicious activity prevention purposes.
The Y axis of the cube evaluates the activities performed upon the X axis data by an organization. Clearly here the X axis sets an upper bound on what’s possible on the Y axis. For instance, it would be hard to understand “Who, What or How” something happened if we didn’t first store session data to be analyzed. From a GDPR perspective, PII can be anonymized if necessary in session information. As with most analytics oriented system, maturity progresses from doing nothing, to “reporting” capabilities that illuminate “what is happening” (typically employing performance indicators), to answering “Who, Why and How” to finally predicting what will happen and preventing malicious activities in real time.
The Z axis of the cube deals simply with the depth, or duration, that data is kept. We rarely suggest that data be kept forever, but there is great value in ensuring that past patterns can be analyzed to create behavior models for scoring risk and blocking activities. A handful of years is typically appropriate for most commerce solutions, slightly more data for fintech solutions.
AKF Partners performs security reviews of technology products. Our approach evaluates security among several dimensions and includes components of NIST and ISO standards, but is tailored to the needs of online product companies.
Subscribe to the AKF Newsletter
February 7, 2018 | Posted By: Pete Ferguson
If you have a premium product, at a premium price, it’s unlikely you would sell it out of a rundown, poorly lighted store that smells vaguely like stale meat. Yet somehow many of us forget to apply that same reasoning when it comes to selling our products online. The availability - and look and feel of your presence online - is your store front.
I’ve long been a fan of Saddleback Leather. However, their motto: “They’ll fight over it when you’re dead” fell short in January. You see, it’s hard for your family to fight over the thing that you can’t even purchase… Saddleback Leather had a completely foreseeable, and absolutely preventable outage. From Dave Munson, the CEO:
“I’ve always dreamt of one day having a really fast and easy website for you to enjoy. So, we decided to leave our slow and clunky old website and start building one on a new and different platform. The contract expired Dec. 30th, 2017, but the new site wasn’t fully ready yet. We flipped the switch anyways and all Gehenna broke loose. The super fast, fun and easy website… wasn’t fast, fun or easy and we wasted a ton of time and irritated the heck out of our favorite people. People couldn’t check out, set up accounts or even add stuff to their carts. So, we paid a ton of money to get our old slow and clunky back again until we get this new site just right. “
To make up for it, last week I received an apology letter sent by “El Presidente” Munson with an 11% off coupon. 11 % because Munson has recently celebrated 11 years of marriage to his wife, Suzzette. As a side note, it’s a perfect example of how to apologize to your customers when you screw up. This guy made a mistake, is paying for it by paying for his old site while continuing to develop the new, and is giving customers discounts with a coupon aptly titled: “IAMSORRY.”
Ironically, as a fan and customer, I don’t recall the old site being slow or terrible. On the contrary, when I visited early in January, their “new and improved” site felt clunky and disjointed. The wrong images were coming up for products and many items reported being “not available.”
In the world of environmental health and safety, “all accidents are preventable” is the holy grail of compliance. We believe that with the right forethought and planning, the same is true with virtually all products and storefronts online.
At AKF we are fond of saying “an accident is a terrible thing to waste.” While the exact details of what went wrong are not disclosed, the motives were:
- They took a concept that presumably worked great in beta testing live without testing under full load.
- Munson made the decision to push out something that wasn’t yet great to save money by exiting a contract by the end of the year.
For similar content on our Growth Blog, click here
The result is lost sales from when the site was down, lost customers who may have been trying the website for their first time and won’t be back, an 11% haircut of sales for the next week, and a fan base - many of whom have been very vocal on FaceBook - that is verbally expressing their disdain to see the company they have counted on for unquestioned quality in the past didn’t settle for quality first this time.
The days of customers quickly forgiving their favorite retailers for not being equally as great online are waning. Make sure you have a solid strategy and the right expertise in your corner when it comes to greatly affecting your customer’s ability to purchase or better interact with your product.
Experiencing growing pains? AKF is here to help! We are an industry expert in technology scalability and due diligence. Put our 200+ years of combined experience to work for you today!
Get this article and others like it by signing up for our newsletter.
Subscribe to the AKF Newsletter
January 29, 2018 | Posted By: Robin McGlothin
The Scale Cube - Architecting for Scale
In every Industry, companies with similar strategies grow differently. Some grow with metronomic regularity to become leaders in their segment. Walmart, Dell, Amazon exemplify this trend. Others grow in fits and starts, often languishing, or at best, get acquired. While several attributes define successful companies, one is often overlooked – their ability to scale.
The Scale Cube is a model for building resilient architectures using patterns and practices that apply broadly to any solution. We developed the cube 11 years ago for our practice and included it in the first edition of “The Art of Scalability” (AKF Partners ).
The Scale Cube (sometimes known as the “AKF Scale Cube” or “AKF Cube”) is comprised of an X-axis, Y-axis, and Z-axis.
• Horizontal Duplication and Cloning (X-Axis )
• Functional Decomposition and Segmentation - Microservices (Y-Axis)
• Horizontal Data Partitioning - Shards (Z-Axis)
The Scale Cube helps teams keep critical dimensions of system scale in mind when solutions are designed. Scalability is all about the capability of a design to support ever growing client traffic without compromising performance. It is important to understand there are no “silver bullets” to designing scalable solutions. An architecture is scalable if each component is scalable. For example, a well-designed solution should be able to scale seamlessly as demand increases and decreases, and be resilient enough to withstand the loss of one or more compute resources.
Most internet enabled products start their life as a single application running on an appserver or appserver/webserver combination and likely communicate with a database. This monolithic design will work fine for relatively small applications that receive low levels of client traffic. However, this monolithic architecture becomes a kiss of death for complex applications.
A large monolithic application can be difficult for developers to understand and maintain. It is also an obstacle to frequent deployments. To deploy changes to one application component you need to build and deploy the entire monolith, which can be complex, risky, time consuming, require the coordination of many developers and result in long test cycles.
Consequently, you are often stuck with the technology choices that you made at the start of the project. In other words, the monolithic architecture doesn’t scale to support large, long-lived applications.
Scaling Solutions with the Scale Cube
The most commonly used approach of scaling an solution is by running multiple identical copies of the application behind a load balancer also known as X-axis scaling. That’s a great way of improving the capacity and the availability of an application.
When using X-axis scaling each server runs an identical copy of the service (if disaggregated) or monolith. One benefit of the X axis is that it is typically intellectually easy to implement and it scales well from a transaction perspective. Impediments to implementing the X axis include heavy session related information which is often difficult to distribute or requires persistence to servers – both of which can cause availability and scalability problems. Comparative drawbacks to the X axis is that while intellectually easy to implement, data sets have to be replicated in their entirety which increases operational costs. Further, caching tends to degrade at many levels as the size of data increases with transaction volumes. Finally, the X axis doesn’t engender higher levels of organizational scale.
Y-axis scaling (think services oriented architecture, micro services or functional decomposition of a monolith) focuses on separating services and data along noun or verb boundaries. These splits are “dissimilar” from each other. Examples in commerce solutions may be splitting search from browse, checkout from add-to-cart, login from account status, etc. In implementing splits, Y-axis scaling splits a monolithic application into a set of services. Each service implements a set of related functionalities such as order management, customer management, inventory, etc. Further, each service should have its own, non-shared data to ensure high availability and fault isolation. Y axis scaling shares the benefit of increasing transaction scalability with all the axes of the cube.
Further, because the Y axis allows segmentation of teams and ownership of code and data, organizational scalability is increased. Cache hit ratios should increase as data and the services are appropriately segmented and similarly sized memory spaces can be allocated to smaller data sets accessed by comparatively fewer transactions. Operational cost often is reduced as systems can be sized down to commodity servers or smaller IaaS instances can be employed.
Whereas the Y axis addresses the splitting of dissimilar things (often along noun or verb boundaries), the Z-axis addresses segmentation of “similar” things. Examples may include splitting customers along an unbiased modulus of customer_id, or along a somewhat biased (but beneficial for response time) geographic boundary. Product catalogs may be split by SKU, and content may be split by content_id. Z-axis scaling, like all of the axes, improves the solution’s transactional scalability and if fault isolated it’s availability. Because the software deployed to servers is essentially the same in each Z axis shard (but the data is distinct) there is no increase in organizational scalability. Cache hit rates often go up with smaller data sets, and operational costs generally go down as commodity servers or smaller IaaS instances can be used.
Like Goldilocks and the three bears, the goal of decomposition is not to have services that are too small, or services that are too large but to ensure that the system is “just right” along the dimensions of scale, cost, availability, time to market and response times.
AKF Partners has helped hundreds of companies, big and small, employ the AKF Scale Cube to scale their technology product solutions. Let us help you succeed and thrive!
Subscribe to the AKF Newsletter
January 23, 2018 | Posted By: Marty Abbott
Technical due diligence of products is about more than the solution architecture and the technologies employed. Performing diligence correctly requires that companies evaluate the solution against the investment thesis, and evaluate the performance and relationship of the engineering and product management teams. Here we present the best practices for technology due diligence in the format of things to do, and things not to do:
1. Understand the Investment/Acquisition Thesis
One cannot perform any type of diligence without understanding the investment/acquisition thesis and equally as important, the desired outcomes. Diligence is meant to not only uncover “what is” or “what exists”, but also identify the obstacles to achieve “what may or can be”. The thesis becomes the standard by which the diligence is performed.
2. Evaluate the Team against the Desired Outcomes
The technology product landscape is littered with the carcasses of great ideas run into the ground with the wrong leadership or the wrong team. Disagree? We ask you to consider the Facebook and Friendster battle. We often joke that the robot apocalypse hasn’t happened yet, and technology isn’t building itself. Great teams are the reasons solutions succeed and substandard teams behind those solutions that fail technically. Make sure your diligence is identifying whether you are getting the right team along with the product/company you acquire.
3. Understand the Tech/Product Relationship
Product Management teams are the engines of products, and engineering teams are the transmission. Evaluating these teams in isolation is a mistake – as regardless of the PDLC (product development lifecycle) these teams must have an effective working relationship to build great products. Make sure your diligence encompasses an evaluation of how these teams work together and the lifecycle they use to maximize product value and minimize time to market.
4. Evaluate the Security Posture
Cyber-crime and fraud is going to increase at a rate higher than the adoption of online solutions pursuant to a number of secular forces that we will enumerate in a future post. As such, it is in your best interest as an investor to understand the degree to which the company is focused on increasing the perceived cost of malicious activity and decreasing the perceived value of said malicious activity. Ensure that your diligence includes evaluating the security focus, spending, approach and mindset of the target company. This need not be a separate diligence for small investments – just ensure that you are comfortable with the spend, attention and approach.
1. Don’t Waste Too Much Time (or money) on Code Reviews
The one thing I know from years of running engineering teams is that anytime an engineer reviews code for the first time she is going to say, “This code is crap and needs to be rewritten.” Code reviews are great to find potential defects and to ensure that code conforms to the standards set forth by the company. But you are unlikely to have the time or resources to review everything. The company is also unlikely to give you unfettered access to all of their code (Google “Sybase Microsoft SQLServer” for reasons why). That leaves you at the whims of the company to cherry-pick what you review, which in turn means you aren’t getting a good representative sample.
Further, your standards likely differ from those of the target company. As such, a review of the software is simply going to indicate that you have different standards.
Lastly, we’ve seen great architecture and terrible code succeed whereas terrible architecture and great code rarely is successful. You may find small code reviews enlightening, but we urge you to spend a majority of your time on the architecture, people and process of the acquisition or investment.
2. Don’t Start a Fight
Far too often technology diligence sessions start in discussion and end in a fight. The people performing the diligence start asking questions in a way that may seem judgmental to the target company. Then the investing/acquiring team shifts from questions to absolute statements that can only be taken as judgmental. There’s simply no room for this. Diligence is clinical – not personal. It’s not a place to prove who is smarter than whom. This dynamic is one of the many reasons it is often a good idea to have a third party perform your diligence: The target company is less likely to feel threatened by the acquiring product team, and the third party is oftentimes more experienced with establishing a non-threatening environment.
3. Don’t Be Religious
In a services oriented world, it really doesn’t matter what code or what data persistence platform comprises a service you may be calling. Assuming that you are acquiring a solution and its engineers, you need not worry about supporting the solution with your existing skillsets. Debates around technology implementations too often come from a place of what one knows (“I know Java, Java rocks, and everything else is substandard”) than what one can prove. There are certainly exceptions, like aging and unsupported technology – but stay focused on the architecture of a solution, not the technology that implements that architecture.
4. Don’t Do Diligence Remotely
As we’ve indicated before, diligence is as much about teams as it is the technology itself. Performing diligence remotely without face to face interaction makes it difficult to identify certain cues that might otherwise be indicators that you should dig more deeply into a certain space or set of questions. Examples are a CTO giving an authoritative answer to a certain question while members her team roll their eyes or slightly shake or bow their heads.
You may also want to read about the necessary components of technical due diligence in our article on optimizing technical diligence.
AKF Partners performs diligence on behalf of a number of venture capital and private equity firms as well as on behalf of strategic acquirers. Whether for a third party view, or because your team has too much on their plate, we can help. Read more about our technical due diligence services here.
Subscribe to the AKF Newsletter
January 13, 2018 | Posted By: Dave Swenson
Sorry, False Alarm…
On January 13, 2018, what felt like an episode of Netflix’s “Black Mirror” unfolded in real life. Just after 8 in the morning, residents and visitors of Hawaii were woken up to the following startling push notification:
Thankfully, the notification was a false alarm, finally retracted with a second notification nearly 40 interminable minutes later.
The amazing, poignant and sobering stories that occurred from those 40 minutes, included people:
- determining which children to spend their last minutes with,
- abandoning their cars on streets,
- sheltering in a lava tube,
- believing and acting as we all would if we believed the end was here.
Unfortunately, this wasn’t a Black Mirror episode and paralyzed an entire state’s population. Thankfully, the alarm was a false one.
A Muted President
As President Trump took office, he introduced a new means for a President to reach his constituents—Twitter, averaging 6 to 7 tweets per day during his first year. On November 2, 2017, many bots that were created to closely monitor the tweets of @realDonaldTrump started reporting that the account no longer existed. Clicking to his account took the user to the above error page.
For a deafening 11 minutes, the nation was unable to listen to its leader, at least via Twitter.
The Hawaiian false alarm was sent by the state’s Emergency Management Agency. Their explanation of the incident was that during a shift change, an employee clicked “the wrong button” while running a missile crisis test, then subsequently clicked through a confirmation prompt (“Are you sure you want to tell 1.5 million people this?”).
Twitter employees had reportedly tried for years to get management attention on ensuring accounts weren’t deleted without proper vetting. The company typically used contractors in the Philippines and Singapore to handle such account administration; Trump’s account was deleted by a German contract worker on his last day at Twitter. Acting on yet-another-Trump-complaint, believing such an important account couldn’t be suspended, the worker’s last action for Twitter was to click the suspend button, and then walked out of the building causing the Twitterverse to read far more into the account’s disappearance than they should have.
In both of these situations, the immediate focus was on the personnel involved in the incident. “Who pushed the button?” is typically always one of the initial questions. Assumptions that a new employee, or rogue worker were behind the incident are common, and both motive and intelligence of all involved are under inspection.
We at AKF Partners constantly preach “An incident is a terrible thing to waste”. Events such as these warp the known reality into “How the shit can that happen??”, causing enough alarm to warrant special attention and focus, if not panic. Yet, all too often we see teams searching frantically to find any cause, blame the most obvious, immediate factor, declare victory, and move on.
“Who pushed the button?” is only one of many questions.
Toyota’s Taichi Ohno, the father of Lean Manufacturing, recognized his team’s habit of accepting the most apparent cause, ignoring (wasting) other elements revealed by an incident, potentially allowing it to be eventually repeated. Ohno (the person, not the exclamation typically uttered during an incident) emphasized the importance of asking “5 Why’s” in order to move beyond the most obvious explanation (and accompanying blame), to peel the onion diving deeper into contributory causes.
Questions beyond the reflexive “What happened?” and “Who did it?” relevant to the false alarm and erroneous account deletion incidents include:
- Why did the system act differently than the individual expected (is there more training required, is the user interface a confusing one)?
- Why did it take so long to correct (is there no playbook for detecting / reversing such a message or key account activity)?
- Why does the system allow such an impactful event to be performed unilaterally, by a single person (what safeguards should exist requiring more than one set of hands?)
- Why does this particular person have such authorization to perform this action (should a non-employee have the ability to delete such a verified, popular and influential account)?
- Why was the possibility of this incident not anticipated and prevented (why were Twitter employee requests for better safeguards ignored for years, why wasn’t the ease of making such a mistake recognized and what other similar mistake opportunities are there)?
Both of these incidents have had an impact far beyond those directly affected (Hawaiian inhabitants or Trump Twitter followers), and have shed light on the need to recognize the world has changed and policies and practices of old might not be enough for today. The ballistic missile false alarm revealed that more controls need to be placed on all mass communication, but also that Hawaii (or anywhere/anyone else) is extremely unprepared for the unthinkable. The use of Twitter as a channel for the President now raises questions over the validity of it as a Presidential record, asks who should control such a channel, and raises concerns on what security is around the President’s account?
Ask 5 Whys, look beyond the immediate impact to find collateral learnings, and take notice of all that an incident can reveal.
AKF Partners have been brought in by over 400 companies to avoid such incidents, and when they do occur, to learn from them. Let us help you.
Subscribe to the AKF Newsletter
January 3, 2018 | Posted By: AKF
One of the most common questions we get is “What are the most common failures you see tech and product teams make?”. To answer that question we queried our database consisting of 11 years of anonymous client recommendations. Here are the top 20 most repeated failures and recommendations:
1) Failing to design for rollback
If you are developing a SaaS platform and you can only make one change to your current process make it so that you can always roll back any of your code changes. Yes, we know that it takes additional engineering work and additional testing to make nearly any change backwards compatible but in our experience that work has the greatest ROI of any work you can do. It only takes one really bad release in which your site performance is significantly degraded for several hours or even days while you attempt to “fix forward” for you to agree this is of the utmost importance. The one thing that is most likely to give you an opportunity to find other work (i.e. “get fired”) is to roll a product that destroys your business. In other words, if you are new to your job DO THIS BEFORE ANYTHING ELSE; if you have been in your job for awhile and have not done this DO THIS TOMORROW.
2) Confusing product release with product success
Do you have “release” parties? Stop it! You are sending your team the wrong message! A release has nothing to do with creating shareholder value and very often it is not even the end of your work with a specific product offering or set of features. Align your celebrations with achieving specific business objectives like a release increasing signups by 10%, or increasing checkouts by 15% or increasing the average sale price of a all checkouts by 12% or increasing click-through-rates by 22%. See #10 below on incenting a culture of excellence. Don’t celebrate the cessation of work – celebrate achieving the success that makes shareholder’s wealthy.
3) Insular product development/engineering
How often does one of your engineering teams complain about not “being in the loop” or “being surprised” by a change? Does your operations team get surprised about some new feature and its associated load on a database? Does engineering get surprised by some new firewall or routing infrastructure resulting in dropped connections? Do not let your teams design in a vacuum and “throw things over the wall” to another group. Organize around your outcomes and “what you produce” in cross functional teams rather than around activities and “how you work”.
4) Over engineering the solution
One of our favorite company mottos is “simple solutions to complex problems”. The simpler the solution, the lower the cost and the faster the time to market. If you get blank stares from peers or within your organization when you explain a design do not assume that you have a team of idiots – assume that you have made the solution overly complex and ask for assistance in resolving the complexity.
5) Allowing history to repeat itself
Organizations do not spend enough time looking at past failures. In the engineering world, a failure to look back into the past and find the most commonly repeated mistakes is a failure to maximize the value of the team. In the operations world, a failure to correlate past site incidents and find thematically related root causes is a guarantee to continue to fight the same fires over and over. The best and easiest way to improve our future performance is to track our past failures, group them into groups of causation and treat the root cause rather than the symptoms. Keep incident logs and review them monthly and quarterly for repeating issues and improve your performance. Perform post mortems of projects and site incidents and review them quarterly for themes.
6) Scaling through 3d parties
Every vendor has a quick fix for your scale issues. If you are a hyper growth SaaS site, however, you do not want to be locked into a vendor for your future business viability; rather you want to make sure that the scalability of your site is a core competency and that it is built into your architecture. This is not to say that after you design your system to scale horizontally that you will not rely upon some technology to help you; rather, once you define how you can horizontally scale you want to be able to use any of a number of different commodity systems to meet your needs. As an example, most popular databases (and NoSQL solutions) provide for multiple types of native replication to keep hosts in synch.
7) Relying on QA to find your mistakes
You cannot test quality into a system and it is mathematically impossible to test all possibilities within complex systems to guarantee the correctness of a platform or feature. QA is a risk mitigation function and it should be treated as such. Defects are an engineering problem and that is where the problem should be treated. If you are finding a large number of bugs in QA, do not reward QA – figure out how to fix the problem in engineering. Consider implementing test driven design as part of your PDLC. If you find problems in production, do not punish QA; figure out how you created them in engineering. All of this is not to say that QA should not be held responsible for helping to mitigate risk – they should – but your quality problems are an engineering issue and should be treated within engineering.
8) Revolutionary or “big bang” fixes
In our experiences, complete re-writes or re-architecture efforts end up somewhere on the spectrum of not returning the desired ROI to complete and disastrous failures. The best projects we have seen with the greatest returns have been evolutionary rather than revolutionary in design. That is not to say that your end vision should not be to end up in a place significantly different from where you are now, but rather that the path to get there should not include “and then we turn off version 1.0 and completely cutover to version 2.0”. Go ahead and paint that vivid description of the ideal future, but approach it as a series of small (but potentially rapid) steps to get to that future. And if you do not have architects who can help paint that roadmap from here to there, go find some new architects.
9) The Multiplicative Effect of Failure
Every time you have one service call another service in a synchronous fashion you are lowering your theoretical availability. If each of your services are designed to be 99.999% available, where a service is a database, application server, application, webserver, etc then the product of all of the service calls is your theoretical availability. 5 calls is (.99999)^5 or 99.995 availability. Eliminate synchronous calls wherever possible and create fault-isolative architectures to help you identify problems quickly.
10) Failing to create and incent a culture of excellence
Bring in the right people and hold them to high standards. You will never know what your team can do unless you find out how far they can go. Set aggressive yet achievable goals and motivate them with your vision. Understand that people make mistakes and that we will all ultimately fail somewhere, but expect that no failure will happen twice. If you do not expect excellence and lead by example, you will get less than excellence and you will fail in your mission of maximizing shareholder wealth.
11) Under-engineering for scale
The time to think about scale is when you are first developing your platform. If you did not do it then, the time to think about scaling for the future is right now. That is not to say that you have to implement everything on the day you launch, but that you should have thought about how it is that you are going to scale your application services and your database services. You should have made conscious decisions about tradeoffs between speed to market and scalability and you should have ensured that the code will not preclude any of the concepts we have discussed in our scalability postings. Hold quarterly scalability meetings where you discuss what you need to do to scale to 10x your current volume and create projects out of the action items. Approach your scale needs in evolutionary rather than revolutionary fashion as in #8 above.
12) “Not Built Here” Culture
We see this all the time. You may even have agreed with point (6) above because you have a “we are the smartest people in the world and we must build it ourselves” culture. The point on relying upon third parties to scale was not meant as an excuse to build everything yourselves. The real point to be made is that you have to focus on your core competencies and not dilute your engineering efforts with things that other companies or open source providers can do better than you. Unless you are building databases as a business, you are probably not the best database builder. And if you are not the best database builder, you have no business building your own databases for your SaaS platform. Focus on what you should be the best at: building functionality that maximizes your shareholder wealth and scaling your platform. Let other companies focus on the other things you need like routers, operating systems, application servers, databases, firewalls, load balancers and the like.
13) A new PDLC will fix my problems
Too often CTO’s see repeated problems in their product development life cycles such as missing dates or dissatisfied customers and blame the PDLC itself.
The real problem, regardless of the lifecycle you use, is likely one of commitment and measurement. For instance in most Agile lifecycles there needs to be consistent involvement from the business or product owner. A lack of involvement leads to misunderstandings and delayed products. Another very common problem is an incomplete understanding or training on the existing PDLC. Everyone in the organization should have a working knowledge of the entire process and how their roles fit within it. Most often, the biggest problem within a PDLC is the lack of progress measurement to help understand likely dates and the lack of an appropriate “product discovery” phase to meet customer needs.
14) We cannot hire great people quickly
Often when growing an engineering team quickly the engineering managers will push back on hiring plans and state that they cannot possibly find, interview, and hire engineers that meet their high standards. We agree that hiring great people takes time and hiring decisions are some of the most important decisions managers can make. A poor hiring decision takes a lot of energy and time to fix. However, there are lots of ways to streamline the hiring process in order to recruit, interview, and make offers very quickly. A useful idea that we have seen work well in the past are interview days, where potential candidates are all invited on the same day. This should be no more than 2 - 3 weeks out from the initial phone screen, so having an interview day per months is a great way to get most of your interviewing in a single day. Because you optimize the interview process people are much more efficient and it is much less disruptive to the daily work that needs to get done the rest of the month. Post interview discussions and hiring decisions should all be made that same day so that candidates get offers or letters of regret quickly; this will increase the likelihood of offers being accepted or make a professional impression on those not getting offers. The key is to start with the right answer that “there is a way to hire great people quickly” and the myriad of ways to make it happen will be generated by a motivated leadership team.
15) It is a SPOF (Single Point of Failure) but we can recover it onto another host quickly
A SPOF is a SPOF and even if the impact to the customer is low it still takes time away from other work to fix right away in the event of a failure. And there will be a failure…because that is what hardware and software does, it works for a long time and then eventually it fails! As you should know by now, it will fail at the most inconvenient time. It will fail when you have just repurposed the host that you were saving for it or it will fail while you are releasing code. Plan for the worst case and have it run on two hosts (we actually recommend to always deploy in pools of three or more hosts) so that when it does fail you can fix it when it is most convenient for you.
16) No Business Continuity plan
No one expects a disaster but they happen and if you cannot keep up normal operations of the business you will lose revenue and customers that you might never get back. Disasters can be huge like Hurricane Katrina, where it take weeks or months to relocate and start the business back up in a new location. Disasters can also be small like a winter snow storm that keeps everyone at home for two days or a HAZMAT spill near your office that keeps employees from coming to work. A solid business continuity plan is something that is thought through ahead of time, before you need it, and explains to everyone how they will operate in the event of an emergency. Perhaps your satellite office will pick up customer questions or your tech team will open up an IRC channel to centralize communication for everyone capable of working remotely. Do you have enough remote connections through your VPN server to allow for remote work? Spend the time now to think through what and how you will operate in the event of a major or minor disruption of your business operations and document the steps necessary for recovery.
17) No Disaster Recovery Plan
Even worse, in our opinion, than not having a BC plan is not having a disaster recovery plan. If your company is a SaaS based company, the site and services provided is the company’s sole source of revenue. Moreover, with a SaaS company, you hold all the data for your customers that allow them to operate. When you are down they are more than likely seriously impaired in attempting to conduct their own business. When your collocation facility has a power outage that takes you completely down, think 365 Main datacenter in San Francisco, how many customers of yours will leave and never return? Our preference is to provide your own disaster recovery through multiple collocation facilities but if that is not yet technically feasible nor in the budget, at a minimum you need your code, executables, configurations, loads, and data offsite and an agreement in place for both collocation services as well as hosts. Lots of vendors offer such packages and they should be thought of as necessary business insurance.
18) No Product Management team or person
In a similar vein to #13 above, there needs to be someone or a team of people in the organization who have responsibility for the product lines. They need to have authority to make decisions about what features get added, which get delayed, and which get deprecated (yes, we know, nothing ever gets deprecated but we can always hope!). Ideally these people have ownership of business goals (see #10) so they feel the pressure to make great business decisions.
19) It is okay to bring the site down to roll code
Just because you call it scheduled maintenance does not mean that it does not count against your uptime. While some of your customers might be willing to endure the frustration of having the site down when they want to access it in order to get some new features, most care much more about the site being available when they want it. They are on the site because the existing features serve some purpose for them; they are not there in the hopes that you will rollout a certain feature that they have been waiting on. They might want new features, but they rely on existing features. There are ways to roll code, even with database changes, without bringing the site down. It is important to put these techniques and processes in place so that you plan for 100% availability instead of planning for much less because of planned down time.
20) Firewalls, Firewalls, Everywhere!
We often see technology teams that have put all public facing services behind firewalls while many go so far as to put firewalls between every tier of the application. Security is important because there are always people trying to do malicious things to your site, whether through directed attacks or random scripts port scanning your site. However, security needs to be balanced with the increased cost as well as the degradation in performance. It has been our experience that too often tech teams throw up firewalls instead of doing the real analysis to determine how they can mitigate risk in other ways such as through the use of ACLs and LAN segmentation. You as the CTO ultimately have to make the decision about what are the best risks and benefits for your site.
Like this article? Subscribe to the newsletter here.
Whatever you do, don’t make the mistakes above! AKF Partners helps companies avoid costly product and technology mistakes - and we’ve seen most of them. Give us a call or shoot us an email. We’d love to help you achieve the success you desire.
Subscribe to the AKF Newsletter
1 2 3 > Last ›