March 15, 2019 | Posted By: Marty Abbott
I’m no Nostradamus when it comes to predicting the future of technology, but some trends are just too blatantly obvious to ignore. Unfortunately, they are only easy to spot if you have a job where you are allowed (I might argue required) to observe broader industry trends. AKF Partners must do that on behalf of our clients as our clients are just too busy fighting the day-to-day battles of their individual businesses.
One such very concerning probability is the eventual decline – and one day potentially the elimination of – the colocation (hosting) business. Make no mistake about it – if you lease space from a colocation provider, the probability is high that your business will need to move locations, move providers, or experience a service disruption soon.
Let’s walk through the factors and trends that indicate, at least to me, that the industry is in trouble, and that your business faces considerable risks:
Sources of Demand for Colocation (Macro)
Broadly speaking, the colocation industry was built on the backs of young companies needing to lease space for compute, storage, and the like. As time progressed, more established companies started to augment privately-owned data centers with colocation facilities to avoid the burden of large assets (buildings, capital improvements and in some cases even servers) on their balance sheets.
The first source of demand, small companies, has largely dried up for colocation facilities. Small companies seek to be “asset light” and most frequently start their businesses running on Infrastructure as a Service (IaaS) providers (AWS, GCP, Azure etc.). The ease and flexibility of these providers enable faster time to market and easier operational configuration of systems. Platform as a Service (PaaS) offerings in many cases eliminate the need for specialized infrastructure and DevOps skill sets, allowing small companies to focus limited funds on software engineers that will help create differentiating experiences and capabilities. Five years ago, successful startups may have started migrating into colocation facilities to lower costs of goods sold (COGS) for their products, and in so doing increase gross margin (GM). While this is still an opportunity for many successful companies, few seem to take advantage of it. Whether due to vendor lock-in through PaaS services, or a preference for speed and flexibility over expenses, the companies tend to stay with their IaaS provider.
Larger, more established companies continue to use colocation facilities to augment privately-owned data centers. That said, in most cases technology refresh results in faster and more efficient compute. When the rate of compute increases faster than the rate of growth in transactions and revenue within these companies, they start to collapse the infrastructure assets back into wholly-owned facilities (assuming power, space, and cooling of the facilities are not constraints). Bringing assets back in-house to owned facilities lowers costs of goods sold as the company makes more efficient use of existing assets.
Simultaneously these larger firms also seek the flexibility and elasticity of IaaS services. Where they have new demand for new solutions, or as companies embark upon a digital transformation strategy, they often do so leveraging IaaS.
The result of these forces across the spectrum of small to large firms reduces overall demand. Reduced demand means a contraction in the colocation industry overall.
Minimum Efficient Scale and the Colocation Industry (Micro)
Data centers are essentially factories. To achieve optimum profitability, fixed costs such as the facility itself, and the associated taxes, must be spread across the largest possible units of production. In the case of data centers, this means achieving maximum utilization of the constraining factors (space, power, and cooling capacity) across the largest possible revenue base. Maximizing utilization against the aforementioned constraints drops the LRAC (long run average cost) as fixed costs are spread across a larger number of paying customers. This is the notion of Minimum Efficient Scale in economics.
As demand decreases, on a per data center (colocation facility) basis, fixed costs per customer increases. This is because less space is used, and the cost of the facility is allocated across fewer customers. At some point, on a per data center basis the facility becomes unprofitable. As profits dwindle across the enterprise, and as debt service on the facilities becomes more difficult, the colocation provider is forced to shut down data centers and consolidate customers. Assets are sold or leases terminated with the appropriate termination penalties.
Customers who wish to remain with a provider are forced to relocate. This in turn causes customers to reconsider colocation facilities, and somewhere between a handful to a majority on a per location basis will decide to move to IaaS instead. Thus begins a vicious cycle of data center shutdowns engendering ever-decreasing demand for colocation facilities.
Excluding other macroeconomic or secular events like another real estate collapse, smaller providers start to exit the colocation service industry. Larger providers benefit from the exit of smaller players and the remaining data centers benefit from increased demand on a dwindling supply, allowing those providers to regain MES and profitability.
Does the Trend Stop at a Smaller Industry?
We are likely to continue to see the colocation industry exist for quite some time – but it will get increasingly smaller. The consolidation of providers and dwindling supply of facilities will stop at some point, but just for a period. Those that remain in colocation facilities will either not have the means or the will to move. In some cases, a lack of skills within the remaining companies will keep them “locked into” a colocation. In other cases, competing priorities will keep an exit on the distant horizon. These “lock in” factors will give rise to an opportunity for the colocation industry to increase pricing for a time.
But make no mistake about it, customers will continue to leave – just at a decreased rate relative to today’s departures. Some companies will simply go out of business or contract in size and depart the data centers. Others will finally decide that the increasing cost of service is too high.
While it’s doubtful that the industry will go away in its entirety, it will be small and comparatively expensive. The difference between costs of colocation and costs to run in an IaaS solution will start to dwindle.
Risks to Your Firm
The risk to your firm comes in three forms, listed in increasing order of risk as measured by a function of probability of occurrence and impact upon occurrence:
- Pricing of service per facility. If you are lucky enough that your facility does not close, there is a high probability that your cost for service will increase. This in turn increases your cost of goods sold and decreases your gross margin.
- Risk of facility dissolution. There exists an increasingly high probability that the facilities in which you are located will be shut down. While you are likely to be given some advance notice, you will be required to move to another facility with the same provider, or another provider. There is both a real cost in the move, and an opportunity cost associated with service interruption and effort.
- Risk of firm as a going concern. Some providers of colocation services will simply exit the business. In some cases, you may be given very little notice as in the case of a company filing bankruptcy. Service interruption risk is high.
Strategies You Must Employ Today
In our view, you have no choice but to ensure that you are ready and able to easily move out of colocation facilities. Whether this be to existing data centers you own, IaaS providers, or a mix matters not. At the very least, we suggest your development and operations processes enable the following principles:
- Environment Agnosticism: Ensure that you can run in owned, lease, managed service, or IaaS locations. Ensuring consistency in deployment platforms, using container technologies and employing orchestration systems all aid in this endeavor.
- Hybrid Hosting: Operate out of at least two of the following three options as a course of normal business operations: owned data centers, leased/colocation facilities, IaaS.
- Dynamic Allocation of Demand: Prove on at least a weekly basis that you can operate any functionality within your product out of any location you operate – especially those that happen to be located within colocation facilities.
AKF Partners helps companies think through technology, process, organization, location, and hosting strategies. Let us help you architect a hybrid hosting solution that limits your risk to any single provider.
March 6, 2019 | Posted By: Marty Abbott
Our typical assessment goes something like this: We spend 1.75 days with an energized product team comprised of engineers and product managers. We feel the passion and engagement of the team, and we see the signs of stress the team endures in trying to meet product delivery schedules. Then we meet the security person. The person is not very stressed, does not have delivery goals and seems to steal the energy from the room.
This is an angry post. I won’t apologize for that. I’m fed up with the ridiculous way that most CISOs approach security, and you should be too. The typical approach, in more than 80% of the companies with which we work, results in slow time to market, increased response time for transactions, higher than necessary cost, lower than appropriate availability, and no demonstrable difference in the level of security related incidents. Put another way, most CISOs reduce rather than increase shareholder value.
Here is a handy tool to identify value-destroying CISOs. We’ve compiled 5 common statements uttered by CISOs out of touch with the needs of the corporation, customers and shareholders. Each of these assumes that the CISO both believes the statement and acts consistently with the statement (a high probability chance). Each statement is followed by why it is bad, what it is costing you, and what (besides replacing the person) you should do.
“No, we can’t do that”
Wrong answer. The purpose of security is to help move the company towards the right business outcome, as quickly as possible, with the right level of risk for the endeavor. This means that in some cases, where the probability and impact of compromise is low, we simply do not apply much “security” to the solution. In other cases, where probability and impact is high, we put measures in place to reduce probability and impact.
We never say “No” to an ethical outcome. Rather than saying “No” to a path, we attempt to ensure the path includes the right level of risk adjustment to make it successful.
The right answer: “That may work if we make a few modifications to help reduce the following probability of an incident, and reduce the impact of an incident should it occur. Here’s how my team can help you”.
“My job is to keep us out of the paper”
Incomplete, and as a result, incorrect answer. The role of security is to ethically maximize profits, by ensuring that risk is commensurate with the endeavor. A great security team helps decrease the probability of incidents and decrease the impact of an incident should one arise. This in turn helps ensure that profits achieve an appropriate level. “Keeping us out of the paper” makes no reference to the fiduciary responsibility of providing returns to shareholders. It’s further not a path to that responsibility, as there is no tie to enabling or maintaining profitability. Hell, if you want to achieve this goal, all you have to do is go out of business!
The right answer: “My job is to ensure an appropriate risk approach to stakeholder return – specifically through helping us to achieve an appropriate risk posture for our initiatives that meets our time to market, revenue and profitability objectives”.
“We have to work the process” or “Put in a request – we’ll review it”
Wrong answer. Security isn’t a “team” in and of itself because it can’t “score” and “win”. Security is part of a larger team responsible for delighting end customers such that we can ensure appropriate profitability through superior and appropriately secure offerings. To that end, security needs to adopt an agile mindset – specifically “individual interactions over processes and tools” and be embedded within the value creation teams that are the lifeblood of a company. Further, product and operational teams need to “own” and be accountable for the risk associated with the solutions they create and maintain. Software and servers need to be secure consistent with the needs of the business and end users.
The right answer: “Let’s get together immediately and make this work. Furthermore, how could I have ensured we had folks involved earlier to help you get this out faster?”
“My job is governance” or “We need the right governance model”
Wrong answer. The implication of the above statement is that the value the security team provides is in judging the work of others and ensuring compliance. The best security teams understand that compliance is best achieved through embedding themselves within product teams – not sitting in judgement of them during the process. The fastest and highest value creating teams are those that understand and have the right tools to accomplish the necessary outcomes embedded within their teams (read the related white paper here).
The right answer: “We embed people in teams to get the right answer quickly and get the product to market faster. Good governance happens in real time and in many cases can be automated within the product development lifecycle (CICD pipelines for instance).
“We have to slow things down a bit”
Wrong answer. If you have a compelling growth business with a big vision, you are going to attract competitors. If you have competitors, getting the right solution to market quickly is critical to your success. No one “wins” by playing only defense or by being just “careful”. You win by making the best risk measured decisions quickly and releasing a good enough product before your competitors.
The right answer: “We have to figure out how to make the right decisions early without slowing down our delivery.”
Another way to determine if you have the “right” security team and correct security leader is to evaluate the number of security related engineers embedded within teams relative to the number of people evaluating approaches or “governing”. If the number of “governing” employees exceeds the number of embedded employees, you have a problem. Ideally, we want a very small number of “brakes” (governance) and more security product “gas pedals” (embedded). The latter results in better decisions and better product security in real time. The former results in delay, overhead, and an ivory tower.
We perform dozens of security assessments and technical due diligence reviews every year. Contact us and let us help!
February 22, 2019 | Posted By: Greg Fennewald
On multiple occasions over the years, we have heard our clients state a use case they want to avoid in product design sessions or as a reason for architectural choices made for existing products. These use cases can be given more credence than they deserve based on objective data – they become boogeyman legends, edge cases that can result in poor architectural choices.
One of our clients was debating the benefit of multiple live sites with customers pinned to the nearest site to minimize latency. The availability benefits of multiple live sites are irrefutable, but the customer experience benefit of less latency was questioned. This client had millions of clients spread across the country. The notion of pinning a client to a “home” site nearest them raised the question of “what happens when the client travels across the country?”. The answer is to direct them to that same home site. That client will experience more latency for the duration of the visit. The proportion of clients that spend 50% of their time on either coast is vanishingly small – keep it simple. Have a work around for clients that permanently move to a location served by a different site – client data resides in more than one location for DR purposes anyway, right?
This client also had hundreds of service specialists that would at times access client accounts and take actions on their behalf, and these service specialists were located near the west coast. Objections were made based on the latency a west coast service specialist would encounter when acting on the behalf of an east coast client whose data was hosted near the east coast. Millions of clients. Hundreds of service specialists. The math is not hard. The needs of the many outweigh the needs of the few.
A different client had a concern about data consistency upon new user registration for their service. To ensure a new customer could immediately transact, the team decided to deploy a single authentication server to preclude the possibility of a transaction following registration hitting an authentication server that had not yet received the registration data. Intentionally deploying a SPOF should have raised immediate objections but did not. The team deployed a passive backup server that required manual intervention to work.
The new user process flow was later revealed to be less than 3% of the overall transactions. 97% of the transactions suffered an impactful outage along with the 3% new users when the SPOF authentication server failed. Designing a workaround for the new users while employing a write master with multiple, load balanced read only slaves would provide far better availability. The needs of the many outweigh the needs of the few.
It is important to remain open minded during early design sessions. It is also important to follow architectural principles in the face of such use cases. How can one balance potentially conflicting concepts?
• Ask questions best answered with objective data.
• Strive for simplicity, shave with Occam’s Razor
• Validate whether the edge case is a deal breaker for the product owner
• Propose a work around that addresses the edge case while optimizing the architecture for the majority use case and sound principles.
Catering to the needs of the business while adhering to architectural standards is a delicate balancing act and compromises will be made. Everyone looks at the technologist when a product encounters a failure. Know when to hold the line on sound architectural principles that safeguard product availability and user experience. The product owner must understand and acknowledge the architectural risks resulting from product design decisions. The technologist must communicate these risks to the product owner along with objective data and options. A failure to communicate effectively can lead to the tail wagging the dog – do not let that happen.
With 12 years of product architecture and strategy experience, AKF Partners is uniquely positioned to be your technology partner. Learn more here.
December 6, 2018 | Posted By: AKF
The United States Special Operations Command (SOCOM) has 5 truths that they live by. All of them have guided SOCOM to be the premier fighting force in the world and they all center around people:
-Humans are more important than hardware
Never prioritize your equipment over people. Hardware is always easily replaced. The same is true in technology. Buying a new server is easy. Replacing good engineers is not.
-Quality is better than quantity
Anytime you have the choice between one fully capable person and two okay people, always choose quality over quantity. It gives you the ability to still get the job done and provides more overhead for future hires.
-Special Operations Forces cannot be mass produced
Just like SOCOM, engineers cannot be made in a factory. Yes there are boot camps that help to fill the void in engineering roles but even those graduates still require time to get up to speed on what it is your organization does. No two companies are the same.
-Competent Special Operations Forces cannot be created after emergencies occur
If you hire engineers based upon a disaster, they will be ready for tomorrow’s disaster, not today’s. Always be prepared.
-Most special operations require non-SOF assistance
Engineers can’t work alone. If QA, Security, Marketing, etc. are not present and supporting, then the work that is produced will not be what is required.
It takes an entire team beyond just engineers to get the job done.
People are what make organizations great, nothing else. You may be known for a product, but it is your people that developed it and keep it running.
Seed, Feed and Weed
AKF has a principle of Seed, Feed and Weed. Broken down into each component:
Every day new technology and talent is being produced. Additionally skilled individuals are always looking for their next challenge, particularly if they currently work in a non-favorable company. Some companies use entities to find and evaluate new talent to see if they would be a good fit. Others rely heavily upon their own brand to attract talent. And still others network throughout the community to make sure they are constantly aware of locally available skill.
Given that every day distance from the workplace matter less and less for employees, using all three can be very integral to staying viable. If using an outside entity to evaluate new talent isn’t feasible, make sure that you have a solid method in place for determining not only people’s skill, but their fit into your culture. If you don’t have a strong name to fall back on, make your software the differentiator. You may not be the biggest mass producer of widgets in the world, but maybe you are the only one who can make them blue. Gravitate towards that for attracting talent. And lastly if you can’t network then take some tips from marketing. Figure out how to get out there and sell yourself and your company. If you can’t talk excitedly about what you do with someone you meet, then they won’t be excited to work for you.
There are different facets to Feed. An important aspect of Feed is feedback. Continually communicating with your employees about how they are doing (and not just the wrong stuff) gives them a sense of direction for where you think they should be going. It also lets them know that you pay attention and value their work. Beyond feedback is also training. Managers need to be aware of the required training that someone needs to do their job. If they are managing your databases and have zero database knowledge that is an issue. But employees need more than just required training. They need something that stimulates them. Leaders ensure that employees grow beyond what their role defines them as. If they want to take Underwater Basket Weaving, well encourage it. It may not always benefit the company, but if it benefits the employee it usually has a way of having positive returns for everyone.
Last, and certainly not least, is Weed. No one likes to be the bad guy. Firing people is tough - but getting rid of people with bad behaviors or repeatedly poor results is part of our duty as managers and executives. Have you tried to make them better? Maybe they were just in the wrong position or didn’t have the right training. Or worse, they just weren’t a good fit for the company. If someone has bad behavior it can be very difficult to remedy. If someone is lacking experience, then it just takes training and mentorship. 9 times out of 10, experience can be fixed. 9 times out of 10 it is just better for everyone to part ways with behavioral issues.
The key to a good workplace is identifying which is most important to you at any given time as sometimes it can shift. Which of the three to apply more focus on depends on the condition of your team.
So why are people so important? Unless you have already developed Artificial Intelligence they are the ones doing the work. The Special Operations Community is aware that money can be used to purchase new equipment but the right people take time and nurturing. The same should be applied for your business. If availability and scalability are your concerns, money can get you all the rack space or cloud storage you need. But only time and effort can get you the right people to manage it.
December 4, 2018 | Posted By: Marty Abbott
During the last 12 years, many prospective clients have asked us some variation of the following questions: “What makes you different?”, “Why should we consider hiring you?”, or “How are you differentiated as a firm?”.
The answer has many components. Sometimes our answers are clear indications that we are NOT the right firm for you. Here are the reasons you should, or should not, hire AKF Partners:
Operators and Executives – Not Consultants
Most technology consulting firms are largely comprised of employees who have only been consultants or have only run consulting companies. We’ve been in your shoes as engineers, managers and executives. We make decisions and provide advice based on practical experience with living with the decisions we’ve made in the past.
Engineers – Not Technicians
Educational institutions haven’t graduated enough engineers to keep up with demand within the United States for at least forty years. To make up for the delta between supply and demand, technical training services have sprung up throughout the US to teach people technical skills in a handful of weeks or months. These technicians understand how to put building blocks together, but they are not especially skilled in how to architect highly available, low latency, low cost to develop and operate solutions.
The largest technology consulting companies are built around programs that hire employees with non-technical college degrees. These companies then teach these employees internally using “boot camps” – creating their own technicians.
Our company is comprised almost entirely of “engineers”; employees with highly technical backgrounds who understand both how and why the “building blocks” work as well as how to put those blocks together.
Product – Not “IT”
Most technology consulting firms are comprised of consultants who have a deep understanding of employee-facing “Information Technology” solutions. These companies are great at helping you implement packaged software solutions or SaaS solutions such as Enterprise Resource Management systems, Customer Relationship Management Systems and the like. Put bluntly, these companies help you with solutions that you see as a cost center in your business. While we’ve helped some partners who refuse to use anyone else with these systems, it’s not our focus and not where we consider ourselves to be differentiated.
Very few firms have experience building complex product (revenue generating) services and platforms online. Products (not IT) represent nearly all of AKF’s work and most of AKF’s collective experience as engineers, managers and executives within companies. If you want back-office IT consulting help focused on employee productivity there are likely better firms with which you can work. If you are building a product, you do not want to hire the firms that specialize in back office IT work.
Business First – Not Technology First
Products only exist to further the needs of customers and through that relationship, further the needs of the business. We take a business-first approach in all our engagements, seeking to answer the questions of: Can we help a way to build it faster, better, or cheaper? Can we find ways to make it respond to customers faster, be more highly available or be more scalable? We are technology agnostic and believe that of the several “right” solutions for a company, a small handful will emerge displaying comparatively low cost, fast time to market, appropriate availability, scalability, appropriate quality, and low cost of operations.
Cure the Disease – Don’t Just Treat the Symptoms
Most consulting firms will gladly help you with your technology needs but stop short of solving the underlying causes creating your needs: the skill, focus, processes, or organizational construction of your product team. The reason for this is obvious, most consulting companies are betting that if the causes aren’t fixed, you will need them back again in the future.
At AKF Partners, we approach things differently. We believe that we have failed if we haven’t helped you solve the reasons why you called us in the first place. To that end, we try to find the source of any problem you may have. Whether that be missing skillsets, the need for additional leadership, organization related work impediments, or processes that stand in the way of your success – we will bring these causes to your attention in a clear and concise manner. Moreover, we will help you understand how to fix them. If necessary, we will stay until they are fixed.
We recognize that in taking the above approach, you may not need us back. Our hope is that you will instead refer us to other clients in the future.
Are We “Right” for You?
That’s a question for you, not for us, to answer. We don’t employ sales people who help “close deals” or “shape demand”. We won’t pressure you into making a decision or hound you with multiple calls. We want to work with clients who “want” us to partner with them – partners with whom we can join forces to create an even better product solution.
November 20, 2018 | Posted By: AKF
“Quality in a service or product is not what you put into it. It’s what the customer gets out of it.” Peter Drucker
The Importance of QA
High levels of quality are essential to achieving company business objectives. Quality can be a competitive advantage and in many cases will be table stakes for success. High quality is not just an added value, it is an essential basic requirement. With high market competition, quality has become the market differentiator for almost all products and services.
There are many methods followed by organizations to achieve and maintain the required level of quality. So, let’s review how world-class product organizations make the most out of their QA roles. But first, let’s define QA.
According to Wikipedia, quality assurance is “a way of preventing mistakes or defects in products and avoiding problems when delivering solutions or services to customers. But there’s much more to quality assurance.”
There are numerous benefits of having a QA team in place:
- Helps increase productivity while decreasing costs (QA HC typically costs less)
- Effective for saving costs by detecting and fixing issues and flaws before they reach the client
- Shifts focus from detecting issues to issue prevention
Teams and organizations looking to get serious about (or to further improve) their software testing efforts can learn something from looking at how the industry leaders organize their testing and quality assurance activities. It stands to reason that companies such as Google, Microsoft, and Amazon would not be as successful as they are without paying proper attention to the quality of the products they’re releasing into the world. Taking a look at these software giants reveals that there is no one single recipe for success. Here is how five of the world’s best-known product companies organize their QA and what we can learn from them.
Google: Searching for best practices
How does the company responsible for the world’s most widely used search engine organize its testing efforts? It depends on the product. The team responsible for the Google search engine, for example, maintains a large and rigorous testing framework. Since search is Google’s core business, the team wants to make sure that it keeps delivering the highest possible quality, and that it doesn’t screw it up.
To that end, Google employs a four-stage testing process for changes to the search engine, consisting of:
- Testing by dedicated, internal testers (Google employees)
- Further testing on a crowdtesting platform
- “Dogfooding,” which involves having Google employees use the product in their daily work
- Beta testing, which involves releasing the product to a small group of Google product end users
Even though this seems like a solid testing process, there is room for improvement, if only because communication between the different stages and the people responsible for them is suboptimal (leading to things being tested either twice over or not at all).
But the teams responsible for Google products that are further away from the company’s core business employ a much less strict QA process. In some cases, the only testing done by the developer responsible for a specific product, with no dedicated testers providing a safety net.
In any case, Google takes testing very seriously. In fact, testers’ and developers’ salaries are equal, something you don’t see very often in the industry.
Facebook: Developer-driven testing
Like Google, Facebook uses dogfooding to make sure its software is usable. Furthermore, it is somewhat notorious for shaming developers who mess things up (breaking a build or causing the site to go down by accident, for example) by posting a picture of the culprit wearing a clown nose on an internal Facebook group. No one wants to be seen on the wall-of-shame!
Facebook recognizes that there are significant flaws in its testing process, but rather than going to great lengths to improve, it simply accepts the flaws, since, as they say, “social media is nonessential.” Also, focusing less on testing means that more resources are available to focus on other, more valuable things.
Rather than testing its software through and through, Facebook tends to use “canary” releases and an incremental rollout strategy to test fixes, updates, and new features in production. For example, a new feature might first be made available only to a small percentage of the total number of users.
Canary Incremental Rollout
By tracking the usage of the feature and the feedback received, the company decides either to increase the rollout or to disable the feature, either improving it or discarding it altogether.
Amazon: Deployment comes first
Like Facebook, Amazon does not have a large QA infrastructure in place. It has even been suggested (at least in the past) that Amazon does not value the QA profession. Its ratio of about one test engineer to every seven developers also suggests that testing is not considered an essential activity at Amazon.
The company itself, though, takes a different view of this. To Amazon, the ratio of testers to developers is an output variable, not an input variable. In other words, as soon as it notices that revenue is decreasing or customers are moving away due to anomalies on the website, Amazon increases its testing efforts.
The feeling at Amazon is that its development and deployment processes are so mature (the company famously deploys software every 11.6 seconds!) that there is no need for elaborate and extensive testing efforts. It is all about making software easy to deploy, and, equally if not more important, easy to roll back in case of a failure.
Spotify: Squads, tribes and chapters
Spotify does employ dedicated testers. They are part of cross-functional teams, each with a specific mission. At Spotify, employees are organized according to what’s become known as the Spotify model, constructed of:
- Squads. A squad is basically the Spotify take on a Scrum team, with less focus on practices and more on principles. A Spotify dictum says, “Rules are a good start, but break them when needed.” Some squads might have one or more testers, and others might have no testers at all, depending on the mission.
- Tribes are groups of squads that belong together based on their business domain. Any tester that’s part of a squad automatically belongs to the overarching tribe of that squad.
- Chapters. Across different squads and tribes, Spotify also uses chapters to group people that have the same skillset, in order to promote learning and sharing experiences. For example, all testers from different squads are grouped together in a testing chapter.
- Guilds. Finally, there is the concept of a guild. A guild is a community of members with shared interests. These are a group of people across the organization who want to share knowledge, tools, code and practices.
Spotify Team Structure
Testing at Spotify is taken very seriously. Just like programming, testing is considered a creative process, and something that cannot be (fully) automated. Contrary to most other companies mentioned, Spotify heavily relies on dedicated testers that explore and evaluate the product, instead of trying to automate as much as possible. One final fact: In order to minimize the efforts and costs associated with spinning up and maintaining test environments, Spotify does a lot of testing in its production environment.
Microsoft: Engineers and testers are one
Microsoft’s ratio of testers to developers is currently around 2:3, and like Google, Microsoft pays testers and developers equally—except they aren’t called testers; they’re software development engineers in test (or SDETs).
The high ratio of testers to developers at Microsoft is explained by the fact that a very large chunk of the company’s revenue comes from shippable products that are installed on client computers & desktops, rather than websites and online services. Since it’s much harder (or at least much more annoying) to update these products in case of bugs or new features, Microsoft invests a lot of time, effort, and money in making sure that the quality of its products is of a high standard before shipping.
What you can learn from world-class product organizations? If the culture, views, and processes around testing and QA can vary so greatly at five of the biggest tech companies, then it may be true that there is no one right way of organizing testing efforts. All five have crafted their testing processes, choosing what fits best for them, and all five are highly successful. They must be doing something right, right?
Still, there are a few takeaways that can be derived from the stories above to apply to your testing strategy:
- There’s a “testing responsibility spectrum,” ranging from “We have dedicated testers that are primarily responsible for executing tests” to “Everybody is responsible for performing testing activities.” You should choose the one that best fits the skillset of your team.
- There is also a “testing importance spectrum,” ranging from “Nothing goes to production untested” to “We put everything in production, and then we test there, if at all.” Where your product and organization belong on this spectrum depends on the risks that will come with failure and how easy it is for you to roll back and fix problems when they emerge.
- Test automation has a significant presence in all five companies. The extent to which it is implemented differs, but all five employ tools to optimize their testing efforts. You probably should too.
Bottom line, QA is relevant and critical to the success of your product strategy. If you’d tried to implement a new QA process but failed, we can help.
November 20, 2018 | Posted By: Roger Andelin
Diagnosing the cause of poor performance from your engineering team is difficult and can be costly for the organization if done incorrectly. Most everyone will agree that a high performing team is more desirable than a low performing team. However, there is rarely agreement as to why teams are not performing well and how to help them improve performance. For example, your CFO may believe the team does not have good project management and that more project management will improve the team’s performance. Alternatively, the CEO may believe engineers are not working hard enough because they arrive to the office late. The CMO may believe the team is simply bad and everyone needs to be replaced.
Often times, your CTO may not even know the root causes of poor performance or even recognize there is a performance problem until peers begin to complain. However, there are steps an organization can take to uncover the root cause of poor performance quickly, present those findings to stakeholders for greater understanding, and take steps that will properly remove the impediments to higher performance. Those steps may include some of the solutions suggested by others, but without a complete understanding of the problem, performance will not improve and incorrect remedies will often make the situation worse. In other words, adding more project management does not always solve a problem with on time delivery, but it will add more cost and overhead. Requiring engineers to start each day at 8 AM sharp may give the appearance that work is getting done, but it may not directly improve velocity. Firing good engineers who face legitimate challenges to their performance may do irreversible harm to the organization. For instance, it may appear arbitrary to others and create more fear in the department resulting in unwanted attrition. Taking improper action will make things worse rather than improve the situation.
How can you know what action to take to fix an engineering performance problem? The first step in that process is to correctly define and agree upon what good performance looks like. Good performance is comprised of two factors: velocity and value.
Velocity is defined as the speed at which the team works and value is defined as achievement of business goals. Velocity is measured in story points which represent the amount of work completed. Value is measured in business terms such as revenue, customer satisfaction or conversion. High performing engineering teams work quickly and their work has a measurable impact on business goals. High performing teams put as much focus on delivering a timely release as they do on delivering the right release to achieve a business goal.
Once you have agreement on the definition of good engineering performance, rate each of your engineering teams against the two criteria: velocity and value. You may use a chart like the one below:
Once each team has been rated, write down a narrative that justifies the rating. Here are a few examples:
Bottom Left: Velocity and Value are Low
“My requests always seem to take a long time. Even the most simple of requests takes forever. And, when the team finally gets around to completing the request, often times there are problems in production once the release is completed. These problems have negatively impacted customers’ confidence in us so not only are engineers not delivering value – they are eroding it!”
Upper/Middle Left: Velocity is Good and Value is Low
“The team does get stuff done. Of course I’d like them to go faster, but generally speaking they are able to get things done in a reasonable amount of time. However, I can’t say if they are delivering value – when we release something we are not tracking any business metrics so I have no way of knowing!”
Upper Right: Velocity is High and Value is High
“The team is really good. They are tracking their velocity in story points and have goals to improve velocity. They are already up 10% over last year. Also, they instrument all their releases to measure business value. They are actively working with product management to understand what value needs to be delivered and hypothesize with the stakeholders as to what features will be best to deliver the intended business goal. This team is a pleasure to work with.”
Unknown Velocity and Unknown Value
“I don’t know how to rate this team. I don’t know their velocity; its always changing and seems meaningless. I think the team does deliver business value, but they are not measuring it so I cannot say if it is low or high.”
With narratives in hand it’s time to begin digging for more data to support or invalidate the ratings.
Diagnosing Velocity Problems
Engineering velocity is a function of time spent developing. Therefore, the first question to answer is “what is the maximum amount of time my team is able to spend on engineering work under ideal conditions?”
This is a calculated value. For example, start with a 40 hour work week. Next, assuming your teams are following an Agile software development process, for each engineering role subtract out the time needed each week for meetings and other non-development work. For individual contributors working in an Agile process that number is about 5 hours per week (for stand up, review, planning and retro). For managers the number may be larger. For each role on the team sum up the hours. This is your ideal maximum.
Next, with the ideal maximum in hand, compare that to the actual achievement. If your teams are not logging hours against their engineering tasks, they will need to do this in order to complete this exercise. Evaluate the gap between the ideal maximum and the actual. For example, if the ideal number is 280 hours and the team is logging 200 hours, then the gap is 80 hours. You need to determine where that 80 hours is going and why. Here are some potential problems to consider:
- Teams are spending extra time in planning meetings to refine requirements and evaluating effort.
- Team members are being interrupted by customer incidents which they are required to support.
- The team must support the weekly release process in addition to their other engineering tasks.
- Miscellaneous meeting are being called by stakeholders including project status meetings and updates.
As you dig into this gap it will become clear what needs to be fixed. The results will probably surprise you. For example, one client was faced with a software quality problem. Determined to improve their software quality, the client added more quality engineers, built more unit tests, and built more automated system tests. While there is nothing inherently wrong with this, it did not address the root cause of their poor quality: Rushing. Engineers were spending about 3-4 hours per day on their engineering tasks. Context switching, interruptions and unnecessary meetings eroded quality engineering time each day. As a result, engineers rushing to complete their work tasks made novice mistakes. Improving engineering performance required a plan for reducing engineering interruptions, unnecessary meetings, and enabling engineers to spend more uninterrupted time on their development tasks.
At another client, the frequency of production support incidents were impacting team velocity. Engineers were being pulled away from their daily engineering tasks to work on problems in production. This had gone on so long that while nobody liked it, they accepted it as normal. It’s not normal! Digging into the issue, the root cause was uncovered: The process for managing production incidents was ineffective. Every incident was urgent and nearly every incident disrupted the engineering team. To improve this, a triage process was introduced whereby each incident was classified and either assigned an urgent status (which would create an interruption for the team) or something lower which was then placed on the product backlog (no interruption for the team). We also learned the old process (every incident was urgent) was in part a response to another velocity problem; stakeholders believed that unless something was considered urgent it would never get fixed by the engineering team. By having an incident triage process, a procedure for when something would get fixed based on its urgency, the engineering team and the stakeholders solved this problem.
At AKF, we are experts at helping engineering teams improve efficiency, performance, fixing velocity problems, and improving value. In many cases, the prescription for the team is not obvious. Our consultants help company leaders uncover the root causes of their performance problems, establish vision and execute prescriptions that result in meaningful change. Let us help you with your performance problems so your teams can perform at their best!
November 15, 2018 | Posted By: AKF
“but in this world nothing can be said to be certain, except death and taxes.”
...and data breaches. Given the era that Benjamin Franklin lived in the concept of a data breach was far from any possibility. But in the world we live in it is a certainty. Death, taxes and data breaches. Welcome to the 21st Century.
So how did we get here? The death and taxes is for someone else to explain, but the data breach I will help flesh out. The following article will begin to explore the world we live in where we know data breaches are not something you hope never happens, but something you prepare for to happen. Following this article, in the coming weeks I will explore what can be done when the inevitable does occur.
Data Breaches in the 21st Century
“My system is completely secure,” says the guy who is already breached and just doesn’t know it.
Why is a data breach such a certainty in these days? It comes down to four areas: similarity, interconnections, users and motive.
In 2015 Windows had a great marketing plan to upgrade as many older OSes to the current release: offer the upgrade for free. Issues with upgrading (or tricking users to upgrade against their will) aside this built a quick base for Windows 10 and quickly allowed Windows 10 to overtake version 7 in December 2017 as the most adopted version of Windows. Couple this with the fact that Windows is one of the highest used OSes and you now have a nicely populated target base.
This isn’t to say that Windows machines are more susceptible than other machines, but that given their popularity and the scheduled release of updates, malicious people are able to identify the weaknesses being patched and target machines that are slower to update. In an ideal world patches would be applied in a timely manner but there always extenuating circumstances that keep this from happening. So now if your POS (Point of Sale) system is several patches behind there is an exploit that can target its weakened state.
Don’t feel like being breached and exploited via the internet? Never get on the internet. Simple answer, but not a feasible one given the world in which we live.
At AKF we have a tenet of Build vs. Buy. From a cost perspective it doesn’t make sense to build something that you know very little about if a 3rd party already offers it for a reasonable price. But cost isn’t the only decision to weigh when it comes to connecting to a 3rd party. Risk is another major factor. Is the interaction between your system and the 3rd party enough to help insulate you from their potential compromise? Integrations through API usually help solve this issue, but sometimes a more thorough coupling of the software is necessary.
So now being reliant on an additional entity (or even more) in that 3rd party helps create another vector with which to be compromised. And to top it off, you usually don’t have any insight into their security posture. They may be obligated to provide you with quarterly security scans, but that doesn’t mean they don’t turn off their highly vulnerable machines prior to each scan.
Congratulations on having an extremely secure system that doesn’t rely on 3rd parties being secure as well. You are now brought down by an employee who thinks they won a raffle.
If this all sounds like a horrible “Choose Your Own Adventure” then you are in the right mindset. It doesn’t matter what you do to protect your systems because you have Users. This isn’t to say that all Users can’t be trusted but there are degrees to how much trust they should have. And whether inadvertent or purposeful they are an extremely susceptible entry point for a breach to occur. Advanced threats are getting smarter and smarter at crafting emails that get past basic email filters and once opened, create a backdoor for them to access the system. Once persistent call backs are established, all traffic now looks like the User is generating it internally and most security allows User initiated traffic a higher degree of freedom.
Have you ever had a bone to pick with a company and didn’t care about the legal ramifications of what you did? Hopefully not. But that segment does exist. Whether you inadvertently wronged a former customer, at least according to them, or you have something that someone else covets, they are going to move hell and high water to get it. The only thing worse than a malicious actor casting a wide net in the hopes of getting a compromise to stick, is someone specifically targeting your business. It can become an obsession for them.
Maybe they want access to the banking records you protect, or they would just like to see you embarrassed, this is a worst case scenario for a business. Someone who refuses to stop until they compromise your system. They will use everything available, leveraging similarity, interconnections and your Users to gain access.
You’ve Been Breached
Congratulations! You’ve been breached?!? Not really the accolade you were looking for, but one you need to accept. They say the first step is Acceptance, so if you’ve made it this far, you’re where you need to be. Don’t believe you are breached, or will be in the future? Feel free to read the article again and start to really ask yourself if you are secure as you think you are. The above are just small snippets of the overall vulnerability you may have.
-Don’t use Windows? Well Linux doesn’t guarantee not being compromised.
-Not connected to anyone? Possible if you are brick and mortar store that only accepts cash.
-You employ the savviest Users? Everyone makes a mistake from time to time.
-Never upset someone or owned something they could want? You aren’t a business then.
So what comes next? Well for that there is a lot of articles out there explaining how to help shore up your system. One such article comes from our very own Larry Steinberg, Are you compromised? The important thing is to pick an area where you feel that you are weakest at and go from there. More often than not this revolves around user training. But maybe checking off some items from the Australian Cyber Security Centre’s Essential Eight will help.
Ultimately you should have the best ideas on how to help secure your system, but if you find that you may need some assistance looking at you product holistically, with security in mind, AKF can help.
November 1, 2018 | Posted By: Pete Ferguson
Eat Your Own Dog Food
Eating your own dog food is a common phrase that is cynical from the start – unless you like eating dog food! A more positive, but often overused cliche, is “Be the Customer.” Regardless of how you want to phrase it, the goal is to create solutions that win your customers (which by the way hopefully include your engineers!) over.
We recently had an opportunity to walk in the customer’s shoes of one of our clients and it was painfully obvious within our first minute that user input to the ultimate design of the product was not considered. The methodology for feedback involved Post-it notes and paper forms as opposed to a simple feedback button on the application appliance. The end users were frustrated and reoccurring problems require creative manipulation by the person closest to the customer while software developers are insulated from valuable input.
The rise and fall of companies from top dog to B player (or worse) occurs at an ever quickening pace. Some companies get a second chance, but it takes a lot of effort to get even close to catching up. The best scenario of course is to never lose the number one spot, and the key to staying ahead lies in understanding your customers and innovating and providing appealing solutions for them to be successful.
Tenure, stock options, and other “Golden Handcuffs” are meant for retention, but can often backfire into lulling employees into a comfortable complacency.
So, how do you combat complacency or customer disservice? Many companies take creative routes to hold hackathons and other contests to improve user experience. The best companies allow their customers to vote on which products should be prioritized in the pipeline. But the most effective path to success is to ensure an open dialogue between engineers and those on the front lines using your products.
When I was at eBay there was an issue that had been dogging (no pun intended) customers for a long time and was frustrating customer service agents to no end. John Donahoe, then CEO, was visiting the customer service location in Utah and at a lunch Q&A the frustration came out. John had someone contact the California-based engineers responsible during the luncheon and arranged to have them all fly to the CS center the next morning if not that evening. It was communicated to John that the flights home for the engineers at the end of the week were sold out so he rearranged his travel and took them back on the corporate jet.
I was a driver to get them to the executive terminal at the airport. John raced out of the car and stood at the foot of the red carpet to salute and shake the engineer’s hands after a few grueling days of eating their dog food by sitting on customer calls and meeting with very frustrated agents.
The message was clear to all involved, “no more finger pointing” – engineers were tied at the hip with customer service reps on the front lines with the customers.
Avoid complacency by ensuring your developers and product managers walk in the customer’s shoes and hear from customers regularly. The better informed they are, the better the solutions they will develop.
Redefine the Definition of “Done”
An important aspect of incorporating customers into the development process is an OKRs (Objectives [at AKF we prefer “Outcomes”] and Key Results) focus. What is the desired customer behavior for new features, fixing existing features, etc.? It’s the big “so what” question that needs to be asked often.
If you are trying to increase customer engagement by 10% for a new product or service, then the project is not “done” because of a code release for a new product or features – it is “done” when customer engagement is increased by 10%! So that is when you have the party, not when code is released.
Word usage may seem trivial, but it is our experience that clarity must be uniform and consistent with actions. Team members pay attention to the little things and make the correlations between the words a CEO speaks at an All Hands and what behaviors are observed day in and day out. Having goals around work performed instead of changed customer behavior will likely result in a lot of code being released with very little “so what” for your customers which provides space for an upstart or competitor to edge their way in. If AOL, WebCrawler, Yahoo, AskJeeves, or Excite had provided great consistent search for their customers, Google wouldn’t have been able to take over and dominate. And if Google doesn’t continue to provide great search, the next “Google” will find a space to wiggle in and dominate in the future.
Customers have a balance of wanting their current needs filled with a look to the future. Be clear on what “done” means for your customers on each project and hold off celebrating success – not just effort exerted – until your customers can see that you are done.
Stay Focused on the Bigger Picture
The now infamous quote from Henry Ford is “If I had asked people what they wanted, they would have said faster horses.”
Often we see teams getting microfocused on a fringe case and creating solutions for the minority of end users. While your product will demonstrate value for a few, you may miss the boat for the majority of your customers and they may move on.
I recall touring a facility in the US for a security software integrator. I walked by a set of cubicles with hundreds of security cameras set up and asked what they were doing. I was told the team of 30+ engineers were testing each camera to ensure it works with their software.
Meanwhile their software was not capable of two-factor authentication, which was fast becoming a major blindside for their organization. If making sure every brand of camera was really a profit center, they should have outsourced it offshore for the same result for a fraction of the cost and put their top engineers on something with customer value and profitability. I doubt supporting hundreds of cameras was a major differentiator - certainly not enough to tie up top-paid engineers. As the consumer if I knew they supported 30-60 cameras perfectly, I’d be fine with picking one from the list. Lack of two-factor authentication was causing major roadblocks for my team in getting infosec approval to continue using their software.
It is important to step back often and look at where teams are exerting the most effort and to ask the simple question “why are we doing this?” If your answer is “because we’ve always done it that way” you are likely not maximizing customer value. If your reasoning aligns with how to maximize customer value for longer term value, then you are on track.
Stay Two Steps Ahead
One of my first jobs was as an apprentice for a building contractor. Chris had a very small crew and specialized in high-end home remodeling and additions. My first day on the job he said “see that pile of trash?” Yes, I replied. “See those dumpsters out that window?” Yes, I said again. “Stop standing around and get to work!” Over the summer I learned that when we showed up on a new job site the saws and compressor and hoses and nail guns needed to be set up as soon as his truck slowed to a stop. My job was to anticipate what we would be doing next and make sure the right equipment was set up and ready to go before we needed it.
Steve Job’s famous modernization of Henry Ford’s faster horses quote: “It’s really hard to design products by focus groups. A lot of times, people don’t know what they want until you show it to them.”
Much to the complaint of many customers, Jobs drug everyone into USB with the original iMac (while eliminating the floppy disk) and Apple has continued to drag customers into faster adoption of Bluetooth, SSDs, and now USB-C. For the most part, the gambles have paid off and customers adopt and enjoy single cable interfaces, faster transfer speeds, etc. (and Apple generates a lot of revenue on dongles for themselves and others …).
Agile is all about quickly adapting and changing. Similarly, as you look at where your customers are today, don’t lose site of where you need to take them tomorrow. Provide the vision two steps ahead of where you are today.
Look, it is easy – and entertaining – to get lost in pet projects that provide a challenge and are good career development, but if they aren’t providing customer value, save these projects for for when there is overflow time. In any large corporation it can be easy to lose sight of customer and end user needs and get lost in endless meetings, pet projects, and other seemingly urgent, but not important, activities.
The main thing is to keep the main thing the main thing … and your customers/end users are THE Main Thing! Agile development provides the ongoing opportunity to comb the backlog and prioritize projects that have the most customer (shareholder, and hopefully employee) value. So make sure the efforts of your team and your organization are laser focused on what will immediately provide the maximum customer value. This will provide the needed profits to hire more staff and provide room to add in non-functional requirements and R&D projects as part of the larger, ongoing development process.
Let us help you improve your main thing!
For a good laugh, visit demotivators.com.
October 12, 2018 | Posted By: Bill Armelin
Understanding Technical Debt
During the course of our client engagements, there are a few common topics or themes that are always discussed, and the clients themselves usually introduce them. One such area is technical debt. Every team has it, every team believes they have too much of it, and every team struggles to explain why it’s important to address it.
Let’s start by defining what technical debt means. It is the difference between doing something the “desired” or “best” way and doing something quickly (i.e. reduce time to market). The difference results in the company taking on “debt” within the solution. Technical debt requires acting with forethought. In other words, you only assume technical debt knowingly and with commission. Acts of omission (forgetting to plan or do something) do not count as debt. Our partners in business may think we are hiding the truth if we do not clearly delineate the difference between debt (known assumptions) and mistakes, failures or other issues related to maintenance.
The following list provides examples of things that are not tech debt:
- Software defects (unless we decide to NOT fix them for an extended period of time – but defects are still human failures – not debt.)
- Failures in design that are not previously tagged as debt.
- Failures to identify scalability bottle necks.
- Poor choices in technology components that fail to scale.
- Failure to properly identify infrastructure failures, or high failure rates of vendors in infrastructure.
A Financial Analogy for Tech Debt
When you hear the words “technical debt”, it invokes a negative connotation. However, the judicious use of tech debt is a valuable addition to your product development process. Tech debt is analogous to financial debt. Companies can raise capital to grow their business by either issuing equity or issuing debt. Issuing equity means giving up a percentage of ownership in the company and dilutes current shareholder value. Issuing debt requires the payment of interest but does not give up ownership or dilute shareholder value. Issuing debt is good, until you can’t service it. Once you have too much debt and cannot pay the interest, you are in trouble.
Tech debt operates in the same manner. Companies use tech debt to defer performing work on a product. As we develop our minimum viable product, we build a prototype, gather feedback from the market, and iterate. The parts of the product that didn’t meet the definition of minimum or the decisions/shortcuts made during development represent the tech debt that was taken on to get to the MVP. This is the debt that we must service in later iterations. In fact, our definition of done must include the servicing of the resulting tech debt. Taking on tech debt early can pay big dividends by getting your product to market faster. However, like financial debt, you must service the interest. If you don’t, you will begin to see scalability and availability issues. At that point, refactoring the debt becomes more difficult and time critical. It begins to affect your customers’ experience.
Many development teams have a hard time convincing leadership that technical debt is a worthy use of their time. Why spend time refactoring something that already “works” when you could use that time to build new features customers and markets are demanding now? The danger with this philosophy is that by the time technical debt manifests itself into a noticeable customer problem, it’s often too late to address it without a major undertaking. It’s akin to not having a disaster recovery plan when a major availability outage strikes. To get the business on-board, you must make the case using language business leaders understand – again this is often financial in nature. Be clear about the cost of such efforts and quantify the business value they will bring by calculating their ROI. Demonstrate the cost avoidance that is achieved by addressing critical debt sooner rather than later - calculate how much cost would be in the future if the debt is not addressed now. The best practice is to get leadership to agree and commit to a certain percentage of development time that can be allocated to addressing technical debt on an on-going basis. If they do, it’s important not to abuse this responsibility. Do not let engineers alone determine what technical debt should be paid down and at what rate – it must have true business value that is greater than or equal to spending that time on other activities.
Just as with debt that a company assumes, in and of itself, technical debt is not bad. It can be looked at as a leveraging tool to optimize the technology resources in the short term - delaying a hardware tech refresh or the release date for HTML 5. Delaying attention to address technical issues allows greater resources to be focused on higher priority endeavors. The absence of technical debt probably means missed business opportunities– use technical debt as a tool to best meet the needs of the business. However, excessive technical debt will cause availability and scalability issues, and can choke business innovation (too much engineering time dealing with debt rather than focusing on the product).
Develop a technology balance sheet and profit and loss (income) statement to discuss tech debt with the business in a manner they understand – finance. Let’s first look at the balance sheet, where Assets = Liabilities + Equity. Our assets are the engineering time spent creating the product. Liabilities are the principle of the tech debt (i.e. the difference between “desired” and “actual.” Equity is the remainder, or the engineering resources spent creating the product while not contributing to tech debt.
Here is an example of a technology balance sheet:
To further the financial analogy, we need to have a technology P&L statement. Here, the interest on tech debt is the difficulty or increased level of effort in modifying something in subsequent releases. This manifests as a reduction in developer productivity per value created. The more debt you take on or less principle you pay down, the higher your interest payment becomes, and the cost to the organization.
Dedicating resources on an ongoing basis to service technical debt can be a challenging discussion with the business. Resources are always limited and employing them in the manner which best benefits the business is a critical business priority decision. Similar to the notion of debt within business, you should never take on technical debt without a plan to pay the interest (increased future cost of development) and principal (fixing the difference between appropriate and as-is). Relating technical debt to financial debt can help those outside of your technology organization grasp the concept and understand the need to keep technical debt under control.
One way to make the concept of debt real is to estimate, for any debt item, the amount of “interest” one will need to pay in the future to modify the solution in question.
- For the benefit of time to market, you decide to “hard code” a number of “display strings” that you’d rather set aside in a resource file to modify and translate later.
- You save 2 weeks of development time, creating a 2-week liability on your balance sheet. You have a 2-week principal to fix.
- You estimate that for all future string modifications (or translations) it will take an additional day of development. Your interest is 1 day, payable for each modification.
Just as retiring all financial liabilities at once does not make good business sense, trying to wipe out technical debt in one fell swoop is a bad idea. Continuous service to the technical debt is required to prevent technical liabilities from wiping out technical equity. An informed decision to increase debt service to reduce the principal will result in more productive product development time (smaller debt requires less on-going service). A short-term decision to reduce tech debt service in favor of a critical product launch may be viable if not used often. Keep track of both your principal (balance sheet) and your interest payments (income statement). Use these to help your business partners with debt related decisions.
Do NOT mix the cost of defects, or other infrastructure and software mistakes with tech debt. Doing so creates two very big problems:
- It becomes harder for the technology team to learn from past mistakes. Mistakes are mistakes and we should use them as learning opportunities. Debt is taken thoughtfully. Track them separately and treat them differently.
- Using the debt term for non-debt related items, will lower the level of trust between you and the business. Businesses don’t for instance “mistakenly” take on debt. Mixing these terms can cause relationship problems.
Additionally, be clear about how you define technical debt, so time spent paying it down is not commingled with other activities. Bugs in your code are not technical debt. Refactoring your code base to make it more scalable, however, would be. A good test is to ask if the path you chose was a conscious or unconscious decision. Meaning, if you decided to go in one direction knowing that you would later need to refactor. You are making a specific decision to do or not to do something knowing that you will need to address it later. Bugs are found in sloppy code, and that is not tech debt, it is just bad code.
Prioritizing Tech Debt
So how do you decide what tech debt should be addressed and how do you prioritize? If you have been tracking work with Agile storyboards and product backlogs, you should have an idea where to begin. Also, if you track your problems and incidents like we recommend, then this will show elements of tech debt that have begun to manifest themselves as scalability and availability concerns. Set a budget and begin paying down the debt. If you are working on less 12%, you are not spending enough effort. If you are spending over 25%, you are probably fixing issues that have already manifested themselves, and you are trying to catch up. Setting an appropriate budget and maintaining it over the course of your development efforts will pay down the interest and help prevent issues from arising.
Taking on technical debt to fund your product development efforts is an effective method to get your product to market quicker. But, just like financial debt, you need to take on an appropriate amount of tech debt that you can service by making the necessary interest and principle payments to reduce the outstanding balance. Failing to set an appropriate budget will result in a technical “bankruptcy” that will be much harder to dig yourself out of later.
Tech Debt Takeaways
Here is a list of our tech debt takeaways:
Want help reducing your tech debt? We can help.
‹ First < 4 5 6 7 8 > Last ›