AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Category » Engineering

Common Cloud Misconceptions

Over the course of the last year, we have seen several of our clients either start exploring or make plans to move their SaaS products to the “Cloud” or an IaaS provider. We thought we would share some of the misconceptions we sometimes see and our advice.

- I can finally focus on product development and software engineering and not worry about this infrastructure stuff.
The notion that IaaS providers like Amazon have eliminated your worries about infrastructure is only partially true. You may not need to understand everything about designing an infrastructure with bare metal but you need to make sure you understand how your virtual configuration in the Cloud will affect your product. IaaS helps us to quickly deploy infrastructure for our products but it doesn’t eliminate the need for good high availability and fault tolerance design. There are several levers you can pull within an IaaS console and design decisions that will impact your products performance. To ensure good design and configuration, its ultra important that your SaaS product engineering team is made up of talent that has expertise in distributed application architecture design, infrastructure, security, and networking. Having this knowledge will help you design a high performing, fault isolated product for your business.

- Going to the cloud pretty much guarantees me high availability because of auto scaling.
Going to the cloud will provide you with the ability to scale quickly as load increases but it will not provide you with high availability. For example, if you have a monolithic code base that you deployed and you are pushing to production on a regular basis, there is a pretty good chance you will introduce a defect at some point that impacts the availability of your entire service and business. We advise our clients to split their applications appropriately, deploy the services to separate instances, and, assuming you are using Amazon, configure them to run across multiple zones within a region at a minimum (preferably across regions). This allows you to focus dedicated teams to the individual services and reduce the likelihood of introducing a defect that takes down your system.

- The Cloud will be cheaper than a collocation or managed hosting provider.
There are several factors that need to be considered before you can confirm that is cost effective. You should look closely at load on your servers. If your servers are not serving traffic around the clock, it may be better from a cost perspective for you to buy and maintain your own infrastructure in a collocation or in an existing data center you may have. The economics of this decision is changing rapidly as IaaS pricing is declining due to the competition in the industry. A simple spreadsheet exercise will help determine if the move to Cloud would be cost effective for your business.

- The Cloud isn’t secure so we better not use it.
The cloud isn’t necessarily what makes or breaks security around your SaaS product. Many believe that public cloud services like Amazon’s EC2 service isn’t secure. First off, you are far more likely to experience a security breach because of an employee’s actions (either intentionally or unintentionally) than caused by an infrastructure provider. Secondly, Amazon likely has invested much more in security at various layers, including their physical data centers, than most companies we see who have their own data centers. Amazon has designed the infrastructure to isolate customer instances and you can also choose to take advantage of Amazon Virtual Private Cloud that can be configured to create an isolated network. There are various options for encrypting all of your data as well. This only touches the surface for security design options you have and they continue to be enhanced everyday. You can see why it’s important to staff your team with an engineer who has experience in this space.

If you are looking to move to Cloud, don’t rush into the decision. Do your homework and make sure it’s right for your business. Make sure you have the talent that has experience with the technology that will get you there and run your operations. Once you make the leap, you will have to live with it for a while.


Enablement

One of the most important aspects of managing a successful technology organization is ensuring that you are practicing & instilling the concept of enablement at all levels. This concept applies to both the product/service you are producing and for people. A good example for your organization is enabling decision-making at the lowest levels possible. I have often seen this represented as “delegation” but I believe that enablement of decision-making is a more powerful concept than delegation which is driven from the top-down. I recently had the opportunity to lead a large infrastructure team and one of the first changes we made was breaking apart into reasonable sized PODs with the primary purpose of ensuring that decisions for the product & technology were driven from the bottom-up while guidance was flowing in from various stakeholders. Many teams practice a flavor of Agile but without enabling each POD to make the appropriate decisions you will run into organizational scalability problems.

The allure of IaaS & PaaS is firmly rooted in the concept of enablement. Self-service is an amazing if you are a DBA, developer or even the end user of your product. The cloud may not be suitable for your needs but don’t let that stop your organization from thinking strategically about bringing those processes & technologies “in-house” for scalability reasons. Implemented correctly, the infrastructure and platform you are building should enable the users and not hinder them, as is sometimes the case. Reducing the number of dependencies between technology teams for launching products is not only good for cycle time of product launches but also critical in scaling up.

Consider making enablement part of your technology team’s DNA and you will likely see that employee morale, productivity and other metrics like NPS will rise.


The Biggest Mistake with Agile

At least 75% of the dev shops that we see are using some form of Agile. Very few are following a pure form of any specific flavor i.e. Scrum, Extreme Programming, etc. but rather most are using some hybrid method. Some teams measure velocity while other don’t. Some teams have dedicated ScrumMasters while others have the engineering managers perform this role. While most team’s processes could be tweaked, none of these are the real problem.

Scrum team

The biggest mistake companies make implementing Agile and thus the cause of most of their problems is they don’t understand that Agile is a business process, not a software development methodology. Thus, the business owners or their delegates, product managers, must be involved at every step.

We’ve argued before that Agile teams must sit together because communication degrades at a rate of square the distance. Not having product managers with the Agile team involved in the entire process and (if you’ve moved from a Waterfall methodology) not having detailed specifications, is the worst possible scenario. Developers either need someone siting beside them to help with product decisions (Agile) or a detailed spec to work from (Waterfall).

Agile is a business process which requires the business to be involved in the product development process. It does not mean you get to stop writing specs and not be involved.


2 comments

Quality Products

The question was raised to us recently, how can you tell if you are building quality products? I think this is an interesting question because you can answer it from two different perspectives – internally or externally. Quality is a fairly nebulous bucket that can include simple bugs, broken functionality, poorly designed flows, and even performance problems. For this discussion we can borrow the ISO 8402-1986 definition of quality as “the totality of features and characteristics of a product or service that bears its ability to satisfy stated or implied needs.” In short, if our service or product is not meeting the needs of our customers then it is falling short with regards to quality.

The external perspective of quality comes from your customers. Signs of poor quality products include customers leaving, customers complaining (on forums or to customer service reps), or even slow adoption rates by new customers. All of these are sure signs that you’re doing something wrong. If you haven’t revamped major functionality or a competitor hasn’t launched some new feature set then functionality is probably not to blame and you should look at the overall quality of your product. The problem with using customers to verify the quality of your product is that once they’ve experienced poor quality, it’s too late. They’ve already had the bad experience and might be considering leaving or not recommending your product to friends.

The internal perspective of quality includes what your teams are doing to ensure they are delivering a high quality product. This does not just include QA and in fact, you cannot test quality into the product. Quality relies much more on good product management and solid engineering than testing. Here are a few processes that tend to improve quality and indicate that you are likely building quality products:

  • Cross Functional Design – Whether you are following JAD/ARB processes or DevOps, getting technical operations engineers, quality assurance engineers, and software developers together to design will produce a much higher quality and more scalable product.
  • Coding Standards – This includes branching strategies, documentation requirements, formatting (e.g. tabs vs spaces), variable naming conventions, patterns, frameworks, technology stacks, and a defined process for introducing new standards or questioning the use of existing ones.
  • Unit Tests – In my opinion, one of the most important measures of quality is the creation of unit tests. Achieving a 75% or greater code coverage in unit tests seems to indicate the delivery of a higher quality product. Creating these tests ahead of time in a Test Driven Development (TDD) methodology is even better.
  • Code Reviews – One of AKF’s mantras is “Everything Fails” and this includes people. Buddy checking code is a simple way to help catch these failures. It also serves to help spread knowledge across the team about different ways of coding and solving problems.

We all rely on customers to provide feedback on our product’s quality to some degree but we are better off building and designing quality into our products in the first place. Focus on the internal perspective of quality and then use the external perspective to validate your insights.


Engineering Metrics

A topic that often results in great debate is “how to measure engineers?” I’m a pretty data driven guy so I’m a fan of metrics as long as they are 1) measured correctly 2) used properly and 3) not taken in isolation. I’ll touch on these issues with metrics later in the post, let’s first discuss a few possible metrics that you might consider using. Three of my favorite are: velocity, efficiency, and cost.

  • Velocity – This is a measurement that comes from the Agile development methodology. Velocity is the aggregate of story points (or any other unit of estimate that you use e.g. ideal days) that engineers on a team complete in a sprint. As we will discuss later, there is no standard good or bad for this metric and it is not intended to be used to compare one engineer to another. This metric should be used to help the engineer get better at estimating, that’s it. No pushing for more story points or comparing one team to another, just use it as feedback to the engineers and team so they can get more predictable in their work.
  • Efficiency – The amount of time a software developer spends doing development related activities (e.g. coding, designing, discussing with the product manager, etc) divided by their total time available (assume 8 – 10 hours per day) provides the Engineering Efficiency. This is a metric designed to see how much time software developers are actually spending on developing software. This metric often surprises people. Achieving 60% or more is exceptional. We often see dev groups below 40% efficiency. This metric is useful for identifying where else engineers are spending their time. Are there too many company meetings not directly related to getting products out the door? Are you doing too many HR training sessions, etc? This metric is really for the management team to then identify what is eating up the non-development time and get rid of it.
  • Cost – Tech cost as a percentage of revenue is a good cost based metric to see how much you are spending on technology. This is very useful as it can be compared to other tech (SaaS or other web-based companies) and you can watch this metric change over time. Most startups begin with their total tech cost (engineers, hosting, etc) at well over 50% of revenue but this should quickly reduce as revenue grows and the business scales. Yes, scaling a business involves growing it cost effectively. Established companies with revenues in the tens of millions range usually have this percentage below 10%. Very large companies in the hundreds of millions in revenue often drive this down to 5-7%.

Now that we know about some of the most common metrics, how should they be used? The most common way managers and executives want to use metrics is to compare engineers to each other or compare a team over time. This works for the Efficiency and the Cost metrics, which by the way are primarily measurements of management effectiveness. Managers make most of the cost decisions including staffing, vendor contracts, etc. so they should be on the hook to improve these metrics. In terms of product out the door as measured by story points completed each sprint a.k.a. Velocity, as mentioned above, is to be used to improve estimates, not try to speed up developers. Using this metric incorrectly will just result in bloated estimates, not faster development.

An interesting comparison of developers comes from a 1967 article by Grant and Sackman in which they stated a ratio of 28:1 for the time required by the slowest versus the fastest programmer to complete a task. This has been a widely cited ratio but a paper from 2000 revised this number to 4:1 at the most and more likely 2:1. While a 2x difference in speed is still impressive it doesn’t optimize for the overall quality of the product. An engineer who is very fast and with high quality but doesn’t interact with the product managers isn’t necessarily the overall most effective. My point is that there are many other factors to be considered than just story points per release when comparing engineers.


4 comments

Agile Communication

In Agile software development methodology the communication between team members is critical. Two of the twelve principles deal with directly with this issue:

  • Business people and developers must work together daily throughout the project.
  • The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.

Despite the importance of this, time and time again we see teams that are spread out across floors and even buildings trying to work in an Agile fashion. Every foot two people are separated decreases the likelihood that they communicate directly or overhear something that they can provide input on. In physics, there is such a thing as an inverse-square law where a quantity is inversely proportional to the square of the distance from the source. Newton’s law of universal gravitation is example of an inverse-square law. I don’t believe anyone has associated this law with verbal communication but I’m convinced this is the case.

communication : 1 / (distance x distance)

This isn’t to say that remote teams can’t work. In fact, I’m actually a proponent of remote workers but I think because they are not in the building or across the street special arrangements are made like an open Skype call with the remote office. People might worry about being seen as lazy if they Skype someone across the room but across the country, that’s fine.

The take away of this is to put people working with each other as close together as possible. If you need to move peoples’ desks, do it. The temporary disruption is worth the gained communication over the length of the project.


4 comments

Measuring Technical Debt

We all are familiar with the term technical debt and how bad it can be for a company but how do we get our arms around how much technical debt we have? In this post, we’ll first address the types of technical debt then answer the question of why pay it down at all and finally jump into a few approaches for how to keep track or estimate the amount of technical debt in your system.


erase debt

Types of Technical Debt
There are at least five different types of technical debt: design, coding, testing, documentation, and bugs. Let’s cover each quickly:

  • Design – this is debt that is associated with poor or quick designs. Typically this happens when you don’t include technical operations (system administrators, DBA’s, network, security, etc) and quality assurance into the design. Often when rushed these team members get left out of the design process.
  • Coding – this is what most people think as pure technical debt…poor or sloppy coding. This could be cutting & pasting of code, creating excessively long methods instead of breaking them apart, not adhering to naming conventions, etc.
  • Testing – when you skip creating the unit tests or automated integration / regression tests this builds up debt. The interest on this debt is often paid every release in the form of extra manual testing.
  • Documentation – if only one person understands part of your code base because it’s not documented thoroughly enough this is debt. You pay for it when that one person then becomes the bottleneck for lots of projects.
  • Bugs – not always considered debt but bugs in the production code are debt that if it accumulates too much becomes a big hinderance to the usability of your service.

Why Pay Down Technical Debt?
Most people hear the term “debt” and understand the crippling affect that it can have on a nation or even one’s personal finances when left unchecked. The same is true of technical debt. When left to grow out of control you end up with scalability problems, sometimes availability issues, and much slower development of new features. As you start to pay down this debt you will gain lower maintenance costs, increased productivity, greater scalability, and often even higher availability. But before you decide to pay your debt down completely to zero you need to consider what the ROI would be and to do that you need to understand how much debt you have. Technical deb is like bugs in a SaaS product, the cost of getting them to zero isn’t worth the effort in most cases.

Qualitative Approaches
Let’s start with some qualitative ways of estimating how much technical debt you have in your system. Usually teams, especially ones with senior developers who have been around a while, know when the code is getting “spaghetti-like”. Often it just requires you asking the developers how the clean the code is being maintained. Another, less direct method, is to listen for indicators such as “don’t worry about documentation, just check it in” or “the only person that knows that is Brad” or “if I touch that everything breaks”. These are all signs that you have a lot of technical debt built up in the system and you had better start paying it down soon.

Quantitative Approaches
Some teams, however, prefer a more quantitative approach. For these teams we recommend two estimation methods. The first one is proactive, trying to keep track of debt as you accumulate it and the second is reactive, trying to estimate the amount of debt once accumulated. The first approach is for whenever you ask your team to release a feature faster than they normally would, you are incurring technical debt and therefore you should track it in a bug or feature tracking system. Put the estimate alongside the debt-feature of how long it would have taken the engineer to complete the feature as they normally would have minus the time already invested. This will provide a reasonably accurate estimate of the amount of debt.

For those that already have a bunch of technical debt but haven’t kept up with it in a tracking system, try looking at the cyclomatic complexity of the code. Eclipse has a metrics plug-in that can calculate this for you. While there is no conversion from complexity to technical debt, McCabe, the originator of the term and calculation for cyclomatic complexity, suggests that 10 is a good limit to start with and therefore anything above that should be considered an indicator of debt.

Conclusions
No matter what method you use to estimate or keep track of your technical debt the important thing is to address it before it gets out of hand.


8 comments

How to Reduce Risk

We’ve written about risk before but wanted to revisit the topic. Generally people and organizations approach risk management or mitigation from one perspective, reducing the probability of failure. We typically do this with systems by testing extensively. While this is useful there is only so much that can be found by testing in a simulated environment or with simulated users. In addition to testing, the company should consider the duration of the failure and the percentage of customers impacted, as show in the figure below.

Let’s go through each one of the items and identify what specifically you can do to accomplish these.

  • Payload Size – The smaller the change, the less risk. This is the concept behind continous deployment where every code commit is released to production, assuming it passes the automated build and test processes. While continuous deployment isn’t right for every organization the concept of releasing smaller more frequent releases to reduce risk is applicable to everyone.
  • Testing – As we stated before, you cannot test quality into a system and it is mathematically impossible to test all possibilities within complex systems to guarantee the correctness of a platform or feature. Responsibility for the quality of a feature resides with the engineer and begins with unit tests. Test driven development is the process of writing a failing automated test and then writing or modifying code in order to pass that test. There are mixed opinions about the pros and cons of TDD but anything that makes an engineer write more unit tests is likely to improve quality.
  • Monitoring – The key to monitoring is to select a few key business metrics to monitor. Per our earlier article first you need to determine if there IS a problem and then use all the other monitoring that we are used to like Nagios, New Relic, Cacti, etc to determine WHERE and WHAT is the problem.
  • Rollback – As much as we’ve written on the importance of being able to rollback code I’m not sure more that I can add except having lived through two major outages without the ability to rollback I will never push code again that can’t be rolled back.
  • Architecture – By splitting your applications and database along Y and Z axes will allow parts of the service to continue functioning should one part fail. This swim lane or fault isolation approach will provide greater availability for your overall service.

3 comments

Engineering Efficiency

Recently, several of our clients have been interested in how they can make their engineers more efficient. The need for this usually arises when someone notices that they used to deliver much more when the team was smaller. This is a very common problem because as the teams grow larger more coordination is required and technical debt builds up.

Our first recommendation is to measure. As we’ve pointed out before in our post, Data Driven Decisions, without data you are just guessing at the solution. More importantly there is no way of knowing if you are improving anything unless you have the data. We recommend you collect data on where your engineers spend time and produce the following ratio:
actual engineering time spent on development (per time period) / available engineering time (per time period)

The numerator is how much time the engineers are spending building products. What gets taken out of this number are meetings not directly associated with development (design meetings are part of the development process), tasks such as building their environments, and firefighting such as production bug fixes. The denominator includes the average time available per that period. Typically, vacation time, holidays, etc are removed from this time. Once you’ve identified this ratio you have a good idea what tasks are taking away time from engineers actually building products. When our clients calculate this they often see ratios as low as 40%.

One of the largest culprits of reduced engineering efficiency are non-product development related meetings. A simple fix for this is to set aside 4 hr blocks of no-meeting time for engineers to work. We typically recommend 8am – noon as non-meeting, noon – 2pm for meetings, and then 2pm – TBD for non-meetings. This does two things, first it gives everyone time to get actual work done and secondly it forces people to prioritize meetings and limit who should attend since they all have to occur in a 2 hr window.

Start measuring your engineers efficiency and see what you can change to make it improve.


How to Choose a Development Methodology

One of our most viewed blog entries is PDLC or SDLC, while we don’t know definitively why we suspect that technology leaders are looking for ways to improve their organization’s performance e.g. better quality, faster development. etc. Additionally, we often get asked the question “what is the best software development methodology?” or “should we change PDLC’s?” The simple answer is that there is no “best” and changing methodologies is not likely to fix organizational or process problems. What I’d like to do in this posts is 1) give you a very brief overview of the different methodologies – consider this a primer, a refresher, or feel free to skip and 2) provide a framework for considering which methodology is right for your organization.

The waterfall model is an often used software development processes that occurs in a sequential set of steps. Progress is seen as flowing steadily downwards (like a waterfall) through the phases such as Idea, Analysis, Design, Development, Testing, Implementation and Maintenance. The waterfall development model originated in the hardware manufacturing arena where after-the-fact changes are prohibitively costly. Since no formal software development methodologies existed at the time, this hardware-oriented model was adapted for software development. There are many variations on the waterfall methodology that changes the phases but all have the similar quality of a sequential set of steps. Some waterfall variations include those from Six Sigma (DMAIC, DMADV).

Agile software development is a group of software development methodologies based on iterative and incremental development. The requirements and ultimate products or services that get delivered evolve through collaboration between self-organizing, cross-functional teams. This methodology promotes adaptive planning, evolutionary development and delivery. The Agile Manifesto was introduced in 2001 and introduced a conceptual framework that promotes foreseen interactions throughout the development cycle. The time boxed iterative approach encourages rapid and flexible response to change. There are many variations of Agile including XP, Scrum, FDD, DSDM, and RUP.

With so many different methodologies available how do you decide which is right for your team? Here are a few questions that will help guide you through the decision.

1) Is the business willing to be involved in the entire product development cycle? This involvement takes the form of dedicated resources (no split roles such as running a P&L by day and being a product manager by night), everyone goes through training, and joint ownership of the product / service (no blaming technology for slow delivery or quality problems).
YES – Consider any of the agile methodologies.
NO – Bypass all forms of agile. All of these require commitment and involvement by the business in order to be successful.

2) What is the size of your development teams? Is the entire development team less than 10 people or have you divided the code into small components / services that teams of less than 10 people own and support?
YES – Consider XP or Scrum flavors of the agile methodology.
NO – Consider FDD and DSDM which are capable of scaling up to 100 developers. If you team is even larger consider RUP. Note that with agile, when the development team gets larger so does the amount of documentation and communication and this tends to make the project less agile.

3) Are your teams located in the same location?
YES – Consider any flavor of agile.
NO – While remote teams can and do follow agile methodologies it is much more difficult. If the business owners are not collocated with the developers I would highly recommend sticking with a waterfall methodology.

4) Are you hiring a lot of developers?
YES – Consider the more popular forms of agile or waterfall to minimize the ramp up time of new developers coming on board. If you really want an agile methodology, consider XP which includes paired programming as a concept, and is a good way to bring new developers up to speed quickly.
NO – Any methodology is fine.

A last important idea is that it isn’t important to follow a pure flavor of any methodology. Purist or zealots of process or technology are dangerous because a single tool in your tool box doesn’t provide the flexibility needed in the real world. Feel free to mix or alter concepts of any methodology to make it fit better in your organization or the service being provided.

There are of course counter examples to every one of these questions, in fact I can probably give the examples from our client list. These questions/answers are not definitive but they should provide a starting point or framework for how you can determine your team’s development methodology.


3 comments