AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Category » Engineering

Engineering Metrics

A topic that often results in great debate is “how to measure engineers?” I’m a pretty data driven guy so I’m a fan of metrics as long as they are 1) measured correctly 2) used properly and 3) not taken in isolation. I’ll touch on these issues with metrics later in the post, let’s first discuss a few possible metrics that you might consider using. Three of my favorite are: velocity, efficiency, and cost.

  • Velocity – This is a measurement that comes from the Agile development methodology. Velocity is the aggregate of story points (or any other unit of estimate that you use e.g. ideal days) that engineers on a team complete in a sprint. As we will discuss later, there is no standard good or bad for this metric and it is not intended to be used to compare one engineer to another. This metric should be used to help the engineer get better at estimating, that’s it. No pushing for more story points or comparing one team to another, just use it as feedback to the engineers and team so they can get more predictable in their work.
  • Efficiency – The amount of time a software developer spends doing development related activities (e.g. coding, designing, discussing with the product manager, etc) divided by their total time available (assume 8 – 10 hours per day) provides the Engineering Efficiency. This is a metric designed to see how much time software developers are actually spending on developing software. This metric often surprises people. Achieving 60% or more is exceptional. We often see dev groups below 40% efficiency. This metric is useful for identifying where else engineers are spending their time. Are there too many company meetings not directly related to getting products out the door? Are you doing too many HR training sessions, etc? This metric is really for the management team to then identify what is eating up the non-development time and get rid of it.
  • Cost – Tech cost as a percentage of revenue is a good cost based metric to see how much you are spending on technology. This is very useful as it can be compared to other tech (SaaS or other web-based companies) and you can watch this metric change over time. Most startups begin with their total tech cost (engineers, hosting, etc) at well over 50% of revenue but this should quickly reduce as revenue grows and the business scales. Yes, scaling a business involves growing it cost effectively. Established companies with revenues in the tens of millions range usually have this percentage below 10%. Very large companies in the hundreds of millions in revenue often drive this down to 5-7%.

Now that we know about some of the most common metrics, how should they be used? The most common way managers and executives want to use metrics is to compare engineers to each other or compare a team over time. This works for the Efficiency and the Cost metrics, which by the way are primarily measurements of management effectiveness. Managers make most of the cost decisions including staffing, vendor contracts, etc. so they should be on the hook to improve these metrics. In terms of product out the door as measured by story points completed each sprint a.k.a. Velocity, as mentioned above, is to be used to improve estimates, not try to speed up developers. Using this metric incorrectly will just result in bloated estimates, not faster development.

An interesting comparison of developers comes from a 1967 article by Grant and Sackman in which they stated a ratio of 28:1 for the time required by the slowest versus the fastest programmer to complete a task. This has been a widely cited ratio but a paper from 2000 revised this number to 4:1 at the most and more likely 2:1. While a 2x difference in speed is still impressive it doesn’t optimize for the overall quality of the product. An engineer who is very fast and with high quality but doesn’t interact with the product managers isn’t necessarily the overall most effective. My point is that there are many other factors to be considered than just story points per release when comparing engineers.


Agile Communication

In Agile software development methodology the communication between team members is critical. Two of the twelve principles deal with directly with this issue:

  • Business people and developers must work together daily throughout the project.
  • The most efficient and effective method of conveying information to and within a development team is face-to-face conversation.

Despite the importance of this, time and time again we see teams that are spread out across floors and even buildings trying to work in an Agile fashion. Every foot two people are separated decreases the likelihood that they communicate directly or overhear something that they can provide input on. In physics, there is such a thing as an inverse-square law where a quantity is inversely proportional to the square of the distance from the source. Newton’s law of universal gravitation is example of an inverse-square law. I don’t believe anyone has associated this law with verbal communication but I’m convinced this is the case.

communication : 1 / (distance x distance)

This isn’t to say that remote teams can’t work. In fact, I’m actually a proponent of remote workers but I think because they are not in the building or across the street special arrangements are made like an open Skype call with the remote office. People might worry about being seen as lazy if they Skype someone across the room but across the country, that’s fine.

The take away of this is to put people working with each other as close together as possible. If you need to move peoples’ desks, do it. The temporary disruption is worth the gained communication over the length of the project.


Measuring Technical Debt

We all are familiar with the term technical debt and how bad it can be for a company but how do we get our arms around how much technical debt we have? In this post, we’ll first address the types of technical debt then answer the question of why pay it down at all and finally jump into a few approaches for how to keep track or estimate the amount of technical debt in your system.

erase debt

Types of Technical Debt
There are at least five different types of technical debt: design, coding, testing, documentation, and bugs. Let’s cover each quickly:

  • Design – this is debt that is associated with poor or quick designs. Typically this happens when you don’t include technical operations (system administrators, DBA’s, network, security, etc) and quality assurance into the design. Often when rushed these team members get left out of the design process.
  • Coding – this is what most people think as pure technical debt…poor or sloppy coding. This could be cutting & pasting of code, creating excessively long methods instead of breaking them apart, not adhering to naming conventions, etc.
  • Testing – when you skip creating the unit tests or automated integration / regression tests this builds up debt. The interest on this debt is often paid every release in the form of extra manual testing.
  • Documentation – if only one person understands part of your code base because it’s not documented thoroughly enough this is debt. You pay for it when that one person then becomes the bottleneck for lots of projects.
  • Bugs – not always considered debt but bugs in the production code are debt that if it accumulates too much becomes a big hinderance to the usability of your service.

Why Pay Down Technical Debt?
Most people hear the term “debt” and understand the crippling affect that it can have on a nation or even one’s personal finances when left unchecked. The same is true of technical debt. When left to grow out of control you end up with scalability problems, sometimes availability issues, and much slower development of new features. As you start to pay down this debt you will gain lower maintenance costs, increased productivity, greater scalability, and often even higher availability. But before you decide to pay your debt down completely to zero you need to consider what the ROI would be and to do that you need to understand how much debt you have. Technical deb is like bugs in a SaaS product, the cost of getting them to zero isn’t worth the effort in most cases.

Qualitative Approaches
Let’s start with some qualitative ways of estimating how much technical debt you have in your system. Usually teams, especially ones with senior developers who have been around a while, know when the code is getting “spaghetti-like”. Often it just requires you asking the developers how the clean the code is being maintained. Another, less direct method, is to listen for indicators such as “don’t worry about documentation, just check it in” or “the only person that knows that is Brad” or “if I touch that everything breaks”. These are all signs that you have a lot of technical debt built up in the system and you had better start paying it down soon.

Quantitative Approaches
Some teams, however, prefer a more quantitative approach. For these teams we recommend two estimation methods. The first one is proactive, trying to keep track of debt as you accumulate it and the second is reactive, trying to estimate the amount of debt once accumulated. The first approach is for whenever you ask your team to release a feature faster than they normally would, you are incurring technical debt and therefore you should track it in a bug or feature tracking system. Put the estimate alongside the debt-feature of how long it would have taken the engineer to complete the feature as they normally would have minus the time already invested. This will provide a reasonably accurate estimate of the amount of debt.

For those that already have a bunch of technical debt but haven’t kept up with it in a tracking system, try looking at the cyclomatic complexity of the code. Eclipse has a metrics plug-in that can calculate this for you. While there is no conversion from complexity to technical debt, McCabe, the originator of the term and calculation for cyclomatic complexity, suggests that 10 is a good limit to start with and therefore anything above that should be considered an indicator of debt.

No matter what method you use to estimate or keep track of your technical debt the important thing is to address it before it gets out of hand.


How to Reduce Risk

We’ve written about risk before but wanted to revisit the topic. Generally people and organizations approach risk management or mitigation from one perspective, reducing the probability of failure. We typically do this with systems by testing extensively. While this is useful there is only so much that can be found by testing in a simulated environment or with simulated users. In addition to testing, the company should consider the duration of the failure and the percentage of customers impacted, as show in the figure below.

Let’s go through each one of the items and identify what specifically you can do to accomplish these.

  • Payload Size – The smaller the change, the less risk. This is the concept behind continous deployment where every code commit is released to production, assuming it passes the automated build and test processes. While continuous deployment isn’t right for every organization the concept of releasing smaller more frequent releases to reduce risk is applicable to everyone.
  • Testing – As we stated before, you cannot test quality into a system and it is mathematically impossible to test all possibilities within complex systems to guarantee the correctness of a platform or feature. Responsibility for the quality of a feature resides with the engineer and begins with unit tests. Test driven development is the process of writing a failing automated test and then writing or modifying code in order to pass that test. There are mixed opinions about the pros and cons of TDD but anything that makes an engineer write more unit tests is likely to improve quality.
  • Monitoring – The key to monitoring is to select a few key business metrics to monitor. Per our earlier article first you need to determine if there IS a problem and then use all the other monitoring that we are used to like Nagios, New Relic, Cacti, etc to determine WHERE and WHAT is the problem.
  • Rollback – As much as we’ve written on the importance of being able to rollback code I’m not sure more that I can add except having lived through two major outages without the ability to rollback I will never push code again that can’t be rolled back.
  • Architecture – By splitting your applications and database along Y and Z axes will allow parts of the service to continue functioning should one part fail. This swim lane or fault isolation approach will provide greater availability for your overall service.


Engineering Efficiency

Recently, several of our clients have been interested in how they can make their engineers more efficient. The need for this usually arises when someone notices that they used to deliver much more when the team was smaller. This is a very common problem because as the teams grow larger more coordination is required and technical debt builds up.

Our first recommendation is to measure. As we’ve pointed out before in our post, Data Driven Decisions, without data you are just guessing at the solution. More importantly there is no way of knowing if you are improving anything unless you have the data. We recommend you collect data on where your engineers spend time and produce the following ratio:
actual engineering time spent on development (per time period) / available engineering time (per time period)

The numerator is how much time the engineers are spending building products. What gets taken out of this number are meetings not directly associated with development (design meetings are part of the development process), tasks such as building their environments, and firefighting such as production bug fixes. The denominator includes the average time available per that period. Typically, vacation time, holidays, etc are removed from this time. Once you’ve identified this ratio you have a good idea what tasks are taking away time from engineers actually building products. When our clients calculate this they often see ratios as low as 40%.

One of the largest culprits of reduced engineering efficiency are non-product development related meetings. A simple fix for this is to set aside 4 hr blocks of no-meeting time for engineers to work. We typically recommend 8am – noon as non-meeting, noon – 2pm for meetings, and then 2pm – TBD for non-meetings. This does two things, first it gives everyone time to get actual work done and secondly it forces people to prioritize meetings and limit who should attend since they all have to occur in a 2 hr window.

Start measuring your engineers efficiency and see what you can change to make it improve.

Comments Off on Engineering Efficiency

How to Choose a Development Methodology

One of our most viewed blog entries is PDLC or SDLC, while we don’t know definitively why we suspect that technology leaders are looking for ways to improve their organization’s performance e.g. better quality, faster development. etc. Additionally, we often get asked the question “what is the best software development methodology?” or “should we change PDLC’s?” The simple answer is that there is no “best” and changing methodologies is not likely to fix organizational or process problems. What I’d like to do in this posts is 1) give you a very brief overview of the different methodologies – consider this a primer, a refresher, or feel free to skip and 2) provide a framework for considering which methodology is right for your organization.

The waterfall model is an often used software development processes that occurs in a sequential set of steps. Progress is seen as flowing steadily downwards (like a waterfall) through the phases such as Idea, Analysis, Design, Development, Testing, Implementation and Maintenance. The waterfall development model originated in the hardware manufacturing arena where after-the-fact changes are prohibitively costly. Since no formal software development methodologies existed at the time, this hardware-oriented model was adapted for software development. There are many variations on the waterfall methodology that changes the phases but all have the similar quality of a sequential set of steps. Some waterfall variations include those from Six Sigma (DMAIC, DMADV).

Agile software development is a group of software development methodologies based on iterative and incremental development. The requirements and ultimate products or services that get delivered evolve through collaboration between self-organizing, cross-functional teams. This methodology promotes adaptive planning, evolutionary development and delivery. The Agile Manifesto was introduced in 2001 and introduced a conceptual framework that promotes foreseen interactions throughout the development cycle. The time boxed iterative approach encourages rapid and flexible response to change. There are many variations of Agile including XP, Scrum, FDD, DSDM, and RUP.

With so many different methodologies available how do you decide which is right for your team? Here are a few questions that will help guide you through the decision.

1) Is the business willing to be involved in the entire product development cycle? This involvement takes the form of dedicated resources (no split roles such as running a P&L by day and being a product manager by night), everyone goes through training, and joint ownership of the product / service (no blaming technology for slow delivery or quality problems).
YES – Consider any of the agile methodologies.
NO – Bypass all forms of agile. All of these require commitment and involvement by the business in order to be successful.

2) What is the size of your development teams? Is the entire development team less than 10 people or have you divided the code into small components / services that teams of less than 10 people own and support?
YES – Consider XP or Scrum flavors of the agile methodology.
NO – Consider FDD and DSDM which are capable of scaling up to 100 developers. If you team is even larger consider RUP. Note that with agile, when the development team gets larger so does the amount of documentation and communication and this tends to make the project less agile.

3) Are your teams located in the same location?
YES – Consider any flavor of agile.
NO – While remote teams can and do follow agile methodologies it is much more difficult. If the business owners are not collocated with the developers I would highly recommend sticking with a waterfall methodology.

4) Are you hiring a lot of developers?
YES – Consider the more popular forms of agile or waterfall to minimize the ramp up time of new developers coming on board. If you really want an agile methodology, consider XP which includes paired programming as a concept, and is a good way to bring new developers up to speed quickly.
NO – Any methodology is fine.

A last important idea is that it isn’t important to follow a pure flavor of any methodology. Purist or zealots of process or technology are dangerous because a single tool in your tool box doesn’t provide the flexibility needed in the real world. Feel free to mix or alter concepts of any methodology to make it fit better in your organization or the service being provided.

There are of course counter examples to every one of these questions, in fact I can probably give the examples from our client list. These questions/answers are not definitive but they should provide a starting point or framework for how you can determine your team’s development methodology.


The Agile Executive

In this third installment of our “Agile Organization” series we discuss the qualities and attributes necessary for someone to lead a group of cross functional Agile teams in the development of a web-centric product.  For the purposes of this discussion, the Agile Executive is the person responsible for leading a group of agile teams in the development of a product.

In a world with less focus on functional organizations such as the one we’ve described in our Agile Organization articles, it is imperative that the leadership have a good understanding of all domains from the overall business through product management and finally each of the technical domains.  Clearly such an individual can’t be an expert in every one of these areas, but they should be deep in at least one and broad through all of them.  Ideally this would have been the case in the functional world as well, but alas functional organizations exist to support deep rather than broad domain knowledge.  In the Agile world we need deep in at least area and broad in all areas.

Such a deep yet broad individual could come from any of the functional domains.  The former head of product management may be one such candidate assuming that he or she had good engineering and operations understanding.  The head of engineering and operations may be heads of other agile teams, assuming that they have a high business acumen and good product understanding.  In fact, it should not matter whence the individual comes, but rather whether he or she has the business acumen, product savvy and technical chops to lead teams.

In our view of the world, such an individual will likely have a strong education consisting of an undergraduate STEM (science, technology, engineering or math) degree.  This helps give them the fundamentals necessary to effectively interact with engineers and add value to the engineering process.  They will also have likely attended graduate school in a business focused program such as an MBA with a curriculum that covers finance, accounting, product and strategy.  This background helps them understand the language of business.  The person will hopefully have served for at least a short time in one of the engineering disciplines as an individual contributor to help bridge the communication chasm that can sometimes exist between those who “do” and those who “dream”.  As they progress in their career, they will have taken on roles within product or worked closely with product in not only identification of near term product elements, but the strategic evaluation of product needs longer term as well.

From the perspective of philosophy, the ideal candidates are those who understand that innovation is more closely correlated with collaboration through wide networks than it is to the intelligence of one individual or a small group of people.  This understanding helps drive beneficial cognitive conflict and increased contribution to the process of innovation rather than the closed minded approach of affective conflict associated with small groups of contributors.

In summary, it’s not about whence the person comes but rather “who the person is”.  Leading cross disciplinary teams requires cross disciplinary knowledge.  As we can’t possibly experience enough personally to be effective in all areas, we must broaden ourselves through education and exposure and deepen ourselves through specific past experiences.  Most importantly, for a leader to succeed in such an environment he or she must understand that “it’s not about them” – that success is most highly correlated with teams that contribute and not with just being “wickedly smart”.

Comments Off on The Agile Executive

Technical Debt

Given the debt crises in Italy, Greece, and the US, just to name a few, the idea of debt is often parts of our conversations. As little as five years ago who would have believed that the US could double it’s debt that took 200 years to accumulate. While we all probably think this is outrageous let’s consider the debt that we accumulate within our systems. We’ve all heard and probably used the term technical debt but do we really understand how bad it can affect us and how quickly?

We often tell clients that they need to spend 12 – 25% of their engineering time on maintenance issues to pay down this debt. Failing to do so can result in some horrendous work stoppages. As an example, we had a client that has been around for a little more than a decade and had a great little profitable business. They were bringing in finance to grow their operations outside of the NY-metro area and needed help scaling. Unfortunately they had ignored refactoring, scaling, and maintenance in general for most of the time they’ve been in business. The result was that when a potential client asked to add a second email address on accounts the estimate came back at 1,500 hours…yes, that’s right 190 days or about 3/4 of a year of a developer to simply add an email field. Given that engineers typically are optimistic about how long it will take to accomplish something, imagine how long this really took.

The take away here for both our countries and our organizations is to not let debt pile up. Face the pain and balance the budget because if you don’t it only gets worse.

1 comment

Lazy Summertime

Now that summer has officially arrived, it’s time to talk about how we can justify being lazy. One of my favorite chapters in Scalability Rules is “Use Caching Aggresively.” The reason I like it so much is that it reminds me to be lazy. Yes for all you slackers here is your excuse to do as little work as possible.

Another Guincho perspective

Our first justification for being lazy comes under the category of “how to avoid work.” The best way to scale is to avoid the traffic in the first place. One way to avoid traffic is for users to never come to your site but this isn’t very desirable. The prefered solution to avoiding traffic is to utilize the many layers of caching between your persistent storage (usually a relational database) and the users’ browsers. A few of these possible caches that you can leverage are: O/S DNS cache, Browser cache, CDNs, Reverse Proxy, and Object Cache.

If one reason wasn’t enough here is another excuse to be lazy. The best way to avoid errors is that do as little work as possible. The less you do the less you can screw up. In order to do as little as possible you need to automate or script simple tasks. If you find yourself doing something repetitively such as installing packages, resetting data, or making copies consider these tasks for automation. Consider these few commands:

dd bs=65536 if=/dev/sda1 of=/dev/sdd
fsck /dev/sdd
mkdir /root/ebs-vol
mount /dev/sdd /root/ebs-vol

It’s much easier and less prone to errors to kick off a shell script than to type all these commands over and over, day after day.

So, there you go, two reasons for you to remain lazy all summer and hopefully enjoy the warm weather.

1 comment

Multi-paradigm Data Storage Architectures

We often have clients ask about one or more of the NoSQL technologies as potential replacements for their RDBMS (typically MySQL) to simplify scaling. What I think makes much more sense with regard to these NoSQL and SQL storage systems is an AND instead of an OR discussion. Consider implementing a multi-paradigm data storage layer that provides the appropriate persistent storage system for the different types of data in your application. This approach is similar to our RFM approach to data storage. Consider questions such as how often do you need the data, how quickly do you need it, how relational is the data, etc. There are at least four benefits of this multi-paradigm approach: simpler scaling, improved application performance, easier application development, and reduced cost.

The AKF Scale Cube provides a straightforward way to scale any relational database through the three axes but we know that splitting data relationships once they’ve been established isn’t easy. It requires work and lots of coordination between teams. By limiting what gets stored relationally to only the minimum that is required means fewer splits along any axis. Many of the NoSQL technologies provide auto sharding and asynchronous replication. Re-indexing keys across another node is much simpler than migrating tables into another database.

While relational databases can have great performance, unless the table is pinned in memory or the query results are cached in memory, an in memory data store will always outperform SQL. In many applications we could make use of in memory solutions like Memcache or MongoDB to improve performance of retrieving high value data.

Application Development
As Debasish Ghosh states in his article Multiparadigm Data Storage for Enterprise Applications, storing data the way it is used in an application, simplifies programming and makes it easier to decentralize data processing. If the application treats the data as a document why break it apart to store it relationally when we have viable document storage engines. Storing the data in a more native format allows for quicker development.

For data that’s not needed often, cache it in other places (such as a CDN) or lazy load it from a low cost storage tier such as Amazon’s S3. This might work well for applications hosted in the cloud. The benefit of this a lower cost per byte stored, especially when considering all costs including administrators for the more complex data storage systems such as relational databases.

A final step in implementing a multi-paradigm data storage layer is an asynchronous message queue for data that needs to move up and down the stack. Implementing ActiveMQ or RabbitMQ to asynchronously move data from one layer to another as needed relieves the application of this burden. As an example consider an application that routes picking baskets for inventory in a warehouse. This is typically thought of as a graph with bins of inventory as nodes and the aisles as edges. For faster retrieval you could store this in a graph database such as Neo4J for ease of development and performance reasons. You could then asynchronously persist these maps in a MySQL database for reporting and older versions into an S3 bucket for historic archiving. This combination provides faster performance, easier development, simpler scaling, and reduced cost.

Comments Off on Multi-paradigm Data Storage Architectures