AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Category » Engineering

Communicating Across Swim-lanes

We often emphasize the need to break apart complex applications into separate swim-laned services. Doing so not only improves architectural scalability but also permits the organization to scale and innovate faster as each separate team can operate almost as its own independent startup.

Ideally, there would be no need to pass information between these services and they could be easily stitched together on the client-side with a little HTML for a seamless user experience. However, in the real world, you’re likely to find that some message passing between services is needed (If only to save your users the trouble of not entering information twice).

Every cross-service call adds complexity to your application. Often times, teams will implement internal APIs when cross-service communication is needed. This is a good start and solves part of the problem by formalizing communication and discouraging unnecessary cross-service calls.

However, APIs alone don’t address the more important issue with cross-service communication — the reduction in availability. Anytime one service synchronously communicates with another you’ve effectively connected them in series and reduced overall availability. If Service-A calls Service-B synchronously, and Service-B crashes or slows to a crawl, Service-A will do the same. To avoid this trap, communication needs to take place asynchronously between services. This, in fact, is how we define “swim-laning”.

So what are the best means to facilitate asynchronous cross-service communication and what are the pro and cons of each method?

Client Side

Information that needs to be shared between services can be passed via JavaScript & JSON in the browser. This has the advantage of keeping the backend services completely separate and gives you the option to move business logic to the client side, thus reducing load on your transactional systems.

The downside, however, is the increased security risks. It doesn’t take an expert hacker to manipulate variables in JavaScript. (Just think, $price_var gets set to $0 somewhere between the shopping cart and checkout services). Furthermore, data is passed back to the server-side these cross-service calls will now suffer from the same latency and reliability issues as any TCP call over the internet.

Message Bus / Enterprise Service Bus

Message buses and enterprise service buses provide a ready means to transmit messages between services asynchronously in pub/sub model. Advantages include providing a rich set of features for tracking, manipulating, and delivering messages, as well as the ability centralized logging and monitoring. However, the potential for congestion as well as occasional message loss makes them less desirable in many cases than asynchronous point-to-point calls (discussed below).

To limit congestion, it’s best to implement several independent channels and limit message bus traffic to events that will have multiple subscribers.

Asynchronous Point-to-Point

Point-to-point communication (i.e. one service directly calling another) is another effective means of message passing. Advantages include simplicity, speed, and reliability. However, be sure to implement this asynchronously with a queuing mechanism with timeouts, retries (if needed), and exceptions handled in the event of service failure. This will prevent failures from propagating across service boundaries.

If properly implemented, asynchronous point-to-point communication is excellent for invoking a function in another service or transmitting a small amount of information. This method can be implemented for the majority of cross-service communication needs. However, for larger data transfers, you’ll need to consider one of the methods below.

ETL

ETL jobs can be used to move a large amount of data from one datastore to another. Since these are generally implemented as separate batch processes they won’t bring down the availability of your services. However, drawbacks include increased load on transactional databases (unless a read replica of the source DB is used) and the poor timeliness/consistency of data resulting from periodic batch process.

ETL processes are likely best reserved for transferring data from your OLTP to OLAP systems, where immediate consistency isn’t required. If you need both a large amount of data transferred and up to the second consistency consider a DB read replica.

DB Read-Replica

Most common databases (Oracle, MySQL, PostgreSQL) support native replication of DB clones with replication lag measured in milliseconds. By placing a read replica of one service’s DB into another service’s swim-lane, you can successfully transfer a large amount of data in near real-time. The downsides are increased IOPS requirements and a lack of abstraction between services that — in contrast to the abstraction provided by an API — fosters greater service interdependency.

Conclusion

In conclusion, you can see there are a variety of cross-service communication methods that permit fault-isolation between services. The trick is knowing the strengths and weaknesses of each and implementing the right method where appropriate.


Comments Off on Communicating Across Swim-lanes

Making Agile Predictable

One of the biggest complaints that we hear from businesses implementing agile processes is that they can’t predict when things will get delivered. Many people believe, incorrectly, that waterfall is a much more predictable development methodology. Let’s review a few statistics that demonstrate the “predictability” of the waterfall methodology.

In a study of over $37 billion (USD) worth of US Defense Department projects concluded that: 46% of the systems so egregiously did not meet the real needs (although they met the specifications) that they were never successfully used, and another 20% required extensive rework (Larman, 1995).

In another study of 6,700 projects it was found that four out of five key factors contributing to project failure were associated with or aggravated by the waterfall method, including inability to deal with changing requirements and problems with late integration.

Waterfall is not a bad or failed methodology, it’s just a tool like any other that has its limitations. We as the users of that tool misused it and then blamed the tool. We falsely believed that we could fix the project scope (specifications), cost (team size), and schedule (delivery date). No methodology allows for that. As shown in the diagram below, waterfall is meant to fix scope and cost. When we also try to constrain the schedule we’re destined to fail.

fig1

Agile when used properly fixes the cost (team size) and the schedule (2 week sprints), allowing the scope to vary. This is where most organizations struggle, attempting to predict delivery of features when the scope of stories is allowed to vary. As a side note, if you think you can fix the scope and the schedule and vary the cost (team size) read Brooke’s 1975 book The Mythical Man-Month.

This is where the magical measurement of velocity comes in to play. The velocity of a team is simply the number of units of work completed during the sprint. The primary purpose of this measurement is to provide the team with feedback on their estimates. As you can see in the graph below it usually takes a few sprints to get into a controlled state where the velocity is predictable and then it usually rises slightly over time as the team becomes more experienced.

fig2

Using velocity we can now predict when features will be delivered. We simply project out the best and worst velocities and we can demonstrate with high certainly a best and worst delivery date for a set of stories that make up a feature.

fig3

Velocity helps us answer two types of questions. The first is the fixed scope question “when will we have X feature?” to which the answer is “between A and B dates”. The second question is the fixed time question “what will be delivered by the June releases?” to which the answer is “all of this, some of that, and none of those.” What we can’t answer is fixed time and fixed scope questions.

fig4

It’s important to remember is that agile is not a software development methodology, but rather a business process. This means that all parts of your organization must buy-in to and participate in agile product development. Your sales team must get out of the mindset of committing to product new features with fixed time and scope when talking to existing or potential customers. When implemented correctly, agile provides faster time to market and higher levels of innovation than waterfall, which brings greater value to your customers. The tradeoff from a sales side is to change behavior from making long-term product commitments as they did in the past (but were more often than not missed anyway)!

By utilizing velocity, keeping the team consistent, and phrasing the questions properly, agile can be a very predictable methodology. The key is understanding the constraints of the methodology and working within them instead of ignoring them and blaming the methodology.


1 comment

Paying Down Your Technical Debt

During the course of our client engagements, there are a few common topics or themes that are almost always discussed, and the clients themselves usually introduce them. One such area is technical debt. Every team has it, almost every team believes they have too much of it, and no team has an easy time explaining to the business why it’s important to address. We’re all familiar with the concept of technical debt, and we’ve shared a client horror story in a previous blog post that highlights the dangers of ignoring it: Technical Debt

TDPicture1

When you hear the words “technical debt”, it invokes a negative connotation. However, the judicious use of tech debt is a valuable addition to your product development process. Tech debt is analogous to financial debt. Companies can raise capital to grow their business by either issuing equity or issuing debt. Issuing equity means giving up a percentage of ownership in the company and dilutes current shareholder value. Issuing debt requires the payment of interest, but does not give up ownership or dilute shareholder value. Issuing debt is good, until you can’t service it. Once you have too much debt and cannot pay the interest, you are in trouble.

Tech debt operates in the same manner. Companies use tech debt to defer performing work on a product. As we develop our minimum viable product, we build a prototype, gather feedback from the market, and iterate. The parts of the product that didn’t meet the definition of minimum or the decisions/shortcuts made during development represent the tech debt that was taken on to get to the MVP. This is the debt that we must service in later iterations. Taking on tech debt early can pay big dividends by getting your product to market faster. However, like financial debt, you must service the interest. If you don’t, you will begin to see scalability and availability issues. At that point, refactoring the debt becomes more difficult and time critical. It begins to affect your customers’ experience.

TDPicture2

Many development teams have a hard time convincing leadership that technical debt is a worthy use of their time. Why spend time refactoring something that already “works” when you could use that time to build new features customers and markets are demanding now? The danger with this philosophy is that by the time technical debt manifests itself into a noticeable customer problem, it’s often too late to address it without a major undertaking. It’s akin to not having a disaster recovery plan when a major availability outage strikes. To get the business on-board, you must make the case using language business leaders understand – again this is often financial in nature. Be clear about the cost of such efforts and quantify the business value they will bring by calculating their ROI. Demonstrate the cost avoidance that is achieved by addressing critical debt sooner rather than later – calculate how much cost would be in the future if the debt is not addressed now. The best practice is to get leadership to agree and commit to a certain percentage of development time that can be allocated to addressing technical debt on an on-going basis. If they do, it’s important not to abuse this responsibility. Do not let engineers alone determine what technical debt should be paid down and at what rate – it must have true business value that is greater than or equal to spending that time on other activities.

TDPicture3

Additionally, be clear about how you define technical debt so time spent paying it down is not comingled with other activities. Generally, bugs in your code are not technical debt. Refactoring your code base to make it more scalable, however, would be. A good test is to ask if the path you chose was a conscious or unconscious decision. Meaning, if you decided to go in one direction knowing that you would later need to refactor. This is also analogous to financial debt; technical debt needs to be a conscious choice. You are making a specific decision to do or not to do something knowing that you will need to address it later. Bugs are found in sloppy code, and that is not tech debt, it is just bad code.

So how do you decide what tech debt should be addressed and how do you prioritize? If you have been tracking work with Agile storyboards and product backlogs, you should have an idea where to begin. Also, if you track your problems and incidents like we recommend, then this will show elements of tech debt that have begun to manifest themselves as scalability and availability concerns. We recommend a budget of 12-25% of your development efforts in servicing tech debt. Set a budget and begin paying down the debt. If you are working on less than the lower range, you are not spending enough effort. If you are spending over 25%, you are probably fixing issues that have already manifested themselves, and you are trying to catch up. Setting an appropriate budget and maintaining it over the course of your development efforts will pay down the interest and help prevent issues from arising.

Taking on technical debt to fund your product development efforts is an effective method to get your product to market quicker. But, just like financial debt, you need to take on an appropriate amount of tech debt that you can service by making the necessary interest and principle payments to reduce the outstanding balance. Failing to set an appropriate budget will result in a technical “bankruptcy” that will be much harder to dig yourself out of later.


Comments Off on Paying Down Your Technical Debt

When Should You Split Services?

The Y axis of the AKF Scale Cube indicates that growing companies should consider splitting their products along services (verb) or resources (noun) oriented boundaries. A common question we receive is “how granular should one make a services split?” A similar question to this is “how many swim lanes should our application be split into?” To help answer these questions, we’ve put together a list of considerations based on developer throughput, availability, scalability, and cost. By considering these, you can decide if your application should be grouped into a large, monolithic codebases or split up into smaller individual services and swim lanes. You must also keep in mind that splitting too aggressively can be overly costly and have little return for the effort involved. Companies with little to no growth will be better served focusing their resources on developing a marketable product than by fine tuning their service sizes using the considerations below.

 

Developer Throughput:

Frequency of Change – Services with a high rate of change in a monolithic codebase cause competition for code resources and can create a number of time to market impacting conflicts between teams including product merge conflicts. Such high change services should be split off into small granular services and ideally placed in their own fault isolative swim lane such that the frequent updates don’t impact other services. Services with low rates of change can be grouped together as there is little value created from disaggregation and a lower level of risk of being impacted by updates.

The diagram below illustrates the relationship we recommend between functionality, frequency of updates, and relative percentage of the codebase. Your high risk, business critical services should reside in the upper right portion being frequently updated by small, dedicated teams. The lower risk functions that rarely change can be grouped together into larger, monolithic services as shown in the bottom left.

frequency_v_functionality

Degree of Reuse – If libraries or services have a high level of reuse throughout the product, consider separating and maintaining them apart from code that is specialized for individual features or services. A service in this regard may be something that is linked at compile time, deployed as a shared dynamically loadable library or operate as an independent runtime service.

Team Size – Small, dedicated teams can handle micro services with limited functionality and high rates of change, or large functionality (monolithic solutions) with low rates of change. This will give them a better sense of ownership, increase specialization, and allow them to work autonomously. Team size also has an impact on whether a service should be split. The larger the team, the higher the coordination overhead inherent to the team and the greater the need to consider splitting the team to reduce codebase conflict. In this scenario, we are splitting the product up primarily based on reducing the size of the team in order to reduce product conflicts. Ideally splits would be made based on evaluating the availability increases they allow, the scalability they enable or how they decrease the time to market of development.

Specialized Skills – Some services may need special skills in development that are distinct from the remainder of the team. You may for instance have the need to have some portion of your product run very fast. They in turn may require a compiled language and a great depth of knowledge in algorithms and asymptotic analysis. These engineers may have a completely different skillset than the remainder of your code base which may in turn be interpreted and mostly focused on user interaction and experience. In other cases, you may have code that requires deep domain experience in a very specific area like payments. Each of these are examples of considerations that may indicate a need to split into a service and which may inform the size of that service.

 

Availability and Fault Tolerance Considerations:

Desired Reliability – If other functions can afford to be impacted when the service fails, then you may be fine grouping them together into a larger service. Indeed, sometimes certain functions should NOT work if another function fails (e.g. one should not be able to trade in an equity trading platform if the solution that understands how many equities are available to trade is not available). However, if you require each function to be available independent of the others, then split them into individual services.

Criticality to the Business – Determine how important the service is to business value creation while also taking into account the service’s visibility. One way to view this is to measure the cost of one hour of downtime against a day’s total revenue. If the business can’t afford for the service to fail, split it up until the impact is more acceptable.

Risk of Failure – Determine the different failure modes for the service (e.g. a billing service charging the wrong amount), what the likelihood and severity of each failure mode occurring is, and how likely you are to detect the failure should it happen.   The higher the risk, the greater the segmentation should be.

 

Scalability Considerations:

Scalability of Data – A service may be already be a small percentage of the codebase, but as the data that the service needs to operate scales up, it may make sense to split again.

Scalability of Services – What is the volume of usage relative to the rest of the services? For example, one service may need to support short bursts during peak hours while another has steady, gradual growth. If you separate them, you can address their needs independently without having to over engineer a solution to satisfy both.

Dependency on Other Service’s Data – If the dependency on another service’s data can’t be removed or handled with an asynchronous call, the benefits of disaggregating the service probably won’t outweigh the effort required to make the split.

 

Cost Considerations:

Effort to Split the Code – If the services are so tightly bound that it will take months to split them, you’ll have to decide whether the value created is worth the time spent. You’ll also need to take into account the effort required to develop the deployment scripts for the new service.

Shared Persistent Storage Tier – If you split off the new service, but it still relies on a shared database, you may not fully realize the benefits of disaggregation. Placing a read-only DB replica in the new service’s swim lane will increase performance and availability, but it can also raise the effort and cost required.

Network Configuration – Does the service need its own subdomain? Will you need to make changes load balancer routing or firewall rules? Depending on the team’s expertise, some network changes require more effort than others. Ensure you consider these changes in the total cost of the split.

 

 

The illustration below can be used to quickly determine whether a service or function should be segmented into smaller microservices, be grouped together with similar or dependent services, or remain in a multifunctional, infrequently changing monolith.

decision_matrix


Comments Off on When Should You Split Services?

The Downside of Stored Procedures

I started my engineering career as a developer at Sybase, the company producing a relational database going head-to-head with Oracle. This was in the late 80’s, and Sybase could lay claim to a very fast RDBMS with innovations like one of the earliest client/server architectures, procedural language extensions to SQL in the form of Transact-SQL, and yes, stored procedures. Fast forward a career, and now as an Associate Partner with AKF, I find myself encountering a number of companies that are unable to scale their data tier, primarily because of stored procedures. I guess it is true – you reap what you sow.

Why does the use of stored procedures (sprocs) inhibit scalability? To start with, a database filled with sprocs is burdened by the business logic contained in those sprocs. Rather than relatively straightforward CRUD statements, sprocs seem to typically become the main application ‘tier’, containing 100s of lines of SQL within one sproc. So, in addition to persisting and retrieving your precious data, maintaining referential integrity and transactional consistency, you are now asking for your database to run the bulk of your application as well. Too many eggs in one basket.

And, that basket will remain a single basket, as it is quite difficult to horizontally scale a database filled with sprocs. Out of the 3 axes described in the AKF scalability cube, the only axis that is an option to readily achieve horizontal scalability with is the Z axis, where you have divided or partitioned your data across multiple shards. The X axis, the classic horizontal approach, works only if you are able to replicate your database across multiple read-only replicas, and can segregate read-only activity out of your application, difficult to do if many of those reads are coming from sprocs that also write. Additionally, most applications built on top of a sproc-heavy system have no DAL, no data access layer that might control or route read access to a different location than writes. The Y axis, dividing your architecture up into services, is also difficult to do in a sproc-heavy environment, as these environments are often extremely monolithic, making it very difficult to separate sprocs by service.

Your database hardware is typically the beefiest, most expensive box you will have. Running business logic on it vs. on smaller, commodity boxes typically used for an application tier is simply causing the cost of computation to be inordinately high. Also, many companies will deploy object caching (e.g. memcached) to offload their database. This works fine when the application is separate from the database, but when the reads are primarily done in sprocs? I’d love to see someone attempt to insert memcached into their database.

Sprocs seem to be something of a gateway drug, where once you have tasted the flavor of application logic living in the database, the next step is to utilize abilities such as CLR (common language runtime). Sure, why not have the database invoke C# routines? Why not ask your database machine to also perform functions such as faxing? If Microsoft allows it, it must be good, right?

Many of the companies we visit that are suffering from stored procedure bloat started out building a simple application that was deployed in a turnkey fashion for a single customer with a small number of users. In that world, there are certainly benefits to using sprocs, but that world was 20th century and no longer viable in today’s world of cloud and SaaS models.

Just say no to stored procedures!


Comments Off on The Downside of Stored Procedures

Selecting Metrics for Your Agile Teams

One of our favorite sayings is “you can’t improve that which you do not measure.” When working with clients, we often emphasize the need to select and track performance metrics. It’s quite surprising (disheartening really) to see how many companies are limping along with decision-making based entirely on intuition. Metrics-driven institutions demonstrably outperform those that rely on “gut feel” and are able to quickly refocus efforts on projects that offer the greatest ROI.

Just as your top-level business KPIs govern strategic decision making, your agile teams (and their respective services) need their own “tactical” metrics to focus efforts, guide decision making, and make performance measurable. The purpose of agile development is to deliver high quality value to your customers in an iterative fashion. Agile facilitates rapid deployment, but also allows you to garner feedback from your customers that will shape the product. Absent a set of KPIs, you will never truly understand the effectiveness of your process. Getting it right, however, isn’t an easy task. Poorly chosen metrics won’t reflect the quality of service, will be easily gamed, or difficult to track, and may result in distorted incentives or undesirable outcomes.

In contrast, well-chosen metrics make it simple to track to performance and shape team incentives. For example, a search service could be graded against the speed and accuracy of search results while the shopping cart service is measured on the percentage of abandoned shopping carts. These metrics are simple, easy to track, difficult to game, and directly reflect the quality of service.

Be sure to dedicate the time and the mental energy needed to pick the right metrics. Feedback from team members is essential but the final selection isn’t something you can delegate. After all, If I’m allowed to pick my own performance metrics — I can assure you I’m always going to look awesome.

To keep you on the right track, below is a checklist of considerations to take into account before finalizing the selection of metrics for your agile teams:

  1. A handful of carefully chosen metrics should be preferred over a large volume of metrics. Ideally, each Agile team (and respective service) should be evaluated/tasked with improving 2-3 metrics (no more than 5). We have witnessed at least one company that proposed a total of 20 different metrics to measure an agile teams performance! Needless to say, being graded on that many metrics is disempowering at best, and likely to illicit either a panic attack or total apathy from your team. In contrast, having only a handful metrics to be graded against is empowering and helps to focus efforts.
  2. Easy to collect and or calculate. One startup suggested they would track “Engineering Hours Spent Bug-Fixing” as a way to determine code quality. The issue was quickly raised: Who would be doing this tracking? And how much time/effort did they estimate it would take?  It became obvious that tracking the exact amount of time spent would add a heavy productivity-tax to an already burdened engineering team.  While providing a very granular measure, the cost of collecting this information simply outweighed the benefits.  Ultimately we helped them decide that the “Number of Customer Service Tickets per Week” was the right metric. Sometimes a cruder measure is the right choice, especially if it is easier to collect and act upon.
  3. Directly Controllable by the Team. Choose metrics that your agile team has more or less direct control over. A metric they contribute towards indirectly is less empowering than something they know they own. For example, when measuring a search service the “Speed and Accuracy of Search” is preferable to “Overall Revenue” which the team only indirectly controls.
  4. Reflect the Quality of Service. Be sure to pick metrics that reflect the quality of that service. For instance, the number of abandoned shopping carts reflects the quality of a shopping cart service, whereas number of shopping cart views is an input metric but doesn’t necessarily reflect service quality.
  5. Difficult to Game. The innate human tendency to game any system should be held in check by selecting metrics that can’t easily be gamed. Simple velocity measures are easily (read: notoriously) gamed while the number of “Severity 1” incidents your service invoked can’t be so easily massaged.
  6. Near Real Time Feedback. Metrics than can be collected and presented over short-time intervals are the most actionable. Information is more valuable when fresh — providing availability data weekly (or even daily) will foster better results than a year-end update.

Most importantly, well-chosen metrics tracked regularly should pervade all aspects and all levels of your business. If you want your business to become a lean, performance driven machine, you need to step on the scale every day. It can often be uncomfortable, but it’s necessary to get the returns of which you are capable.


Comments Off on Selecting Metrics for Your Agile Teams

Common Cloud Misconceptions

Over the course of the last year, we have seen several of our clients either start exploring or make plans to move their SaaS products to the “Cloud” or an IaaS provider. We thought we would share some of the misconceptions we sometimes see and our advice.

– I can finally focus on product development and software engineering and not worry about this infrastructure stuff.
The notion that IaaS providers like Amazon have eliminated your worries about infrastructure is only partially true. You may not need to understand everything about designing an infrastructure with bare metal but you need to make sure you understand how your virtual configuration in the Cloud will affect your product. IaaS helps us to quickly deploy infrastructure for our products but it doesn’t eliminate the need for good high availability and fault tolerance design. There are several levers you can pull within an IaaS console and design decisions that will impact your products performance. To ensure good design and configuration, its ultra important that your SaaS product engineering team is made up of talent that has expertise in distributed application architecture design, infrastructure, security, and networking. Having this knowledge will help you design a high performing, fault isolated product for your business.

– Going to the cloud pretty much guarantees me high availability because of auto scaling.
Going to the cloud will provide you with the ability to scale quickly as load increases but it will not provide you with high availability. For example, if you have a monolithic code base that you deployed and you are pushing to production on a regular basis, there is a pretty good chance you will introduce a defect at some point that impacts the availability of your entire service and business. We advise our clients to split their applications appropriately, deploy the services to separate instances, and, assuming you are using Amazon, configure them to run across multiple zones within a region at a minimum (preferably across regions). This allows you to focus dedicated teams to the individual services and reduce the likelihood of introducing a defect that takes down your system.

– The Cloud will be cheaper than a collocation or managed hosting provider.
There are several factors that need to be considered before you can confirm that is cost effective. You should look closely at load on your servers. If your servers are not serving traffic around the clock, it may be better from a cost perspective for you to buy and maintain your own infrastructure in a collocation or in an existing data center you may have. The economics of this decision is changing rapidly as IaaS pricing is declining due to the competition in the industry. A simple spreadsheet exercise will help determine if the move to Cloud would be cost effective for your business.

– The Cloud isn’t secure so we better not use it.
The cloud isn’t necessarily what makes or breaks security around your SaaS product. Many believe that public cloud services like Amazon’s EC2 service isn’t secure. First off, you are far more likely to experience a security breach because of an employee’s actions (either intentionally or unintentionally) than caused by an infrastructure provider. Secondly, Amazon likely has invested much more in security at various layers, including their physical data centers, than most companies we see who have their own data centers. Amazon has designed the infrastructure to isolate customer instances and you can also choose to take advantage of Amazon Virtual Private Cloud that can be configured to create an isolated network. There are various options for encrypting all of your data as well. This only touches the surface for security design options you have and they continue to be enhanced everyday. You can see why it’s important to staff your team with an engineer who has experience in this space.

If you are looking to move to Cloud, don’t rush into the decision. Do your homework and make sure it’s right for your business. Make sure you have the talent that has experience with the technology that will get you there and run your operations. Once you make the leap, you will have to live with it for a while.


Comments Off on Common Cloud Misconceptions

Enablement

One of the most important aspects of managing a successful technology organization is ensuring that you are practicing & instilling the concept of enablement at all levels. This concept applies to both the product/service you are producing and for people. A good example for your organization is enabling decision-making at the lowest levels possible. I have often seen this represented as “delegation” but I believe that enablement of decision-making is a more powerful concept than delegation which is driven from the top-down. I recently had the opportunity to lead a large infrastructure team and one of the first changes we made was breaking apart into reasonable sized PODs with the primary purpose of ensuring that decisions for the product & technology were driven from the bottom-up while guidance was flowing in from various stakeholders. Many teams practice a flavor of Agile but without enabling each POD to make the appropriate decisions you will run into organizational scalability problems.

The allure of IaaS & PaaS is firmly rooted in the concept of enablement. Self-service is an amazing if you are a DBA, developer or even the end user of your product. The cloud may not be suitable for your needs but don’t let that stop your organization from thinking strategically about bringing those processes & technologies “in-house” for scalability reasons. Implemented correctly, the infrastructure and platform you are building should enable the users and not hinder them, as is sometimes the case. Reducing the number of dependencies between technology teams for launching products is not only good for cycle time of product launches but also critical in scaling up.

Consider making enablement part of your technology team’s DNA and you will likely see that employee morale, productivity and other metrics like NPS will rise.


The Biggest Mistake with Agile

At least 75% of the dev shops that we see are using some form of Agile. Very few are following a pure form of any specific flavor i.e. Scrum, Extreme Programming, etc. but rather most are using some hybrid method. Some teams measure velocity while other don’t. Some teams have dedicated ScrumMasters while others have the engineering managers perform this role. While most team’s processes could be tweaked, none of these are the real problem.

Scrum team

The biggest mistake companies make implementing Agile and thus the cause of most of their problems is they don’t understand that Agile is a business process, not a software development methodology. Thus, the business owners or their delegates, product managers, must be involved at every step.

We’ve argued before that Agile teams must sit together because communication degrades at a rate of square the distance. Not having product managers with the Agile team involved in the entire process and (if you’ve moved from a Waterfall methodology) not having detailed specifications, is the worst possible scenario. Developers either need someone siting beside them to help with product decisions (Agile) or a detailed spec to work from (Waterfall).

Agile is a business process which requires the business to be involved in the product development process. It does not mean you get to stop writing specs and not be involved.


2 comments

Quality Products

The question was raised to us recently, how can you tell if you are building quality products? I think this is an interesting question because you can answer it from two different perspectives – internally or externally. Quality is a fairly nebulous bucket that can include simple bugs, broken functionality, poorly designed flows, and even performance problems. For this discussion we can borrow the ISO 8402-1986 definition of quality as “the totality of features and characteristics of a product or service that bears its ability to satisfy stated or implied needs.” In short, if our service or product is not meeting the needs of our customers then it is falling short with regards to quality.

The external perspective of quality comes from your customers. Signs of poor quality products include customers leaving, customers complaining (on forums or to customer service reps), or even slow adoption rates by new customers. All of these are sure signs that you’re doing something wrong. If you haven’t revamped major functionality or a competitor hasn’t launched some new feature set then functionality is probably not to blame and you should look at the overall quality of your product. The problem with using customers to verify the quality of your product is that once they’ve experienced poor quality, it’s too late. They’ve already had the bad experience and might be considering leaving or not recommending your product to friends.

The internal perspective of quality includes what your teams are doing to ensure they are delivering a high quality product. This does not just include QA and in fact, you cannot test quality into the product. Quality relies much more on good product management and solid engineering than testing. Here are a few processes that tend to improve quality and indicate that you are likely building quality products:

  • Cross Functional Design – Whether you are following JAD/ARB processes or DevOps, getting technical operations engineers, quality assurance engineers, and software developers together to design will produce a much higher quality and more scalable product.
  • Coding Standards – This includes branching strategies, documentation requirements, formatting (e.g. tabs vs spaces), variable naming conventions, patterns, frameworks, technology stacks, and a defined process for introducing new standards or questioning the use of existing ones.
  • Unit Tests – In my opinion, one of the most important measures of quality is the creation of unit tests. Achieving a 75% or greater code coverage in unit tests seems to indicate the delivery of a higher quality product. Creating these tests ahead of time in a Test Driven Development (TDD) methodology is even better.
  • Code Reviews – One of AKF’s mantras is “Everything Fails” and this includes people. Buddy checking code is a simple way to help catch these failures. It also serves to help spread knowledge across the team about different ways of coding and solving problems.

We all rely on customers to provide feedback on our product’s quality to some degree but we are better off building and designing quality into our products in the first place. Focus on the internal perspective of quality and then use the external perspective to validate your insights.


Comments Off on Quality Products