AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

AKF Turns 10 – And It’s Still Not About the Tech

The caller ID was blocked but Marty had been expecting the call.  Three “highly connected” people – donors, political advisers and “inner circle” people –  had suggested AKF could help. It was October 2013 and Healthcare.gov had launched only to crash when users tried to sign up. President Obama appointed Jeffrey Zients to mop up the post launch mess. Once the crisis was over, the Government Accountability Office (GAO) released its postmortem citing inadequate capacity planning, software coding errors, and lack of functionality as root causes. AKF’s analysis was completely different – largely because we think differently than most technologists. While our findings indicated the bottlenecks that kept the site from scaling, we also identified failures in  leadership and a dysfunctional organization structure.  These latter, and more important, problems prevented the team from identifying and preventing recurring issues.

We haven’t always thought differently. Our early focus in 2007 was to help companies overcome architectural problems related to scale and availability. We’ve helped our clients solve some of the largest and challenging problems ever encountered – cyber Monday ecommerce purchasing, Christmas day gift card redemption, and April 15th tax filings. But shortly after starting our firm, we realized there was something common to our early engagements that created and sometimes turbocharged the technology failures. This realization, that people and processes – NOT TECHNOLOGY–  are the causes of most failures led us to think differently.  Too often we see technology leaders focusing too much on the technology and not enough on leading, growing, and scaling their teams.

We challenge the notion that technology leaders should be selected and promoted based on their technical acumen. We don’t accept that a technical leader should spend most of her time making the biggest technical decisions.  We believe that technical executives, to be successful, must first be a business executive with great technical and business acumen.  We teach teams how to analyze and successfully choose the appropriate architecture, organization, and processes to achieve a business outcome. Product effort is meaningless without a measurable and meaningful business outcome and we always put outcomes, not technical “religion” first.

If we can teach a team the “AKF way” the chance of project and business success increases dramatically. This may sound like marketing crap (did we mention we are also irreverent?), but our clients attest to it.  This is what Terry Chabrowe, CEO eMarketer, said about us:

AKF served as our CTO for about 8 months and helped us make huge improvements in virtually every area related to IT and engineering. Just as important, they helped us identify the people on our team who could move into leadership positions. The entire AKF team was terrific. We’d never have been able to grow our user base tenfold without them.

A recent post claimed that 93% of successful companies abandon their original strategy.  This is certainly true for AKF. Over the past 10 years we’ve massively changed our strategy of how we “help” companies. We’ve also quadrupled our team size, worked with over 350 companies, written three books, and most importantly made some great friendships. Whether you’ve read our books, engaged with our company, or connected with us on social media, thanks for an amazing 10 years. We look forward to the next 10 years, learning, teaching, and changing strategies with you.

1 comment

AKF Interim Leadership Case Study

Read about one of our success stories in which we filled an interim CTO role at a marketing subscription company in New York.

AKF Long-term Case Study

Comments Off on AKF Interim Leadership Case Study

Big Data Anti-Patterns

Recent research by Gartner indicates that while Big Data investment continues to grow, the intent to invest in Big Data projects is finally showing signs of tapering. While this is a natural part of the hype-cycle, poor ROI on Big Data projects continues to impact the industry. Gartner sites the effective “lack of business leadership” around Big Data initiatives as a primary cause. We’ve kept a watchful eye on these Big Data trends to better serve our clients.

In a recent presentation, Marty Abbott addresses the struggles of getting Big Data Analytics right. There he identifed seven anti-patterns that hamper the proper implementation of Big Data initiatives and provided suggested remedies for each.

Anti-Pattern #1 — Assuming you know the answers.

By setting out to confirm a particular hypothesis, you end up building an analytics system hard-wired to answer a narrow set of questions. This lack of extensibility and blinds you from achieving deeper insights.

Gain greater insights by focusing on correlations without presumption and adopt an analytics process that follows continuous cycle induction, hypothesis, deduction, and data creation. (see hermeneutic cycle.)

Anti-Pattern #2 — No QA in analytics.

Big Data analytics systems are just as prone to bugs and defects as production systems, if not more so. All too often Analytics systems are designed for perfect execution.

Improve QA by building a pre-processing, cleaning stage in your data flow while retaining raw immutable data elsewhere. Maintain prior results and implement the ability to run analytics over your entire dataset to reverify results.

Anti-Pattern #3 — Using production Engineers to run your analytics.
Big Data involves different skills sets, different technologies, and, perhaps most importantly, a different mindset from production operations. Data scientists and analytics experts are explorers who theorize correlations, while production developers are operationalists, making pragmatic decisions to keep systems running.

Improve effectiveness by creating a separate team focused on implementing analytics architecture.

Anti-Pattern #4 — Using OLTP Datastore for Analytics

Using your transactional datastore for analytics will impact the performance of your operational systems. Furthermore, transactional and analytics data have different requirements for normalization and response time latency.

Separate analytics from transactional systems by using a separate Operational Datastore and ingesting data from your application in an asynchronous fashion.

Anti-Pattern #5 — Ignoring Performance Requirements

Performance matters. The longer you take to produce results the more out of date they become. Furthermore, no one can leverage results if they can’t quickly and easily access them.

Improve analytics performance using cluster computing technologies such as Apache’s Spark and Hadoop. Store frequently queried results in a Datamart for quick access.

Anti-Pattern #6 — OLTP is the only input to your Big Data system

If transactional data is the only input to your analytics you’re missing out on the much larger picture.

Customer interactions, social media scrapes, stock performance, Google analytics data, and transaction logs should be brought together to form a comprehensive picture.

Anti-Pattern #7 — Anyone can run their own Analytics

A well-built analytics system can become the victim of it’s own success. As it becomes more popular among stakeholders, overall performance may tank.

To preserve performance, move aggregated data out of analytics processing into OLAP/Datamarts and create a chargeback system allocating resources to users/departments.

Comments Off on Big Data Anti-Patterns

Why Airline On-Time Reliability Is Not Likely To Improve

As a frequent traveler who can exceed 250K air miles per year, I experience flight delays almost every week. Waiting for a delayed flight recently allowed me time to think about how airlines experience the multiplicative effect of failure just like our systems do. If you’re not familiar with this phenomenon, think back to holiday lights that your parents put up. If any one bulb burnt out none of the string of lights worked. The reason was that all the 50 lights in the string were wired in series, requiring all of them to work or none of them worked. Often our systems are designed such that they require many servers, databases, network devices, etc. to work for every single request. If any device or system is unavailable in that request-response chain, the entire process fails. The problem with this is that if five devices each have 99.99% availability, the total availability of the overall system is that availability raised to the power of the number of devices, i.e. 99.99%^5 = 99.95%. This has become even more of a problem with the advent of micro-services (see our paper on how to fix this problem, http://reports.akfpartners.com/2016/04/20/the-multiplicative-effect-of-failure-of-microservices/). Now, let’s get back to airlines.

Airlines spend a lot on airplanes, the average Boeing 737-900 costs over $100M (http://www.boeing.com/company/about-bca/index.page%23/prices), and they want to make use of those assets by keeping them in the air and not on the ground. To accomplish this, airlines schedule airplanes on back-to-back trips. For example, a recent United flight departing SFO for SAN was being serviced by an Airbus A320 #4246. This aircraft had the following schedule for the day.

Depart Arrive
SFO 8:05am SAN 9:44am
SAN 10:35am SFO 12:15pm
SFO 1:20pm SEA 3:28pm
SEA 4:18pm SFO 6:29pm
SFO 7:30pm SAN 9:05pm

If I’m on the last flight of the day (SFO->SAN), each previous flight has to be “available” (on-time) in order for me to have any chance of my request (flight) being serviced properly (on-time). As you can undoubtedly see, this is aircraft schedule (SFO->SAN->SFO->SEA->SFO->SAN) is very similar to a request-response chain for a website (browser->router->load balancer->firewall->web server->app server->DB). The availability of the entire system is the multiplied availability of each component. Is it any wonder why airlines have such dismal availability? See the chart below for Aug 2016, with El Al coming in at a dismal 36.81%, my preferred airline, United, coming in at 77.88%, and the very best KLM at only 91.07%.

airline_ontime(Source: http://www.flightstats.com/company/monthly-performance-reports/airlines/).

So, how do you fix this? In our systems, we scale by splitting our systems by either common attributes such as customers (Z-axis) or different attributes such as services (Y-axis). In order to maintain high availability we ensure that we have fault isolation zones around the splits. Think of this as taking the string of 50 lights, breaking it into 5 sets of 10 lights and ensuring that no set affects another set. This allows us to scale up by adding sets of 10 lights while maintaining or even improving our availability.

Airlines could achieve improved customer perceived on-time reliability by essentially splitting their flights by customers (Z-axis). Instead of large airplanes such as the Boeing 737-900 with 167 passengers, they could fly two smaller planes such as the Embraer ERJ170-200 each with only 88 passengers. When one plane is delayed only half the passengers are affected. Airlines are not likely to do this because in general the larger airplanes have a lower cost per seat than smaller airplanes. The other way to increase on-time reliability would be to fault isolate the flight segments. Instead of stacking flights back-to-back with less than one hour between arrival and the next scheduled departure, increase this to a large enough window to makeup for minor delays. Alternatively, at hub airports, introduce a swap aircraft into the schedule to break the dependency chain. Both of these options are unlikely to appeal to airlines because they would also increase the cost per seat.

The bottom line is that there are clearly ways in which airlines could increase their on-time performance but they all increase costs. We advise our clients, that they need the amount of performance and availability that improves customer acquisition or retention but no more. Building more scalability or availability into the product than is needed takes resources away from features that serve the customer and the business better. Airlines are in a similar situation, they need the amount of on-time reliability that doesn’t drive customers to competitors but no more than that. Otherwise the airlines would have to either increase the price of tickets or reduce their profits. Given the dismal on-time reliability, I just might be willing to pay more for a better on-time reliability.

Comments Off on Why Airline On-Time Reliability Is Not Likely To Improve

Achieving Maximum Availability in the Cloud

We often hear clients confidently tell us that Amazon’s SLA of 99.95% for EC2 instances provides them with plenty of availability. They believe that the combination of auto scaling compute instances with a persistent data store such as RDS provides all the scalability and availability that they will ever need. This unfortunately isn’t always the case. In this article we will focus on AWS services but this applies to any IaaS or PaaS provider.

The SLA of 99.95% is not the guaranteed availability of your services. It’s the availability of the EC2 for an entire region in AWS. From the AWS EC2 SLA agreement,

“Monthly Uptime Percentage” is calculated by subtracting from 100% the percentage of minutes during the month in which Amazon EC2 or Amazon EBS, as applicable, was in the state of “Region Unavailable.”

Even this guaranteed limited downtime by Amazon of 21.56 minutes per month is not always achieved. Over the course of the last few years there have been several outages that lasted much longer. Additionally, your services availability can be impacted from other third parties services used in your product and an even more likely from simple human error by your engineering team.

To combat and reduce the likely of a customer-impacting event or performance degradation, we recommend various deployment patterns that will help.

  • Calls made across Regions should be done so asynchronously. This reduces latency and the likelihood of a failure.
  • A given Service should be completely deployed within multiple Availability Zone.
  • Use abstraction layers with AWS Managed Services so that your product is not tied to such services. Today these managed services are at different stages of maturity and an abstraction layer will allow for a migration to more robust solutions later.
  • Services, or microservices depending on your architecture, should have a dedicated data store. Allowing app server pools to communicate to multiple data stores reduces availability.
  • To protect against region failures deploy all services needed for a product into a second region and run active / active designating a home for each user.

We have come across clients who have invested in migrating from one cloud provider to another because of what seemed like infrastructure outages in a particular region. Before making such an investment of time and money make sure you are not the root cause of these outages because the way your product is architected. If you are concerned about your architecture or deploying your product in the cloud, contact us and we can provide you with some options.

Comments Off on Achieving Maximum Availability in the Cloud

Three Strategies for Migrating to the Cloud

Cloud-deployments are becoming increasingly popular across all industries. Despite initial concerns around vendor lock-in and the need for increased security in multi-tenant environments, many companies have found a strong business case for migration. Overall, the economics aren’t surprising. Cloud providers can leverage huge economies of scale, discounts on power-consumption, and demand diversification to drive down costs, which are then passed on to a fast-growing customer base.

As a technology leader, you’ve done your own research, and you’re confident that a cloud deployment can both save money and promote faster innovation amongst your development team. The next challenge is to develop a clear migration plan; however, what you’ll quickly learn is that the technical and business realities of no two tech companies are alike and no single cloud migration strategy can be relied upon to fit every circumstance.

We’ve laid out three commons scenarios that you might consider for your product. Each describes a different business case and an accompanying cloud migration pattern.

Service-by-Service Migration

In the first scenario you already have a well-architected solution running in a private (company-owned or managed host) datacenter. Your main production application is separated into different services and unburdened by large amounts of technical debt. Here a service-by-service transition into the cloud is the clear choice for smooth transition that doesn’t have nearly the risk of a “big-bang” approach and doesn’t trigger a request for proposal (RFP) event for customers. If you make your customers live through a massive migration, they are often going to take the opportunity to look around for alternative services – thus the RFP.

You’ll want to start with the smallest or least heavily coupled service first. The goal being to build your team’s confidence and experience with cloud deployments and tools before tackling larger challenges.

In most cases your engineers will likely need to get their hands dirty with a little re-architecting. (Depending on how closely you’ve followed our recommendations regarding fault isolation zones or “swim lanes”, many of these steps may have already been completed.) Your team’s first goal should be to rewrite (or eliminate) any cross-service calls to be asynchronous. Once this is complete, except in rare cases, virtualizing the web & app tiers should be a straightforward process. Often, the greater challenge is disentangling the persistent data-tier. If you already have a separate data store for each service, it should be easy to complete service migration by virtualizing or importing the data to a PaaS solution (RDS, Azure’s SQL, etc.). If that’s not the case, you’ll want to either:

(1) Separate the service’s data (choosing an appropriate SQL or NoSQL technology). Sharding the persistent data-tier has the added benefit of reducing load on the primary datastore, reducing size and I/OPS requirements, thereby further easing the transition to the cloud.


(2) Eliminate service’s need for data storage entirely by passing data back to another service. (Less desirable, but suitable for smaller services).

Once you’ve deployed your first cloud service. Redirect a small portion of the customer base and monitor for correct functionality. As monitoring improves confidence in the service’s operation, you can continue a slow rollout over your entire customer base, minimize both risk and operational downtime. Repeat this process as for each service to complete the migration.

Down Market MVP

Another common scenario is that your company runs an older mature product, but one that is loaded with technical debt. Perhaps it’s poorly architected, written using older frameworks or languages, or contains undocumented or “black box” code that even your seasoned engineers are afraid to touch. The sheer complexity screams re-write, but at the same time there’s a pressing business appetite to move into the cloud quickly.

In this case, a possible strategy is to enter the cloud with a down-market, minimum viable product (MVP). Have your team start from scratch and develop an MVP in the cloud that encompasses the core features of your main offering. By targeting the down-market, you’re not only entering quickly (with a minimal feature set) but you’re also blocking other upstart competitors from overtaking you.

During development, you’ll need to balance the business demands of time-to-market vs. the technical risk of vendor lock-in. Using PaaS offerings such as Amazon’s SQS or Azure’s DocumentDB can help to get in your new product up-and-running quickly, but at the expense of greater vendor lock-in. There’s no, silver bullet with these decisions, just be sure to make these choices consciously and recognize your accumulating what could become tech debt should you decided to change cloud providers down the road.

Once you’ve established this MVP with a small customer-base, continue to articulate it with feature additions, possibly in a tiered pricing model (e.g. standard, professional, premium). As the new cloud product matures, development efforts should slowly pivot. Developers are moved from the older legacy product to help further articulate features on the new. At the same time, as the new cloud product matures and has more articulated features, customers will be increasing highly incentivized to migrate to the new platform.

Private to Hybrid to Public

In this scenario your company has large capital infrastructure investments that you can’t immediately walk away from. Perhaps you have one or more privately owned data centers, a long-term contract with a managed host provider, or recently purchased expensive new servers during a recent tech-refresh. Cloud migration is a long-term goal for eventual cost savings, but you need to realize the value on infrastructure that has already been purchased.

The pattern in this case is a crawl-walk-run strategy from the private, to hybrid, to public cloud. Initial focus will be on changing your internal processes to run your datacenter(s) as its own mini-cloud. By adopting technology and process changes such as virtualization, resource pooling, automated provisioning, (and possibly even chargeback accounting), you’re readying your organization for an eventual transition to the public cloud.

Once you’ve established your own private cloud, you’ll want to leverage a public cloud provider for your data-backups and DR/Business continuity plan. This is where you’re likely to see your first reduction in CAPEX. Quarterly DR drills will help you test the viability and compatibility of systems your cloud provider.

As demand from your customer base increases, you’ll avoid making further infrastructure investments. Instead, you’ll want to adopt a hybrid cloud strategy, moving a handful of customers to the cloud or leveraging bursting into the cloud to accommodate usage spikes.

Finally, as servers are progressively retired or when your managed provider contracts run out, you’ll be ready to complete a smooth migration fully into the public cloud.


While each of these patterns are vastly different, they all share one thing in common — there are no “Big Bang” moments, no high-risk events that’ll have your stakeholders nail-biting on the edge of there seats. The best technologists are always looking to meet business objectives while reducing risk.

The business and technical realities for every company are different and it’s likely you’ll need to combine one or more of these migration patterns in your own strategy. While we’re confident that many of our customers will benefit from migrating to the cloud, it’s important to tailor your strategy to the needs of your business. We continue to develop new migration patterns that minimize risk and generate business value for our clients.

Comments Off on Three Strategies for Migrating to the Cloud

Product Discovery (the Right Way)

Product Discovery: Sales Driven vs. Market Driven

Product discovery is a core process to any successful software company. While your engineers need to focus on building the product right (making it highly scalable and available), your product owners need to ensure they’re building the right product.

All too often, we find product owners serving essentially as “ticket takers” that run requests from sales to engineering. While the product owners may add a little design and elaboration, the ideas originate solely from the “asks” of the sales department. This “sales-driven” approach is limiting. It tends to stifle innovation and frequently allocates engineering effort to one-off features requested by a single customer. Overall, it is indicative of an immature product management process.

In contrast, mature product organizations take a “market-driven” approach, receive input from a variety of sources, and focus engineering efforts on penetrating new market segments. These differences translate into measurable business outcomes. Market-driven companies (e.g. Salesforce, Apple, Amazon, WorkDay) are routinely valued at 10-15 times revenue, while sales-driven companies (e.g. Dell, Samsung, Oracle, SAP, Rackspace, WalMart.com) typically trade for only 1.5-6 times revenue.

Highlighting the differences of these two approaches will demonstrate how organizations focused on innovation and faster time to market are better served by a market-driven approach.

Sales Led Product Discovery

In a sales-driven product organization, the product ideation process begins with sales obtaining feedback from customers (or potential customers) on desired features and functionality. These requests are then translated into requirements and added to the product backlog. Prioritization — if it occurs at all — is based on the size of the contract and/or relative importance of the customer. The process itself is simple and straightforward, but unfortunately it often leads to poor outcomes.

Frequently, features not yet implemented are promised and assigned deadlines as part of the sales contract. These binding commitments tie the hands of product and engineering anywhere from several weeks to months, forcing them on a treadmill that never quite allows enough breathing room to experiment with innovative features or new products. It becomes impossible for the product organization to pivot without months of lead time or executive intervention.

Additionally, the particular needs of each customer lead them to request features and functionality that only they will use. Product evolution therefore becomes haphazard, with the engineering team jumping from one custom feature to the next.

Furthermore, these feature commitments are often granted in an all or nothing approach with highly articulated use cases. A consequence is that releases to production are infrequent, perhaps quarterly or less, making it difficult to acquire feedback from the larger customer base. Few releases result in less feedback from customers overall and fewer opportunities to pivot away from a poorly chosen feature set.

Consequently, the product itself becomes bloated with a large number of features that one or maybe two customers use and these one-off features add complexity both to the interface (making it difficult to use) and to the code (making it difficult to maintain). This over development makes the product unsatisfying to use and a drain on engineering resources to maintain.

The bottom-line is sales-driven products simply lack vision. Your sales department (quite rightly) is too focused on selling and your customers (quite naturally) are too focused on solving the business problems of today to steer the future vision of your product. The limitations of sales driven approaches are best summarized by the old Henry Ford quote “If I asked my customers what they wanted, it would have been a faster horse.”

Market Led Product Discovery

If a sales-driven approach is a dead end, what is the alternative?

In a market-driven approach, product ideas aren’t limited to sales and customer asks but come from a variety of sources (e.g. charter customer feedback, actual usage metrics, customer service, engineering, sales, executive team). This generates a surplus of ideas — the majority of which, the product owners will have to say “no” to. In fact, the most important function of product owners is to say “no” to ideas that will divert engineering effort away from creating maximum value.

When a new product is launched, an MVP concept is prototyped and tested among a small set of charter customers. Prototypes (interactive mockups) are created with tools like Balsamiq, Axure, etc. and iterated over multiple times, allowing a huge number of ideas to be tested and refined with almost no coding. This process also identifies the minimum feature set required to solve a problem in a particular market segment, thereby minimizing the engineering effort required to implement it.

Once in production, these MVPs are enhanced by further feature development. However, features aren’t selected to satisfy the needs of individual customers but to meet the collective needs of the market segment (i.e. Keep the aggregate customer mostly happy). Features themselves are initially released in minimum viable fashion and further articulated in later releases, allowing for a frequent release schedule and ready feedback from the customer base. Overall, this strategy avoids over-development while capturing as much of the market segment as possible. Specific customer one-off product needs can be pushed to a professional services group (internally or externally) and built as add-ons through a mechanism like an API framework, but not incorporated into the core product.

Once the current market segment has been saturated, focus shifts to conquering another market segment. This is the expansion strategy advocated by Geoffery Moore in Crossing the Chasm: “[target] a very specific niche market where you can dominate from the outset, drive your competitors out of that market niche, and then use it as a base for broader operations.”[1]

In contrast to the sales-driven approach above, this expansion is a deliberate process. Market research is conducted to identify the needs of these new customers. In some cases this means adding a collection of features to the existing product, but often it requires the launch of an entirely new product. Again, the use of extensive prototyping and feedback from charter customers play a key role.

This quick advance – prenetrating and saturating market segments in rapid succession – explains the stark differences in investor valuations between market-driven (10-15x Rev) and sales-driven (1.5-6x Rev) companies.


While more complex, the market-driven product discovery process helps your business focus on long-term growth vs. near-term individual customer wins. It draws upon a variety of sources for ideas, allows for early testing and feedback (for both MVP products and features), minimizes wasted engineering effort, and opens new market segments quicker than sales-driven approaches.

Many of our product management ideas are influenced by Marty Cagan at the SVPG. Click here to visit his website and learn more.

[1] Moore, Geoffrey A. (2014-01-28). Crossing the Chasm, 3rd Edition (Collins Business Essentials) (p. 79). HarperCollins. Kindle Edition.

Comments Off on Product Discovery (the Right Way)

Communicating Across Swim-lanes

We often emphasize the need to break apart complex applications into separate swim-laned services. Doing so not only improves architectural scalability but also permits the organization to scale and innovate faster as each separate team can operate almost as its own independent startup.

Ideally, there would be no need to pass information between these services and they could be easily stitched together on the client-side with a little HTML for a seamless user experience. However, in the real world, you’re likely to find that some message passing between services is needed (If only to save your users the trouble of not entering information twice).

Every cross-service call adds complexity to your application. Often times, teams will implement internal APIs when cross-service communication is needed. This is a good start and solves part of the problem by formalizing communication and discouraging unnecessary cross-service calls.

However, APIs alone don’t address the more important issue with cross-service communication — the reduction in availability. Anytime one service synchronously communicates with another you’ve effectively connected them in series and reduced overall availability. If Service-A calls Service-B synchronously, and Service-B crashes or slows to a crawl, Service-A will do the same. To avoid this trap, communication needs to take place asynchronously between services. This, in fact, is how we define “swim-laning”.

So what are the best means to facilitate asynchronous cross-service communication and what are the pro and cons of each method?

Client Side

Information that needs to be shared between services can be passed via JavaScript & JSON in the browser. This has the advantage of keeping the backend services completely separate and gives you the option to move business logic to the client side, thus reducing load on your transactional systems.

The downside, however, is the increased security risks. It doesn’t take an expert hacker to manipulate variables in JavaScript. (Just think, $price_var gets set to $0 somewhere between the shopping cart and checkout services). Furthermore, data is passed back to the server-side these cross-service calls will now suffer from the same latency and reliability issues as any TCP call over the internet.

Message Bus / Enterprise Service Bus

Message buses and enterprise service buses provide a ready means to transmit messages between services asynchronously in pub/sub model. Advantages include providing a rich set of features for tracking, manipulating, and delivering messages, as well as the ability centralized logging and monitoring. However, the potential for congestion as well as occasional message loss makes them less desirable in many cases than asynchronous point-to-point calls (discussed below).

To limit congestion, it’s best to implement several independent channels and limit message bus traffic to events that will have multiple subscribers.

Asynchronous Point-to-Point

Point-to-point communication (i.e. one service directly calling another) is another effective means of message passing. Advantages include simplicity, speed, and reliability. However, be sure to implement this asynchronously with a queuing mechanism with timeouts, retries (if needed), and exceptions handled in the event of service failure. This will prevent failures from propagating across service boundaries.

If properly implemented, asynchronous point-to-point communication is excellent for invoking a function in another service or transmitting a small amount of information. This method can be implemented for the majority of cross-service communication needs. However, for larger data transfers, you’ll need to consider one of the methods below.


ETL jobs can be used to move a large amount of data from one datastore to another. Since these are generally implemented as separate batch processes they won’t bring down the availability of your services. However, drawbacks include increased load on transactional databases (unless a read replica of the source DB is used) and the poor timeliness/consistency of data resulting from periodic batch process.

ETL processes are likely best reserved for transferring data from your OLTP to OLAP systems, where immediate consistency isn’t required. If you need both a large amount of data transferred and up to the second consistency consider a DB read replica.

DB Read-Replica

Most common databases (Oracle, MySQL, PostgreSQL) support native replication of DB clones with replication lag measured in milliseconds. By placing a read replica of one service’s DB into another service’s swim-lane, you can successfully transfer a large amount of data in near real-time. The downsides are increased IOPS requirements and a lack of abstraction between services that — in contrast to the abstraction provided by an API — fosters greater service interdependency.


In conclusion, you can see there are a variety of cross-service communication methods that permit fault-isolation between services. The trick is knowing the strengths and weaknesses of each and implementing the right method where appropriate.

Comments Off on Communicating Across Swim-lanes

Making Agile Predictable

One of the biggest complaints that we hear from businesses implementing agile processes is that they can’t predict when things will get delivered. Many people believe, incorrectly, that waterfall is a much more predictable development methodology. Let’s review a few statistics that demonstrate the “predictability” of the waterfall methodology.

In a study of over $37 billion (USD) worth of US Defense Department projects concluded that: 46% of the systems so egregiously did not meet the real needs (although they met the specifications) that they were never successfully used, and another 20% required extensive rework (Larman, 1995).

In another study of 6,700 projects it was found that four out of five key factors contributing to project failure were associated with or aggravated by the waterfall method, including inability to deal with changing requirements and problems with late integration.

Waterfall is not a bad or failed methodology, it’s just a tool like any other that has its limitations. We as the users of that tool misused it and then blamed the tool. We falsely believed that we could fix the project scope (specifications), cost (team size), and schedule (delivery date). No methodology allows for that. As shown in the diagram below, waterfall is meant to fix scope and cost. When we also try to constrain the schedule we’re destined to fail.


Agile when used properly fixes the cost (team size) and the schedule (2 week sprints), allowing the scope to vary. This is where most organizations struggle, attempting to predict delivery of features when the scope of stories is allowed to vary. As a side note, if you think you can fix the scope and the schedule and vary the cost (team size) read Brooke’s 1975 book The Mythical Man-Month.

This is where the magical measurement of velocity comes in to play. The velocity of a team is simply the number of units of work completed during the sprint. The primary purpose of this measurement is to provide the team with feedback on their estimates. As you can see in the graph below it usually takes a few sprints to get into a controlled state where the velocity is predictable and then it usually rises slightly over time as the team becomes more experienced.


Using velocity we can now predict when features will be delivered. We simply project out the best and worst velocities and we can demonstrate with high certainly a best and worst delivery date for a set of stories that make up a feature.


Velocity helps us answer two types of questions. The first is the fixed scope question “when will we have X feature?” to which the answer is “between A and B dates”. The second question is the fixed time question “what will be delivered by the June releases?” to which the answer is “all of this, some of that, and none of those.” What we can’t answer is fixed time and fixed scope questions.


It’s important to remember is that agile is not a software development methodology, but rather a business process. This means that all parts of your organization must buy-in to and participate in agile product development. Your sales team must get out of the mindset of committing to product new features with fixed time and scope when talking to existing or potential customers. When implemented correctly, agile provides faster time to market and higher levels of innovation than waterfall, which brings greater value to your customers. The tradeoff from a sales side is to change behavior from making long-term product commitments as they did in the past (but were more often than not missed anyway)!

By utilizing velocity, keeping the team consistent, and phrasing the questions properly, agile can be a very predictable methodology. The key is understanding the constraints of the methodology and working within them instead of ignoring them and blaming the methodology.

1 comment

AKF is launching an Information Security Service for Clients

In today’s digital world, cyber security is one of the biggest areas that keep CEOs and business executives awake at night. Further, regulation and compliance are continually changing, as both governments and customers are modifying the rules in an attempt to keep pace with rapid changes in technology. Keeping up with these changes in regulation, and particularly how they will apply to your business, is a monumental challenge. As a very recent example, if you operate in both the U.S. and Europe, you’re probably familiar with Safe Harbor (https://en.wikipedia.org/wiki/International_Safe_Harbor_Privacy_Principles). Up until mid 2015, Safe Harbor permitted U.S. companies to store EU customer data on U.S. soil, so long as the infrastructure and usage met the privacy standards of the Safe Harbor Act. However, in October 2015, courts struck down the US-EU Safe Harbor agreement, leaving companies again wondering what approach they should take. Then on Feb 29, 2016, the European Commission introduced the EU-US Privacy Shield (http://europa.eu/rapid/press-release_IP-16-433_en.htm), which is similar to Safe Harbor but, in conjunction with the Judicial Redress Act signed by President Obama, extends the ability of European citizens to challenge U.S. companies that store sensitive personal information. With the changes coming quickly and furiously, what is your company to do?


In addition to those challenges, customers today put more and more demand on companies to not only protect sensitive customer data and IP, but many customers DO expect you to store and process sensitive data in a highly available and scalable manner. Balancing the needs of maintaining data on behalf of customers with regulation and compliance is one of the biggest challenges for any modern company, whether you store or process sensitive data for your customers or not. Sometimes just operating in a regulated industry subjects your company to the long reaches of compliance standards.

Finally, who is in charge of security in your company? Is it spread across teams? Is there a single CSO/CISO and security organization? How well do the security organizations work with your technology and business teams?

Our clients ask us frequently if we can assess their security programs and help develop plans for them, similar to the work we do with system architecture and product development. At AKF, we look at security the same way we look at availability and scaling. It is about managing risk. You can build a system that has nearly 100% availability, and you can build a system that scales nearly infinitely… but you don’t always need to, as it’s not always the most cost-effective way to run your business.


You want to build a system that achieves very high availability, up to the point where the cost of “near perfect” availability exceeds the value it brings to the business. Similarly, you may never be able to completely eliminate each and every risk to your information security. Some industries, like health care and credit card processors, need to be “near perfect.” But other may not need to be quite as perfect. Some may just require that you have ample controls, tightly monitored systems, secure coding practices, and top notch Security Incident Response plans.

Finally, there are plenty of overlaps in regulations that if applied once, can solve several of your security compliance concerns. For example, PCI, SOX, and HIPAA all have requirements about audibility and accessibility, although they apply to different types of data. If you’re subject to more than one of those, why not put controls in place that help solve them for ANY regulation?

SecPic3 SecPic4 SecPic5

We have begun a new program to help our clients with their security needs. Our approach is to help you understand your regulation and compliance landscape, look at the security measures you’ve put in place, and work with you to design a program and projects to close the gaps, focusing on the highest value projects first. We’ll also help you understand where security fits into your organization, and guide you on how to break down barriers that prevent organizations from working cohesively to manage security across the enterprise.

If you are interested in our security program services, please contact us.

Comments Off on AKF is launching an Information Security Service for Clients