AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Achieving Maximum Availability in the Cloud

We often hear clients confidently tell us that Amazon’s SLA of 99.95% for EC2 instances provides them with plenty of availability. They believe that the combination of auto scaling compute instances with a persistent data store such as RDS provides all the scalability and availability that they will ever need. This unfortunately isn’t always the case. In this article we will focus on AWS services but this applies to any IaaS or PaaS provider.

The SLA of 99.95% is not the guaranteed availability of your services. It’s the availability of the EC2 for an entire region in AWS. From the AWS EC2 SLA agreement,

“Monthly Uptime Percentage” is calculated by subtracting from 100% the percentage of minutes during the month in which Amazon EC2 or Amazon EBS, as applicable, was in the state of “Region Unavailable.”

Even this guaranteed limited downtime by Amazon of 21.56 minutes per month is not always achieved. Over the course of the last few years there have been several outages that lasted much longer. Additionally, your services availability can be impacted from other third parties services used in your product and an even more likely from simple human error by your engineering team.

To combat and reduce the likely of a customer-impacting event or performance degradation, we recommend various deployment patterns that will help.

  • Calls made across Regions should be done so asynchronously. This reduces latency and the likelihood of a failure.
  • A given Service should be completely deployed within multiple Availability Zone.
  • Use abstraction layers with AWS Managed Services so that your product is not tied to such services. Today these managed services are at different stages of maturity and an abstraction layer will allow for a migration to more robust solutions later.
  • Services, or microservices depending on your architecture, should have a dedicated data store. Allowing app server pools to communicate to multiple data stores reduces availability.
  • To protect against region failures deploy all services needed for a product into a second region and run active / active designating a home for each user.

We have come across clients who have invested in migrating from one cloud provider to another because of what seemed like infrastructure outages in a particular region. Before making such an investment of time and money make sure you are not the root cause of these outages because the way your product is architected. If you are concerned about your architecture or deploying your product in the cloud, contact us and we can provide you with some options.

Send to Kindle

Three Strategies for Migrating to the Cloud

Cloud-deployments are becoming increasingly popular across all industries. Despite initial concerns around vendor lock-in and the need for increased security in multi-tenant environments, many companies have found a strong business case for migration. Overall, the economics aren’t surprising. Cloud providers can leverage huge economies of scale, discounts on power-consumption, and demand diversification to drive down costs, which are then passed on to a fast-growing customer base.

As a technology leader, you’ve done your own research, and you’re confident that a cloud deployment can both save money and promote faster innovation amongst your development team. The next challenge is to develop a clear migration plan; however, what you’ll quickly learn is that the technical and business realities of no two tech companies are alike and no single cloud migration strategy can be relied upon to fit every circumstance.

We’ve laid out three commons scenarios that you might consider for your product. Each describes a different business case and an accompanying cloud migration pattern.

Service-by-Service Migration

In the first scenario you already have a well-architected solution running in a private (company-owned or managed host) datacenter. Your main production application is separated into different services and unburdened by large amounts of technical debt. Here a service-by-service transition into the cloud is the clear choice for smooth transition that doesn’t have nearly the risk of a “big-bang” approach and doesn’t trigger a request for proposal (RFP) event for customers. If you make your customers live through a massive migration, they are often going to take the opportunity to look around for alternative services – thus the RFP.

You’ll want to start with the smallest or least heavily coupled service first. The goal being to build your team’s confidence and experience with cloud deployments and tools before tackling larger challenges.

In most cases your engineers will likely need to get their hands dirty with a little re-architecting. (Depending on how closely you’ve followed our recommendations regarding fault isolation zones or “swim lanes”, many of these steps may have already been completed.) Your team’s first goal should be to rewrite (or eliminate) any cross-service calls to be asynchronous. Once this is complete, except in rare cases, virtualizing the web & app tiers should be a straightforward process. Often, the greater challenge is disentangling the persistent data-tier. If you already have a separate data store for each service, it should be easy to complete service migration by virtualizing or importing the data to a PaaS solution (RDS, Azure’s SQL, etc.). If that’s not the case, you’ll want to either:

(1) Separate the service’s data (choosing an appropriate SQL or NoSQL technology). Sharding the persistent data-tier has the added benefit of reducing load on the primary datastore, reducing size and I/OPS requirements, thereby further easing the transition to the cloud.

or

(2) Eliminate service’s need for data storage entirely by passing data back to another service. (Less desirable, but suitable for smaller services).

Once you’ve deployed your first cloud service. Redirect a small portion of the customer base and monitor for correct functionality. As monitoring improves confidence in the service’s operation, you can continue a slow rollout over your entire customer base, minimize both risk and operational downtime. Repeat this process as for each service to complete the migration.

Down Market MVP

Another common scenario is that your company runs an older mature product, but one that is loaded with technical debt. Perhaps it’s poorly architected, written using older frameworks or languages, or contains undocumented or “black box” code that even your seasoned engineers are afraid to touch. The sheer complexity screams re-write, but at the same time there’s a pressing business appetite to move into the cloud quickly.

In this case, a possible strategy is to enter the cloud with a down-market, minimum viable product (MVP). Have your team start from scratch and develop an MVP in the cloud that encompasses the core features of your main offering. By targeting the down-market, you’re not only entering quickly (with a minimal feature set) but you’re also blocking other upstart competitors from overtaking you.

During development, you’ll need to balance the business demands of time-to-market vs. the technical risk of vendor lock-in. Using PaaS offerings such as Amazon’s SQS or Azure’s DocumentDB can help to get in your new product up-and-running quickly, but at the expense of greater vendor lock-in. There’s no, silver bullet with these decisions, just be sure to make these choices consciously and recognize your accumulating what could become tech debt should you decided to change cloud providers down the road.

Once you’ve established this MVP with a small customer-base, continue to articulate it with feature additions, possibly in a tiered pricing model (e.g. standard, professional, premium). As the new cloud product matures, development efforts should slowly pivot. Developers are moved from the older legacy product to help further articulate features on the new. At the same time, as the new cloud product matures and has more articulated features, customers will be increasing highly incentivized to migrate to the new platform.

Private to Hybrid to Public

In this scenario your company has large capital infrastructure investments that you can’t immediately walk away from. Perhaps you have one or more privately owned data centers, a long-term contract with a managed host provider, or recently purchased expensive new servers during a recent tech-refresh. Cloud migration is a long-term goal for eventual cost savings, but you need to realize the value on infrastructure that has already been purchased.

The pattern in this case is a crawl-walk-run strategy from the private, to hybrid, to public cloud. Initial focus will be on changing your internal processes to run your datacenter(s) as its own mini-cloud. By adopting technology and process changes such as virtualization, resource pooling, automated provisioning, (and possibly even chargeback accounting), you’re readying your organization for an eventual transition to the public cloud.

Once you’ve established your own private cloud, you’ll want to leverage a public cloud provider for your data-backups and DR/Business continuity plan. This is where you’re likely to see your first reduction in CAPEX. Quarterly DR drills will help you test the viability and compatibility of systems your cloud provider.

As demand from your customer base increases, you’ll avoid making further infrastructure investments. Instead, you’ll want to adopt a hybrid cloud strategy, moving a handful of customers to the cloud or leveraging bursting into the cloud to accommodate usage spikes.

Finally, as servers are progressively retired or when your managed provider contracts run out, you’ll be ready to complete a smooth migration fully into the public cloud.

Conclusion

While each of these patterns are vastly different, they all share one thing in common — there are no “Big Bang” moments, no high-risk events that’ll have your stakeholders nail-biting on the edge of there seats. The best technologists are always looking to meet business objectives while reducing risk.

The business and technical realities for every company are different and it’s likely you’ll need to combine one or more of these migration patterns in your own strategy. While we’re confident that many of our customers will benefit from migrating to the cloud, it’s important to tailor your strategy to the needs of your business. We continue to develop new migration patterns that minimize risk and generate business value for our clients.

Send to Kindle

Product Discovery (the Right Way)

Product Discovery: Sales Driven vs. Market Driven

Product discovery is a core process to any successful software company. While your engineers need to focus on building the product right (making it highly scalable and available), your product owners need to ensure they’re building the right product.

All too often, we find product owners serving essentially as “ticket takers” that run requests from sales to engineering. While the product owners may add a little design and elaboration, the ideas originate solely from the “asks” of the sales department. This “sales-driven” approach is limiting. It tends to stifle innovation and frequently allocates engineering effort to one-off features requested by a single customer. Overall, it is indicative of an immature product management process.

In contrast, mature product organizations take a “market-driven” approach, receive input from a variety of sources, and focus engineering efforts on penetrating new market segments. These differences translate into measurable business outcomes. Market-driven companies (e.g. Salesforce, Apple, Amazon, WorkDay) are routinely valued at 10-15 times revenue, while sales-driven companies (e.g. Dell, Samsung, Oracle, SAP, Rackspace, WalMart.com) typically trade for only 1.5-6 times revenue.

Highlighting the differences of these two approaches will demonstrate how organizations focused on innovation and faster time to market are better served by a market-driven approach.

Sales Led Product Discovery

In a sales-driven product organization, the product ideation process begins with sales obtaining feedback from customers (or potential customers) on desired features and functionality. These requests are then translated into requirements and added to the product backlog. Prioritization — if it occurs at all — is based on the size of the contract and/or relative importance of the customer. The process itself is simple and straightforward, but unfortunately it often leads to poor outcomes.

Frequently, features not yet implemented are promised and assigned deadlines as part of the sales contract. These binding commitments tie the hands of product and engineering anywhere from several weeks to months, forcing them on a treadmill that never quite allows enough breathing room to experiment with innovative features or new products. It becomes impossible for the product organization to pivot without months of lead time or executive intervention.

Additionally, the particular needs of each customer lead them to request features and functionality that only they will use. Product evolution therefore becomes haphazard, with the engineering team jumping from one custom feature to the next.

Furthermore, these feature commitments are often granted in an all or nothing approach with highly articulated use cases. A consequence is that releases to production are infrequent, perhaps quarterly or less, making it difficult to acquire feedback from the larger customer base. Few releases result in less feedback from customers overall and fewer opportunities to pivot away from a poorly chosen feature set.

Consequently, the product itself becomes bloated with a large number of features that one or maybe two customers use and these one-off features add complexity both to the interface (making it difficult to use) and to the code (making it difficult to maintain). This over development makes the product unsatisfying to use and a drain on engineering resources to maintain.

The bottom-line is sales-driven products simply lack vision. Your sales department (quite rightly) is too focused on selling and your customers (quite naturally) are too focused on solving the business problems of today to steer the future vision of your product. The limitations of sales driven approaches are best summarized by the old Henry Ford quote “If I asked my customers what they wanted, it would have been a faster horse.”

Market Led Product Discovery

If a sales-driven approach is a dead end, what is the alternative?

In a market-driven approach, product ideas aren’t limited to sales and customer asks but come from a variety of sources (e.g. charter customer feedback, actual usage metrics, customer service, engineering, sales, executive team). This generates a surplus of ideas — the majority of which, the product owners will have to say “no” to. In fact, the most important function of product owners is to say “no” to ideas that will divert engineering effort away from creating maximum value.

When a new product is launched, an MVP concept is prototyped and tested among a small set of charter customers. Prototypes (interactive mockups) are created with tools like Balsamiq, Axure, etc. and iterated over multiple times, allowing a huge number of ideas to be tested and refined with almost no coding. This process also identifies the minimum feature set required to solve a problem in a particular market segment, thereby minimizing the engineering effort required to implement it.

Once in production, these MVPs are enhanced by further feature development. However, features aren’t selected to satisfy the needs of individual customers but to meet the collective needs of the market segment (i.e. Keep the aggregate customer mostly happy). Features themselves are initially released in minimum viable fashion and further articulated in later releases, allowing for a frequent release schedule and ready feedback from the customer base. Overall, this strategy avoids over-development while capturing as much of the market segment as possible. Specific customer one-off product needs can be pushed to a professional services group (internally or externally) and built as add-ons through a mechanism like an API framework, but not incorporated into the core product.

Once the current market segment has been saturated, focus shifts to conquering another market segment. This is the expansion strategy advocated by Geoffery Moore in Crossing the Chasm: “[target] a very specific niche market where you can dominate from the outset, drive your competitors out of that market niche, and then use it as a base for broader operations.”[1]

In contrast to the sales-driven approach above, this expansion is a deliberate process. Market research is conducted to identify the needs of these new customers. In some cases this means adding a collection of features to the existing product, but often it requires the launch of an entirely new product. Again, the use of extensive prototyping and feedback from charter customers play a key role.

This quick advance – prenetrating and saturating market segments in rapid succession – explains the stark differences in investor valuations between market-driven (10-15x Rev) and sales-driven (1.5-6x Rev) companies.

Conclusion

While more complex, the market-driven product discovery process helps your business focus on long-term growth vs. near-term individual customer wins. It draws upon a variety of sources for ideas, allows for early testing and feedback (for both MVP products and features), minimizes wasted engineering effort, and opens new market segments quicker than sales-driven approaches.

Many of our product management ideas are influenced by Marty Cagan at the SVPG. Click here to visit his website and learn more.

[1] Moore, Geoffrey A. (2014-01-28). Crossing the Chasm, 3rd Edition (Collins Business Essentials) (p. 79). HarperCollins. Kindle Edition.
Send to Kindle

Comments Off on Product Discovery (the Right Way)

Communicating Across Swim-lanes

We often emphasize the need to break apart complex applications into separate swim-laned services. Doing so not only improves architectural scalability but also permits the organization to scale and innovate faster as each separate team can operate almost as its own independent startup.

Ideally, there would be no need to pass information between these services and they could be easily stitched together on the client-side with a little HTML for a seamless user experience. However, in the real world, you’re likely to find that some message passing between services is needed (If only to save your users the trouble of not entering information twice).

Every cross-service call adds complexity to your application. Often times, teams will implement internal APIs when cross-service communication is needed. This is a good start and solves part of the problem by formalizing communication and discouraging unnecessary cross-service calls.

However, APIs alone don’t address the more important issue with cross-service communication — the reduction in availability. Anytime one service synchronously communicates with another you’ve effectively connected them in series and reduced overall availability. If Service-A calls Service-B synchronously, and Service-B crashes or slows to a crawl, Service-A will do the same. To avoid this trap, communication needs to take place asynchronously between services. This, in fact, is how we define “swim-laning”.

So what are the best means to facilitate asynchronous cross-service communication and what are the pro and cons of each method?

Client Side

Information that needs to be shared between services can be passed via JavaScript & JSON in the browser. This has the advantage of keeping the backend services completely separate and gives you the option to move business logic to the client side, thus reducing load on your transactional systems.

The downside, however, is the increased security risks. It doesn’t take an expert hacker to manipulate variables in JavaScript. (Just think, $price_var gets set to $0 somewhere between the shopping cart and checkout services). Furthermore, data is passed back to the server-side these cross-service calls will now suffer from the same latency and reliability issues as any TCP call over the internet.

Message Bus / Enterprise Service Bus

Message buses and enterprise service buses provide a ready means to transmit messages between services asynchronously in pub/sub model. Advantages include providing a rich set of features for tracking, manipulating, and delivering messages, as well as the ability centralized logging and monitoring. However, the potential for congestion as well as occasional message loss makes them less desirable in many cases than asynchronous point-to-point calls (discussed below).

To limit congestion, it’s best to implement several independent channels and limit message bus traffic to events that will have multiple subscribers.

Asynchronous Point-to-Point

Point-to-point communication (i.e. one service directly calling another) is another effective means of message passing. Advantages include simplicity, speed, and reliability. However, be sure to implement this asynchronously with a queuing mechanism with timeouts, retries (if needed), and exceptions handled in the event of service failure. This will prevent failures from propagating across service boundaries.

If properly implemented, asynchronous point-to-point communication is excellent for invoking a function in another service or transmitting a small amount of information. This method can be implemented for the majority of cross-service communication needs. However, for larger data transfers, you’ll need to consider one of the methods below.

ETL

ETL jobs can be used to move a large amount of data from one datastore to another. Since these are generally implemented as separate batch processes they won’t bring down the availability of your services. However, drawbacks include increased load on transactional databases (unless a read replica of the source DB is used) and the poor timeliness/consistency of data resulting from periodic batch process.

ETL processes are likely best reserved for transferring data from your OLTP to OLAP systems, where immediate consistency isn’t required. If you need both a large amount of data transferred and up to the second consistency consider a DB read replica.

DB Read-Replica

Most common databases (Oracle, MySQL, PostgreSQL) support native replication of DB clones with replication lag measured in milliseconds. By placing a read replica of one service’s DB into another service’s swim-lane, you can successfully transfer a large amount of data in near real-time. The downsides are increased IOPS requirements and a lack of abstraction between services that — in contrast to the abstraction provided by an API — fosters greater service interdependency.

Conclusion

In conclusion, you can see there are a variety of cross-service communication methods that permit fault-isolation between services. The trick is knowing the strengths and weaknesses of each and implementing the right method where appropriate.

Send to Kindle

Comments Off on Communicating Across Swim-lanes

Making Agile Predictable

One of the biggest complaints that we hear from businesses implementing agile processes is that they can’t predict when things will get delivered. Many people believe, incorrectly, that waterfall is a much more predictable development methodology. Let’s review a few statistics that demonstrate the “predictability” of the waterfall methodology.

In a study of over $37 billion (USD) worth of US Defense Department projects concluded that: 46% of the systems so egregiously did not meet the real needs (although they met the specifications) that they were never successfully used, and another 20% required extensive rework (Larman, 1995).

In another study of 6,700 projects it was found that four out of five key factors contributing to project failure were associated with or aggravated by the waterfall method, including inability to deal with changing requirements and problems with late integration.

Waterfall is not a bad or failed methodology, it’s just a tool like any other that has its limitations. We as the users of that tool misused it and then blamed the tool. We falsely believed that we could fix the project scope (specifications), cost (team size), and schedule (delivery date). No methodology allows for that. As shown in the diagram below, waterfall is meant to fix scope and cost. When we also try to constrain the schedule we’re destined to fail.

fig1

Agile when used properly fixes the cost (team size) and the schedule (2 week sprints), allowing the scope to vary. This is where most organizations struggle, attempting to predict delivery of features when the scope of stories is allowed to vary. As a side note, if you think you can fix the scope and the schedule and vary the cost (team size) read Brooke’s 1975 book The Mythical Man-Month.

This is where the magical measurement of velocity comes in to play. The velocity of a team is simply the number of units of work completed during the sprint. The primary purpose of this measurement is to provide the team with feedback on their estimates. As you can see in the graph below it usually takes a few sprints to get into a controlled state where the velocity is predictable and then it usually rises slightly over time as the team becomes more experienced.

fig2

Using velocity we can now predict when features will be delivered. We simply project out the best and worst velocities and we can demonstrate with high certainly a best and worst delivery date for a set of stories that make up a feature.

fig3

Velocity helps us answer two types of questions. The first is the fixed scope question “when will we have X feature?” to which the answer is “between A and B dates”. The second question is the fixed time question “what will be delivered by the June releases?” to which the answer is “all of this, some of that, and none of those.” What we can’t answer is fixed time and fixed scope questions.

fig4

It’s important to remember is that agile is not a software development methodology, but rather a business process. This means that all parts of your organization must buy-in to and participate in agile product development. Your sales team must get out of the mindset of committing to product new features with fixed time and scope when talking to existing or potential customers. When implemented correctly, agile provides faster time to market and higher levels of innovation than waterfall, which brings greater value to your customers. The tradeoff from a sales side is to change behavior from making long-term product commitments as they did in the past (but were more often than not missed anyway)!

By utilizing velocity, keeping the team consistent, and phrasing the questions properly, agile can be a very predictable methodology. The key is understanding the constraints of the methodology and working within them instead of ignoring them and blaming the methodology.

Send to Kindle

2 comments

AKF is launching an Information Security Service for Clients

In today’s digital world, cyber security is one of the biggest areas that keep CEOs and business executives awake at night. Further, regulation and compliance are continually changing, as both governments and customers are modifying the rules in an attempt to keep pace with rapid changes in technology. Keeping up with these changes in regulation, and particularly how they will apply to your business, is a monumental challenge. As a very recent example, if you operate in both the U.S. and Europe, you’re probably familiar with Safe Harbor (https://en.wikipedia.org/wiki/International_Safe_Harbor_Privacy_Principles). Up until mid 2015, Safe Harbor permitted U.S. companies to store EU customer data on U.S. soil, so long as the infrastructure and usage met the privacy standards of the Safe Harbor Act. However, in October 2015, courts struck down the US-EU Safe Harbor agreement, leaving companies again wondering what approach they should take. Then on Feb 29, 2016, the European Commission introduced the EU-US Privacy Shield (http://europa.eu/rapid/press-release_IP-16-433_en.htm), which is similar to Safe Harbor but, in conjunction with the Judicial Redress Act signed by President Obama, extends the ability of European citizens to challenge U.S. companies that store sensitive personal information. With the changes coming quickly and furiously, what is your company to do?

SecPic1

In addition to those challenges, customers today put more and more demand on companies to not only protect sensitive customer data and IP, but many customers DO expect you to store and process sensitive data in a highly available and scalable manner. Balancing the needs of maintaining data on behalf of customers with regulation and compliance is one of the biggest challenges for any modern company, whether you store or process sensitive data for your customers or not. Sometimes just operating in a regulated industry subjects your company to the long reaches of compliance standards.

Finally, who is in charge of security in your company? Is it spread across teams? Is there a single CSO/CISO and security organization? How well do the security organizations work with your technology and business teams?

Our clients ask us frequently if we can assess their security programs and help develop plans for them, similar to the work we do with system architecture and product development. At AKF, we look at security the same way we look at availability and scaling. It is about managing risk. You can build a system that has nearly 100% availability, and you can build a system that scales nearly infinitely… but you don’t always need to, as it’s not always the most cost-effective way to run your business.

SecPic2

You want to build a system that achieves very high availability, up to the point where the cost of “near perfect” availability exceeds the value it brings to the business. Similarly, you may never be able to completely eliminate each and every risk to your information security. Some industries, like health care and credit card processors, need to be “near perfect.” But other may not need to be quite as perfect. Some may just require that you have ample controls, tightly monitored systems, secure coding practices, and top notch Security Incident Response plans.

Finally, there are plenty of overlaps in regulations that if applied once, can solve several of your security compliance concerns. For example, PCI, SOX, and HIPAA all have requirements about audibility and accessibility, although they apply to different types of data. If you’re subject to more than one of those, why not put controls in place that help solve them for ANY regulation?

SecPic3 SecPic4 SecPic5

We have begun a new program to help our clients with their security needs. Our approach is to help you understand your regulation and compliance landscape, look at the security measures you’ve put in place, and work with you to design a program and projects to close the gaps, focusing on the highest value projects first. We’ll also help you understand where security fits into your organization, and guide you on how to break down barriers that prevent organizations from working cohesively to manage security across the enterprise.

If you are interested in our security program services, please contact us.

Send to Kindle

Comments Off on AKF is launching an Information Security Service for Clients

Paying Down Your Technical Debt

During the course of our client engagements, there are a few common topics or themes that are almost always discussed, and the clients themselves usually introduce them. One such area is technical debt. Every team has it, almost every team believes they have too much of it, and no team has an easy time explaining to the business why it’s important to address. We’re all familiar with the concept of technical debt, and we’ve shared a client horror story in a previous blog post that highlights the dangers of ignoring it: Technical Debt

TDPicture1

When you hear the words “technical debt”, it invokes a negative connotation. However, the judicious use of tech debt is a valuable addition to your product development process. Tech debt is analogous to financial debt. Companies can raise capital to grow their business by either issuing equity or issuing debt. Issuing equity means giving up a percentage of ownership in the company and dilutes current shareholder value. Issuing debt requires the payment of interest, but does not give up ownership or dilute shareholder value. Issuing debt is good, until you can’t service it. Once you have too much debt and cannot pay the interest, you are in trouble.

Tech debt operates in the same manner. Companies use tech debt to defer performing work on a product. As we develop our minimum viable product, we build a prototype, gather feedback from the market, and iterate. The parts of the product that didn’t meet the definition of minimum or the decisions/shortcuts made during development represent the tech debt that was taken on to get to the MVP. This is the debt that we must service in later iterations. Taking on tech debt early can pay big dividends by getting your product to market faster. However, like financial debt, you must service the interest. If you don’t, you will begin to see scalability and availability issues. At that point, refactoring the debt becomes more difficult and time critical. It begins to affect your customers’ experience.

TDPicture2

Many development teams have a hard time convincing leadership that technical debt is a worthy use of their time. Why spend time refactoring something that already “works” when you could use that time to build new features customers and markets are demanding now? The danger with this philosophy is that by the time technical debt manifests itself into a noticeable customer problem, it’s often too late to address it without a major undertaking. It’s akin to not having a disaster recovery plan when a major availability outage strikes. To get the business on-board, you must make the case using language business leaders understand – again this is often financial in nature. Be clear about the cost of such efforts and quantify the business value they will bring by calculating their ROI. Demonstrate the cost avoidance that is achieved by addressing critical debt sooner rather than later – calculate how much cost would be in the future if the debt is not addressed now. The best practice is to get leadership to agree and commit to a certain percentage of development time that can be allocated to addressing technical debt on an on-going basis. If they do, it’s important not to abuse this responsibility. Do not let engineers alone determine what technical debt should be paid down and at what rate – it must have true business value that is greater than or equal to spending that time on other activities.

TDPicture3

Additionally, be clear about how you define technical debt so time spent paying it down is not comingled with other activities. Generally, bugs in your code are not technical debt. Refactoring your code base to make it more scalable, however, would be. A good test is to ask if the path you chose was a conscious or unconscious decision. Meaning, if you decided to go in one direction knowing that you would later need to refactor. This is also analogous to financial debt; technical debt needs to be a conscious choice. You are making a specific decision to do or not to do something knowing that you will need to address it later. Bugs are found in sloppy code, and that is not tech debt, it is just bad code.

So how do you decide what tech debt should be addressed and how do you prioritize? If you have been tracking work with Agile storyboards and product backlogs, you should have an idea where to begin. Also, if you track your problems and incidents like we recommend, then this will show elements of tech debt that have begun to manifest themselves as scalability and availability concerns. We recommend a budget of 12-25% of your development efforts in servicing tech debt. Set a budget and begin paying down the debt. If you are working on less than the lower range, you are not spending enough effort. If you are spending over 25%, you are probably fixing issues that have already manifested themselves, and you are trying to catch up. Setting an appropriate budget and maintaining it over the course of your development efforts will pay down the interest and help prevent issues from arising.

Taking on technical debt to fund your product development efforts is an effective method to get your product to market quicker. But, just like financial debt, you need to take on an appropriate amount of tech debt that you can service by making the necessary interest and principle payments to reduce the outstanding balance. Failing to set an appropriate budget will result in a technical “bankruptcy” that will be much harder to dig yourself out of later.

Send to Kindle

Comments Off on Paying Down Your Technical Debt

The Biggest Problem with Big Data

How to Reduce Crime in Big Cities

I’m a huge Malcolm Gladwell fan.  Gladwell’s ability to convey complex concepts and virtually incomprehensible academic research in easily understood prose is second to none within his field of journalism.  A perfect example of his skill is on display in the Tipping Point, where Gladwell wrestles the topic of Complexity Theory (aka Chaos Theory) into submission, making it accessible to all of us.  In the Tipping Point, Gladwell also introduces us to The Broken Windows Theory.

The Broken Windows Theory gets its name from a 1982 The Atlantic Monthly article.  This article asked the reader to imagine a building with a few broken windows.  The authors claim that the existence of these windows invite vandals to break still more windows.  A continuous cycle of expanding vandalism ensues, with squatters moving in, nearby buildings getting vandalized, etc.  Subsequent authors expanded upon the theory, claiming that the presence of vandalism invites other crimes and that crime rates soar in communities where unhandled vandalism is present.  A corollary to the Broken Windows Theory is that cities can reduce crime rates by focusing law enforcement on petty crimes.  Several high profile examples seem to illustrate the power and correctness of this theory, such as New York Mayor Giuliani’s “Zero Tolerance Program”.  The program focused on vandalism, public drinking, public urination, and subway fare evasion.  Crime rates dropped over a 10 year period, corresponding with the initiation of the program.  Several other cities and other experiments showed similar effects.  Proof that the hypothesis underpinning the theory is correct.

Not So Fast…

Enter the self-described “Rogue Economist” Stephen Levitt and his co-author Stephen Dubner – both of Freakonomics fame.  While the two authors don’t deny that the Broken Windows theory may explain some drop in crime, they do cast significant doubt on the approach as the primary explanation for crime rates dropping.  Crime rates dropped nationally during the same 10 year period in which New York pursued its Zero Tolerance Program.  This national drop in crime occurred in cities that both practiced Broken Windows and those that did not.  Further, crime rate dropped irrespective of either an increase or decrease in police spending.  The explanation therefore, argue the authors, cannot primarily be Broken Windows.  The most likely explanation and most highly correlated variable is a reduction in a pool of potential criminals.   Roe v. Wade legalized abortion, and as a result there was a significant decrease in the number of unwanted children, a disproportionately high percentage of whom would grow up to be criminals.

Gladwell isn’t therefore incorrect in proffering Broken Windows as an explanation for reduction in crime.  But the explanation is not the best one available and as a result it holds residence somewhere between misleading (worst case) and incomplete (best case).

What Happened?

To be fair, it’s hard to hold Gladwell accountable for this oversight.  Gladwell is not a scientist and therefore not trained in how to scientifically evaluate the research he reported.  Furthermore, his is an oft repeated mistake even among highly trained researchers.  And what exactly is that mistake?  The mistake made here is illustrated by the difference in approach between the Broken Windows researchers and the Freakonomics authors.  The Broken Windows researchers started with something like the following question “Does the presence of vandalism invite additional vandalism and escalating crime?”  Levitt and Dubner first asked the question “What variables appear to explain the rate of crime?”

Broken Windows started with a question focused on deductive analysis.  Deduction starts with a hypothesis – “Evidence of vandalism and/or other petty crimes invites similar and more egregious crimes”.   The process continues to attempt to confirm or disprove the hypothesis.  Deduction starts with a broad and abstract view of the data – a generalization or hypothesis as to relationships – and attempts to move to show specific relationships between data elements.  The Broken Windows folks started with a hypothesis, developed a series of experiments to test the hypothesis and then ultimately evaluated time series data in cities with various Broken Windows approaches to policing.  What they lacked was a broad question that may have developed a range of options indicating possible causes.

The Freakonomics authors started with an inductive question.  Induction is the process of moving from specific observations about data into generalizations.  These generalizations are often in the form of hypothesis or models as to how data interacts.  Induction helps to inform what questions should be asked of the data.  Induction is the asking of “What change in what independent variables seem to correspond with a resulting change in some dependent variable?”  Whereas deduction works from independent variable to dependent variable, induction attempts to work backwards from dependent variable to identify independent variable relationships.

So What?

The jump to deduction, without forming the right questions and hypotheses through induction, is the biggest mistake we see in developing Big Data programs and implementing Big Data solutions.   We all approach problems with unique experiences and unique biases.  The combination of these often cause us to race to hypotheses and want to test them.  The issue here is two-fold. The best case is that we develop an incomplete (and as a result partially or mostly incorrect) answer similar to that of The Broken Windows researchers.  The worst case is that we suffer what statisticians call a Type 1 error – confirming an incorrect answer.  The probability of type 1 errors increases when we don’t look for alternative or better answers for outcomes within our data sets.  Induction helps to uncover those alternative or supporting explanations.  Exploring the data to discover potential relationships helps us to ask the right questions and form better hypotheses and better models.  Skipping induction makes it highly probable that we will get an incorrect, misleading or substandard answer.

But it is not enough to simply ensure that we practice both induction and deduction.  We must also recognize that the solutions that support induction are different from those that support deduction.  Further, we must understand that the two processes while complimentary can actually interfere with each other when performed on the same system.  Induction is necessarily a very broad and as a result slow and tedious process.  Deduction, on the other hand, needs significantly less data and “prefers” to be faster in implementation.  Inductive systems are best supported by solutions that impose very few relations or structure on the data we observe.  Systems that support deduction, in order to allow for faster response times, impose increased structure relative to inductive systems.  While the two phases of discovery (Induction and Deduction) support each other, their differences suggest that they should be performed on solutions purpose built to their specific needs.

Similarly, not everyone is equally qualified to perform both induction and deduction.  Our experience is that the folks who tend to be good at determining how to prove relationships between variables are often not as good at identifying patterns and vice versa.

These two observations, that the systems that support induction and deduction should be separated and that the people performing these tasks may need to be different, have ramifications to how we develop our analytics systems and organize our Big Data teams.  We’ll discuss these ramifications and more in our next post, “10 Anti-Patterns within Big Data”.

Send to Kindle

Comments Off on The Biggest Problem with Big Data

Data Driven Product Development

When we bring up the idea of data driven product development, almost every product manager or engineering director will quickly say that of course they do this. Probing further we usually hear about A/B testing, but of course they only do it when they aren’t sure what the customer wants. To all of this we say “hogwash!” Testing one landing page against another is a good idea, but it’s a far cry from true data driven product development. The bottom line is that if your agile teams don’t have business metrics (e.g. increase the conversion rate by 5% or decrease the shopping cart abandonment rate by 2%) that they measure after every release to see if they are coming closer to achieving, then you are not data driven. Most likely you are building products based on the HiPPO. The Highest Paid Person’s Opinion. In case that person isn’t always around, there’s this service.
hippo

Let’s break this down. Traditional agile processes have product owners seeking input from multiple sources (customers, stakeholders, marketplace). All these diverse inputs are considered and eventually they are combined into a product vision that helps prioritize the backlog. In the most advanced agile processes, the backlog is ranked based on effort and expected benefit. This is almost never done. And if it is done, the effort and benefit are never validated afterwards to see how accurate the estimates where. We even have an agile measurement for doing this for the effort, velocity. Most agile teams don’t bother with the velocity measurement and have no hope of measuring the expected benefit.

Would we approach anything else like this? Would we find it acceptable if other disciplines worked this way? What if doctors practiced medicine with input from patients and pharmaceutical companies but without monitoring how their actions impacted the patients’ health? We would probably think that they got off to a good start, understanding the symptoms from the patient and the possible benefits of certain drugs from the pharmaceutical representatives. But, they failed to take into account the most important part of the process. How their treatment was actually performing against their ultimate goal, the improved health of the patient.
doctor

We argue that you’re wasting time and effort doing anything that isn’t measured by a success factor. Why code a single line if you don’t have a hypothesis that this feature, enhancement, story, etc. will benefit a customer and in turn provide greater revenue for the business. Without this feedback mechanism, your effort would be better spent taking out the trash, watering the plants, and cleaning up the office. Here’s what we know about most people’s gut instinct with regards to products, they stink. Read, Lund’s very first Usability Maxims “Know thy user, and YOU are not thy user.” What does that mean? It means that you know more about your product, industry, service, etc. than any user ever will. Your perspective is skewed because you know too much. Real users don’t know where everything is on the page, they have to search for it. Real users lie when you ask them what they want. Real users have bosses, spouses, and children distracting them while they use your product. As Henry Ford supposedly said, “If I had asked customers what they wanted, they would have said a faster horse.” You can’t trust yourself and you can’t trust users. Trust the data. Form a hypothesis “If I add a checkout button here, shoppers will convert more.” Then add the button and watch the data.
checkout_button

This is not A/B or even multivariate testing. It is much more than that. It’s changing the way to develop products and services. Remember, Agile is a business process, not just a development methodology. Nothing gets shipped without an expected benefit that is measured. If you didn’t achieve the business benefit that was expected you either try again next sprint or you deprecate the code and start over. Why leave code in a product that isn’t achieving business results? It only serves to bloat the code base and make maintenance harder. Data driven product development means we don’t celebrate shipping code. That’s suppose to happen every two weeks or even daily if we’re practicing continuous delivery. You haven’t achieved anything pushing code to production. When the customer interacts with that code and a business result is achieved then we celebrate.

Data driven product development is how you explain the value of what the product development team is delivering. It’s also how you ensure your agile teams are empowered to make meaningful impact for the business. When the team is measured against real business KPIs that directly drive revenue, it raises the bar on product development. Don’t get caught in the trap of measuring success with feature releases or sprint completions. The real measurement of success is when the business metric shows the predicted improvement and the agile team becomes better informed of what the customers really want and what makes the business better.

Send to Kindle

Comments Off on Data Driven Product Development

Guest Post: Three Mistakes in Scaling Non-Relational Databases

The following guest post comes from another long time friend and business associate Chris LaLonde.  Chris was with us through our eBay/PayPal journey and joined us again at both Quigo and Bullhorn.  After Bullhorn, Chris and co-founders Erik Beebe and Kenny Gorman (also eBay and Quigo alums) started Object Rocket to solve what they saw as a serious deficiency in cloud infrastructure – the database.  Specifically they created the first high speed Mongo instance offered as a service.  Without further ado, here’s Chris’ post – thanks Chris!

Three Mistakes in Scaling Non-Relational Databases

In the last few years I’ve been lucky enough to work with a number of high profile customers who use and abuse non-relational databases (mostly MongoDB) and I’ve seen the same problems repeated frequently enough to identify them as patterns. I thought it might be helpful to highlight those issues at a high level and talk about some of the solutions I’ve seen be successful.

Scale Direction:

At first everyone tries to scale things up instead of out.  Sadly that almost always stops working at some point. So generally speaking you have two alternatives:

  • Split your database – yes it’s a pain but everyone gets there eventually. You probably already know some of the data you can split out; users, billing, etc.Starting early will make your life much simpler and your application more scalable down the road. However the number of people that don’t consider that they’ll ever need a 2nd or a 3rd database is frightening. Oh and one other thing, put your analytics some place else; the fewer things placing load on your production database from the beginning the better. Copying data off of a secondary is cheap.  
  • Scale out – Can you offload heavy reads or writes ?  Most non-relational databases ’s have horizontal scaling functionality built in (e.g. sharding in MongoDB).  Don’t let all the negative articles fool you, these solutions do work. However you’ll want an expert on your side to help in picking the right variable or key to scale against ( e.g shard keys in MongoDB ). Seek advice early as these decisions will have a huge impact on future scalability.

Pick the right tool:

Non-Relational databases are very different by design and most “suffer” from some of the “flaws” of the eventually consistent model.  My grandpa always said “use the right tool for the right job”  in this case that means if you’re writing a application or product that requires your data to be highly consistent you probably shouldn’t use a non-relational database. You can make most modern non-relational databases more consistent with configuration and management changes but almost all lack any form of ACID compliance   Luckily in the modern world databases are cheap; pick  several  and get someone to manage them for you, always use the right tool for the right job. When you need consistency  use an ACID  compliant database,   when you need raw speed  use an in-memory data store,  and when you need flexibility use a non-relational database .

Write and choose good code:

Unfortunately not all database drivers are created equal. While I understand that you should write code in the language you’re strongest in sometimes it pays to be flexible. There’s a reason why Facebook writes code in PHP, transpiles it to C++, and then compiles it into a binary for production. In the case of your database the driver is _the_ interface to your database, so if the interface is unreliable or difficult to deal with or not updated frequently, you’re going to have a lot of bad days.  If the interface is stable, well documented and is frequently updated,  you’ll have a lot time to work on features instead of fixes.  Make sure to take a look at the communities around each driver, look at the number of bugs reported and how quickly those bugs are being fixed.  Another thing about connecting to your database: please remember nothing is perfect so write code as if it’s going to fail. At some point in time some component, the network, the NIC, load balancer, a switch or the database itself crashing, is going to cause your application to _not_ be able to talk to your database. I can’t tell you how many times I’ve talked to or heard of a developer assuming that “the database is always up, it’s not my  responsibility” and that’s exactly the wrong assumption. Until the application knows that the data is written assume that it isn’t. Always assume that there’s going to be a problem until you get an “I’ve written the data” confirmation from the database, to assume otherwise is asking for data loss.   

These are just a few quick pointers to help guide folks in the right direction. As always the right answer is to get advice from an actual expert about your specific situation.

Send to Kindle

Comments Off on Guest Post: Three Mistakes in Scaling Non-Relational Databases