AKF Partners

Abbott, Keeven & Fisher Partners Technology Consultants

Growth Blog

Scalability and Technology Consulting Advice for SaaS and Technology Companies

Crossing the People Chasm Within Your Organization

June 6, 2018  |  Posted By: Pete Ferguson

Crossing the People Chasm Within Your Organization

In Geoffrey Moore’s book “Crossing the Chasm,” he argues there is a chasm between the early adopters of a product (the technology enthusiasts and visionaries) and the early majority (the pragmatists).  He illustrates well the differences in each of their self-interests and their very differing needs for security verses willing to take on risk.

People’s talents, attitudes, and skills similarly must cross the rapid growth chasm within your organization if your company is to remain viable and competitive.

As AKF Partners assess fast-growing companies in technical due diligence engagements, we often observe Moore’s chasm principle in play with an organization’s people and the ability for legacy employees to make the jump to the “next big thing” and keep up with explosive growth.  Or conversely, we have also seen the “why change” attitude greatly hinder and blindside the scalability of a fledgling company. 

The Chasm From Startup to Established Company

Young, well-funded startups have a lot of flare that Millenials and corporate escapees love – free food, eccentric workplaces, schedule flexibility, and very little bureaucracy, policy, or procedure.  This works very well for small, talented teams during a very scrappy period of rapid growth where the common goals of the organization are well-known and lived and breathed on a daily basis and personal and group conversations with the CEO and CTO occur regularly, sometimes daily.  Often there is minimal rigor around Agile rituals – and during periods of startup and rapid growth, their likely is very little time to formalize processes and the outcomes - 100-200% customer acquisition and profits - can be mistaken as a “full steam ahead” desire to not make any changes.

Recently we worked with a company that was a decade old and was fairly large compared to many startups we see in our technical due diligences for investors.  The founders has seen the need to bring in experienced and open-minded senior leadership and it was inspiring to see the the vigor, enthusiasm, growth, and speed of a young fledgling company, but with defined metrics and compliance to set ground rules.

There was not an observed bureaucracy.  There was clear direction.

As unfortunately this is more of an outlier than it should be, I was impressed and wanted to know what set them apart from other more mature companies I have worked with or worked for and I found several differentiators.

My observations of companies who bridge the organizational chasm of growth:

  • Successful companies do not confine themselves to one segment of the market - they are thoughtful and disciplined when taking on new segments.  They follow Moore’s observations well and saturate one small subset of a new market with marketing, sales, customer service and provide steep discounts to get a foothold.  Once established, they expand horizontally within the subset and rinse and repeat until they are the market leader.  This allows them to fail forward fast through constant innovation and iterations.  This requires the people in their organizations to have an Agile mindset and not rest on their laurels.
  • Successful companies have teams with a good diversity of opinions, but are unified in how they execute on their plan.  The senior leadership are very successful in constantly communicating the vision of the company through desired outcomes and allow teams the autonomy to get there however they can.  Because the focus is on the outcome rather than the process, there is very little bureaucratic red tape, trust is very high, and teams are not afraid to fail fast, learn, and reiterate more successfully.
  • Successful companies keep things simple and team members are onboard with the company philosophy and understand how their role fits into the larger scheme of things.  Google pioneered OKRs - “Objectives and Key Results” - as how they measure success within their organization.  OKRs allow for nested outcomes to be defined, aligning teams with the broader company goal and successful companies have the common thread of how success is defined through outcomes from the top to the bottom of the org chart.

While some of the companies highlighted in Moore’s 1999 version of the book eventually could not cross the chasm with newer products (Blackberry, 3Comm/Palm Pilot), the principles he outlined are common to companies who are enduring today (Apple desktop to MP3 player to mobile phone/tablet to watch to … [insert Apple’s next category DOMINATOR here]). 

When looking at products, according to Moore, the marketer should focus on one group of customers at a time, using each group as a base for marketing to the next group to create a bandwagon effect with momentum that spreads to others in the next marketing segment.  The focus on each segment is intense and an “all hands on deck” blitz approach to include marketing, software engineering, product, customer service, sales, and others.

Similarly when it comes to what is going on inside of organizations, it is important to ensure your people cross the chasm of change required for your products to remain viable and enduring.  Successful companies know that either their people have to make the transition to new skill sets/mindsets or they will need to be transitioned out of the company.  Either way it’s important to inject new people with the experience needed into the organization.  At AKF, we refer to this as Seed, Feed, and Weed.

What we see in successful companies is an early focus on standardization but with freedom for exploration.  Allowing each team to use their own communication devices (Slack, Hive, Spark) and Agile methods is something that does not scale well.  But seeking input from team members and having each team follow a standard software development cycle with similar Agile methodology does scale well and allows teams to interoperate without administrative and communicative friction.

Conclusions

Successful companies endure because individuals are allowed autonomy to reach shared outcomes.  Tools are provided to help individuals succeed, fail forward fast, learn and share their learning, automate mundane tasks, and are not a bureaucratic bottleneck.  To remain successful, companies must constantly focus on how to take their team members with them through the chasms of growth into new and emerging markets by continually upgrading their skills and contribution to the company desired outcomes.

Measuring success is not just in the stock price (many failing companies - i.e. Palm Pilot - had a good stock price while the internal decay had been going on for several quarters), it must be a thorough measurement of all aspects of the company’s technical abilities - architecture, process, organization, and security. 

RELATED CONTENT

Technical Due Diligence Checklists

Do You Know What is Negatively Affecting Your Engineer’s Productivity? Shouldn’t You?

SaaS Migration Challenges

The No Surprises Rule

We’d love your feedback and an opportunity to see how AKF Partners can help your organization maximize outcomes. Contact us now.

Permalink

The Many Unfortunate Meanings of Cloud

June 5, 2018  |  Posted By: Marty Abbott

Enough is enough already – stop using the term “Cloud”.  Somewhere during the last 20 years, the term “Cloud” started to mean to product offerings what Sriracha and Tabasco mean to food:  everything’s better if we can just find a way to incorporate it.  Just as Sriracha makes barely edible items palatable and further enhances the flavor of delicacies, so evidently does “Cloud” confuse the unsophisticated buyer or investor and enhance the value for more sophisticated buyers and investors. That’s a nice analogy, but it’s also bullshit.

The term cloud just means too many things – some of which are shown below:


various meanings of the word cloud and the confusion it causes


The real world of cloud offerings can be roughly separated into two groups:

  1. Pretenders This group of companies know, at some level, that they haven’t turned the corner and started truly offering “services”.  They support heavy customization, are addicted to maintenance revenue streams, and offer low levels of tenancy.  These companies simply can’t escape the sins of their past.  Instead, they slap the term “cloud” on their product in the hopes of being seen as being relevant.  At worst, it’s an outright lie.  At best, it’s slightly misleading relative to the intended meaning of the term.  Unless, of course, anything that’s accessible through a browser is “Cloud”.  These companies should stop using the term because deep down, when they are alone with a glass of bourbon, they know they aren’t a “cloud company”.
  2. Contenders This group of companies either blazed a path for the move to services offerings (think rentable instead of purchasable) products or quickly recognized the services revolution; they were “born cloud” or are truly embracing the cloud model.  They prefer configuration over customization, and stick to the notion of a small number of releases (countable on one hand) in production across their entire customer base.  They embrace physical and logical multi-tenancy both to increase margins and decrease customer costs.  These are the companies that pay the tax for the term “cloud” – a tax that funds the welfare checks for the “pretenders”.

The graph below plots Cloud Pretenders, Contenders and Not Cloud products along the axes of gross margin and operating margin:

Various models of cloud and on-premise plotted against cost of goods sold and operating expense

Consider one type of “Pretender” – the case of a company hosting a single tenant, client customized software release for each of their many customers.  This is an ASP (Application Service Provider) model.  But there is a reason the provider of the service won’t call themselves an ASP:  The margins of an ASP stinks relative to that of a true “SaaS” company.  The term ASP is old and antiquated.  The fix?  Just pour a bit of “cloud sauce” on it and everything will be fine.
Contrast the above case with that of a “Contender”:  physical and logical multi-tenancy at every layer of the architecture and \ a small number of production releases (one to three) across the entire customer base.  Both operating and gross margins increase as maintenance costs and hosting costs decrease when allocated across the entire customer base. 
Confused?  So are we.  Here are a few key takeaways:

  1. “Cloud” should mean more than just being accessed through the internet via a browser.  Unfortunately, it no longer does as anyone who can figure out how to replace their clients with a browser and host their product will call themselves a “Cloud” provider.
  2. Contenders should stop using the term “Cloud” because it invites comparison with companies to which they are clearly superior:  Superior in terms of margins, market needs, architecture and strategic advantage.
  3. Pretenders should stop using the term “Cloud” for both ethical reasons and reasons related to survivability.  Ethically the term is somewhere between an outright lie and an ethically contentious quibble or half-truth.  Survivability comes into play when the company believes its own lie and stops seeing a reason to change to become more competitive.

AKF Partners helps companies create “Cloud” (XaaS) transition plans to transform their business.  We help with financial models, product approach, market fit, product and technology architecture, business strategy and help companies ensure they organize properly to maximize their opportunity in XaaS.

RELATED CONTENT

The Scale Cube - Architecting for Scale

Microservices for Breadth, Libraries for Depth

SaaS Migration Challenges

When Should You Split Services?

 

Permalink

The No Surprises Rule

May 23, 2018  |  Posted By: Geoffrey Weber

No Surprises

We blogged recently about how to write precisely and concisely, highlighting how important it was to learn the “Three Sentence Rule” early in our careers so that when we communicated with other executives, we communicated with extreme brevity and clarity.  We might think of this as the “what” of executive communication.  Today, we’d like to quickly describe a few ground rules with respect to the “when” and “how” of communicating as executives.

10 or 15 years ago, a fad swept through technology, executives everywhere were writing “How to Communicate with Me” articles for their teams and co-workers.  In the most positive light, these were serious attempts by quirky executives to help their teams learn to conform to their own bizarre communications requirements.  We would argue that a modern technology executive with a reasonably non-quirky personality need not pen such narcissistic claptrap. Communications is so basic, we should not over-think the process.

In today’s world, we have a variety of communications channels available: face-to-face, email, text message, internal communications tools (e.g. Slack) and the good old telephone.  When an unexpected issue occurs on our watch, our primary duty is to inform our superior, by any means necessary as quickly as possible.  

Whether we work in a large corporate environment with thousands of employees or in a small team with 10 people, immediate communications are an absolute requirement.  If we fail to do so, our superiors may hear of the unexpected news before we have a chance to tell them.  Think of a major system outage…  while we work to determine a root cause, the VP of Marketing sends a quick text to our boss (let’s say the CEO in this case.). Now the CEO is in possession of bad news about something we are responsible for.  Our phone will ring immediately, and we’ll be on our back feet explaining why we hadn’t taken a moment to call.

A worse example might be a system outage that we, as CTO, were not aware of, and the very same VP of Marketing texts the CEO again.  Now when the phone rings, we are surprised, just as the CEO was surprised by the VP of Marketing.  Our team has failed at a very fundamental level.

There’s an informal rule that states: No Surprises.  The corollary is, communicate as early as possible and as often as possible.  A site outage demands an immediate upward missive with frequent updates.  The leaders who work under us must also live by this rule.  We can never be left out in the cold when it comes to significant information.  Furthermore, we are solely accountable for the communication of negative news up to our bosses.

The idea of communicating early and communicating often has a number of uses beyond crisis communications.  In the early days of eBay, Marty Abbott (managing partner of AKF Partners) set 4 objectives for the site operations teams: Availability (99.9%), Scalability, Cost and Operational Excellence.  Every member of the operations teams knew the current availability as it was communicated nearly continuously.  The other 3 objectives were communicated with equal frequency.  It would be a significant surprise if a colleague was working on a project that was not associated with Availability, Scalability, Cost or Operational excellence.  A few years later, we borrowed Marty’s objectives at Shutterfly and simplified: Up, Fast, Cheap and Easy. All 50 operations team members knew those goals and we repeated them like a mantra.

The quickest path to failure as technology executives is non-communication, the opposite of communicating clearly and frequently.  Worse, those executives that don’t stay ahead of the surprises technology throws at us every day will find themselves working in a different industry.

To summarize how to communicate:

When: early and often

How: any means available

What: 3 sentences.

We don’t need to write 5 page essays on how to communicate unless we are quite peculiar.  

 

Permalink

4 Landmines When Using Serverless Architecture

May 20, 2018  |  Posted By: Dave Berardi

Physical Bare Metal, Virtualization, Cloud Compute, Containers, and now Serverless in your SaaS? We are starting to hear more and more about Serverless computing. Sometimes you will hear it called function as a service. In this next iteration of Infrastructure-as-a-Service, users can execute a task or function without having to provision a server, virtual machine, or any other underlying resource. The word Serverless is a misnomer as provisioning the underlying resources are abstracted away from the user, but they still exist underneath the covers. It’s just that Amazon, Microsoft, and Google manage it for you with their code. AWS Lambda, Azure Functions, and Google Cloud Functions are becoming more common in the architecture of a SaaS product. As technology leaders responsible for architectural decisions for scale and availability, we must understand its pros and cons and take the right actions to apply it.

Several advantages of serverless computing include:

• Software engineers can deploy and run code without having to manage any underlying infrastructure effectively creating a No-Ops environment.
• Auto-scaling is easier and requires less orchestration as compared to a containerized environment running services.
• True On-Demand capacity – no orphaned containers or other resources that might be idling.
• They are cost effective IF we are running the right size workloads.

Disadvantages and potential landmines to watch out for:

• Landmine #1 - No control over the execution environment meaning you are unable to isolate your operational environment. Compute and networking resources are virtualized with no visibility into either of them. Availability is the hands of our cloud provider and uptime is not guaranteed.
• Landmine #2 - SLAs cannot guarantee uptime. Start-up time can take a second causing latency that might not be acceptable.
• Landmine #3 - It’s going to become much easier for engineers to create code, host it rapidly, and forget about it leading to unnecessary compute and additional attack vectors creating a security risk.
• Landmine #4 - You will create vendor lock-in with your cloud provider as you set up your event driven functions to trigger from other AWS or Azure Services or your own services running on compute instances.

AKF is often asked about our position on serverless computing. There are 4 key rules considering the advantages and the landmines that we outlined:

1) Gradually introduce it into your architecture and use it for the right use cases
2) Establish architectural principles that guide its use in your organization that will minimize availability impact for Serverless. You will tie your availability to the FaaS in your cloud provider.
3) Watch out for a false sense of security among your engineering teams. Understand how serverless works before you use it and so you can monitor it for performance and availability.
4) Manage how and what it’s used for - monitor it (eg. AWS Cloud Watch) to avoid neglect and misuse along with cost inefficiencies.

AWS, Azure, or Google Cloud Serverless platforms could provide an affective computing abstraction in your architecture if it’s used for the right use cases, good monitoring is in place, and architectural principles are established.

AKF Partners has helped many companies create highly available and scalable systems that are designed to be monitored. Contact us for a free consultation.

Permalink

Do you know what is negatively affecting your engineers' productivity? Shouldn't you?

May 13, 2018  |  Posted By: Dave Swenson
The Impact of Meetings on Engineers

Meetings, meetings, meetings. How many times have we said that? Visiting dozens and dozens of clients per year, we see a number of customers whose culture seems to be extremely meeting-centric, as ifthe only way any decision can be made or information communicated is via a meeting.

Paul Graham, co-founder of Y Combinator and Hacker News, wrote back in 2009 of the impact of meetings upon engineers. Coding typically is best performed in multi-hour solid chunks of time, with no interruptions. It takes awhile to get into the ‘zone’, and any context switch will disrupt that zone, in Graham’s words “like throwing an exception”. He even suggests that the impact of a meeting goes far beyond the actual time spent at the meeting, that simply knowing you are going to be disrupted prevents you from reaching that zone - something like when you know you have to get up early in the morning say for a flight, you toss and turn all night long, unable to get into that deep REM state.

Many companies recognize the disruptive impact of meetings, and put rules stating ‘no meeting’ afternoons, or perhaps a full day in place. Pinterest’s recent blog post recounts their somewhat extreme move along these lines - putting a three-day no meeting block in place for engineers - engineers were not to be invited to meetings 3 days a week. The blog post is worth a read, covering some of the challenges and objections of eliminating engineer-attended meetings 3 days a week, but overall touts the success of the approach citing a 92% positive response rate to a survey question asking “Are you more productive…?”.

Really, Pinterest? Really??

I’m all for the reduction of meetings, though I do wonder if three days a week with no meetings is a bit overboard. What I’m disappointed by is that Pinterest has no (or at least did not cite any) quantifiable evidence that their engineers were actually more productive. Now, I’m not suggesting they should have a before and after count of, say, lines of code. But, assuming that Pinterest is at least something of an Agile shop, did they not see an increase in velocity, in story points being delivered?

In our visits to our clients, just as we see a wide variation in the dependency upon meetings to get anything done, we see some clients living and breathing by their team-by-team velocity numbers, while other clients totally disregard that key productivity metric. To you technology leaders out there, how better can you measure your teams’ efficiencies?

And, even more so, do you know why your teams’ current velocity is what it is? Are you actively seeking out the context switches, delays, and disruptions that are throwing exceptions in your engineers’ brains?

We’ve been pulled in many times to analyze a team’s efficiency (or lack thereof), only to find out that, yes, meetings are a negative influence, but beyond that:

  • Interviews (worthwhile, but hiring should be a highly optimized process)
  • Environmental issues (are you measuring your dev environments’ availability?)
  • Waiting for a pull request approvals (do you have an SLA around this?)
  • Long build times that are due to weak hardware or poor dependency management (compare the cost of faster build machines or code optimization of your builds vs. the value of wait or down-time of your engineers?)
  • Waiting to receive clarification from a product owner on a feature (again, do you have an SLA around this? Is your team colocated, so a question can be asked/answered quickly?)
  • Other surprising items ranging from having to feed a parking meter to miserable network latency for those remote engineers.

Yet again, the mantra of “If you can’t measure it, you can’t improve it” applies. We view metrics such as actual hours spent coding vs. expected hours spent coding as not only a measurement of your teams’ productivity, but as a management effectiveness gauge. Are you as a manager effectively protecting your engineers?

Are you able to see the impact of ‘no-meeting’ days, or the factors today that negatively affect your developers’ coding efficiencies?

If not, AKF can more than help. We have run productivity surveys at many clients, and always enjoy the look on technology leaders’ faces when we present the results. Let us help you.

Permalink

Three Reasons Your Software Engineers May Not Be Successful

May 10, 2018  |  Posted By: Pete Ferguson

Three Reasons Your Software Engineers May Not Be Successful

At AKF Partners, we have the unique opportunity to see trends among startups and well-established companies in the dozens of technical due diligence and more in-depth technology assessments we regularly perform, in addition to filling interim leadership roles within organizations.  Because we often talk with a variety of folks from the CEO, investors, business leadership, and technical talent, we get a unique top-to-bottom perspective of an organization.

Three common observations

  • People mostly identify with their job title, not the service they perform.
  • Software Engineers can be siloed in their own code vs. contributing to the greater outcome.
  • CEO’s vision vs. frontline perception of things as they really are.

Job Titles Vs. Services

The programmer who identifies herself as “a search engineer” is likely not going to be as engaged as her counterpart who describes herself as someone who “helps improve our search platform for our customers.”

Shifting focus from a job title to a desired outcome is a best practice from top organizations.  We like to describe this as separating nouns and verbs – “I am a software engineer” focuses on the noun without an action: software engineer instead of “I simplify search” where the focus is on verb of the desired outcome: simplify.  It may seem minor or trivial, but this shift can be a contributing impact on how team members understand their contribution to your overall organization. 

Removing this barrier to the customer puts team members on the front line of accountability to customer needs – and hopefully also the vision and purpose of the company at large.  To instill a customer experience, outcome based approach often requires a reworking of product teams given our experience with successful companies.  Creating a diverse product team (containing members of the Architecture, Product, QA and Service teams for example) that owns the outcomes of what they produce promotes:

  • Motivation
  • Quality
  • Creating products customers love

If you have had experience in a Ford vehicle with the first version of Sync (bluetooth connectivity and onscreen menus) – then you are well aware of the frustration of scrolling through three layers of menus to select “bluetooth audio” ([Menu] -> [OK] -> [OK] -> [Down Arrow]-> [OK] -> [Down Arrow] -> [OK]) each time you get into your car.  The novelty of wireless streaming was a key differentiator when Sync first was introduced – but is now table stakes in the auto industry – and quickly wears off when having to navigate the confusing UI likely designed by product engineers each focused on a specific task but void of designing for a great user experience.  What was missing is someone with the vision and job description: “I design wireless streaming to be seamless and awesome - like a button that says “Bluetooth Audio!!!”

Hire for – and encourage – people who believe and practice “my real job is to make things simple for our customers.”

Avoiding Siloed Approach

Creating great products requires engineers to look outside of their current project specific tasks and focus on creating great customer experiences.  Moving from reactively responding to customer reported problems to proactively identifying issues with service delivery in real time goes well beyond just writing software.  It moves to creating solutions.

Long gone are the “fire and forget” days of writing software, burning to a CD and pushing off tech debt until the next version.  To Millennials, this Waterfall approach is foreign, but unfortunately we still see this mentality engrained in many company cultures.

Today it is all about services.  A release is one of many in a very long evolution of continual improvement and progression.  There isn’t Facebook V1 to be followed by V2 … it is a continual rolling out of upgrades and bug fixes that are done in the background with minimum to no downtime.  Engineers can’t afford to be laggard in their approach to continual evolution, addressing tech debt, and contributing to internal libraries for the greater good.

Ensure your technical team understands and is very closely connected to the evolving customer experience and have skin in the game.  Among your customers, there likely is very little patience with “wait until our next release.”  They expect immediately resolution or they will start shopping the competition.

Translating the Vision of the CEO to the Front Lines

During our our more in-depth technology review engagements we interview many people from different layers of management and different functions within the organization.  This gives us a unique opportunity to see how the vision of the CEO migrates down through layers of management to the front-line programmers who are responsible for translating the vision into reality.

Usually - although not always - the larger the company, the larger the divide between what is being promised to investors/Wall Street and what is understood as the company vision by those who are actually doing the work.  Best practices at larger companies include regular all-hands where the CEO and other leaders share their vision and are held accountable to deliverables and leadership checks that the vision is conveyed in product roadmaps and daily stand up meetings.  When incentive plans focus directly on how well a team and individual understand and produce products to accomplish the company vision, communication gaps close considerably.

Creating and sustaining successful teams requires a diverse mix of individuals with a service mindset.  This is why we stress that Product Teams need to be all inclusive of multiple functions.  Architecture, Product, Service, QA, Customer Service, Sales and others need to be included in stand up meetings and take ownership in the outcome of the product. 

The Dev Team shouldn’t be the garbage disposal for what Sales has promised in the most recent contract or what other teams have ideated without giving much thought to how it will actually be implemented. 

When your team understands the vision of the company - and how customers are interacting with the services of your company - they are in a much better position to implement it into reality.

As a CTO or CIO, it is your responsibility to ensure what is promised to Wall Street, private investors, and customers is translated correctly into the services you ultimately create, improve, and publish.

Conclusions

As we look at new start-ups facing explosive 100-200% year-over-year growth, our question is always “how will the current laser focus vision and culture scale?”  Standardization, good Agile practices, understanding technical debt, and creating a scalable on-boarding and mentoring process all lend to best answers to this question.

When your development teams are each appropriately sized, include good representation of functional groups, each team member identifies with verbs vs. nouns (“I improve search” vs. “I’m a software engineer”), and understand how their efforts tie into company success, your opportunities for success, scalability, and adaptability are maximized.

RELATED CONTENT

Do You Know What is Negatively Affecting Your Engineers’ Productivity? Shouldn’t You?

Enabling Time to Market (TTM) With Contributor Model Teams

—-

Experiencing growing or scaling pains?  AKF is here to help!  We are an industry expert in technology scalability, due diligence, and helping to fill leadership gaps with interim CIO/CTO and other positions in addition to helping you in your search for technical leaders.  Put our 200+ years of combined experience to work for you today!

Permalink

Enabling TTM With Contributor Model Teams

May 6, 2018  |  Posted By: Dave Berardi

Enabling TTM With Contributor Model Teams

We often speak about the benefits of aligning agile teams with the system’s architecture.  As Conway’s Law describes, product/solution architectures and organizations cannot be developed in isolation.  (See https://akfpartners.com/growth-blog/conways-law) Agile autonomous teams are able to act more efficiently, with faster time to market (TTM).  Ideally, each team should be able to behave like a startup with the skills and tools needed to iterate until they reach the desired outcome.

Many of our clients are under pressure to achieve both effective TTM and reduce the risk of redundant services that produce the same results. During due diligence, we will sometimes discover redundant services that individual teams develop within their own silo for a TTM benefit.  Rather than competing with priorities and waiting for a shared service team to deliver code, the team will build their own flavor of a common service to get to market faster.

Instead, we recommend a shared service team own common services. In this type of team alignment, the team has a shared service or feature on which other autonomous teams depend. For example, many teams within a product may require email delivery as a feature.  Asking each team to develop and operate its own email capability would be wasteful, resulting in engineers designing redundant functionality leading to cost inefficiencies and unneeded complexity.  Rather than wasting time on duplicative services, we recommend that organizations create a team that would focus on email and be used by other teams.

Teams make requests in the form of stories for product enhancements that are deposited in the shared services team’s backlog. (email in this case) To mitigate the risk of having each of these requesting teams waiting for requests to be fulfilled by the shared services team, we suggest thinking of the shared services as an open source project or as some call it – the contributor model.

Open sourcing our solution (at least internally) doesn’t mean opening up the email code base to all engineers and letting them have at it.  It does mean mechanisms should be established to help control the quality and design for the business. An open source project often has its own repo and typically only allows trusted engineers, called Committers, to commit. Committers have Contribution Standards defined by the project owning team. In our email example, the team should designate trusted and experienced engineers from other Agile teams that can code and commit to the email repo. Engineers on the email team can be focused on making sure new functionality aligns with architectural and design principles that have been established. Code reviews are conducting before its accepted. Allowing for outside contribution will help to mitigate the potential bottleneck such a team could create.

Now that the development of email has been spread out across contributors on different teams, who really owns it?

Remember, ownership by many is ownership by none.  In our example, the email team ultimately owns the services and code base. As other developers commit new code to the repo, the email team should conduct code, design, and architectural reviews and ultimately deployments and operations.  They should also confirm that the contributions align with the strategic direction of the email mission.  Whatever mechanisms are put in place, teams that adopt a contributor model should be a gas pedal and not a brake for TTM.

If your organization needs help with building an Agile organization that can innovate and achieve competitive TTM, we would love to partner with you. Contact us for a free consultation.

Permalink

The Problem with Non-Functional Requirements

May 4, 2018  |  Posted By: Marty Abbott

NonFunctional Requirements AKF Scale Cube
Image Credit

Many of our clients use the term “Non-Functional Requirements” to group into a basket those portions of their solution that don’t easily fit into method or function-based execution of market needs.  Examples of non-functional requirements often include things like “Availability”, “Scalability”, “Response Time”, “Data Sovereignty” (as codified within requirements such as the GDPR), etc.  Very often, these “NFRs” are relegated to second class citizens within the development lifecycle and, if lucky, end up as a technical debt item to be worked on later.  More often than not, they are just forgotten until major disaster strikes.  This happens so often that we at AKF joke that “NFR” really stands for “No F-ing Resources (available to do the job)”.

While I believe that this relegation to second class citizen is a violation of fiduciary responsibility, I completely understand how we’ve collectively gotten away with it for so long.  For most of the history of our products, we’ve produced solutions for which customers are responsible for running.  We built the code, shipped it, and customers installed it and ran it on their systems. 

Fortunately (for most of us) the world has changed to the SaaS model.  As subscribers, we no longer bear the risk of running our own systems.  Implementation is easier and faster, costs of running solutions lower. 

Unfortunately (for most of us) these changes also mean we are now wholly accountable for NFRs.  We now mostly produce services, for which we are wholly accountable for the complete outcomes including how our solution runs.

Most NFRS Are Table Stakes and Must-Haves

In the world of delivering services, most NFR capabilities are must-haves.  SaaS companies provide a utility, and the customer expectation is that utility will be available whenever the customer needs it.  You simply do not have an option to decide to punt attributes like availability, or regulatory compliance to a later date. 

The Absolute Value of NFR Your Product/Service Needs Varies

While we believe most NFRs are necessary, and non-negotiable for playing in the SaaS space, the amount that you need of each of them varies with the portion of the market you are addressing within the Technology Adoption Lifecycle.

As you progress from left to right into a market, the NFR expectations of the adopters of your solution increase.  Innovators care more about the differentiating capability that you offer than they do the availability of your solution.  That said, they still need to be able to use your product and will stop using it or churn if it doesn’t meet their availability, response time, data sovereignty, and privacy needs.  NFRs are still necessary – they just matter less to innovators than later adopters.

At the other end of the extreme, Late Majority and Laggard adopters care greatly about NFRs.  Whereas Innovators may be willing to grudgingly live with 99.8% availability, the Late Majority will settle for nothing less than 99.95% or better.

It’s Time to Eliminate the Phrase Non-Functional Requirement

We believe that even the name “Non-Functional Requirement” somehow implies that necessary capabilities like availability and data sovereignty can somehow take a back seat to other activities within the solutions we create.  At the very least, the term fails to denote the necessity (some of them legally so) of these attributes.
We prefer names like “Table Stakes” or “Must Have Requirements” to NFRs.
 
It’s Also Time to Eliminate the Primary Cause

While we sometimes find that teams simply haven’t changed their mindset to properly understand that Table Stakes aren’t optional investments, we more often find a more insidious cause:  Moral Hazards.  Moral hazards exist when one person makes a decision for which another must bear the cost:  Person A decides to smoke, but Person B bears the risk of cancer instead of Person A.

Commonly, we see product managers with ownership over product decisions, but no accountability for Table Stakes like availability, response time, cost effectiveness, security, etc.  The problem with this is that, as we’ve described, the Table Stakes are the foundation of the Maslow’s Needs for online products.  Engineering teams and product teams should jointly own all the attributes of the products they co-create.  Doing so will help fix the flawed notion that Table Stakes can be deferred.

AKF Partners helps clients build highly available, scalable, fast response time solutions that meet the needs of the portion of the Technology Adoption Lifecycle they are addressing.

Permalink

The Difference between Science, Engineering and Programmers and What it Means to You

May 3, 2018  |  Posted By: Marty Abbott

Scientists, Engineers, and Technicians

What is, or perhaps should be, the difference between a Computer Scientist, Data Scientist, Software Engineer, and Programmer?  What should your expectations be of each?  How should they work together?

To answer these questions, we’ll look at the differences between scientists, engineers, and technicians in more mature disciplines, apply them to our domain, and offer suggestions as to expectations.

Science and Scientists

The primary purpose of science is “to know”.  Knowing, or the creation of knowledge, is enabled through discovery and the practice of the scientific method.  Scientists seek to know “why” something is and “how” that something works.  Once this understanding of “why and how” are generally accepted, ideally they are codified within theories that are continually tested for validity over time.

To be successful, scientists must practice both induction and deduction.  Induction seeks to find relationships for the purposes of forming hypotheses.  Deduction seeks to test those hypotheses for validity.

In the physical world (or in physical, non-biological, and non-behavioral sciences), scientists are most often physicists and chemists.  They create the knowledge, relationships, and theories upon which aerospace, chemical, civil, electrical, mechanical, and nuclear engineers rely.

For our domain, scientists are mathematicians, computer scientists – and more recently – data scientists.  Each seeks to find relationships and approaches useful for engineers and technicians to apply for business purposes.  The scientist “finds”, the engineer “applies”.  Perhaps the most interesting new field here is that of data science.  Whereas most science is focused on broad discovery, data science is useful within a business as it is meant to find relationships between various independent variables and desirable outcomes or dependent variables.  These may be for the purposes of general business insights, or something specific like understanding what items (or SKUs) we should display to different individuals to influence purchase decisions.

True scientists tend to have doctorates, the doctoral degree being the “certification” that one knows how to properly perform research.  There are of course many examples of scientists without doctoral degrees – but these days that is rare in any case other than data scientists (here the stakes are typically lower).

Engineering and Engineers
The primary purpose of engineering is to “create” or to “do”.  Engineers start with an understanding (created by scientists) of “why” things work, and “how” they work (scientific theories – often incorrectly called “laws” by engineers) and apply them for the purposes of creating complex solutions or products.  Mechanical engineers rely on classical physics, electrical engineers rely on modern physics.  Understanding both the “why” and “how” are important to be able to create highly reliable solutions, especially in new or unique situations.  For instance, it is important for electrical engineers to understand field generation for micro-circuitry and how those fields will affect the operation of the device in question.  Civil and mechanical engineers need to understand the notion of harmonic resonance in order to avoid disasters like the Tacoma Narrows bridge failure.

The domain of “software engineering” is much more confusing.  Unlike traditional engineering domains, software engineering is ill-defined and suffers from an overuse of the term in practice.  If such a domain truly existed, it should follow the models of other engineering disciplines.  As such, we should expect that software engineers have a deep understanding of the “whys and hows” derived from the appropriate sciences: computer science and mathematics.  For instance, computer scientists would identify and derive interesting algorithms and suggest applications, whereas engineers would determine how and when to apply the algorithms for maximum effect.  Computer scientists identify unique scenarios (e.g. the dining philosophers problem) that broadly define a class of problems (mutual exclusion in concurrency) and suggest approaches to fix them.  Engineers put those approaches into practice with the constraints of the system they are developing in mind.

Following this logic, we should also expect our engineers to understand how the systems upon which they apply their trade (computers) truly function – the relationship between processors, memory, storage, etc.  Further, they should understand how an operating system works, how interpreters and compilers work, and how networks work.  They should be able to apply all these things to come up with simple designs that limit the blast radius of failure, ensure low latency response, and are cost effective to produce and maintain.  They should understand the depth of components necessary to deliver a service to a customer – not just the solitary component upon which they work (the code itself). 

Very often, to be a successful engineer, one must have at least a bachelor’s degree in an engineering domain.  The degree at least indicates that one has proven to some certifying body that they understand the math, the theories, and the application of those theories to the domain in question.  Interestingly, as a further example of the difference between engineering and science, some countries issue engineering degrees under the degree of “Bachelors of Applied Science”.

There are of course many famous examples of engineers without degrees – for instance, the Wright Brothers.  The true test isn’t the degree itself, but whether someone understands the depth and breadth of math and science necessary to apply science in highly complex situations.  Degrees are simply one way of ensuring an individual has at least once proven an understanding of the domain.

Practical Applications and Technicians

Not everything we produce requires an engineer’s deep understanding of both why and how something works.  Sometimes, the application of a high-level understanding of “how” is sufficient.  As such, in many domains technicians augment engineers.  These technicians are responsible for creating solutions out of reusable building blocks and a basic understanding of how things work.  Electricians for instance are technicians within the domain of electrical engineering.  Plumbers are a very specific application of civil engineering.  HVAC technicians apply fluid mechanics from mechanical engineering.

These trades take a very technical skill, implementation specific tradecraft, and a set of heuristics to design, implement, and troubleshoot systems that are created time and time again.  Electricians, for instance, design the power infrastructure of homes and offices – potentially reviewed by an electrical engineer.  They then implement the design (wire the building) and are also responsible for troubleshooting any flaws.  The same is true for HVAC technicians and plumbers.

Programmers are the technicians for software engineers (engineering domain) and computer and data scientists (science domain).  Not everything we develop needs a “true engineer”.  Very often, as is the case with wiring a house, the solution is straight forward and can be accomplished with the toolset one gains with several weeks of training.  The key difference between an engineer and a programmer is again the depth and breadth of knowledge. 

Technicians are either trained through apprenticeship or through trade schools.  Electricians can come from either approach.  Broadly speaking, it makes sense to use technicians over using an engineer for at least 2 reasons:

  1. Some things don’t require an engineer, and the cost of an engineer makes no sense for these applications.
  2. The supply of engineers is very low relative to demand within the US across all domains.  During the great recession, engineers were one of the only disciplines at or near economic full employment – a clear indication of the supply/demand imbalance.

Implications and Takeaways

Several best practices follow the preceding definitions and roles:

  1. Don’t mix Science and Engineering teams or goals:  Science, and the approach to answering questions using the scientific method, are different animals and require different skills and different expectations from engineering.  The process of scientific discovery is difficult to apply time constraints; sometimes answers exist and are easy to find, sometimes they are hard to find, and sometimes they simply do not exist.  If you want to have effective analytics efforts, choose people trained to approach your Big Data needs appropriately, and put them in an organization dedicated to “science” activities (discovery).  They may need to be paired with programmers or engineers to accomplish their work – but they should not be confused with a typical product team.  “Ah-Ha!” moments are something you must allocate time against – but not something for which you can expect an answer in a defined time interval.
  2. Find the right ratio of engineers to programmers:  Most companies don’t need all their technical folks to be engineers.  Frankly, we don’t “mint” enough engineers anyway – roughly 80K to 100K per year since 1945 with roughly 18K of those being Computer Science graduates who go on to become “engineers” in practice.  Augment your teams with folks who have attended trade schools or specialized boot camps to learn how to program.
  3. Ensure you hire capable engineers:  You should both pay more for and expect more from the folks on your team who are doing engineering related tasks.  Do not allow them to approach solutions with just a “software” focus; expect them to understand how everything works together to ensure that you build the most reliable services possible.

 

Permalink

Fault Isolation in Services Architectures

May 2, 2018  |  Posted By: AKF

Our post on the AKF Scale Cube made reference to a concept that we call “Fault Isolation” and sometimes “Swim lanes” or “Swim-laned Architectures”.  We sometimes also call “swim lanes” fault isolation zones or fault isolated architecture.


Fault Isolation Defined
A “Swim lane” or fault isolation zone is a failure domain.  A failure domain is a group of services within a boundary such that any failure within that boundary is contained within the boundary and the failure does not propagate or affect services outside of said boundary.  Think of this as the “blast radius” of failure meant to answer the question of “What gets impacted should any service fail?” The benefit of fault isolation is twofold:

1) Fault Detection: Given a granular enough approach, the component of availability associated with the time to identify the failure is significantly reduced.  This is because all effort to find the root cause or failed component is isolated to the section of the product or platform associated with the failure domain.  Once something breaks, because the failure is limited in scope, it can be more rapidly identified and fixed.  Recovery time objectives (RTO) are subsequently decreased which increases overall availability.

2) Fault Isolation: As stated previously, the failure does not propagate or cause a deterioration of other services within the platform.  The “blast radius” of a failure is contained.  As such, and depending upon approach, only a portion of users or a portion of functionality of the product is affected.  This is akin to circuit breakers in your house - the breaker exists to limit the fault zone for any load that exceeds a limit imposed by the breaker.  Failure propagation is contained by the breaker popping and other devices are not affected. 

Architecting Fault Isolation
A fault isolated architecture is one in which each failure domain is completely isolated.  We use the term “swim lanes” to depict the separations. In order to achieve this, ideally there are no synchronous calls between swim lanes or failure domains made pursuant to a user request.  User initiated synchronous calls between failure domains are absolutely forbidden in this type of architecture as any user-initiated synchronous call between fault isolation zones, even with appropriate timeout and detection mechanisms, is very likely to cause a cascading series of failures across other domains.  Strictly speaking, you do not have a failure domain if that domain is connected via a synchronous call to any other service in another domain, to any service outside of the domain, or if the domain receives synchronous calls from other domains or services.  Again, “synchronous” is meant to identify a synchronous call (call, wait for a response) pursuant to any user request.

It is acceptable, but not advisable, to have asynchronouss calls between domains and to have non-user initiated synchronous calls between domains (as in the case of a batch job collecting data for the purposes of reporting in another failure domain).  If such a communication is necessary it is very important to include failure detection and timeouts even with the asynchronous calls to ensure that retries do not call port overloads on any services. Here is an interesting blog post about runaway scripts and their impact on Apache, PHP, and MySQL.

As previously indicated, a swim lane should have all of its services located within the failure domain.  For instance, if database [read/writes] are necessary, the database with all appropriate information for that swim lane should exist within the same failure domain as all of the application and webservers necessary to perform the function or functions of the swim lane.  Furthermore, that database should not be used for other requests of service from other swim lanes.  Our rule is one production database on one host.

The figure below demonstrates the components of software and infrastructure that are typically fault isolated:
Fault Isolation in Micro-Services Architectures

Rarely are shared higher level network components isolated (e.g. border systems and core routers).
Sometimes, if practical, firewalls and load balancers are isolated.  These are especially the case under very high demand situations where a single pair of devices simply wouldn’t meet the demand.

The remainder of solutions are always isolated, with web-servers, top of rack switches (in non IaaS implementations), compute (app servers) and storage all being properly isolated.

Applying Fault Isolation with AKF’s Scale Cube
As we have indicated with our Scale Cube in the past, there are many ways in which to think about swim laned architectures.  Swim lanes can be isolated along the axes of the Scale Cube as shown below with AKF’s circuit breaker analogy to fault isolation. 

AKF Fault Isolation in the X-axis
Fault isolation in X-axis would mean replicating everything for high availability - and performing the replication asynchronously and in an eventually consistent (rather than a consistent) fashion.  For example, when a data center fails the fault will be isolated to the one failed data center or multiple availability zones. This is common with traditional disaster recovery approaches, though we do not often advise it as there are better and more cost effective solutions for recovering from disaster.

AKF Fault Isolation in the Y-axis
Fault Isolation in the Y-axis can be thought in terms of a separation of services e.g. “login” and “shopping cart” (two separate swim lanes) each having the web and app servers as well as all data stores located within the swim lane and answering only to systems within that swim lane.  Each portion of a page is delivered from a separate service reducing the blast radius of a potential fault to it’s swim lane. 

While purposely not legible (fuzzy) the fake example above shows different components of a fictional business account from a fictional bank.  Components of the page are separated with one component showing a summary, another component displaying more detailed information and still other components showing dynamic or static links - each derived from properly isolated services.

AKF Fault Isolation in the Z-axis
Another approach would be to perform a separation of your customer base or a separation of your order numbers or product catalog.  Assuming an indiscriminate function to perform this separation (like a modulus of id), such a split would be a Z axis swim lane along customer, order number or product id lines.  More beneficially, if we are interested in fastest possible response times to customers, we may split along geographic boundaries.  We may have data centers (or IaaS regions) serving the West and East Coasts of the US respectively, the “Fly-Over States” of the US, and regions serving the EU, Canada, Asia, etc.  Besides contributing to faster perceived customer response times, these implementations can also help ensure we are compliant with data sovereignty laws unique to different countries or even states within the US.


Combining the concepts of service and database separation into several fault isolative failure domains creates both a scalable and highly available platform.  AKF has helped achieve a high availability through fault isolation.  Contact us to see how we can help you achieve the same fault tolerance.

AKF Partners helps companies create highly available, fault isolated solutions.  Send us a note - we’d love to help you!

Permalink

‹ First  < 2 3 4 5 6 >  Last ›