AKF Partners

Abbott, Keeven & Fisher Partners Partners in Technology

Growth Blog

Scalability and Technology Consulting Advice for SaaS and Technology Companies

The No Surprise Rule

May 23, 2018  |  Posted By: Geoffrey Weber

No Surprises

We blogged recently about how to write precisely and concisely, highlighting how important it was to learn the “Three Sentence Rule” early in our careers so that when we communicated with other executives, we communicated with extreme brevity and clarity.  We might think of this as the “what” of executive communication.  Today, we’d like to quickly describe a few ground rules with respect to the “when” and “how” of communicating as executives.

10 or 15 years ago, a fad swept through technology, executives everywhere were writing “How to Communicate with Me” articles for their teams and co-workers.  In the most positive light, these were serious attempts by quirky executives to help their teams learn to conform to their own bizarre communications requirements.  We would argue that a modern technology executive with a reasonably non-quirky personality need not pen such narcissistic claptrap. Communications is so basic, we should not over-think the process.

In today’s world, we have a variety of communications channels available: face-to-face, email, text message, internal communications tools (e.g. Slack) and the good old telephone.  When an unexpected issue occurs on our watch, our primary duty is to inform our superior, by any means necessary as quickly as possible.  

Whether we work in a large corporate environment with thousands of employees or in a small team with 10 people, immediate communications are an absolute requirement.  If we fail to do so, our superiors may hear of the unexpected news before we have a chance to tell them.  Think of a major system outage…  while we work to determine a root cause, the VP of Marketing sends a quick text to our boss (let’s say the CEO in this case.). Now the CEO is in possession of bad news about something we are responsible for.  Our phone will ring immediately, and we’ll be on our back feet explaining why we hadn’t taken a moment to call.

A worse example might be a system outage that we, as CTO, were not aware of, and the very same VP of Marketing texts the CEO again.  Now when the phone rings, we are surprised, just as the CEO was surprised by the VP of Marketing.  Our team has failed at a very fundamental level.

There’s an informal rule that states: No Surprises.  The corollary is, communicate as early as possible and as often as possible.  A site outage demands an immediate upward missive with frequent updates.  The leaders who work under us must also live by this rule.  We can never be left out in the cold when it comes to significant information.  Furthermore, we are solely accountable for the communication of negative news up to our bosses.

The idea of communicating early and communicating often has a number of uses beyond crisis communications.  In the early days of eBay, Marty Abbott (managing partner of AKF Partners) set 4 objectives for the site operations teams: Availability (99.9%), Scalability, Cost and Operational Excellence.  Every member of the operations teams knew the current availability as it was communicated nearly continuously.  The other 3 objectives were communicated with equal frequency.  It would be a significant surprise if a colleague was working on a project that was not associated with Availability, Scalability, Cost or Operational excellence.  A few years later, we borrowed Marty’s objectives at Shutterfly and simplified: Up, Fast, Cheap and Easy. All 50 operations team members knew those goals and we repeated them like a mantra.

The quickest path to failure as technology executives is non-communication, the opposite of communicating clearly and frequently.  Worse, those executives that don’t stay ahead of the surprises technology throws at us every day will find themselves working in a different industry.

To summarize how to communicate:

When: early and often

How: any means available

What: 3 sentences.

We don’t need to write 5 page essays on how to communicate unless we are quite peculiar.  

 

Subscribe to the AKF Newsletter

Contact Us

4 Landmines When Using Serverless Architecture

May 20, 2018  |  Posted By: Dave Berardi

Physical Bare Metal, Virtualization, Cloud Compute, Containers, and now Serverless in your SaaS? We are starting to hear more and more about Serverless computing. Sometimes you will hear it called function as a service. In this next iteration of Infrastructure-as-a-Service, users can execute a task or function without having to provision a server, virtual machine, or any other underlying resource. The word Serverless is a misnomer as provisioning the underlying resources are abstracted away from the user, but they still exist underneath the covers. It’s just that Amazon, Microsoft, and Google manage it for you with their code. AWS Lambda, Azure Functions, and Google Cloud Functions are becoming more common in the architecture of a SaaS product. As technology leaders responsible for architectural decisions for scale and availability, we must understand its pros and cons and take the right actions to apply it.

Several advantages of serverless computing include:

• Software engineers can deploy and run code without having to manage any underlying infrastructure effectively creating a No-Ops environment.
• Auto-scaling is easier and requires less orchestration as compared to a containerized environment running services.
• True On-Demand capacity – no orphaned containers or other resources that might be idling.
• They are cost effective IF we are running the right size workloads.

Disadvantages and potential landmines to watch out for:

• Landmine #1 - No control over the execution environment meaning you are unable to isolate your operational environment. Compute and networking resources are virtualized with no visibility into either of them. Availability is the hands of our cloud provider and uptime is not guaranteed.
• Landmine #2 - SLAs cannot guarantee uptime. Start-up time can take a second causing latency that might not be acceptable.
• Landmine #3 - It’s going to become much easier for engineers to create code, host it rapidly, and forget about it leading to unnecessary compute and additional attack vectors creating a security risk.
• Landmine #4 - You will create vendor lock-in with your cloud provider as you set up your event driven functions to trigger from other AWS or Azure Services or your own services running on compute instances.

AKF is often asked about our position on serverless computing. There are 4 key rules considering the advantages and the landmines that we outlined:

1) Gradually introduce it into your architecture and use it for the right use cases
2) Establish architectural principles that guide its use in your organization that will minimize availability impact for Serverless. You will tie your availability to the FaaS in your cloud provider.
3) Watch out for a false sense of security among your engineering teams. Understand how serverless works before you use it and so you can monitor it for performance and availability.
4) Manage how and what it’s used for - monitor it (eg. AWS Cloud Watch) to avoid neglect and misuse along with cost inefficiencies.

AWS, Azure, or Google Cloud Serverless platforms could provide an affective computing abstraction in your architecture if it’s used for the right use cases, good monitoring is in place, and architectural principles are established.

AKF Partners has helped many companies create highly available and scalable systems that are designed to be monitored. Contact us for a free consultation.

Subscribe to the AKF Newsletter

Contact Us

Do you know what is negatively affecting your engineers' productivity? Shouldn't you?

May 13, 2018  |  Posted By: Dave Swenson
The Impact of Meetings on Engineers

Meetings, meetings, meetings. How many times have we said that? Visiting dozens and dozens of clients per year, we see a number of customers whose culture seems to be extremely meeting-centric, as ifthe only way any decision can be made or information communicated is via a meeting.

Paul Graham, co-founder of Y Combinator and Hacker News, wrote back in 2009 of the impact of meetings upon engineers. Coding typically is best performed in multi-hour solid chunks of time, with no interruptions. It takes awhile to get into the ‘zone’, and any context switch will disrupt that zone, in Graham’s words “like throwing an exception”. He even suggests that the impact of a meeting goes far beyond the actual time spent at the meeting, that simply knowing you are going to be disrupted prevents you from reaching that zone - something like when you know you have to get up early in the morning say for a flight, you toss and turn all night long, unable to get into that deep REM state.

Many companies recognize the disruptive impact of meetings, and put rules stating ‘no meeting’ afternoons, or perhaps a full day in place. Pinterest’s recent blog post recounts their somewhat extreme move along these lines - putting a three-day no meeting block in place for engineers - engineers were not to be invited to meetings 3 days a week. The blog post is worth a read, covering some of the challenges and objections of eliminating engineer-attended meetings 3 days a week, but overall touts the success of the approach citing a 92% positive response rate to a survey question asking “Are you more productive…?”.

Really, Pinterest? Really??

I’m all for the reduction of meetings, though I do wonder if three days a week with no meetings is a bit overboard. What I’m disappointed by is that Pinterest has no (or at least did not cite any) quantifiable evidence that their engineers were actually more productive. Now, I’m not suggesting they should have a before and after count of, say, lines of code. But, assuming that Pinterest is at least something of an Agile shop, did they not see an increase in velocity, in story points being delivered?

In our visits to our clients, just as we see a wide variation in the dependency upon meetings to get anything done, we see some clients living and breathing by their team-by-team velocity numbers, while other clients totally disregard that key productivity metric. To you technology leaders out there, how better can you measure your teams’ efficiencies?

And, even more so, do you know why your teams’ current velocity is what it is? Are you actively seeking out the context switches, delays, and disruptions that are throwing exceptions in your engineers’ brains?

We’ve been pulled in many times to analyze a team’s efficiency (or lack thereof), only to find out that, yes, meetings are a negative influence, but beyond that:

  • Interviews (worthwhile, but hiring should be a highly optimized process)
  • Environmental issues (are you measuring your dev environments’ availability?)
  • Waiting for a pull request approvals (do you have an SLA around this?)
  • Long build times that are due to weak hardware or poor dependency management (compare the cost of faster build machines or code optimization of your builds vs. the value of wait or down-time of your engineers?)
  • Waiting to receive clarification from a product owner on a feature (again, do you have an SLA around this? Is your team colocated, so a question can be asked/answered quickly?)
  • Other surprising items ranging from having to feed a parking meter to miserable network latency for those remote engineers.

Yet again, the mantra of “If you can’t measure it, you can’t improve it” applies. We view metrics such as actual hours spent coding vs. expected hours spent coding as not only a measurement of your teams’ productivity, but as a management effectiveness gauge. Are you as a manager effectively protecting your engineers?

Are you able to see the impact of ‘no-meeting’ days, or the factors today that negatively affect your developers’ coding efficiencies?

If not, AKF can more than help. We have run productivity surveys at many clients, and always enjoy the look on technology leaders’ faces when we present the results. Let us help you.

Subscribe to the AKF Newsletter

Contact Us

Three Reasons Your Software Engineers May Not Be Successful

May 10, 2018  |  Posted By: Pete Ferguson

Three Reasons Your Software Engineers May Not Be Successful

At AKF Partners, we have the unique opportunity to see trends among startups and well-established companies in the dozens of technical due diligence and more in-depth technology assessments we regularly perform, in addition to filling interim leadership roles within organizations.  Because we often talk with a variety of folks from the CEO, investors, business leadership, and technical talent, we get a unique top-to-bottom perspective of an organization.

Three common observations

  • People mostly identify with their job title, not the service they perform.
  • Software Engineers can be siloed in their own code vs. contributing to the greater outcome.
  • CEO’s vision vs. frontline perception of things as they really are.

Job Titles Vs. Services

The programmer who identifies herself as “a search engineer” is likely not going to be as engaged as her counterpart who describes herself as someone who “helps improve our search platform for our customers.”

Shifting focus from a job title to a desired outcome is a best practice from top organizations.  We like to describe this as separating nouns and verbs – “I am a software engineer” focuses on the noun without an action: software engineer instead of “I simplify search” where the focus is on verb of the desired outcome: simplify.  It may seem minor or trivial, but this shift can be a contributing impact on how team members understand their contribution to your overall organization. 

Removing this barrier to the customer puts team members on the front line of accountability to customer needs – and hopefully also the vision and purpose of the company at large.  To instill a customer experience, outcome based approach often requires a reworking of product teams given our experience with successful companies.  Creating a diverse product team (containing members of the Architecture, Product, QA and Service teams for example) that owns the outcomes of what they produce promotes:

  • Motivation
  • Quality
  • Creating products customers love

If you have had experience in a Ford vehicle with the first version of Sync (bluetooth connectivity and onscreen menus) – then you are well aware of the frustration of scrolling through three layers of menus to select “bluetooth audio” ([Menu] -> [OK] -> [OK] -> [Down Arrow]-> [OK] -> [Down Arrow] -> [OK]) each time you get into your car.  The novelty of wireless streaming was a key differentiator when Sync first was introduced – but is now table stakes in the auto industry – and quickly wears off when having to navigate the confusing UI likely designed by product engineers each focused on a specific task but void of designing for a great user experience.  What was missing is someone with the vision and job description: “I design wireless streaming to be seamless and awesome - like a button that says “Bluetooth Audio!!!”

Hire for – and encourage – people who believe and practice “my real job is to make things simple for our customers.”

Avoiding Siloed Approach

Creating great products requires engineers to look outside of their current project specific tasks and focus on creating great customer experiences.  Moving from reactively responding to customer reported problems to proactively identifying issues with service delivery in real time goes well beyond just writing software.  It moves to creating solutions.

Long gone are the “fire and forget” days of writing software, burning to a CD and pushing off tech debt until the next version.  To Millennials, this Waterfall approach is foreign, but unfortunately we still see this mentality engrained in many company cultures.

Today it is all about services.  A release is one of many in a very long evolution of continual improvement and progression.  There isn’t Facebook V1 to be followed by V2 … it is a continual rolling out of upgrades and bug fixes that are done in the background with minimum to no downtime.  Engineers can’t afford to be laggard in their approach to continual evolution, addressing tech debt, and contributing to internal libraries for the greater good.

Ensure your technical team understands and is very closely connected to the evolving customer experience and have skin in the game.  Among your customers, there likely is very little patience with “wait until our next release.”  They expect immediately resolution or they will start shopping the competition.

Translating the Vision of the CEO to the Front Lines

During our our more in-depth technology review engagements we interview many people from different layers of management and different functions within the organization.  This gives us a unique opportunity to see how the vision of the CEO migrates down through layers of management to the front-line programmers who are responsible for translating the vision into reality.

Usually - although not always - the larger the company, the larger the divide between what is being promised to investors/Wall Street and what is understood as the company vision by those who are actually doing the work.  Best practices at larger companies include regular all-hands where the CEO and other leaders share their vision and are held accountable to deliverables and leadership checks that the vision is conveyed in product roadmaps and daily stand up meetings.  When incentive plans focus directly on how well a team and individual understand and produce products to accomplish the company vision, communication gaps close considerably.

Creating and sustaining successful teams requires a diverse mix of individuals with a service mindset.  This is why we stress that Product Teams need to be all inclusive of multiple functions.  Architecture, Product, Service, QA, Customer Service, Sales and others need to be included in stand up meetings and take ownership in the outcome of the product. 

The Dev Team shouldn’t be the garbage disposal for what Sales has promised in the most recent contract or what other teams have ideated without giving much thought to how it will actually be implemented. 

When your team understands the vision of the company - and how customers are interacting with the services of your company - they are in a much better position to implement it into reality.

As a CTO or CIO, it is your responsibility to ensure what is promised to Wall Street, private investors, and customers is translated correctly into the services you ultimately create, improve, and publish.

Conclusions

As we look at new start-ups facing explosive 100-200% year-over-year growth, our question is always “how will the current laser focus vision and culture scale?”  Standardization, good Agile practices, understanding technical debt, and creating a scalable on-boarding and mentoring process all lend to best answers to this question.

When your development teams are each appropriately sized, include good representation of functional groups, each team member identifies with verbs vs. nouns (“I improve search” vs. “I’m a software engineer”), and understand how their efforts tie into company success, your opportunities for success, scalability, and adaptability are maximized.

—-

Experiencing growing or scaling pains?  AKF is here to help!  We are an industry expert in technology scalability, due diligence, and helping to fill leadership gaps with interim CIO/CTO and other positions in addition to helping you in your search for technical leaders.  Put our 200+ years of combined experience to work for you today!

Get this article and others like it by signing up for our newsletter.

 

Subscribe to the AKF Newsletter

Contact Us

Enabling TTM With Contributor Model Teams

May 6, 2018  |  Posted By: Dave Berardi

Enabling TTM With Contributor Model Teams

We often speak about the benefits of aligning agile teams with the system’s architecture.  As Conway’s Law describes, product/solution architectures and organizations cannot be developed in isolation.  (See https://akfpartners.com/growth-blog/conways-law) Agile autonomous teams are able to act more efficiently, with faster time to market (TTM).  Ideally, each team should be able to behave like a startup with the skills and tools needed to iterate until they reach the desired outcome.

Many of our clients are under pressure to achieve both effective TTM and reduce the risk of redundant services that produce the same results. During due diligence, we will sometimes discover redundant services that individual teams develop within their own silo for a TTM benefit.  Rather than competing with priorities and waiting for a shared service team to deliver code, the team will build their own flavor of a common service to get to market faster.

Instead, we recommend a shared service team own common services. In this type of team alignment, the team has a shared service or feature on which other autonomous teams depend. For example, many teams within a product may require email delivery as a feature.  Asking each team to develop and operate its own email capability would be wasteful, resulting in engineers designing redundant functionality leading to cost inefficiencies and unneeded complexity.  Rather than wasting time on duplicative services, we recommend that organizations create a team that would focus on email and be used by other teams.

Teams make requests in the form of stories for product enhancements that are deposited in the shared services team’s backlog. (email in this case) To mitigate the risk of having each of these requesting teams waiting for requests to be fulfilled by the shared services team, we suggest thinking of the shared services as an open source project or as some call it – the contributor model.

Open sourcing our solution (at least internally) doesn’t mean opening up the email code base to all engineers and letting them have at it.  It does mean mechanisms should be established to help control the quality and design for the business. An open source project often has its own repo and typically only allows trusted engineers, called Committers, to commit. Committers have Contribution Standards defined by the project owning team. In our email example, the team should designate trusted and experienced engineers from other Agile teams that can code and commit to the email repo. Engineers on the email team can be focused on making sure new functionality aligns with architectural and design principles that have been established. Code reviews are conducting before its accepted. Allowing for outside contribution will help to mitigate the potential bottleneck such a team could create.

Now that the development of email has been spread out across contributors on different teams, who really owns it?

Remember, ownership by many is ownership by none.  In our example, the email team ultimately owns the services and code base. As other developers commit new code to the repo, the email team should conduct code, design, and architectural reviews and ultimately deployments and operations.  They should also confirm that the contributions align with the strategic direction of the email mission.  Whatever mechanisms are put in place, teams that adopt a contributor model should be a gas pedal and not a brake for TTM.

If your organization needs help with building an Agile organization that can innovate and achieve competitive TTM, we would love to partner with you. Contact us for a free consultation.

Subscribe to the AKF Newsletter

Contact Us

The Problem with Non-Functional Requirements

May 4, 2018  |  Posted By: Marty Abbott

Many of our clients use the term “Non-Functional Requirements” to group into a basket those portions of their solution that don’t easily fit into method or function-based execution of market needs.  Examples of non-functional requirements often include things like “Availability”, “Scalability”, “Response Time”, “Data Sovereignty” (as codified within requirements such as the GDPR), etc.  Very often, these “NFRs” are relegated to second class citizens within the development lifecycle and, if lucky, end up as a technical debt item to be worked on later.  More often than not, they are just forgotten until major disaster strikes.  This happens so often that we at AKF joke that “NFR” really stands for “No F-ing Resources (available to do the job)”.

While I believe that this relegation to second class citizen is a violation of fiduciary responsibility, I completely understand how we’ve collectively gotten away with it for so long.  For most of the history of our products, we’ve produced solutions for which customers are responsible for running.  We built the code, shipped it, and customers installed it and ran it on their systems. 

Fortunately (for most of us) the world has changed to the SaaS model.  As subscribers, we no longer bear the risk of running our own systems.  Implementation is easier and faster, costs of running solutions lower. 

Unfortunately (for most of us) these changes also mean we are now wholly accountable for NFRs.  We now mostly produce services, for which we are wholly accountable for the complete outcomes including how our solution runs.

Most NFRS Are Table Stakes and Must-Haves

In the world of delivering services, most NFR capabilities are must-haves.  SaaS companies provide a utility, and the customer expectation is that utility will be available whenever the customer needs it.  You simply do not have an option to decide to punt attributes like availability, or regulatory compliance to a later date. 

The Absolute Value of NFR Your Product/Service Needs Varies

While we believe most NFRs are necessary, and non-negotiable for playing in the SaaS space, the amount that you need of each of them varies with the portion of the market you are addressing within the Technology Adoption Lifecycle.

As you progress from left to right into a market, the NFR expectations of the adopters of your solution increase.  Innovators care more about the differentiating capability that you offer than they do the availability of your solution.  That said, they still need to be able to use your product and will stop using it or churn if it doesn’t meet their availability, response time, data sovereignty, and privacy needs.  NFRs are still necessary – they just matter less to innovators than later adopters.

At the other end of the extreme, Late Majority and Laggard adopters care greatly about NFRs.  Whereas Innovators may be willing to grudgingly live with 99.8% availability, the Late Majority will settle for nothing less than 99.95% or better.

It’s Time to Eliminate the Phrase Non-Functional Requirement

We believe that even the name “Non-Functional Requirement” somehow implies that necessary capabilities like availability and data sovereignty can somehow take a back seat to other activities within the solutions we create.  At the very least, the term fails to denote the necessity (some of them legally so) of these attributes.
We prefer names like “Table Stakes” or “Must Have Requirements” to NFRs.
 
It’s Also Time to Eliminate the Primary Cause

While we sometimes find that teams simply haven’t changed their mindset to properly understand that Table Stakes aren’t optional investments, we more often find a more insidious cause:  Moral Hazards.  Moral hazards exist when one person makes a decision for which another must bear the cost:  Person A decides to smoke, but Person B bears the risk of cancer instead of Person A.

Commonly, we see product managers with ownership over product decisions, but no accountability for Table Stakes like availability, response time, cost effectiveness, security, etc.  The problem with this is that, as we’ve described, the Table Stakes are the foundation of the Maslow’s Needs for online products.  Engineering teams and product teams should jointly own all the attributes of the products they co-create.  Doing so will help fix the flawed notion that Table Stakes can be deferred.

AKF Partners helps clients build highly available, scalable, fast response time solutions that meet the needs of the portion of the Technology Adoption Lifecycle they are addressing.

Subscribe to the AKF Newsletter

Contact Us

The Difference between Science, Engineering and Programmers and What it Means to You

May 3, 2018  |  Posted By: Marty Abbott

Scientists, Engineers, and Technicians

What is, or perhaps should be, the difference between a Computer Scientist, Data Scientist, Software Engineer, and Programmer?  What should your expectations be of each?  How should they work together?

To answer these questions, we’ll look at the differences between scientists, engineers, and technicians in more mature disciplines, apply them to our domain, and offer suggestions as to expectations.

Science and Scientists

The primary purpose of science is “to know”.  Knowing, or the creation of knowledge, is enabled through discovery and the practice of the scientific method.  Scientists seek to know “why” something is and “how” that something works.  Once this understanding of “why and how” are generally accepted, ideally they are codified within theories that are continually tested for validity over time.

To be successful, scientists must practice both induction and deduction.  Induction seeks to find relationships for the purposes of forming hypotheses.  Deduction seeks to test those hypotheses for validity.

In the physical world (or in physical, non-biological, and non-behavioral sciences), scientists are most often physicists and chemists.  They create the knowledge, relationships, and theories upon which aerospace, chemical, civil, electrical, mechanical, and nuclear engineers rely.

For our domain, scientists are mathematicians, computer scientists – and more recently – data scientists.  Each seeks to find relationships and approaches useful for engineers and technicians to apply for business purposes.  The scientist “finds”, the engineer “applies”.  Perhaps the most interesting new field here is that of data science.  Whereas most science is focused on broad discovery, data science is useful within a business as it is meant to find relationships between various independent variables and desirable outcomes or dependent variables.  These may be for the purposes of general business insights, or something specific like understanding what items (or SKUs) we should display to different individuals to influence purchase decisions.

True scientists tend to have doctorates, the doctoral degree being the “certification” that one knows how to properly perform research.  There are of course many examples of scientists without doctoral degrees – but these days that is rare in any case other than data scientists (here the stakes are typically lower).

Engineering and Engineers
The primary purpose of engineering is to “create” or to “do”.  Engineers start with an understanding (created by scientists) of “why” things work, and “how” they work (scientific theories – often incorrectly called “laws” by engineers) and apply them for the purposes of creating complex solutions or products.  Mechanical engineers rely on classical physics, electrical engineers rely on modern physics.  Understanding both the “why” and “how” are important to be able to create highly reliable solutions, especially in new or unique situations.  For instance, it is important for electrical engineers to understand field generation for micro-circuitry and how those fields will affect the operation of the device in question.  Civil and mechanical engineers need to understand the notion of harmonic resonance in order to avoid disasters like the Tacoma Narrows bridge failure.

The domain of “software engineering” is much more confusing.  Unlike traditional engineering domains, software engineering is ill-defined and suffers from an overuse of the term in practice.  If such a domain truly existed, it should follow the models of other engineering disciplines.  As such, we should expect that software engineers have a deep understanding of the “whys and hows” derived from the appropriate sciences: computer science and mathematics.  For instance, computer scientists would identify and derive interesting algorithms and suggest applications, whereas engineers would determine how and when to apply the algorithms for maximum effect.  Computer scientists identify unique scenarios (e.g. the dining philosophers problem) that broadly define a class of problems (mutual exclusion in concurrency) and suggest approaches to fix them.  Engineers put those approaches into practice with the constraints of the system they are developing in mind.

Following this logic, we should also expect our engineers to understand how the systems upon which they apply their trade (computers) truly function – the relationship between processors, memory, storage, etc.  Further, they should understand how an operating system works, how interpreters and compilers work, and how networks work.  They should be able to apply all these things to come up with simple designs that limit the blast radius of failure, ensure low latency response, and are cost effective to produce and maintain.  They should understand the depth of components necessary to deliver a service to a customer – not just the solitary component upon which they work (the code itself). 

Very often, to be a successful engineer, one must have at least a bachelor’s degree in an engineering domain.  The degree at least indicates that one has proven to some certifying body that they understand the math, the theories, and the application of those theories to the domain in question.  Interestingly, as a further example of the difference between engineering and science, some countries issue engineering degrees under the degree of “Bachelors of Applied Science”.

There are of course many famous examples of engineers without degrees – for instance, the Wright Brothers.  The true test isn’t the degree itself, but whether someone understands the depth and breadth of math and science necessary to apply science in highly complex situations.  Degrees are simply one way of ensuring an individual has at least once proven an understanding of the domain.

Practical Applications and Technicians

Not everything we produce requires an engineer’s deep understanding of both why and how something works.  Sometimes, the application of a high-level understanding of “how” is sufficient.  As such, in many domains technicians augment engineers.  These technicians are responsible for creating solutions out of reusable building blocks and a basic understanding of how things work.  Electricians for instance are technicians within the domain of electrical engineering.  Plumbers are a very specific application of civil engineering.  HVAC technicians apply fluid mechanics from mechanical engineering.

These trades take a very technical skill, implementation specific tradecraft, and a set of heuristics to design, implement, and troubleshoot systems that are created time and time again.  Electricians, for instance, design the power infrastructure of homes and offices – potentially reviewed by an electrical engineer.  They then implement the design (wire the building) and are also responsible for troubleshooting any flaws.  The same is true for HVAC technicians and plumbers.

Programmers are the technicians for software engineers (engineering domain) and computer and data scientists (science domain).  Not everything we develop needs a “true engineer”.  Very often, as is the case with wiring a house, the solution is straight forward and can be accomplished with the toolset one gains with several weeks of training.  The key difference between an engineer and a programmer is again the depth and breadth of knowledge. 

Technicians are either trained through apprenticeship or through trade schools.  Electricians can come from either approach.  Broadly speaking, it makes sense to use technicians over using an engineer for at least 2 reasons:

  1. Some things don’t require an engineer, and the cost of an engineer makes no sense for these applications.
  2. The supply of engineers is very low relative to demand within the US across all domains.  During the great recession, engineers were one of the only disciplines at or near economic full employment – a clear indication of the supply/demand imbalance.

Implications and Takeaways

Several best practices follow the preceding definitions and roles:

  1. Don’t mix Science and Engineering teams or goals:  Science, and the approach to answering questions using the scientific method, are different animals and require different skills and different expectations from engineering.  The process of scientific discovery is difficult to apply time constraints; sometimes answers exist and are easy to find, sometimes they are hard to find, and sometimes they simply do not exist.  If you want to have effective analytics efforts, choose people trained to approach your Big Data needs appropriately, and put them in an organization dedicated to “science” activities (discovery).  They may need to be paired with programmers or engineers to accomplish their work – but they should not be confused with a typical product team.  “Ah-Ha!” moments are something you must allocate time against – but not something for which you can expect an answer in a defined time interval.
  2. Find the right ratio of engineers to programmers:  Most companies don’t need all their technical folks to be engineers.  Frankly, we don’t “mint” enough engineers anyway – roughly 80K to 100K per year since 1945 with roughly 18K of those being Computer Science graduates who go on to become “engineers” in practice.  Augment your teams with folks who have attended trade schools or specialized boot camps to learn how to program.
  3. Ensure you hire capable engineers:  You should both pay more for and expect more from the folks on your team who are doing engineering related tasks.  Do not allow them to approach solutions with just a “software” focus; expect them to understand how everything works together to ensure that you build the most reliable services possible.

 

Subscribe to the AKF Newsletter

Contact Us

Fault Isolation in Services Architectures

May 2, 2018  |  Posted By: AKF

Our post on the AKF Scale Cube made reference to a concept that we call “Fault Isolation” and sometimes “Swim lanes” or “Swim-laned Architectures”.  We sometimes also call “swim lanes” fault isolation zones or fault isolated architecture.


Fault Isolation Defined
A “Swim lane” or fault isolation zone is a failure domain.  A failure domain is a group of services within a boundary such that any failure within that boundary is contained within the boundary and the failure does not propagate or affect services outside of said boundary.  Think of this as the “blast radius” of failure meant to answer the question of “What gets impacted should any service fail?” The benefit of fault isolation is twofold:

1) Fault Detection: Given a granular enough approach, the component of availability associated with the time to identify the failure is significantly reduced.  This is because all effort to find the root cause or failed component is isolated to the section of the product or platform associated with the failure domain.  Once something breaks, because the failure is limited in scope, it can be more rapidly identified and fixed.  Recovery time objectives (RTO) are subsequently decreased which increases overall availability.

2) Fault Isolation: As stated previously, the failure does not propagate or cause a deterioration of other services within the platform.  The “blast radius” of a failure is contained.  As such, and depending upon approach, only a portion of users or a portion of functionality of the product is affected.  This is akin to circuit breakers in your house - the breaker exists to limit the fault zone for any load that exceeds a limit imposed by the breaker.  Failure propagation is contained by the breaker popping and other devices are not affected. 

Architecting Fault Isolation
A fault isolated architecture is one in which each failure domain is completely isolated.  We use the term “swim lanes” to depict the separations. In order to achieve this, ideally there are no synchronous calls between swim lanes or failure domains made pursuant to a user request.  User initiated synchronous calls between failure domains are absolutely forbidden in this type of architecture as any user-initiated synchronous call between fault isolation zones, even with appropriate timeout and detection mechanisms, is very likely to cause a cascading series of failures across other domains.  Strictly speaking, you do not have a failure domain if that domain is connected via a synchronous call to any other service in another domain, to any service outside of the domain, or if the domain receives synchronous calls from other domains or services.  Again, “synchronous” is meant to identify a synchronous call (call, wait for a response) pursuant to any user request.

It is acceptable, but not advisable, to have asynchronouss calls between domains and to have non-user initiated synchronous calls between domains (as in the case of a batch job collecting data for the purposes of reporting in another failure domain).  If such a communication is necessary it is very important to include failure detection and timeouts even with the asynchronous calls to ensure that retries do not call port overloads on any services. Here is an interesting blog post about runaway scripts and their impact on Apache, PHP, and MySQL.

As previously indicated, a swim lane should have all of its services located within the failure domain.  For instance, if database [read/writes] are necessary, the database with all appropriate information for that swim lane should exist within the same failure domain as all of the application and webservers necessary to perform the function or functions of the swim lane.  Furthermore, that database should not be used for other requests of service from other swim lanes.  Our rule is one production database on one host.

The figure below demonstrates the components of software and infrastructure that are typically fault isolated:
Fault Isolation in Micro-Services Architectures

Rarely are shared higher level network components isolated (e.g. border systems and core routers).
Sometimes, if practical, firewalls and load balancers are isolated.  These are especially the case under very high demand situations where a single pair of devices simply wouldn’t meet the demand.

The remainder of solutions are always isolated, with web-servers, top of rack switches (in non IaaS implementations), compute (app servers) and storage all being properly isolated.

Applying Fault Isolation with AKF’s Scale Cube
As we have indicated with our Scale Cube in the past, there are many ways in which to think about swim laned architectures.  Swim lanes can be isolated along the axes of the Scale Cube as shown below with AKF’s circuit breaker analogy to fault isolation. 

AKF Fault Isolation in the X-axis
Fault isolation in X-axis would mean replicating everything for high availability - and performing the replication asynchronously and in an eventually consistent (rather than a consistent) fashion.  For example, when a data center fails the fault will be isolated to the one failed data center or multiple availability zones. This is common with traditional disaster recovery approaches, though we do not often advise it as there are better and more cost effective solutions for recovering from disaster.

AKF Fault Isolation in the Y-axis
Fault Isolation in the Y-axis can be thought in terms of a separation of services e.g. “login” and “shopping cart” (two separate swim lanes) each having the web and app servers as well as all data stores located within the swim lane and answering only to systems within that swim lane.  Each portion of a page is delivered from a separate service reducing the blast radius of a potential fault to it’s swim lane. 

While purposely not legible (fuzzy) the fake example above shows different components of a fictional business account from a fictional bank.  Components of the page are separated with one component showing a summary, another component displaying more detailed information and still other components showing dynamic or static links - each derived from properly isolated services.

AKF Fault Isolation in the Z-axis
Another approach would be to perform a separation of your customer base or a separation of your order numbers or product catalog.  Assuming an indiscriminate function to perform this separation (like a modulus of id), such a split would be a Z axis swim lane along customer, order number or product id lines.  More beneficially, if we are interested in fastest possible response times to customers, we may split along geographic boundaries.  We may have data centers (or IaaS regions) serving the West and East Coasts of the US respectively, the “Fly-Over States” of the US, and regions serving the EU, Canada, Asia, etc.  Besides contributing to faster perceived customer response times, these implementations can also help ensure we are compliant with data sovereignty laws unique to different countries or even states within the US.


Combining the concepts of service and database separation into several fault isolative failure domains creates both a scalable and highly available platform.  AKF has helped achieve a high availability through fault isolation.  Contact us to see how we can help you achieve the same fault tolerance.

AKF Partners helps companies create highly available, fault isolated solutions.  Send us a note - we’d love to help you!

Subscribe to the AKF Newsletter

Contact Us

Why CTOs Fail and What CEOs and CTOs Can Do About It

May 2, 2018  |  Posted By: Marty Abbott

The rate of involuntary turnover for the senior executive running technology and engineering in a company is unfortunately high, with most senior technology executives lasting less than four years in a job.  Our experience over the last 12 years, and with nearly 500 companies, indicates that nearly 50% of senior technology executives are at-risk with their peers around the CEO’s management team table.

The reasons why CEOs and their teams are concerned about their senior technology executive vary, but the trigger causes of concern cluster around 5 primary themes:

  1. Lack of Business Acumen – Doing the Wrong Things
  2. Failure to Lead and Inspire
  3. Failure to Manage and Execute
  4. Trapped in Today – Failure to Plan for Tomorrow
  5. Lack of Technical Knowledge - Doing Things the Wrong Way

Before digging into each of these, I believe it is informative to first investigate the sources (backgrounds) of most senior technology executives.

CTO and CIO Backgrounds

Within our dataset, there are 2 primary backgrounds from which most senior technology executives come:

    1.    Raised through the Technical Ranks

These executives have spent a majority of their working career in some element of product engineering or IT.  They are very often promoted based on technical acumen, and a perception of being able to “get things done”.  Often, they are technically gifted folks with technical or engineering undergraduate degrees, and sometimes they have a great deal of project management experience.

    2.    Co-Opted from the Business

These executives have spent a majority of their working career in a function other than technology.  They may have been raised through marketing, sales or finance and have demonstrated an affinity for technology and some high-level understanding of its application.  They rarely have a deep technical understanding and have done very little hands-on work within technology or engineering.

We see both backgrounds struggling with the chief technology role. Sometimes they fail for similar reasons, but often for very different reasons within our five primary causes.
 
Lack of Business Acumen – Doing the Wrong Things

This is by far the most common reason for failure for chief technologists “Raised through the technical ranks”.  On the flip side, it is rarely in our experience the reason why executives “Co-Opted from the Business” find themselves in trouble. 

The technologist’s peers often complain that they do not trust the executive in question and often cite a failure to present opportunities, fixes, and projects in “business terms”.  Put frankly, a lack of understanding of business in general, and an inability (or lack of desire to) justify actions in meaningful business terms causes a lack of trust with the CIO/CTO’s peers. 

Further, CTOs sometimes overly focus on what is “cool” rather than the things that create significant business and stakeholder value.  Business peers complain that money and headcount is being spent without a justifiable return.  An oft heard quote is “We don’t get why there are so many people working on things that don’t drive revenue.”

The fix for this is easy.  Executives raised through the technology ranks should seek education and training in the fundamentals and language of business.  A great way to do this is for the tech exec to get an MBA or attend an abbreviated MBA-like program such as those offered by Harvard Business School or Stanford.  Many business courses focused on cost justification and the business financial statements are also available from online programs.

Failure to Lead and Inspire

This failure is most common for executives “raised through the technical ranks”.  Comments from peers and CEOs of the CTO indicate that he struggles with creating a strategy that clearly supports the needs of the business.  Further, individual contributors within the organization appear to be disconnected from the mission of the business and disengaged at work.  Often, employees within such an organization will complain that the CIO/CTO requires that all major decisions go through her.  Such employees and organizations often test low on work engagement and overall morale.

The cause of this failure is again often the result of promoting a person solely upon her technical capabilities.  While there are many great technology leaders, leadership and technical acumen have very little in common.  Some folks can be trained to appreciate the value and need for a compelling vision and to be inspirational, but alas some cannot.

The fix is to attempt training, but where the executive does not show the willingness or capability, a replacement may be necessary.  Sometimes we find that mentoring from a former or current successful CIO/CTO helps.  Coaching from a professional coach or leadership professional may also be helpful. 

Failure to Manage and Execute

This failure is common for both those raised through the technical ranks, and those co-opted from the business.  At the very heart of this failure is a perceived lack of execution on the part of the executive – specifically in getting things done.  Often the cause is a lack of communication.  Sometimes the cause is poor project management capabilities within the organization or a lack of management acumen within the organization. 

For executives co-opted from the business, we find that they sometimes struggle in communicating effectively with the technical members of their team and as such don’t have a clear understanding of status.  They may also struggle with the right questions to ask to either probe for current status or to help keep their teams on track.

Where the problem is the lack of technical understanding by non-technical executives running tech functions, the fix is to augment them with strong technical people who also speak the language of business (similar to the lack of business acumen failure).  For the other failures, the fix is to build appropriate project management and oversight into product delivery.  This may be adding skills to the team, or it may mean needing to infuse an element of valuing execution into the technology organization.

Trapped in Today – Failure to Plan for Tomorrow

This type of failure is common to executives of both backgrounds.  In fact, the problem is common to leaders in all positions requiring a mix of both operational focus and forward-looking strategy development.  The needs of running the day to day business often biases the executive to what is necessary to ensure that you make a day, month or quarter at the expense of what needs to happen for the future.  As such, fast moving secular trends (past examples of which are IaaS, Agile, NoSQL, mobile adoption, etc) get ignored today and become crises tomorrow.

There are many potential fixes for this, including ensuring that executives budget their time to create “space” for future planning.  Organizationally, we may add someone in larger companies to focus on either operational needs or the needs of tomorrow.  Additionally, we can bring in outside help and perspectives either in the form of new hires or consultants who have experience with emerging trends and technologies.

Lack of Technical Knowledge

This failure is owned almost entirely by folks with a non-technology background who find themselves running technical organizations.  It manifests itself in multiple ways including not understanding what technologists are saying, not understanding the right questions to ask, and not fully understanding the ramification of various technical decisions.  While a horrible position to be in, it is no more or less disastrous than lacking business acumen.  In either scenario, we have an imperfect matching of an engine and the transmission necessary to gain benefit from that engine.

Unfortunately, this one is the hardest of all problems to fix.  Whereas it is comparatively easy to learn to speak the language of business, the breadth and depth of understanding necessary to properly run a technology organization is not easily acquired.  One need not be the best engineer to be successful, but one should certainly understand “how and why things work” to be able to properly evaluate risk and ask the right questions.

The fix is the mirror image of the fix for business acumen.  No technology executive should be without deep technical understanding AND a clear understanding of business fundamentals.

Key Learnings

Perhaps the most important point to learn from our experience in this area is to ensure that, regardless of your CTO/CIO’s background, you ensure he or she has both the technical and business skills necessary to do the job.  Identify weaknesses through interviews and daily interactions and help the executive focus on shoring up these weaknesses through skill augmentation within their organization or through education, mentoring and coaching.

AKF Partners provides mentoring and coaching for both CTOs and CEOs to help ensure a successful partnership between these two key creators of company value.  If your CEO or CTO has departed, we also provide interim leadership services.

 

Subscribe to the AKF Newsletter

Contact Us

Technical Due Diligence and Technical Debt

April 30, 2018  |  Posted By: Daniel Hanley

Over the past 11 years AKF has been on a number of technical due diligences where we have seen technology organizations put off a large portion of engineering effort creating technical debt.  A technical due diligence as introduced in Optimize Your Investment with Technical Due Diligence, should identify the amount of technical debt and quantify the amount of engineering resources dedicated to service the debt.


What is Technical Debt?
Technical debt is the difference between doing something the desired or best way and doing something quickly.  Technical debt is a conscious choice, made knowingly and with commission to take a shortcut in the technology arena - the delta between the desired or intended way and quicker way.  The shortcut is usually taken for time to market reasons and is a sound business decision within reason. 
AKF Technical Debt Definition

Is Tech Debt Bad?
Technical debt is analogous in many ways to financial debt - a complete lack of it probably means missed business opportunities while an excess means disaster around the corner.  Just like financial debt, technical debt is not necessarily bad, by accruing some debt it allows the technology organization to release an MVP (minimal viable product) to customers, just as some financial debt allows a company to start new investments earlier than capital growth would allow.  Too little debt can result in a product late to market in a competitive environment and too much debt can choke business innovation and cause availability and scalability issues.  Tech debt becomes bad when the engineering organization can no longer service that debt.  A technical due diligence should discover a proper management of a team’s technical debt. 

Tech Debt Maintenance
Similar to financial debt, technical debt must be serviced, and it is serviced by the efforts of the engineering team - the same team developing the software.  A failure to service technical debt will result in high interest payments as seen by slowing time to market for new product initiatives post investment.  Our experience indicates that most companies should expect to spend 12% to 25% of engineering effort on a mix of servicing technical debt and handling defect correction (two different topics).  Whether that resource allocation keeps the debt static, reduces it, or allows it to grow depends upon the amount of technical debt and also influences the level of spend.  It is easy to see how a company delinquent in servicing their technical debt will have to increase the resource allocation to deal with it, reducing resources for product innovation and market responsiveness.  Just as a financial diligence is verifying the CFO has accountability of their balance sheet, AKF looks to verify the CTO is on top of their technology balance sheet during technology due diligence. 

AKF Technical Debt Balance Sheet

Tech Debt Takeaways:

  • Delaying attention to address technical issues allows greater resources to be focused on higher priority endeavors
  • The absence of technical debt probably means missed business opportunities – use technical debt as a tool to best meet the needs of the business
  • Excessive technical debt will cause availability and scalability issues, and can choke business innovation (too much engineering time dealing with debt rather than focusing on the product)
  • The interest on tech debt is the difficulty or increased level of effort in modifying something in subsequent releases
  • The principal of technical debt is the difference between “desired” and “actual”
  • Technology resources to continually service technical debt should be clearly planned in product road maps – 12 to 25% is suggested

 

AKF’s technical due diligence will discover a team’s ability to quantify the amount of debt accrued and the engineering effort to service the debt.  Leadership should be allocating approximately 20% of resources towards this debt and planning for the debt in the product road map. 

Click here to see how AKF Partners can help you assess your company’s technical debt, or assess the debt and issues therein for any investment you plan to make.

 

Subscribe to the AKF Newsletter

Contact Us

 1 2 3 >  Last ›