GROWTH BLOG: Backend for Frontend (BFF) Pattern: The Dos and Don’ts of the BFF Pattern
AKF Partners Logo Technology ConsultingScalability - We wrote the book on it ℠

Growth Blog

Scalability and Technology Consulting Advice for SaaS and Technology Companies

Managing Risk with Technical Due Diligence

February 20, 2018  |  Posted By: Greg Fennewald

You should not buy a home without an inspection by a licensed home inspector and you should not buy a used car without having a mechanic check it out for you.  Diligence - it just makes good sense.  Similarly, it is prudent to include technical diligence as part of the evaluation for a potential technology company investment.


Diligence Informs Risk Management

Private equity and venture capital firms typically evaluate many areas preceding a potential investment.  The business case, legal structure, competitive analysis, product strategy, financial audits and contractual landscape are all examples of diligence deemed necessary prior to an investment.  A company with a great product but three years left on an extremely expensive office lease will probably have a lower value.  Breaking the lease or living with it until the term expires means higher costs and thus lower EBITDA.  A hot start up with an inexperienced CFO that has run on cash-based accounting from day 1 and is rapidly approaching $6 million in annual revenue needs to move to accrual-based accounting.  That takes time and effort and possibly a talent search - this affects the value of the investment. 

But what about the technical underpinnings of the product itself?  A company with a solitary production database and a marketing analyst with access to directly query that database is likely headed for performance and availability incidents.  Single points of failure create a high probability of non-availability.  Solutions that don’t allow for seamless and elastic scalability may run into either capacity or cost of operations problems. 

Preventing these incidents and altering the conditions that enabled them to exist takes time and effort.  All of these assessment areas boil down to risk management.  Further, understanding the cost of fixing these solutions helps a company understand their true cost of investment.  Your investment includes not just the “PIC” or capital that you put into the company - it also includes all the costs to ensure continuing operations of the product that enables that company.  A comprehensive diligence including technical diligence will prepare the investor to make an informed business decision - know the risks and adjust the value proposition accordingly.

Technology Risk Areas

Technology risks can be grouped into four broad areas - Architecture, Process, Organization, and Security.  Each area has several subordinate themes.

Architecture - subordinate themes are availability, scalability, cost control.


• Commodity hardware - Corollas, not Carreras
• Horizontal scalability - scale out, not up
• Design for monitoring - see issues before your customers do
• N+1 design - everything fails eventually
• Design for rollback - minimize the impairment
• Asynchronous design - stateless systems

Process - subordinate themes are engineering, operations, and problem management

• Product management - a product owner should be able add, delay, or deprecate features from an upcoming release
• Metrics - development teams should use effort estimation and velocity measurement metrics to monitor progress and performance
• Development practices - developers should conduct code reviews and be held accountable for unit testing
• Incident management - incidents should be logged with sufficient details for further follow up
• Post mortem - a structured process should be in place to review significant problems, assign action items, and track resolution
• PDLC - the Product Development Lifecycle should align with the company’s desires to be customer driven (not desirable in most cases) or market driven (resulting in the highest returns and fastest saturation of any market)


Organization - subordinate themes are PDLC (Product Development Lifecycle) structure, product alignment and team composition

• Product or Service Alignment - cross functional teams should be aligned by product or service and understand how their efforts complement business goals
• Agile or Waterfall - if “discovering” the market or choosing the best possible product for a market then Agile is appropriate - if developing to well defined contracts then waterfall may be necessary.
• Team composition - the engineer to QA tester ratio should ideally exceed 3.5:1.  Significant deviations may be a sign or trouble or a harbinger of problems to come
• Goals - measurable goals aligned with business priorities should be visible to all with clear accountability

Security - subordinate themes are framework, prevention, detection and response

• Framework - use NIST, ISO, PCI or other regulatory standards to establish the framework for a security program.  The standards do overlap, think it through and avoid duplication of effort.
• Policies in place - a sound security program will have multiple security related policies such as employee acceptable use, access controls, data classification, and an incident response plan.
• Security risk matrix - security risks should be graded by their impact, probability of occurrence, and controlling measures
• Business metrics - analysis of business metrics (revenue per minute, change of address, checkout value anomalies, file saves per minute, etc) can develop thresholds for alerting to a potential security incident.  Over time, the analysis can inform prevention techniques.
• Response plan - a plan must be in place and must have regular rehearsals.

Technology Cost Impact on Investment Value

Technology costs can have a significant impact on the overall investment value.  Strengths and weaknesses uncovered during a technical diligence effort help the investor make the best overall business decision.

Technology costs are normally captured in 2 areas of the income statement, cost of revenue (production environment and personnel) and operating expenses (software development).  Technology costs can also affect depreciation (server capital purchases) and amortization (pre-paid licensing and support).  These cost areas should be reviewed for unusual patterns or abnormally high or low spend rates.  It is also important to understand the term of equipment purchase, software licensing, and support contracts - spend may be committed for several years.

Cost Cautions - tales from the past

• Support for production equipment purchased from a 3d party because the equipment is old and no longer supported by the OEM.  Use equipment as long as possible, but don’t risk a production outage.
• Constant software vendor license audits - they will find revenue, but the technology team that leaves their company vulnerable on a recurring basis is likely to have other significant issues.
• Lack of an RFP or benchmarking process to periodically assess the cost effectiveness of hardware, software, hosting, and support vendors.  Making a change in one of these areas is not simple, but the technology team should know how much they should pay before a change is better for the company.

Technical Debt

A technical diligence effort should also identify the level of technical debt and quantify the amount of engineering resources dedicated to servicing the technical debt.

Technical debt is a conscious choice to take a shortcut in the technology arena - the delta between the desired or intended way and quicker way.  The shortcut is usually taken for time to market reasons and is a sound business decision within reason.  Technical debt is analogous in many ways to financial debt - a complete lack of it probably means missed business opportunities while an excess means disaster around the corner. 

Just like financial debt, technical debt must be serviced, and it is serviced by the efforts of the engineering team - the same team developing the software.  AKF recommends 12% to 25% of engineering effort be spent servicing technical debt.  Whether that resource allocation keeps the debt static, reduces it, or allows it to grow depends upon the amount of technical debt.  It is easy to see how a company delinquent in servicing their technical debt will have to increase the resource allocation to deal with it, reducing resources for product innovation and market responsiveness.

Put It All Together

The investor has made use of several specialists in an overall diligence effort and is digesting the information to zero in on the choice to invest and at what price.  The business side looks good - revenue growth, product strategy, and marketing are solid.  The legal side has some risks relating to returning a leased office space to its original condition, but the lease has 5 years to run.  Now for technology;

• Tech refresh is overdue, so additional investment is needed or a move to the cloud accelerated - either choice puts pressure on thin margins.
• An expensive RDBMS is in use, but the technology team avoids stored procedures and keeps their SQL as vanilla as possible - moving to open source is doable.
• Technical debt service is constantly derailed by feature requests from sales and marketing.  Additional resources, hired or contracted, will be needed and will raise the technology run rate.  More margin pressure.
• Conclusion - the investment needed to address tech refresh and technical debt changes the investment value.  The investor lowers the offer price.

Interested in learning more about technical due diligence? Here are some due diligence do’s and don’ts.

How AKF can help

AKF has conducted hundreds of technical due diligence studies over the last 10 years.  One would want an attorney for a legal diligence effort and one would want a technologist for a technical due diligence.  AKF does technology right.  Read more about our technical due diligence offerings here

RELATED CONTENT

 

 

 

Permalink

The AKF Partners Security Insights Cube

February 13, 2018  |  Posted By: Marty Abbott

Necessary But Insufficient Security Reviews

From a security perspective, tech product companies far too often focus solely on various ISO and/or NIST audits to help inform their view of how they manage risk within their company and their products.  The problem with the standards that exist today is that none of them tread deeply enough into the waters of detection and prevention of malicious activities within products.  Instead, they focus more on the processes of response, identification, notification, employee access, etc.

While these activities (and audits) are necessary, they are insufficient to ensure that we properly manage risk (and prevent malicious activities) in our products.  As we’ve written previously, erecting barriers and hiding behind big walls may make you feel better and help you sleep at night – but it’s not going to keep the bad guys from scaling your walls and taking your stuff.

The Online World is Getting Scarier
Consider the following secular trends for online products:
• A continuing mix-shift of commerce from retail to online.  Within the US today, excluding certain goods, this number stands at a meager 9% of total commerce in 2017 up from 1% in 2002.  If one excludes extremely high dollar items (vehicles, etc) the percentage of sales is significantly higher.  Growing at a slightly higher than linear rate since 2002, this number should easily double within the next 7 years.  From the perspective of a malicious hacker, this is a growth in opportunity.

• Developing and established nations outside of N. America and Western Europe continue to invest heavily in STEM-based education.

• Overall employment in many of these countries is comparatively low outside of what Western Nations provide through off-shore contracting opportunities.  Combined with recent nationalistic trends and a desire to “keep jobs at home” or not “offshore jobs” there is a strong possibility that demand for offshore agencies will decrease over time.

• Some nations within the set of nations spending heavily on STEM education, have created cyber-institutes promoting cyber and security related warfare capabilities.

• A smaller set of the nations described above have heavily promoted state sponsored cyber warfare initiatives, setting these teams (e.g. the PRNK’s Unit 180) against corporate infrastructure within the United States. 

• The barrier to entry for malicious actors to be effective in attacking corporate assets has declined.  Hacker communities commonly share exploits and malware, and certain nation-states (e.g. Russia and N. Korea) have contributed to hacking toolsets, thereby decreasing the barrier to entry for a malicious actor and resultingly increasing the supply of said malicious actors.

• Extradition from other countries for crimes committed, especially those with which the US is not allied, is difficult to impossible.  View this as a low perceived cost of committing a crime.  If you cannot be prosecuted, there is no to low perceived cost of committing the crime.

• Crypto-currency (e.g. Bitcoin) provide a near untraceable means of selling stolen data, or holding systems for ransom.

The resulting forces of these meta or secular trends are clear: 

1) The value of being a malicious actor has increased as the supply (in terms of sales/value) continues to increase.  View this economically as an increasing opportunity for crime.

2) The barrier to entry to become a malicious actor is decreasing.

3) The cost in terms of prosecution, if performed outside the US is low to zero.

These points combine to make one clear outcome:  Cybercrime and cyberterrorism (fraud, malicious use, etc) will rise as a percentage of revenue transacted online.

To help combat this rising malicious activity, we need new models and approaches to help us think about how to Identify and Prevent bad actors from doing horrible things.


Enter the AKF Security Insights Cube.

Security Insights Cube for evaluation of security program and security monitoring of services


If It Isn’t Real Time It Is Worthless

The AKF Partners Security Insights Cube is predicated on the notion that all the data it addresses is accessible in near-real-time.  This alone is a considerable barrier for many companies.  Identifying fraudulent activity after credit cards are processed, for instance, is simply too late.  We want to know that bad people are entering our neighborhood and at our door – not that they stole something from our house yesterday.

The lower left corner of the cube is the starting point for any solution – the point at which you are flying blind and have no real time data.  Again – getting data from 15 minutes ago or 24 hours ago is as useless in driving a product as it is in driving a car or flying a plane; you simply have no idea what is going on.


X Axis

The X axis of the cube evaluates the breadth of data available to an organization in real time.  The far left is “zero real time data”.  Progressing to the right of the axes are increasingly valuable risk related data points from real time key performance indicators like logins, add-to-carts, check-outs, auth activity (and failures), searches, etc.  Moving further right, we may keep all session data such that we can interrogate and perform behavioral analysis and pattern matching.  The far right of the axis is the point at which we keep absolutely everything, increasing the optionality of how we may interrogate the data for risk management and malicious activity prevention purposes.


Y Axis

The Y axis of the cube evaluates the activities performed upon the X axis data by an organization.  Clearly here the X axis sets an upper bound on what’s possible on the Y axis.  For instance, it would be hard to understand “Who, What or How” something happened if we didn’t first store session data to be analyzed.  From a GDPR perspective, PII can be anonymized if necessary in session information.  As with most analytics oriented system, maturity progresses from doing nothing, to “reporting” capabilities that illuminate “what is happening” (typically employing performance indicators), to answering “Who, Why and How” to finally predicting what will happen and preventing malicious activities in real time.


Z Axis

The Z axis of the cube deals simply with the depth, or duration, that data is kept.  We rarely suggest that data be kept forever, but there is great value in ensuring that past patterns can be analyzed to create behavior models for scoring risk and blocking activities.  A handful of years is typically appropriate for most commerce solutions, slightly more data for fintech solutions.


AKF Partners performs security reviews of technology products.  Our approach evaluates security among several dimensions and includes components of NIST and ISO standards, but is tailored to the needs of online product companies. 

Permalink

Your Site is as Important as the Product You Sell - Recent Example from Saddleback Leather

February 7, 2018  |  Posted By: Pete Ferguson

If you have a premium product, at a premium price, it’s unlikely you would sell it out of a rundown, poorly lighted store that smells vaguely like stale meat.  Yet somehow many of us forget to apply that same reasoning when it comes to selling our products online.  The availability - and look and feel of your presence online - is your store front.

I’ve long been a fan of Saddleback Leather.  However, their motto: “They’ll fight over it when you’re dead” fell short in January.  You see, it’s hard for your family to fight over the thing that you can’t even purchase…  Saddleback Leather had a completely foreseeable, and absolutely preventable outage.  From Dave Munson, the CEO:

“I’ve always dreamt of one day having a really fast and easy website for you to enjoy. So, we decided to leave our slow and clunky old website and start building one on a new and different platform. The contract expired Dec. 30th, 2017, but the new site wasn’t fully ready yet. We flipped the switch anyways and all Gehenna broke loose. The super fast, fun and easy website… wasn’t fast, fun or easy and we wasted a ton of time and irritated the heck out of our favorite people. People couldn’t check out, set up accounts or even add stuff to their carts. So, we paid a ton of money to get our old slow and clunky back again until we get this new site just right. “

To make up for it, last week I received an apology letter sent by “El Presidente” Munson with an 11% off coupon.  11 % because Munson has recently celebrated 11 years of marriage to his wife, Suzzette.  As a side note, it’s a perfect example of how to apologize to your customers when you screw up.  This guy made a mistake, is paying for it by paying for his old site while continuing to develop the new, and is giving customers discounts with a coupon aptly titled: “IAMSORRY.”

Ironically, as a fan and customer, I don’t recall the old site being slow or terrible.  On the contrary, when I visited early in January, their “new and improved” site felt clunky and disjointed.  The wrong images were coming up for products and many items reported being “not available.”

In the world of environmental health and safety, “all accidents are preventable” is the holy grail of compliance.  We believe that with the right forethought and planning, the same is true with virtually all products and storefronts online. 

At AKF we are fond of saying “an accident is a terrible thing to waste.”  While the exact details of what went wrong are not disclosed, the motives were:
- They took a concept that presumably worked great in beta testing live without testing under full load.
- Munson made the decision to push out something that wasn’t yet great to save money by exiting a contract by the end of the year.


    For similar content on our Growth Blog, click here

The result is lost sales from when the site was down, lost customers who may have been trying the website for their first time and won’t be back, an 11% haircut of sales for the next week, and a fan base - many of whom have been very vocal on FaceBook - that is verbally expressing their disdain to see the company they have counted on for unquestioned quality in the past didn’t settle for quality first this time.

The days of customers quickly forgiving their favorite retailers for not being equally as great online are waning.  Make sure you have a solid strategy and the right expertise in your corner when it comes to greatly affecting your customer’s ability to purchase or better interact with your product.

—-

Experiencing growing pains?  AKF is here to help!  We are an industry expert in technology scalability and due diligence.  Put our 200+ years of combined experience to work for you today!

Get this article and others like it by signing up for our newsletter.

 

Permalink

Technical Due Diligence Best Practices

January 23, 2018  |  Posted By: Marty Abbott

Technical due diligence of products is about more than the solution architecture and the technologies employed.  Performing diligence correctly requires that companies evaluate the solution against the investment thesis, and evaluate the performance and relationship of the engineering and product management teams.  Here we present the best practices for technology due diligence in the format of things to do, and things not to do:


The Dos

1. Understand the Investment/Acquisition Thesis

One cannot perform any type of diligence without understanding the investment/acquisition thesis and equally as important, the desired outcomes.  Diligence is meant to not only uncover “what is” or “what exists”, but also identify the obstacles to achieve “what may or can be”.  The thesis becomes the standard by which the diligence is performed.

2. Evaluate the Team against the Desired Outcomes

The technology product landscape is littered with the carcasses of great ideas run into the ground with the wrong leadership or the wrong team.  Disagree?  We ask you to consider the Facebook and Friendster battle.  We often joke that the robot apocalypse hasn’t happened yet, and technology isn’t building itself.  Great teams are the reasons solutions succeed and substandard teams behind those solutions that fail technically.  Make sure your diligence is identifying whether you are getting the right team along with the product/company you acquire.

3. Understand the Tech/Product Relationship

Product Management teams are the engines of products, and engineering teams are the transmission.  Evaluating these teams in isolation is a mistake – as regardless of the PDLC (product development lifecycle) these teams must have an effective working relationship to build great products.  Make sure your diligence encompasses an evaluation of how these teams work together and the lifecycle they use to maximize product value and minimize time to market.

4. Evaluate the Security Posture

Cyber-crime and fraud is going to increase at a rate higher than the adoption of online solutions pursuant to a number of secular forces that we will enumerate in a future post.  As such, it is in your best interest as an investor to understand the degree to which the company is focused on increasing the perceived cost of malicious activity and decreasing the perceived value of said malicious activity.  Ensure that your diligence includes evaluating the security focus, spending, approach and mindset of the target company.  This need not be a separate diligence for small investments – just ensure that you are comfortable with the spend, attention and approach.  Ensure that your diligence properly evaluates the risk of the target solution.

5. Prepare Yourself and the Target

Any diligence will go better if you give the acquisition/investment target an opportunity to prepare documents.  Requesting materials in advance allows the investment target an opportunity to prepare for a deep discussion and ensures that you can familiarize yourself with the product architecture and product development processes ahead of time.  Check out our article on due diligence checklists which includes a list of items to request in advance.

6. Be Dynamic and Probe Constantly

While a thorough list of items to discuss is important, it is equally important to abide by the “2 ears and one mouth” rule:  Spend more time listening than talking.  Look for subtle clues as to the target’s comfort with particular answers.  Are there things with which they are uncomfortable?  Are they stressing certain words for a reason?  Don’t accept an answer at face value, dig into the answer to find the information that supports a claim.

7. Evaluate Debt

Part of the investment in your target could well be an ongoing premium payment against past technical debt.  Ensure that you properly evaluate what debt the company has acquired, and how they are paying the interest and premium payments against that debt.


The Don’ts

1. Don’t Waste Too Much Time (or money) on Code Reviews
The one thing I know from years of running engineering teams is that anytime an engineer reviews code for the first time she is going to say, “This code is crap and needs to be rewritten.”  Code reviews are great to find potential defects and to ensure that code conforms to the standards set forth by the company.  But you are unlikely to have the time or resources to review everything.  The company is also unlikely to give you unfettered access to all of their code (Google “Sybase Microsoft SQLServer” for reasons why).  That leaves you at the whims of the company to cherry-pick what you review, which in turn means you aren’t getting a good representative sample. 
Further, your standards likely differ from those of the target company.  As such, a review of the software is simply going to indicate that you have different standards. 
Lastly, we’ve seen great architecture and terrible code succeed whereas terrible architecture and great code rarely is successful.  You may find small code reviews enlightening, but we urge you to spend a majority of your time on the architecture, people and process of the acquisition or investment.

2. Don’t Start a Fight
Far too often technology diligence sessions start in discussion and end in a fight.  The people performing the diligence start asking questions in a way that may seem judgmental to the target company.  Then the investing/acquiring team shifts from questions to absolute statements that can only be taken as judgmental.  There’s simply no room for this.  Diligence is clinical – not personal.  It’s not a place to prove who is smarter than whom.  This dynamic is one of the many reasons it is often a good idea to have a third party perform your diligence:  The target company is less likely to feel threatened by the acquiring product team, and the third party is oftentimes more experienced with establishing a non-threatening environment.

3. Don’t Be Religious
In a services oriented world, it really doesn’t matter what code or what data persistence platform comprises a service you may be calling.  Assuming that you are acquiring a solution and its engineers, you need not worry about supporting the solution with your existing skillsets.  Debates around technology implementations too often come from a place of what one knows (“I know Java, Java rocks, and everything else is substandard”) than what one can prove.  There are certainly exceptions, like aging and unsupported technology – but stay focused on the architecture of a solution, not the technology that implements that architecture.

4. Don’t Do Diligence Remotely
As we’ve indicated before, diligence is as much about teams as it is the technology itself.  Performing diligence remotely without face to face interaction makes it difficult to identify certain cues that might otherwise be indicators that you should dig more deeply into a certain space or set of questions.  Examples are a CTO giving an authoritative answer to a certain question while members her team roll their eyes or slightly shake or bow their heads.

You may also want to read about the necessary components of technical due diligence in our article on optimizing technical diligence.


AKF Partners performs diligence on behalf of a number of venture capital and private equity firms as well as on behalf of strategic acquirers.  Whether for a third party view, or because your team has too much on their plate, we can help.  Read more about our technical due diligence services here

RELATED CONTENT

Permalink

There Are Always Plenty of Incidents from Which To Learn

January 13, 2018  |  Posted By: Dave Swenson

Sorry, False Alarm…

On January 13, 2018, what felt like an episode of Netflix’s “Black Mirror” unfolded in real life. Just after 8 in the morning, residents and visitors of Hawaii were woken up to the following startling push notification:



Thankfully, the notification was a false alarm, finally retracted with a second notification nearly 40 interminable minutes later.

The amazing, poignant and sobering stories that occurred from those 40 minutes, included people:

     
  • determining which children to spend their last minutes with,
  •  
  • abandoning their cars on streets,
  •  
  • sheltering in a lava tube,
  •  
  • believing and acting as we all would if we believed the end was here.

Unfortunately, this wasn’t a Black Mirror episode and paralyzed an entire state’s population. Thankfully, the alarm was a false one.


A Muted President

As President Trump took office, he introduced a new means for a President to reach his constituents—Twitter, averaging 6 to 7 tweets per day during his first year. On November 2, 2017, many bots that were created to closely monitor the tweets of @realDonaldTrump started reporting that the account no longer existed. Clicking to his account took the user to the above error page.

For a deafening 11 minutes, the nation was unable to listen to its leader, at least via Twitter.


What Happened??

The Hawaiian false alarm was sent by the state’s Emergency Management Agency. Their explanation of the incident was that during a shift change, an employee clicked “the wrong button” while running a missile crisis test, then subsequently clicked through a confirmation prompt (“Are you sure you want to tell 1.5 million people this?”).

Twitter employees had reportedly tried for years to get management attention on ensuring accounts weren’t deleted without proper vetting. The company typically used contractors in the Philippines and Singapore to handle such account administration; Trump’s account was deleted by a German contract worker on his last day at Twitter. Acting on yet-another-Trump-complaint, believing such an important account couldn’t be suspended, the worker’s last action for Twitter was to click the suspend button, and then walked out of the building causing the Twitterverse to read far more into the account’s disappearance than they should have.

In both of these situations, the immediate focus was on the personnel involved in the incident. “Who pushed the button?” is typically always one of the initial questions. Assumptions that a new employee, or rogue worker were behind the incident are common, and both motive and intelligence of all involved are under inspection.

We at AKF Partners constantly preach “An incident is a terrible thing to waste”. Events such as these warp the known reality into “How the shit can that happen??”, causing enough alarm to warrant special attention and focus, if not panic. Yet, all too often we see teams searching frantically to find any cause, blame the most obvious, immediate factor, declare victory, and move on.

Who pushed the button?” is only one of many questions.


Toyota’s Taichi Ohno, the father of Lean Manufacturing, recognized his team’s habit of accepting the most apparent cause, ignoring (wasting) other elements revealed by an incident, potentially allowing it to be eventually repeated. Ohno (the person, not the exclamation typically uttered during an incident) emphasized the importance of asking “5 Why’s” in order to move beyond the most obvious explanation (and accompanying blame), to peel the onion diving deeper into contributory causes.

Questions beyond the reflexive “What happened?” and “Who did it?” relevant to the false alarm and erroneous account deletion incidents include:

  • Why did the system act differently than the individual expected (is there more training required, is the user interface a confusing one)?
  •  
  • Why did it take so long to correct (is there no playbook for detecting / reversing such a message or key account activity)?
  •  
  • Why does the system allow such an impactful event to be performed unilaterally, by a single person (what safeguards should exist requiring more than one set of hands?)
  •  
  • Why does this particular person have such authorization to perform this action (should a non-employee have the ability to delete such a verified, popular and influential account)?
  •  
  • Why was the possibility of this incident not anticipated and prevented (why were Twitter employee requests for better safeguards ignored for years, why wasn’t the ease of making such a mistake recognized and what other similar mistake opportunities are there)?

Both of these incidents have had an impact far beyond those directly affected (Hawaiian inhabitants or Trump Twitter followers), and have shed light on the need to recognize the world has changed and policies and practices of old might not be enough for today. The ballistic missile false alarm revealed that more controls need to be placed on all mass communication, but also that Hawaii (or anywhere/anyone else) is extremely unprepared for the unthinkable. The use of Twitter as a channel for the President now raises questions over the validity of it as a Presidential record, asks who should control such a channel, and raises concerns on what security is around the President’s account?

Ask 5 Whys, look beyond the immediate impact to find collateral learnings, and take notice of all that an incident can reveal.


AKF Partners have been brought in by over 400 companies to avoid such incidents, and when they do occur, to learn from them. Let us help you.

Permalink

Conway’s Law – The Rest of the Story.. and How To Fix It

December 14, 2017  |  Posted By: Marty Abbott

The Law that Almost Wasn’t

Conway’s law had a rather precarious beginning.  Harvard Business Review rejected Conway’s thesis, buried as it was in the 43d paragraph of a 45-paragraph paper, on the grounds that he had not proven it.

But Mel had a PhD in Mathematics (from Case Western Reserve University – Go Spartans!), and like most PhDs he was accustomed to journal rejections.  Mel resubmitted the paper to Datamation, a well-respected IT journal of the time, and his paper “How Do Committees Invent” was published in 1968.

It wasn’t until 1975, however, that the moniker “Conway’s Law” came to be.  Fred Brooks both coined the term and popularized Conway’s thesis in his first edition of the Mythical Man Month.  It has since been one of the most widely cited, important but nevertheless incorrectly understood and applied notions in the domain of product development.

Cliff’s Notes to “How Do Committees Invent” (the article in which the law resides)

Conway’s thesis, in his words:

… organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.

Conway calls this self-similarity between organizations and designs homomorphism.  Preamble to the thesis helps explain the breadth and depth:

… the very act of organizing a design team means that certain design decisions have already been made, explicitly or otherwise

Every time a delegation is made … the class of design alternatives which can be effectively pursued is also narrowed.

Because the design which occurs first is almost never the best possible, the prevailing system concept may need to change. Therefore, flexibility of organization is important to effective design.

Specifically, each individual must have at most one superior and at most approximately seven subordinates

Examples. A contract research organization had eight people who were to produce a COBOL and an ALGOL compiler. After some initial estimates of difficulty and time, five people were assigned to the COBOL job and three to the ALGOL job. The resulting COBOL compiler ran in five phases, the ALG0L compiler ran in three.

There are 4 very important points, and one very good example, in the quotes above:
    1)   Organizations and design/architecture and intrinsically linked.  The organization affects and constrains the architecture - the opposite is not true.
    2)   Depth of an organization negatively effects design flexibility.  The deeper the hierarchy of an organization, the less flexible (or alternatively more constrained) the resulting architecture.
    3)   We will make mistakes and must organize to quickly fix these.
    4)   Team size should always be small – which also has an implication to the size of the solution part a team can own (think Amazon’s re-branding of this point of the “2 Pizza Team” (author’s side note – read Scalability Rules for how this came about).

Important corollaries to Conway’s law suggest that if either an organization or a design change, without a corresponding change to the other, the product will be at risk. 

Common Failures in Application of Conway’s Law and How to Fix Them

There are five very common failures in organization and architecture within our clients, the first four of which relate directly to Conway’s points above:
    1)   Organizations and architectures designed separately.  Given the homomorphism that Conway describes, you simply CANNOT do this.
    2)   Deep, hierarchical organizations.  Again – this will constrain design. 
    3)   Lack of flexibility.  Companies tend to plan for success.  Instead, assume failure, learning, and adaptation (think “discovery” and “Agile” instead of “requirements” and “Waterfall”).
    4)   Large teams.  Forget about these.  Small teams, each owning a service or services that the team can support in isolation.

There is a fifth violation that is harder to see in Conway’s paper.  Too often, our clients don’t build properly experienced teams around the solutions they deploy.  Success in low-overhead organizations requires that teams be cross functional.  Whatever a team needs to be successful should be within that team.  If you deploy on your own hardware, you should have hardware experience.  If you need DBA talent, the team should have direct access to that talent.  QA folks should be embedded within the team, etc.  Product managers or owners should also be embedded in the team.  This creates our fifth failure:

    5)   Functional teams.  Don’t build teams around “a skill” – build them around the breadth of skills necessary to accomplish the task handed to the team.

Conway’s Parting Shot and Food for Thought
Noodle on this:  Conway identified a problem early in the life of a new domain.  Yet what was true in Conway’s time as a contributor to the art is still true today, over 50 years after his first attempt to forewarn us:

Probably the greatest single common factor behind many poorly designed systems now in existence has been the availability of a design organization in need of work.

Like this article?  Share it with friends here, and subscribe to the newsletter here.

AKF Partners helps companies ensure that their organizations and architectures are aligned to the outcomes they desire.  We help companies develop better, more highly available and more highly scalable products with faster time to market and lower cost.  Give us a call or shoot us an email.  We’d love to help you achieve the success you desire.

Reach out to AKF

 

Permalink

Tuckman’s Stages and Agile Development

November 8, 2017  |  Posted By: Bill Armelin

In 1965, psychologist Bruce Tuckman published his theory of group dynamics. This theory describes the stages (or phases) through which a team progresses enroute to optimal productivity.  While generally useful for any organization, and prescriptive as to what leaders should do when to boost performance, it has profound impacts to Agile development practices and how we build organizations around these Agile practices.

Forming
The first stage is forming. This is where the team first comes together. Here, the individuals are trying to get to know each other. They tend to be polite and cordial, but they do not fully trust each other.

In this stage, the team productivity and team conflict are low. The team spends time agreeing to what the team is supposed to do. This lack of agreement of the team’s purpose can cause members to miss goals because they are individually targeting different things. Team members rely on patterned behavior and look to the team leader for guidance and direction. The team members want to be accepted by the group.  Cautious behavior on the part of the team starts to depress overall team outcomes.  Good leadership, emphasizing goals and outcomes is important to set the stage for future team behaviors and outcomes.

Storming
Once the team’s goals are clear, they move into the next stage, storming. Here, the team starts to develop a plan to achieve the goal and defines what to do and who does it. Friction starts to occur as members propose different ideas. Trust within the team remains low and affective conflict rises as people vie for control. Cliques can form. Productivity drops even lower than in the first stage.

Once the team agrees on the plan and the roles and responsibilities, it can move to the next stage. Without agreement, the team can get stuck. Symptoms include poor coordination, people doing the wrong things and missing deadlines, to name a few.  Good leadership here focuses on fast affective conflict resolution, and serves to help reinforce team goals and outcomes in order to quickly move to more productive phases.

Norming
Once team members agree to the plan and understand their roles, they enter the norming phase. Affective conflict goes down, cognitive (beneficial) conflict and trust increase.  The team focuses on how to get things done and productivity begins to increase. The team develops “norms” about how to work together and collaborate. A lack of these norms can cause issues such as low quality and missed deadlines.

Leadership within the team becomes clear and cliques dissolve. Members begin to identify with one another and the level of trust in their personal relationships contributes to the development of group cohesion. The team begins to experience a sense of group belonging and a feeling of relief from resolving interpersonal conflicts.  Team identity starts to take hold and innovation and creativity within the team increases. The members feel an openness and cohesion on both a personal and task level. They feel good about being part of the team.

Performing
The final stage, preforming, is not achieved by all teams. This stage is marked by an interdependence in personal relations and problem solving within the realm of the team’s tasks. Team members share a common goal, understand the plan to achieve it, know their roles and how to work together.  The team is firing on all cylinders. At this point, the team is highly productive and collaborates well. They are trusting of each other and “have each other’s back.” Healthy conflict is encouraged. There is unity: group identity is complete, group morale is high, and group loyalty is intense.

Not all teams get to this phase. They can get stuck in a previous phase or slide back into them from a higher phase.  Leadership that focuses on affective conflict resolution, team identity creation, a compelling vision and goals to achieve that vision is critical to reaching the Performing phase.  It is usually not easy for teams to quickly progress through these stages, and it often takes 6 months or more for a team to reach the Performing phase. 


Impact to Agile Development
We often see companies make the mistake of coalescing teams around initiatives.  Sometimes called “virtual teams” or “matrixed teams”, these teams suffer the underperforming phases of Tuckman’s curve repeatedly, especially when these initiatives are of durations shorter than 6 months.  But even with durations of a year, six months of that time is spent getting the team to an optimum level of performance.

Tuckman’s analysis indicates that teams should be together for no less than a year (giving a 6 month return on a 6-month investment) and ideally for about 3 years.  The upper limit being informed by the research on group think and its implications to creativity, performance and innovation within teams.  Teams then should become semi-permanent and we should seek to move work to teams rather than form teams around work.  To be successful here, we need multi-disciplinary teams capable of handling all the work they may get assigned.  Further, the team needs to be familiar and “own” the outcomes associated with the solution (or architectural components) with which they work.  More on that in future articles discussing Conway’s Law and Empathy Groups.

AKF Partners helps companies understand and apply the extant theory around organizational development in order to turbo-charge engineering performance.  Wondering if your engineering productivity decreases as you grow your engineering and product teams?  We can help you fix that and get your productivity back to the level it was as a startup!


Reach out to AKF

 

Permalink

Forget about North Korea's Nukes - the PRNK Is Engaged in Cyber Warfare against Your Company

October 24, 2017  |  Posted By: Marty Abbott

North Korea’s recent antics involving ballistic missiles and nuclear weapons are scary.  While we seem to be edging ever closer to nuclear war – closer perhaps than any time since the Cuban Missile Crisis – the probability of such an occurrence remains relatively low.  Even an apparently irrational head of state such as Kim Jong Un must understand that the use of a nuclear device will turn nearly the entire world against him.  The use of a device against any nation would end his reign in relatively short order and end the People’s Republic of North Korea as we know it today. This then begs the question of why Kim Jong Un would participate in such brinkmanship?  Many politicians and strategists seem to think it is a strategy to force other nations to recognize the PRNK and reduce the onerous sanctions currently levied against it by the United Nations.  Perhaps, but maybe in addition to or instead, Jong Un is trying to take our eyes off the war he has been waging for many years:  a cyber war against many nations.

Both cyber warfare on the part of a nation state and cyber terrorism waged by stateless entities aim to attack our economic infrastructure.  Both North Korea and terrorists understand that attacking our economy, our businesses and our personal wealth are the most effective methods of causing harm to our nations and their citizens.  North Korea is likely behind many recent attacks on financial institutions, has ties to the WannaCry ransomware outbreak, was behind the attack on Sony pictures and was involved in a heist of $81M from the Bangladesh Central Bank.  Each of these were likely perpetrated by the formidable PRNK cyber warfare group “Unit 180”.

When not engaging in direct attacks to steal money from or otherwise harm business operations, both terrorists and nation states seek to use the products of a company for nefarious purposes.  Recent examples include ISIS using eBay’s marketplace to funnel money to an operative in the US, and Russia purchasing advertising on Facebook in an attempt to influence the US election.  Cyber warfare and terrorism are not just threats– they are daily occurrences.  The foregoing examples illustrate how the game has changed.  The question for you is - has your company changed enough to successfully protect itself against this growing and evolving threat?

The answer for most companies with which we work is “No”.  Security organizations seem oblivious to the changing cyber threat.  They continue to focus almost exclusively on barrier protection systems and cyber response processes.  Few companies outside of the financial sector have developed analytics systems to help identify emerging threats and nefarious activity.  Fewer still practice aggressive “patrolling” to identify threats outside of the perimeter of their digital operations.  Here are a few questions to help you evaluate whether your company has the mindset necessary to be successful in the world of cyberwarfare and terrorism:

Who means you harm and how do they intend to perpetrate it?

Military veterans know that a successful defense requires more than just “Alamo’ing Up” behind a wall and hunkering down.  You must patrol and reconnoiter the surrounding area to understand whence the enemy will come, in what numbers and with what capabilities.  If your security team isn’t actively attempting to identify threats outside of your organization - and by this I mean beyond your walls - you are most certainly going to be surprised.

How do you find new and emerging behavior within your product and operations?

Given the threat of using your product for nefarious purposes, how do you identify when new and behaviors or trends emerge?  What analytics systems do you have to identify that existing personas or users are acting in new or odd ways?  How do you keep an eye on new patterns or trends of usage by both existing and new users?  In very high transaction environments, how do you identify the less than 1 basis point of activity that may be nefarious in nature buried within 99.99% of valid transactions?  These questions aren’t likely to be answered by a “traditional” security team – they require teams with deep analytic skills and systems dedicated to analytics and machine learning.  Similarly, traditional analytics teams may not have the right mindset to seek out nefarious transactions.

Do you have the right people?

This is the most important question of all.  You don’t need to fire your CSSP folks – you still have a need for them within your security team.  But you also want folks with a proven record of being able to think like and use the tools of cyber criminals, terrorists, and warfare focused nation-states.  These folks are unlikely to be willing to wear suits and ties to work, preferring instead to wear shorts and Birkenstocks.  The traditional corporate mindset and tools will stand in the way of them being successful on your behalf.  They need to use TOR browsers and have access to sites to which you are unlikely to want the remainder of your employees going.  The biggest barrier to success here with most companies is fit with a company’s culture – but I can guarantee you that if you don’t have some of these folks on staff you are not going to be successful in this new era of cyber warfare.

How do you fare against the above questions?  Are you properly set up to defend your company and your shareholders against the today’s cyber threat?  If you are uncertain, reach out to AKF Partners – we’ll evaluate your security infrastructure and approach and help ensure that you can properly defend yourself against the growing threat.

Permalink

How to Reduce Crime in Big Cities

October 15, 2017  |  Posted By: Marty Abbott

I’m a huge Malcolm Gladwell fan.  Gladwell’s ability to convey complex concepts and virtually incomprehensible academic research in easily understood prose is second to none within his field of journalism.  A perfect example of his skill is on display in the Tipping Point, where Gladwell wrestles the topic of Complexity Theory (aka Chaos Theory) into submission, making it accessible to all of us.  In the Tipping Point, Gladwell also introduces us to The Broken Windows Theory.

The Broken Windows Theory gets its name from a 1982 The Atlantic Monthly article.  This article asked the reader to imagine a building with a few broken windows.  The authors claim that the existence of these windows invite vandals to break still more windows.  A continuous cycle of expanding vandalism ensues, with squatters moving in, nearby buildings getting vandalized, etc.  Subsequent authors expanded upon the theory, claiming that the presence of vandalism invites other crimes and that crime rates soar in communities where unhandled vandalism is present.  A corollary to the Broken Windows Theory is that cities can reduce crime rates by focusing law enforcement on petty crimes.  Several high profile examples seem to illustrate the power and correctness of this theory, such as New York Mayor Giuliani’s “Zero Tolerance Program”.  The program focused on vandalism, public drinking, public urination, and subway fare evasion.  Crime rates dropped over a 10 year period, corresponding with the initiation of the program.  Several other cities and other experiments showed similar effects.  Proof that the hypothesis underpinning the theory is correct.

Not So Fast…

Enter the self-described “Rogue Economist” Stephen Levitt and his co-author Stephen Dubner - both of Freakonomics fame.  While the two authors don’t deny that the Broken Windows theory may explain some drop in crime, they do cast significant doubt on the approach as the primary explanation for crime rates dropping.  Crime rates dropped nationally during the same 10 year period in which New York pursued its Zero Tolerance Program.  This national drop in crime occurred in cities that both practiced Broken Windows and those that did not.  Further, crime rate dropped irrespective of either an increase or decrease in police spending.  The explanation therefore, argue the authors, cannot primarily be Broken Windows.  The most likely explanation and most highly correlated variable is a reduction in a pool of potential criminals.   Roe v. Wade legalized abortion, and as a result there was a significant decrease in the number of unwanted children, a disproportionately high percentage of whom would grow up to be criminals.

Gladwell isn’t therefore incorrect in proffering Broken Windows as an explanation for reduction in crime.  But the explanation is not the best one available and as a result it holds residence somewhere between misleading (worst case) and incomplete (best case).

What Happened?

To be fair, it’s hard to hold Gladwell accountable for this oversight.  Gladwell is not a scientist and therefore not trained in how to scientifically evaluate the research he reported.  Furthermore, his is an oft repeated mistake even among highly trained researchers.  And what exactly is that mistake?  The mistake made here is illustrated by the difference in approach between the Broken Windows researchers and the Freakonomics authors.  The Broken Windows researchers started with something like the following question “Does the presence of vandalism invite additional vandalism and escalating crime?”  Levitt and Dubner first asked the question “What variables appear to explain the rate of crime?”

Broken Windows started with a question focused on deductive analysis.  Deduction starts with a hypothesis - “Evidence of vandalism and/or other petty crimes invites similar and more egregious crimes”.   The process continues to attempt to confirm or disprove the hypothesis.  Deduction starts with a broad and abstract view of the data – a generalization or hypothesis as to relationships – and attempts to move to show specific relationships between data elements.  The Broken Windows folks started with a hypothesis, developed a series of experiments to test the hypothesis and then ultimately evaluated time series data in cities with various Broken Windows approaches to policing.  What they lacked was a broad question that may have developed a range of options indicating possible causes.

The Freakonomics authors started with an inductive question.  Induction is the process of moving from specific observations about data into generalizations.  These generalizations are often in the form of hypothesis or models as to how data interacts.  Induction helps to inform what questions should be asked of the data.  Induction is the asking of “What change in what independent variables seem to correspond with a resulting change in some dependent variable?”  Whereas deduction works from independent variable to dependent variable, induction attempts to work backwards from dependent variable to identify independent variable relationships.

So What?

The jump to deduction, without forming the right questions and hypotheses through induction, is the biggest mistake we see in developing Big Data programs and implementing Big Data solutions.   We all approach problems with unique experiences and unique biases.  The combination of these often cause us to race to hypotheses and want to test them.  The issue here is two-fold. The best case is that we develop an incomplete (and as a result partially or mostly incorrect) answer similar to that of The Broken Windows researchers.  The worst case is that we suffer what statisticians call a Type 1 error – confirming an incorrect answer.  The probability of type 1 errors increases when we don’t look for alternative or better answers for outcomes within our data sets.  Induction helps to uncover those alternative or supporting explanations.  Exploring the data to discover potential relationships helps us to ask the right questions and form better hypotheses and better models.  Skipping induction makes it highly probable that we will get an incorrect, misleading or substandard answer.

But it is not enough to simply ensure that we practice both induction and deduction.  We must also recognize that the solutions that support induction are different from those that support deduction.  Further, we must understand that the two processes while complimentary can actually interfere with each other when performed on the same system.  Induction is necessarily a very broad and as a result slow and tedious process.  Deduction, on the other hand, needs significantly less data and “prefers” to be faster in implementation.  Inductive systems are best supported by solutions that impose very few relations or structure on the data we observe.  Systems that support deduction, in order to allow for faster response times, impose increased structure relative to inductive systems.  While the two phases of discovery (Induction and Deduction) support each other, their differences suggest that they should be performed on solutions purpose built to their specific needs.

Similarly, not everyone is equally qualified to perform both induction and deduction.  Our experience is that the folks who tend to be good at determining how to prove relationships between variables are often not as good at identifying patterns and vice versa.

These two observations, that the systems that support induction and deduction should be separated and that the people performing these tasks may need to be different, have ramifications to how we develop our analytics systems and organize our Big Data teams.  We’ll discuss these ramifications and more in our next post, “10 Anti-Patterns within Big Data”.

Permalink

Hosting Lessons from Harvey and Irma

September 19, 2017  |  Posted By: Greg Fennewald

Everyone was saddened to see the horrific destruction storms caused to Houston and Florida, including deaths and extensive property damage. It seems reasonable that the impact of these hurricanes was lessened by advanced notice and preparation – stockpiling supplies, evacuating the highest risk areas, and staging response resources to assist with recovery and rebuilding.

Data centers operate every day with a similar preparation mindset: diesel generators to provide power should the utility fail, batteries to keep servers running during a transition, potentially stored water or a well to replace municipal water service for cooling systems, and food and water for personnel unable to leave the location.

What happens when a “prepared” location such as a data center encounters a hurricane with strong winds, heavy rain, and extensive flooding? In some cases, the data center survives without impact, although there certainly will be outages and failures. Examples of data centers surviving Harvey in good shape can be seen here, while accounts of the service impacts caused by Hurricane Sandy can be seen here.

Data Center Points of Failure

Let’s examine what may enable a data center to survive without functional impact. Extensive risk investigation goes into site selection for data centers. Data centers are expensive to build with costs measured in the tens or even hundreds of millions of dollars. The potential business impact of a failure can be costly with liquidated damage clauses in hosting contracts. These factors lead to data centers being located outside of flood plains, away from hazardous material routes, and stoutly constructed to endure storm winds likely in the region.

Losing utility power is regarded as a “when” not an “if” in the data center industry (be that an outage or a planned maintenance activity), and diesel generators are a common solution, often with 24 hours or more of fuel on hand and multiple replenishment contracts. Data centers can survive for days/weeks without utility power, and in some cases for months. How could flooding impact power? The service entrance for a data center, where the utility power is routed, is often buried underground. Utility power is likely to be lost during flooding, either from damage due to flooding or intentional actions to prevent damage by shutting down the local grid. A data center would operate on generator if the data center itself is not flooded, although fuel replenishment is not likely. If there are two feet of water in the main electrical room(s), the data center is going dark.

Many large data centers rely on evaporating water to cool the servers it hosts. Evaporative cooling is generally more energy efficient than other options, but introduces an additional risk to operations – water supply. In many locations, municipal water pressure is lost during an extend power outage. Data centers can mitigate this risk by using water storage tanks or water wells onsite. Like diesel generators, the data centers can operate normally for hours or days without municipal water. A data center should be outside the flood plain, able to operate without utility power or municipal water for hours or days, is structurally strong enough to handle the winds of a major storm – is there any other risk to mitigate? Network connectivity and bandwidth.

Most data centers need to communicate with other data centers to fulfill their OLAP or OLTP purpose. Without connectivity, services are not available. Data should be fine, but it is becoming increasingly stale. Transactions and traffic are done. Like utility power, network connections are usually buried. With distance and geographic limitations involved, network pathways may get flooded as may the facilities that aggregate and transmit the data. Telecom facilities generally have generators and other availability measures, but can be forced into less advantageous locations and may have a shorter runtime standard than a data center.

Data centers that are serious about availability generally have carrier diversity and physical pathway diversity to mitigate carrier outages and “backhoe fades”. This may help in the event of widespread flooding as well. The reality is a data center without connectivity is generally useless. All the risk mitigation going into structural design, power and cooling redundancy, and fire protection is moot if connectivity fails.

Preparing for the Inevitable

The best way to mitigate these risks is to not rely on a single data center location. One is none and two is one. Owned, colo, managed hosting, or cloud – be able to survive the loss of a single location. The RTO and RPO of the business will guide the choice of active – active, hot – cold, or data backup with an elastic compute response plan. Hurricanes can cause regional impact, such as Irma disrupting most of Florida. In years past, many companies decided to have two data center within 20 miles of each other to support synchronous data base replication. A primary site in one borough of New York City, and the DR site in a different borough. Replication options and data base management techniques have advanced sufficiently to allow far greater dispersion today. Avoid a regionally impacting event by choosing data centers in diverse regions.

Operating from 3 locations can be cheaper than 2, and can also improve customer satisfaction with reduced response times produced by serving customers from the nearest location. See Rule 12 in Scalability Rules. The ability to operate from multiple locations also enables a choice to adjust the redundancy of those locations. A combination of Tier II and III locations may be a more economical choice than a pair of Tier IV locations.

Developing a hosting plan can be complicated and frustrating, particularly since the core competency of your business is likely not data centers. AKF Partners can help – not only with hosting strategy, but also the product architecture and operational processes needed to weld infrastructure, architecture, and process into a seamless vehicle that delivers services to your clients with availability the market demands.

Hurricanes aren’t the only disasters that can take down your data center. Solar flares, runaway SUVs, civil disruption, tornadoes and localized power outages have all caused data centers to fail. Natural disasters of all types trail equipment failures and human error as causes of service impacting events (source: 365DataCenters). According to FEMA, 40% of businesses that close due to a disaster don’t reopen, and of those that do only 29% are in business two years after the disaster (source:  FEMA). Don’t be a statistic. AKF Partners can help you with the product architecture and data center planning necessary to survive nearly any disaster.

Reach out to AKF

 

Permalink

‹ First  < 10 11 12 13 14 > 

Categories:

Most Popular: