Three Reasons Your Software Engineers May Not Be Successful
May 10, 2018 | Posted By: Pete Ferguson
Three Reasons Your Software Engineers May Not Be Successful
At AKF Partners, we have the unique opportunity to see trends among startups and well-established companies in the dozens of technical due diligence and more in-depth technology assessments we regularly perform, in addition to filling interim leadership roles within organizations. Because we often talk with a variety of folks from the CEO, investors, business leadership, and technical talent, we get a unique top-to-bottom perspective of an organization.
Three common observations
- People mostly identify with their job title, not the service they perform.
- Software Engineers can be siloed in their own code vs. contributing to the greater outcome.
- CEO’s vision vs. frontline perception of things as they really are.
Job Titles Vs. Services
The programmer who identifies herself as “a search engineer” is likely not going to be as engaged as her counterpart who describes herself as someone who “helps improve our search platform for our customers.”
Shifting focus from a job title to a desired outcome is a best practice from top organizations. We like to describe this as separating nouns and verbs – “I am a software engineer” focuses on the noun without an action: software engineer instead of “I simplify search” where the focus is on verb of the desired outcome: simplify. It may seem minor or trivial, but this shift can be a contributing impact on how team members understand their contribution to your overall organization.
Removing this barrier to the customer puts team members on the front line of accountability to customer needs – and hopefully also the vision and purpose of the company at large. To instill a customer experience, outcome based approach often requires a reworking of product teams given our experience with successful companies. Creating a diverse product team (containing members of the Architecture, Product, QA and Service teams for example) that owns the outcomes of what they produce promotes:
- Creating products customers love
If you have had experience in a Ford vehicle with the first version of Sync (bluetooth connectivity and onscreen menus) – then you are well aware of the frustration of scrolling through three layers of menus to select “bluetooth audio” ([Menu] -> [OK] -> [OK] -> [Down Arrow]-> [OK] -> [Down Arrow] -> [OK]) each time you get into your car. The novelty of wireless streaming was a key differentiator when Sync first was introduced – but is now table stakes in the auto industry – and quickly wears off when having to navigate the confusing UI likely designed by product engineers each focused on a specific task but void of designing for a great user experience. What was missing is someone with the vision and job description: “I design wireless streaming to be seamless and awesome - like a button that says “Bluetooth Audio!!!”
Hire for – and encourage – people who believe and practice “my real job is to make things simple for our customers.”
Avoiding Siloed Approach
Creating great products requires engineers to look outside of their current project specific tasks and focus on creating great customer experiences. Moving from reactively responding to customer reported problems to proactively identifying issues with service delivery in real time goes well beyond just writing software. It moves to creating solutions.
Long gone are the “fire and forget” days of writing software, burning to a CD and pushing off tech debt until the next version. To Millennials, this Waterfall approach is foreign, but unfortunately we still see this mentality engrained in many company cultures.
Today it is all about services. A release is one of many in a very long evolution of continual improvement and progression. There isn’t Facebook V1 to be followed by V2 … it is a continual rolling out of upgrades and bug fixes that are done in the background with minimum to no downtime. Engineers can’t afford to be laggard in their approach to continual evolution, addressing tech debt, and contributing to internal libraries for the greater good.
Ensure your technical team understands and is very closely connected to the evolving customer experience and have skin in the game. Among your customers, there likely is very little patience with “wait until our next release.” They expect immediately resolution or they will start shopping the competition.
Translating the Vision of the CEO to the Front Lines
During our our more in-depth technology review engagements we interview many people from different layers of management and different functions within the organization. This gives us a unique opportunity to see how the vision of the CEO migrates down through layers of management to the front-line programmers who are responsible for translating the vision into reality.
Usually - although not always - the larger the company, the larger the divide between what is being promised to investors/Wall Street and what is understood as the company vision by those who are actually doing the work. Best practices at larger companies include regular all-hands where the CEO and other leaders share their vision and are held accountable to deliverables and leadership checks that the vision is conveyed in product roadmaps and daily stand up meetings. When incentive plans focus directly on how well a team and individual understand and produce products to accomplish the company vision, communication gaps close considerably.
Creating and sustaining successful teams requires a diverse mix of individuals with a service mindset. This is why we stress that Product Teams need to be all inclusive of multiple functions. Architecture, Product, Service, QA, Customer Service, Sales and others need to be included in stand up meetings and take ownership in the outcome of the product.
The Dev Team shouldn’t be the garbage disposal for what Sales has promised in the most recent contract or what other teams have ideated without giving much thought to how it will actually be implemented.
When your team understands the vision of the company - and how customers are interacting with the services of your company - they are in a much better position to implement it into reality.
As a CTO or CIO, it is your responsibility to ensure what is promised to Wall Street, private investors, and customers is translated correctly into the services you ultimately create, improve, and publish.
As we look at new start-ups facing explosive 100-200% year-over-year growth, our question is always “how will the current laser focus vision and culture scale?” Standardization, good Agile practices, understanding technical debt, and creating a scalable on-boarding and mentoring process all lend to best answers to this question.
When your development teams are each appropriately sized, include good representation of functional groups, each team member identifies with verbs vs. nouns (“I improve search” vs. “I’m a software engineer”), and understand how their efforts tie into company success, your opportunities for success, scalability, and adaptability are maximized.
Do You Know What is Negatively Affecting Your Engineers’ Productivity? Shouldn’t You?
Enabling Time to Market (TTM) With Contributor Model Teams
Experiencing growing or scaling pains? AKF is here to help! We are an industry expert in technology scalability, due diligence, and helping to fill leadership gaps with interim CIO/CTO and other positions in addition to helping you in your search for technical leaders. Put our 200+ years of combined experience to work for you today!
Subscribe to the AKF Newsletter
Enabling TTM With Contributor Model Teams
May 6, 2018 | Posted By: Dave Berardi
Enabling TTM With Contributor Model Teams
We often speak about the benefits of aligning agile teams with the system’s architecture. As Conway’s Law describes, product/solution architectures and organizations cannot be developed in isolation. (See https://akfpartners.com/growth-blog/conways-law) Agile autonomous teams are able to act more efficiently, with faster time to market (TTM). Ideally, each team should be able to behave like a startup with the skills and tools needed to iterate until they reach the desired outcome.
Many of our clients are under pressure to achieve both effective TTM and reduce the risk of redundant services that produce the same results. During due diligence, we will sometimes discover redundant services that individual teams develop within their own silo for a TTM benefit. Rather than competing with priorities and waiting for a shared service team to deliver code, the team will build their own flavor of a common service to get to market faster.
Instead, we recommend a shared service team own common services. In this type of team alignment, the team has a shared service or feature on which other autonomous teams depend. For example, many teams within a product may require email delivery as a feature. Asking each team to develop and operate its own email capability would be wasteful, resulting in engineers designing redundant functionality leading to cost inefficiencies and unneeded complexity. Rather than wasting time on duplicative services, we recommend that organizations create a team that would focus on email and be used by other teams.
Teams make requests in the form of stories for product enhancements that are deposited in the shared services team’s backlog. (email in this case) To mitigate the risk of having each of these requesting teams waiting for requests to be fulfilled by the shared services team, we suggest thinking of the shared services as an open source project or as some call it – the contributor model.
Open sourcing our solution (at least internally) doesn’t mean opening up the email code base to all engineers and letting them have at it. It does mean mechanisms should be established to help control the quality and design for the business. An open source project often has its own repo and typically only allows trusted engineers, called Committers, to commit. Committers have Contribution Standards defined by the project owning team. In our email example, the team should designate trusted and experienced engineers from other Agile teams that can code and commit to the email repo. Engineers on the email team can be focused on making sure new functionality aligns with architectural and design principles that have been established. Code reviews are conducting before its accepted. Allowing for outside contribution will help to mitigate the potential bottleneck such a team could create.
Now that the development of email has been spread out across contributors on different teams, who really owns it?
Remember, ownership by many is ownership by none. In our example, the email team ultimately owns the services and code base. As other developers commit new code to the repo, the email team should conduct code, design, and architectural reviews and ultimately deployments and operations. They should also confirm that the contributions align with the strategic direction of the email mission. Whatever mechanisms are put in place, teams that adopt a contributor model should be a gas pedal and not a brake for TTM.
If your organization needs help with building an Agile organization that can innovate and achieve competitive TTM, we would love to partner with you. Contact us for a free consultation.
Subscribe to the AKF Newsletter
The Top Five Most Common Agile PDLC Failures
April 27, 2018 | Posted By: Dave Swenson
Agile Software Development is a widely adopted methodology, and for good reason. When implemented properly, Agile can bring tremendous efficiencies, enabling your teams to move at their own pace, bringing your engineers closer to your customers, and delivering customer value
quicker with less risk. Yet, many companies fall short from realizing the full potential of Agile, treating it merely as a project management paradigm by picking and choosing a few Agile structural elements such as standups or retrospectives without actually changing the manner in which product delivery occurs. Managers in an Agile culture often forget that they are indeed still managers that need to measure and drive improvements across teams.
All too often, Agile is treated solely as an SDLC (Software Development Lifecycle), focused only upon the manner in which software is developed versus a PDLC (Product Development Lifecycle) that leads to incremental product discovery and spans the entire company, not just the Engineering department.
Here are the five most common Agile failures that we see with our clients:
- Technology Executives Abdicate Responsibility for their Team’s Effectiveness
Management in an Agile organization is certainly different than say a Waterfall-driven one. More autonomy is provided to Agile teams. Leadership within each team typically comes without a ‘Manager’ title. Often, this shift from a top-down, autocratic, “Do it this way” approach to a grass-roots, bottoms-up one sways way beyond desired autonomy towards anarchy, where teams have been given full freedom to pick their technologies, architecture, and even outcomes with no guardrails or constraints in place. See our Autonomy and Anarchy article for more on this.
Executives often become focused solely on the removal of barriers the team calls out, rather than leading teams towards desired outcomes. They forget that their primary role in the company isn’t to keep their teams happy and content, but instead to ensure their teams are effectively achieving desired business-related outcomes.
The Agile technology executive is still responsible for their teams’ effectiveness in reaching specified outcomes (e.g.: achieve 2% lift in metric Y). She can allow a team to determine how they feel best to reach the outcome, within shared standards (e.g.: unit tests must be created, code reviews are required). She can encourage teams to experiment with new technologies on a limited basis, then apply those learnings or best practices across all teams. She must be able to compare the productivity and efficiencies from one team to another, ensuring all teams are reaching their full potential.
- No Metrics Are Used
The age-old saying “If you can’t measure it, you can’t improve it” still applies in an Agile organization. Yet, frequently Agile teams drop this basic tenet, perhaps believing that teams are self-aware and critical enough to know where improvements are required. Unfortunately, even the most transparent and aware individuals are biased, fall back on subjective characteristics (“The team is really working hard”), and need the grounding that quantifiable metrics provide. We are continually surprised at how many companies aren’t even measuring velocity, not necessarily to compare one team with another, but to compare a team’s sprint output vs. their prior ones. Other metrics still applicable in an Agile world include quality, estimation accuracy, predictability, percent of time spent coding, the ratio of enhancements vs. maintenance vs. tech debt paydown.
These metrics, their definitions and the means of measuring them should be standardized across the organization, with regular focus on results vs. desired goals. They should be designed to reveal structural hazards that are impeding team performance as well as best practices that should be adopted by all teams.
- Your Velocity is a Lie
Is your definition of velocity an honest one? Does it truly measure outcomes, or only effort? Are you consistent with your definition of ‘done’? Take a good look at how your teams are defining and measuring velocity. Is velocity only counted for true ‘ready to release’ tasks? If QA hasn’t been completed within a sprint, are the associated velocity points still counted or deferred?
Velocity should not be a measurement of how hard your teams are working, but instead an indicator of whether outcomes (again, e.g.: achieve 2% lift in metric Y) are likely to be realized - take credit for completion only when in the hands of customers.
- Failure to Leverage Agile for Product Discovery
From the Agile manifesto: “Our highest priority is to satisfy the customer through early and continuous delivery of valuable software”. Many companies work hard to get an Agile structure and its artifacts in place, but ignore the biggest benefit Agile can bring: iterative and continuous product discovery. Don’t break down a six-month waterfall project plan into two week sprints with standups and velocity measurements and declare Agile victory.
Work to create and deliver MVPs to your customers that allow you to test expected value and customer satisfaction without huge investment.
- Treating Agile as an SDLC vs. a PDLC
As explained in our article PDLC or SDLC, SDLC (Software Development Lifecycle) lives within PDLC (Product Development Lifecycle). Again, Agile should not be treated as a project management methodology, nor as a means of developing software. It should focus on your product, and hopefully the related customer success your product provides them. This means that Agile should permeate well beyond your developers, and include product and business personnel.
Business owners or their delegates (product owners) must be involved at every step of the PDLC process. PO’s need to be embedded within each Agile team, ideally colocated alongside team members. In order to provide product focus, POs should first bring to the team the targeted customer problem to be solved, rather than dictating only a solution, then work together with the team to implement the most effective solution to that problem.
AKF Partners helps companies transition to Agile as well as fine-tune their existing Agile processes. We can readily assess your PDLC, organization structure, metrics and personnel to provide a roadmap for you to reach the full value and benefits Agile can provide. Contact us to discuss how we can help.
Subscribe to the AKF Newsletter
Evaluating Technology Risk Using Mathematics
April 23, 2018 | Posted By: Geoffrey Weber
The Relative Risk Equation
Technologists are frequently asked: what are the chances that a given software release is going to work? Do we understand the risk that each component or new feature brings to the entire release?
In this case, measuring risk is assessing the probability that a component will perform poorly or even fail. The higher the probability of failure, the higher the risk. Probabilities are just numbers, so ideally, we should be able to calculate the risk (probability of failure) of the entire release by aggregating the individual risk of each component. In reality this calculation works quite well.
Putting a number to risk is a very useful tool and this article will provide a simple and easy way to calculate risk and produce a numeric result which can be used to compare risk across a spectrum of technology changes.
When we assess a system, one of the key characteristics we want to benchmark is the probability that a system will fail. In particular, if we want to understand whether or not a system can support an availability goal of 99.95% we have to do some analysis to see if the probability that a failure occurs is lower than 0.05%. How do we calculate this?
First let’s introduce some vocabulary.
DEFINITION: Pi is the probability that a given system will experience an incident, i.
For the purposes of this article we are measuring relative and not absolute values. A system where Pi=1 means the system is very unlikely to experience failure. On other hand, Pi values approaching 10 indicate a system with a 100% probability of failure.
DEFINITION: Ii is the impact (or blast radius) a system failure will have.
Ii=1 indicated no impact where Ii=10 indicates a complete failure of an entire system.
DEFINITION: Pd is the probability that an incident will be detected.
Pd=1 means an incident will be completely undetected and Pd=10 indicates that a failure will be completely detected 100% of the time.
Measuring across a scale from 1 to 10 is often too granular; we can reduce scale to tee-shirt sizes and replace 1, 2, …, 10 with Small (3), Medium (5), Large (7). Any series of values will work so long as we are consistent in our approach.
Relative Risk is now only a question of math:
Ri = (Pi x Ii ) / Pd
With values of 1, 2, ..., 10, the minimum Relative Risk value is 0.1 (effectively 0 relative risk) and the maximum value is 100. With tee-shirt sizes, the minimum Relative Risk value is 6/7 and the maximum value is 16.333. Basic statistics can help us to standardize values from 1 to 10:
std(Ri) = (Ri - Min(Ri)) / (Max(Ri) - Min(Ri)) x 10
where Max(Ri) = 16 1/3 and Min(Ri) = 1 2/7 (in the case of tee-shirt sizing)
Example 1: Adding a new data file to a relational database
- Pi = 3 (low.). It’s unlikely that adding a data file will cause a system failure, unless we’re already out of space.
- Ii = 5 (medium.). A failure to add a datafile indicates a larger storage issue may exist which would be very impactful for this database instance. However as there is a backup (for this example), the risk is lowered.
- Pd = 7 (high). It is virtually certain that any failure would be noticed immediately.</li>
- Therefore Ri = (3 X 5) / 7 = 2.1. Standardizing this to our 1 to 10 scale produces a value of 0.51. That is a very low number, so adding datafiles is relatively low risk procedure.
Example 2: Database backups have been stored on tapes that have been demagnetized during transportation to offsite storage.
- Pi = 5 (medium.). While restoring a backup is a relatively safe event, on a production system it is likely happening during a time of maximum stress.
- Ii = 7 (high.). When we attempt to restore the backup tape, it will fail.
- Pd= 3 (low.) The demagnetization of the tapes was a silent and undetected failure.
- Therefore Ri = (5 X 7) / 3 or 11.7. We arrive at a value of 7 on the standard scale, which is quite high, so we should consider randomly testing tapes from off-site storage.
This formula has utility across a vast spectrum of technology:
- Calculate a relative risk value for each feature in a software release, then take the total value of all features in order to compare risk of a release against other releases and consider more detailed testing for higher relative risk values.
- During security risk analysis, calculate a relative risk value for each threat vector and sort the resulting values. The result is a prioritized list of steps required to improve security based on the probabilistic likelihood that a threat vector will cause real damage.
- During feature planning and prioritization exercises, this formula can be altered to calculate feature risk. For example, Pi can mean confidence in estimate, Ii can be converted to impact of feature (e.g. higher revenue = higher impact) and Pd is perceived risk of the feature. Putting all features through this calculation then sorting from high values to low values yields a list of features ranked by value and risk.
The purpose of this formula and similar methods is not to produce a mathematically absolute estimate of risk. The real value here is to remove guessing and emotion from the process of evaluating risk and providing a framework to compare risk across a variety of changes.
Click here to see how AKF Partners can help you manage risk and other technology issues.
Subscribe to the AKF Newsletter
Avoiding the Policy Black Hole
April 18, 2018 | Posted By: Pete Ferguson
During due diligence and in-depth engagements, we often hear feedback from client employees that policies either do not exist - or are not followed.
All too often we see policies that are poorly written, difficult for employees to understand or find, and lack clear alignment with the desired outcomes. Policies are only one part of a successful program - without sound practices, policies alone will not ensure successful outcomes.
Do You Have a Policy …?
Early in my career I was volunteered to be responsible for PCI compliance shortly after eBay purchased PayPal. I’d heard folklore of auditors at other companies coming in and turning things over with the resulting aftermath leading to people being publicly humiliated or losing their job. I suddenly felt on the firing line and asking “why me?”
I booked a quick flight to Phoenix to be in town before the auditor arrived and I prepared by walking through our data center and reviewing our written policies. When I met with the auditor, he looked to be in his early 20s and handed me a business card from a large accounting firm. I asked him about his background; he was fresh out of college and we were one of his first due diligence assignments. He pulled out his laptop and opened an Excel spreadsheet and began reading off the list:
- Do you have cameras? “Yes,” I replied and pointed to the ceiling in the lobby littered with little black domes.
- Do you record the cameras? “Yes,” and I took him into the control room and showed him that we had 90 days of recording.
- Do you have a security policy? “Yes,” and I showed him a Word Document starting with “1.1.1 Purpose of This Policy ....”
Several additional questions, and 10 minutes later, we were done. He and I had both flown some distance so I gave him a tour of the data center and filled him full of facts about square footage and miles of cable and pipes until his eyes glossed over and his feet were tired from walking and off he went.
I was relieved, but let down! I felt we had a really good program and wanted to see how we measured up under scrutiny. Subsequent years brought more sophisticated reviews - and reviewers - but the one question I was always waiting to be asked - but never was:
“Is your policy easily accessible, how do employees know about it, and how do you measure their comprehension and compliance?”
My first compliance exercise didn’t seem all that scary after all, it was only a due diligence “check the box” exercise and didn’t dive deeper into how effective our program was and where it needed to be reinforced.
While having a policy for compliance requirements is important, on its own, policy does not guarantee positive outcomes. Policy must be aligned with day-to-day operations and make sense to employees and customers.
The Traditional Boredom of Policy
Typically policy is written from the auditor’s point of view to ensure compliance to government and industry requirements for public health, anti-corruption, and customer data security standards.
Image Credit: Imgur.com
Unfortunately, this leads to a very poor user experience wading through the 1.1.1 … 1.1.2 … . Certainly a far deviation from how a good novel or any online news story reads.
I’ve heard companies - both large and small - give great assurances that they have policies and they have shown me the 12pt Times New Roman documents that start with “1.1.1 Purpose of This Policy …” as evidence.
I had to argue the point at a former position that the first way to lose interest with any audience is to start with 1.1.1 … and with Times New Roman font in a Microsoft Word document that was not easy to find. It was a difficult argument and I was instructed to stick with the approved, and traditional, industry-accepted method.
Fast forward a decade later and our HR Legal team was reviewing policy and invited me to a meeting with the internal communications team. Before we started talking documents, the Director of Communications asked me if I’d seen the latest safety video for Virgin Atlantic Airlines. I thought it a strange question, but after she told me how surprised and inspired by it she was, I took a look.
VA thankfully took a required dull and mundane US Federal Aviation Administration ritual and instead saw it as a differentiator of their brand from the pack of other airlines. Whoever thought a safety demonstration could also be a 4-minute video on why an airline is different and fun?!? Up until that point, no one! Certainly not on any flight I had previously flown.
Thankfully, since then, Delta and others have followed their example and made something I and millions of airline crews and passengers had previously dreaded - safety policy and procedure - into a more fun, engaging, and entertaining experience.
While policy needs to comply with regulations and other requirements, for policies to move from the page to practice they need to be presented in a way employees clearly understand what is expected - so in writing policy, put the desired outcome first! The regulatory document for auditors can be incorporated at the end of each policy or consider a separate document that calls out only the required sections of your employe handbook or where ever your company policies are presented and stored.
Clarifying the Purpose of Your Policy
In her article “Why Policies Don’t Work,” HR Lawyer Heather Bussing boils down the core issue: “There are two main reasons to have employment policies: to educate and to manage risk. The trouble is that policies don’t do either.”
She further expounds on the problem in her experience:
“ … policies get handed out at a time when no one pays attention to them (first week of employment if not the first day), they are written by people who don’t know how the company really works (usually outside legal counsel), and they have very little to do with what happens. So much for education.”
As for managing risk, Bussing points out that policies are often at odds with each other, or so broad that they can’t be effectively enforced.
“Unless it is required to be on a poster, or unless you can apply it in every instance without variance, you don’t want policies. Your at-will policy covers it. And if you don’t follow your policies to the letter, you will look like a liar in a courtroom.”
Don’t let your online policy repository feel like a suppository - focus on what you want to accomplish!
Small and fast-growing companies typically have little need for formalized policies because people trust each other and can work things out. But as they grow it has been my experience that often the trust and holding people accountable - which sets fast growing companies apart as a cool place to work - get replaced with bureaucratic rituals cemented in place as more and more executives migrate from larger, bureaucratic behemoths. If the way policy is presented is the litmus test for the true company culture, a lot of companies are in trouble!
Policy must be closely aligned to the shared outcomes of the company and interwoven into company culture. Otherwise they are a bureaucratic distraction and will only be adopted or sustained with a lot of uphill effort. In short, if people do not understand how a policy helps them do their job more easily, they are going to fight it.
Adapting Policy To Your Audience
In the early days of eBay, the culture was very much about collectables, and walking through the workspace many employees displayed their collections of trading cards, Legos, and comic books. When it came time to publish our security policies, we hired Foxnoggin - a professional marketing strategy company - and took the time to get to understand our culture and then organized a comprehensive campaign to include contests, print and online material, and other collateral.
They helped formulate an awareness campaign to educate employees and measure the effectiveness of policy through surveys and monitoring employee behavior.
To break away from the usual email method of communication, we got and held employee attention with a series of comic books which included superheroes and supervillains in a variety of scenarios highlighting our policies.
An unintended consequence from our collector employees was that they didn’t want to open their comic books and instead kept them sealed in plastic. To combat this, we provided extra copies (not sealed in plastic) in break rooms and other common areas and future editions were provided without the bags. The messages were reinforced with large movie-style posters displayed throughout the work area.
This approach was wildly popular among employees located at the customer support and developer sites and surveys showed that security was becoming a top of mind topic for employees. Unfortunately, this approach was not as popular with Europeans - who felt we were talking down to them - and by the executives coming from more stodgy and formal companies like Bain & Company or GE and particularly unpopular with execs from the financial industry after the purchase of PayPal.
Intertwining policy into the culture of your organization makes compliance natural and part of daily operations.
Make Sure Your Message Matches Your Audience
President and CEO of Lead From Within Lolly Daskal writes on Inc.com:
“... sometimes the dumbest rules can drive away the best employees … too many workplaces create rule-driven cultures that may keep management feeling like things are under control, but they squelch creativity and reinforce the ordinary.”
Be creative and look at the company culture and how to interweave policies. Policies need to be part of the story you tell your employees to reinforce why they should want to work for you.
Nathan Christensen writes in his Fast Company article: How to Create An Employee Handbook People Will Actually Want to Read, “let’s face it, most handbooks aren’t exactly page-turners. They’re documents designed to play defense or, worse yet, a catalog of past workplace problems.”
Christensen recommends “presenting” policies in a readable and attractive manner. It must be an opportunity to excite people in meeting a greater group purpose and cause.
Your policies need to match your company culture, be in language they use and and understand, and the ask for compliance needs to be easily enough for a new employee to be able to explain to anyone.
Writing Content Your Audience Will Actually Read and Understand
According to the Center for Plain Language - which has the goal to help organizations “write so clearly that their intended audience understands what they are saying the first time they read or hear it” - there are five steps to plain language:
- Identify and describe the target audience: “The audience definition works when you know who you are and are not designing for, what they want to do, and what they know and need to learn.”
- Structure the content to guide the reader through it: “The structure works when readers can quickly and confidently find the information they are looking for.”
- Write the content in plain language: “Use a conversational, rather than legal our bureaucratic tone … pick strong verbs in the active voice and use words the audience knows.”
- Use information design to help readers see and understand: Font choice, line spacing, and use of graphics help break up long sections of text and increase the readability score.
- Work with target user groups to test design and content: Ask readers to describe the content and have them show you where they would find relevant content.
As an illustration, here is a before and after comparison of the AARP Financial policy on giving and receiving gifts:
In reading the “before” example, my eyes immediately glazed over and my mind began to wander until the mention of “courtesies of a de minimus ... “ Did the guy who wrote that go home that night to his family and instruct his kids, “you will need to consume a courtesise of a de minimus amount of broccoli if you want videogame time after dinner”? I sure hope not!
On the “after” example, notice the change in line spacing, switching of font and use of bullet points. Overall the presentation is a lot more conversational and less formal. It also has a call to action in the title starting with two verbs “give and accept …”
I’d add as the 6th step to remember K.I.S.S. - Keep It Simple Stupid! You get a few seconds to grab your audience’s attention and only a few more minutes to keep it.
As a content editor, I was feeling proud of myself when I distilled 146 pages of confusing policies, procedures and “how to” down to 14 pages over the course of several weeks. But when I mentioned this to my wife, she said “you are going to make them read 14 pages?!?”
So I looked at it a few days later with fresh eyes and realized I could condense it down again to two pages by making it more of a table of contents with a brief description of each bullet point and then include links after each section if employees wanted to learn more, and I was able to retain a font size of 14 and plenty of white space.
In reading the two pages, people would understand what was expected of them and could easily learn more - but only if they were interested.
Write policy in language a new employee will quickly understand and be thoughtful in how much you present to employees on their first day, week, and month.
Document Readability is How You Show Your People Love - And Soon To Be the Law In the EU
Speaking more in terms of content marketing, VisibleThread author “Fergle” quotes Neil Patel, columnist for Forbes, Inc, as stating “content that people love and content that people can read is almost the same thing.” Yet, as Fergle points out, “a lot of content being created is not the stuff people love. Or read.”
“Content that people love and content that people can read is almost the same thing.”
Writing content with the aim of it being easy to read as something people love may seem a bit altruistic. But for information regarding data privacy, it is also soon to be the law - at least in the EU and for any international policy which would reach an EU resident. On May 28th of 2018 the General Data Protection Regulation (GDPR) goes into effect. From the GDPR :
“The principle of transparency requires that any information and communication relating to the processing of those personal data be easily accessible and easy to understand, and that clear and plain language be used.”
There are a number of ways to measure readability ease and grade level of your content, and a good communications expert will be able to help you identify the proper tools.
Scores are a good benchmark, but don’t forget the most important resource for feedback - your potential audience!
Buy them lunch, have them come and review your plan and provide their feedback. Bring them back in later when you have content to review and provide an environment where they can be brutally honest - again a communications expert outside of your department will help provide a bit of a buffer and allow your audience to be open, honest, and direct.
But don’t just write policy to comply with due diligence or for policy’s sake - be sure it is part of the company culture, easy to search, and placed where and when your employees or customers will need it. When there are shared outcomes between compliance and how employees operate, policy is integrated and effective.
Timing is Important
Think of ways to break down your policy content not just by audience, but by timing and when the information will actually be relevant.
In retail, the term “point of sale” refers to the checkout process - when taxes, final cost and payment are all settled. The placement of “last minute items” at the POS is very carefully, and competitively assigned only to items with a high ROI measured by the amount of inches each item takes up on the limited shelf space. This careful placement has also been adopted to the online marketplace when you add an item to your shopping cart and a prompt arises to add additional items others have also purchased with your item.
This same methodology in thinking should be applied to where - and when - you introduce your policies to your audience.
We made the mistake for years of pushing our travel safety program and policies for everyone during new hire orientation when only about half of the population traveled and most of them wouldn’t be traveling for several weeks or months. It made a lot more sense to move the travel policies to the travel booking page.
If you only give out corporate credit cards to Directors and above, there is no sense pushing policies on spend limits to the global population. It makes a lot more sense to push the policy when someone is applying for the card and as a reminder each time their credit card expires and they are being issued a new one.
Your audience will appreciate only being told what they need to know when they need the information and will be more likely to not only retain the information, but to comply!
For similar content on our Growth Blog, click here
Know How You Will Measure Successful Outcomes
Perhaps the most important question to ask when designing policy is “how we will know we are successful?”
Having good policy written in a clear and concise manner and stored in an easy to find location is still a very passive approach. Good policy should evolve as your company evolves and should be flexible and realistic to business, customer, and employee needs. It must be modeled by company leadership and hold true to the daily actions of your company.
Tests at the end of annual compliance training are only a “check the box” measure of compliance. Think back to how much you actually learned - or, better yet, retained - the last time you were subjected to hours of compliance training!
If metrics cannot support that your policies are known and followed, then you need to re-evaluate the purpose of your policies and if they are contributing to the benefit of your employees and customers or just ticking compliance boxes.
While compliance is important, compliance alone does not make for better business practices or a competitive edge. Effective, measurable compliance protects your employees and provides value to your customers.
Subject-matter experts are often too close to the policies to be objective. A little tough love is needed and it is best to bring in experts in marketing and communications who will not be biased to the content, but biased to the reader who is the intended audience.
A good communications plan will cover the following:
- Be clear on the desired behavior the policy is to encourage and enforce - and that behavior is streamlined with the overall company purpose
- Identify the target audiences and each of their self-interests
- Outline which channels each audience is receptive to (email/print/video, etc.)
- Identify the inside jargon and language styles needed
- Decide when and where each audience will want to find relevant information
- Plan how often policies will be reviewed - and include as many stakeholders as possible in the review process
- Decide how implementation of policies and compliance to the policies will be measured
Only AFTER the communications plan is agreed upon - with plenty of input from representatives of the target audiences - should the content review begin. Otherwise the temptation from subject-matter experts will be to tell people everything they know.
Pulling it All Together
Poorly written policies that are difficult for employees to search or find do little to meet the mission of policy: to provide a consistent approach to how your company does business and satisfies regulatory compliance. Policies on their own do not make for good operations or guarantee overall success. Remember the true test of policies is not whether they exist, but if they are tightly aligned and incorporated into daily operations, how they contribute to the success of your employees and customers, and if their effectiveness can be measured in a tangible way.
Experiencing growing pains? AKF is here to help! We are an industry expert in technology scalability and due diligence. Put our 200+ years of combined experience to work for you today!
Get this article and others like it by signing up for our newsletter.
Subscribe to the AKF Newsletter
SaaS Migration Challenges
March 12, 2018 | Posted By: Dave Swenson
More and more companies are waking up from the 20th century, realizing that their on-premise, packaged, waterfall paradigms no longer play in today’s SaaS, agile world. SaaS (Software as a Service) has taken over, and for good reason. Companies (and investors) long for the higher valuation and increased margins that SaaS’ economies of scale provide. Many of these same companies realize that in order to fully benefit from a SaaS model, they need to release far more frequently, enhancing their products through frequent iterative cycles rather than massive upgrades occurring only 4 times a year. So, they not only perform a ‘lift and shift’ into the cloud, they also move to an Agile PDLC. Customers, tired of incurring on-premise IT costs and risks, are also pushing their software vendors towards SaaS.
But, what many of the companies migrating to SaaS don’t realize is that migrating to SaaS is not just a technology exercise. Successful SaaS migrations require a ‘reboot’ of the entire company. Certainly, the technology organization will be most affected, but almost every department in a company will need to change. Sales teams need to pitch the product differently, selling a leased service vs. a purchased product, and must learn to address customers’ typical concerns around security. The role of professional services teams in SaaS drastically changes, and in most cases, shrinks. Customer support personnel should have far greater insight into reported problems. Product management in a SaaS world requires small, nimble enhancements vs. massive, ‘big-bang’ upgrades. Your marketing organization will potentially need to target a different type of customer for your initial SaaS releases - leveraging the Technology Adoption Lifecycle to identify early adopters of your product in order to inform a small initial release (Minimum Viable Product).
It is important to recognize the risks that will shift from your customers to you. In an on-premise (“on-prem”) product, your customer carries the burden of capacity planning, security, availability, disaster recovery. SaaS companies sell a service (we like to say an outcome), not just a bundle of software. That service represents a shift of the risks once held by a customer to the company provisioning the service. In most cases, understanding and properly addressing these risks are new undertakings for the company in question and not something for which they have the proper mindset or skills to be successful.
This company-wide reboot can certainly be a daunting challenge, but if approached carefully and honestly, addressing key questions up front, communicating, educating, and transparently addressing likely organizational and personnel changes along the way, it is an accomplishment that transforms, even reignites, a company.
This is the first in a series of articles that captures AKF’s observations and first-hand experiences in guiding companies through this process.
Don’t treat this as a simple rewrite of your existing product - answer these questions first…
Any company about to launch into a SaaS migration should first take a long, hard look at their current product, determining what out of the legacy product is not worth carrying forward. Is all of that existing functionality really being used, and still relevant? Prior to any move towards SaaS, the following questions and issues need to be addressed:
Customization or Configuration?
SaaS efficiencies come from many angles, but certainly one of those is having a single codebase for all customers. If your product today is highly customized, where code has been written and is in use for specific customers, you’ve got a tough question to address. Most product variances can likely be handled through configuration, a data-driven mechanism that enables/disables or otherwise shapes functionality for each customer. No customer-specific code from the legacy product should be carried forward unless it is expected to be used by multiple clients. Note that this shift has implications on how a sales force promotes the product (they can no longer promise to build whatever a potential customer wants, but must sell the current, existing functionality) as well as professional services (no customizations means less work for them).
Many customers, even those who accept the improved security posture a cloud-hosted product provides over their own on-premise infrastructure, absolutely freak when they hear that their data will coexist with other customers’ data in a single multi-tenant instance, no matter what access management mechanisms exist. Multi-tenancy is another key to achieving economies of scale that bring greater SaaS efficiencies. Don’t let go of it easily, but if you must, price extra for it.
Who owns the data?
Many products focus only on the transactional set of functionality, leaving the analytics side to their customers. In an on-premise scenario, where the data resides in the customers’ facilities, ownership of the data is clear. Customers are free to slice & dice the data as they please. When that data is hosted, particularly in a multi-tenant scenario where multiple customers’ data lives in the same database, direct customer access presents significant challenges. Beyond the obvious related security issues is the need to keep your customers abreast of the more frequent updates that occur with SaaS product iterations. The decision is whether you replicate customer data into read-only instances, provide bulk export into their own hosted databases, or build analytics into your product?
All of these have costs - ensure you’re passing those on to your customers who need this functionality.
May I Upgrade Now?
Today, do your customers require permission for you to upgrade their installation? You’ll need to change that behavior to realize another SaaS efficiency - supporting of as few versions as possible. Ideally, you’ll typically only support a single version (other than during deployment). If your customers need to ‘bless’ a release before migrating on to it, you’re doing it wrong. Your releases should be small, incremental enhancements, potentially even reaching continuous deployment. Therefore, the changes should be far easier to accept and learn than the prior big-bang, huge upgrades of the past. If absolutely necessary, create a sandbox for customers to access new releases, but be prepared to deal with the potentially unwanted, non-representative feedback from the select few who try it out in that sandbox.
Wait? Who Are We Targeting?
All of the questions above lead to this fundamental issue: Are tomorrow’s SaaS customers the same as today’s? The answer? Not necessarily. First, in order to migrate existing customers on to your bright, shiny new SaaS platform, you’ll need to have functional parity with the legacy product. Reaching that parity will take significant effort and lead to a big-bang approach. Instead, pick a subset or an MVP of existing functionality, and find new customers who will be satisfied with that. Then, after proving out the SaaS architecture and related processes, gradually migrate more and more functionality, and once functional parity is close, move existing customers on to your SaaS platform.
To find those new customers interested in placing their bets on your initial SaaS MVP, you’ll need to shift your current focus on the right side of the Technology Adoption Lifecycle (TALC) to the left - from your current ‘Late Majority’ or ‘Laggards’ to ‘Early Adopters’ or ‘Early Majority’. Ideally, those customers on the left side of the TALC will be slightly more forgiving of the ‘learnings’ you’ll face along the way, as well as prove to be far more valuable partners with you as you further enhance your MVP.
The key is to think out of the existing box your customers are in, to reset your TALC targeting and to consider a new breed of customer, one that doesn’t need all that you’ve built, is willing to be an early adopter, and will be a cooperative partner throughout the process.
Our next article on SaaS migration will touch on organizational approaches, particularly during the build-out of the SaaS product, and the paradigm shifts your product and engineering teams need to embrace in order to be successful.
AKF has led many companies on their journey to SaaS, often getting called in as that journey has been derailed. We’ve seen the many potholes and pitfalls and have learned how to avoid them. Let us help you move your product into the 21st century. See our SaaS Migration service
Subscribe to the AKF Newsletter
Managing Risk with Technical Due Diligence
February 20, 2018 | Posted By: Greg Fennewald
You should not buy a home without an inspection by a licensed home inspector and you should not buy a used car without having a mechanic check it out for you. Diligence - it just makes good sense. Similarly, it is prudent to include technical diligence as part of the evaluation for a potential technology company investment.
Diligence Informs Risk Management
Private equity and venture capital firms typically evaluate many areas preceding a potential investment. The business case, legal structure, competitive analysis, product strategy, financial audits and contractual landscape are all examples of diligence deemed necessary prior to an investment. A company with a great product but three years left on an extremely expensive office lease will probably have a lower value. Breaking the lease or living with it until the term expires means higher costs and thus lower EBITDA. A hot start up with an inexperienced CFO that has run on cash-based accounting from day 1 and is rapidly approaching $6 million in annual revenue needs to move to accrual-based accounting. That takes time and effort and possibly a talent search - this affects the value of the investment.
But what about the technical underpinnings of the product itself? A company with a solitary production database and a marketing analyst with access to directly query that database is likely headed for performance and availability incidents. Single points of failure create a high probability of non-availability. Solutions that don’t allow for seamless and elastic scalability may run into either capacity or cost of operations problems.
Preventing these incidents and altering the conditions that enabled them to exist takes time and effort. All of these assessment areas boil down to risk management. Further, understanding the cost of fixing these solutions helps a company understand their true cost of investment. Your investment includes not just the “PIC” or capital that you put into the company - it also includes all the costs to ensure continuing operations of the product that enables that company. A comprehensive diligence including technical diligence will prepare the investor to make an informed business decision - know the risks and adjust the value proposition accordingly.
Technology Risk Areas
Technology risks can be grouped into four broad areas - Architecture, Process, Organization, and Security. Each area has several subordinate themes.
Architecture - subordinate themes are availability, scalability, cost control.
• Commodity hardware - Corollas, not Carreras
• Horizontal scalability - scale out, not up
• Design for monitoring - see issues before your customers do
• N+1 design - everything fails eventually
• Design for rollback - minimize the impairment
• Asynchronous design - stateless systems
Process - subordinate themes are engineering, operations, and problem management
• Product management - a product owner should be able add, delay, or deprecate features from an upcoming release
• Metrics - development teams should use effort estimation and velocity measurement metrics to monitor progress and performance
• Development practices - developers should conduct code reviews and be held accountable for unit testing
• Incident management - incidents should be logged with sufficient details for further follow up
• Post mortem - a structured process should be in place to review significant problems, assign action items, and track resolution
• PDLC - the Product Development Lifecycle should align with the company’s desires to be customer driven (not desirable in most cases) or market driven (resulting in the highest returns and fastest saturation of any market)
Organization - subordinate themes are PDLC (Product Development Lifecycle) structure, product alignment and team composition
• Product or Service Alignment - cross functional teams should be aligned by product or service and understand how their efforts complement business goals
• Agile or Waterfall - if “discovering” the market or choosing the best possible product for a market then Agile is appropriate - if developing to well defined contracts then waterfall may be necessary.
• Team composition - the engineer to QA tester ratio should ideally exceed 3.5:1. Significant deviations may be a sign or trouble or a harbinger of problems to come
• Goals - measurable goals aligned with business priorities should be visible to all with clear accountability
Security - subordinate themes are framework, prevention, detection and response
• Framework - use NIST, ISO, PCI or other regulatory standards to establish the framework for a security program. The standards do overlap, think it through and avoid duplication of effort.
• Policies in place - a sound security program will have multiple security related policies such as employee acceptable use, access controls, data classification, and an incident response plan.
• Security risk matrix - security risks should be graded by their impact, probability of occurrence, and controlling measures
• Business metrics - analysis of business metrics (revenue per minute, change of address, checkout value anomalies, file saves per minute, etc) can develop thresholds for alerting to a potential security incident. Over time, the analysis can inform prevention techniques.
• Response plan - a plan must be in place and must have regular rehearsals.
Technology Cost Impact on Investment Value
Technology costs can have a significant impact on the overall investment value. Strengths and weaknesses uncovered during a technical diligence effort help the investor make the best overall business decision.
Technology costs are normally captured in 2 areas of the income statement, cost of revenue (production environment and personnel) and operating expenses (software development). Technology costs can also affect depreciation (server capital purchases) and amortization (pre-paid licensing and support). These cost areas should be reviewed for unusual patterns or abnormally high or low spend rates. It is also important to understand the term of equipment purchase, software licensing, and support contracts - spend may be committed for several years.
Cost Cautions - tales from the past
• Support for production equipment purchased from a 3d party because the equipment is old and no longer supported by the OEM. Use equipment as long as possible, but don’t risk a production outage.
• Constant software vendor license audits - they will find revenue, but the technology team that leaves their company vulnerable on a recurring basis is likely to have other significant issues.
• Lack of an RFP or benchmarking process to periodically assess the cost effectiveness of hardware, software, hosting, and support vendors. Making a change in one of these areas is not simple, but the technology team should know how much they should pay before a change is better for the company.
A technical diligence effort should also identify the level of technical debt and quantify the amount of engineering resources dedicated to servicing the technical debt.
Technical debt is a conscious choice to take a shortcut in the technology arena - the delta between the desired or intended way and quicker way. The shortcut is usually taken for time to market reasons and is a sound business decision within reason. Technical debt is analogous in many ways to financial debt - a complete lack of it probably means missed business opportunities while an excess means disaster around the corner.
Just like financial debt, technical debt must be serviced, and it is serviced by the efforts of the engineering team - the same team developing the software. AKF recommends 12% to 25% of engineering effort be spent servicing technical debt. Whether that resource allocation keeps the debt static, reduces it, or allows it to grow depends upon the amount of technical debt. It is easy to see how a company delinquent in servicing their technical debt will have to increase the resource allocation to deal with it, reducing resources for product innovation and market responsiveness.
Put It All Together
The investor has made use of several specialists in an overall diligence effort and is digesting the information to zero in on the choice to invest and at what price. The business side looks good - revenue growth, product strategy, and marketing are solid. The legal side has some risks relating to returning a leased office space to its original condition, but the lease has 5 years to run. Now for technology;
• Tech refresh is overdue, so additional investment is needed or a move to the cloud accelerated - either choice puts pressure on thin margins.
• An expensive RDBMS is in use, but the technology team avoids stored procedures and keeps their SQL as vanilla as possible - moving to open source is doable.
• Technical debt service is constantly derailed by feature requests from sales and marketing. Additional resources, hired or contracted, will be needed and will raise the technology run rate. More margin pressure.
• Conclusion - the investment needed to address tech refresh and technical debt changes the investment value. The investor lowers the offer price.
Interested in learning more about technical due diligence? Here are some due diligence do’s and don’ts.
How AKF can help
AKF has conducted hundreds of technical due diligence studies over the last 10 years. One would want an attorney for a legal diligence effort and one would want a technologist for a technical due diligence. AKF does technology right. Read more about our technical due diligence offerings here.
Subscribe to the AKF Newsletter
Your Site is as Important as the Product You Sell - Recent Example from Saddleback Leather
February 7, 2018 | Posted By: Pete Ferguson
If you have a premium product, at a premium price, it’s unlikely you would sell it out of a rundown, poorly lighted store that smells vaguely like stale meat. Yet somehow many of us forget to apply that same reasoning when it comes to selling our products online. The availability - and look and feel of your presence online - is your store front.
I’ve long been a fan of Saddleback Leather. However, their motto: “They’ll fight over it when you’re dead” fell short in January. You see, it’s hard for your family to fight over the thing that you can’t even purchase… Saddleback Leather had a completely foreseeable, and absolutely preventable outage. From Dave Munson, the CEO:
“I’ve always dreamt of one day having a really fast and easy website for you to enjoy. So, we decided to leave our slow and clunky old website and start building one on a new and different platform. The contract expired Dec. 30th, 2017, but the new site wasn’t fully ready yet. We flipped the switch anyways and all Gehenna broke loose. The super fast, fun and easy website… wasn’t fast, fun or easy and we wasted a ton of time and irritated the heck out of our favorite people. People couldn’t check out, set up accounts or even add stuff to their carts. So, we paid a ton of money to get our old slow and clunky back again until we get this new site just right. “
To make up for it, last week I received an apology letter sent by “El Presidente” Munson with an 11% off coupon. 11 % because Munson has recently celebrated 11 years of marriage to his wife, Suzzette. As a side note, it’s a perfect example of how to apologize to your customers when you screw up. This guy made a mistake, is paying for it by paying for his old site while continuing to develop the new, and is giving customers discounts with a coupon aptly titled: “IAMSORRY.”
Ironically, as a fan and customer, I don’t recall the old site being slow or terrible. On the contrary, when I visited early in January, their “new and improved” site felt clunky and disjointed. The wrong images were coming up for products and many items reported being “not available.”
In the world of environmental health and safety, “all accidents are preventable” is the holy grail of compliance. We believe that with the right forethought and planning, the same is true with virtually all products and storefronts online.
At AKF we are fond of saying “an accident is a terrible thing to waste.” While the exact details of what went wrong are not disclosed, the motives were:
- They took a concept that presumably worked great in beta testing live without testing under full load.
- Munson made the decision to push out something that wasn’t yet great to save money by exiting a contract by the end of the year.
For similar content on our Growth Blog, click here
The result is lost sales from when the site was down, lost customers who may have been trying the website for their first time and won’t be back, an 11% haircut of sales for the next week, and a fan base - many of whom have been very vocal on FaceBook - that is verbally expressing their disdain to see the company they have counted on for unquestioned quality in the past didn’t settle for quality first this time.
The days of customers quickly forgiving their favorite retailers for not being equally as great online are waning. Make sure you have a solid strategy and the right expertise in your corner when it comes to greatly affecting your customer’s ability to purchase or better interact with your product.
Experiencing growing pains? AKF is here to help! We are an industry expert in technology scalability and due diligence. Put our 200+ years of combined experience to work for you today!
Get this article and others like it by signing up for our newsletter.
Subscribe to the AKF Newsletter
There Are Always Plenty of Incidents from Which To Learn
January 13, 2018 | Posted By: Dave Swenson
Sorry, False Alarm…
On January 13, 2018, what felt like an episode of Netflix’s “Black Mirror” unfolded in real life. Just after 8 in the morning, residents and visitors of Hawaii were woken up to the following startling push notification:
Thankfully, the notification was a false alarm, finally retracted with a second notification nearly 40 interminable minutes later.
The amazing, poignant and sobering stories that occurred from those 40 minutes, included people:
- determining which children to spend their last minutes with,
- abandoning their cars on streets,
- sheltering in a lava tube,
- believing and acting as we all would if we believed the end was here.
Unfortunately, this wasn’t a Black Mirror episode and paralyzed an entire state’s population. Thankfully, the alarm was a false one.
A Muted President
As President Trump took office, he introduced a new means for a President to reach his constituents—Twitter, averaging 6 to 7 tweets per day during his first year. On November 2, 2017, many bots that were created to closely monitor the tweets of @realDonaldTrump started reporting that the account no longer existed. Clicking to his account took the user to the above error page.
For a deafening 11 minutes, the nation was unable to listen to its leader, at least via Twitter.
The Hawaiian false alarm was sent by the state’s Emergency Management Agency. Their explanation of the incident was that during a shift change, an employee clicked “the wrong button” while running a missile crisis test, then subsequently clicked through a confirmation prompt (“Are you sure you want to tell 1.5 million people this?”).
Twitter employees had reportedly tried for years to get management attention on ensuring accounts weren’t deleted without proper vetting. The company typically used contractors in the Philippines and Singapore to handle such account administration; Trump’s account was deleted by a German contract worker on his last day at Twitter. Acting on yet-another-Trump-complaint, believing such an important account couldn’t be suspended, the worker’s last action for Twitter was to click the suspend button, and then walked out of the building causing the Twitterverse to read far more into the account’s disappearance than they should have.
In both of these situations, the immediate focus was on the personnel involved in the incident. “Who pushed the button?” is typically always one of the initial questions. Assumptions that a new employee, or rogue worker were behind the incident are common, and both motive and intelligence of all involved are under inspection.
We at AKF Partners constantly preach “An incident is a terrible thing to waste”. Events such as these warp the known reality into “How the shit can that happen??”, causing enough alarm to warrant special attention and focus, if not panic. Yet, all too often we see teams searching frantically to find any cause, blame the most obvious, immediate factor, declare victory, and move on.
“Who pushed the button?” is only one of many questions.
Toyota’s Taichi Ohno, the father of Lean Manufacturing, recognized his team’s habit of accepting the most apparent cause, ignoring (wasting) other elements revealed by an incident, potentially allowing it to be eventually repeated. Ohno (the person, not the exclamation typically uttered during an incident) emphasized the importance of asking “5 Why’s” in order to move beyond the most obvious explanation (and accompanying blame), to peel the onion diving deeper into contributory causes.
Questions beyond the reflexive “What happened?” and “Who did it?” relevant to the false alarm and erroneous account deletion incidents include:
- Why did the system act differently than the individual expected (is there more training required, is the user interface a confusing one)?
- Why did it take so long to correct (is there no playbook for detecting / reversing such a message or key account activity)?
- Why does the system allow such an impactful event to be performed unilaterally, by a single person (what safeguards should exist requiring more than one set of hands?)
- Why does this particular person have such authorization to perform this action (should a non-employee have the ability to delete such a verified, popular and influential account)?
- Why was the possibility of this incident not anticipated and prevented (why were Twitter employee requests for better safeguards ignored for years, why wasn’t the ease of making such a mistake recognized and what other similar mistake opportunities are there)?
Both of these incidents have had an impact far beyond those directly affected (Hawaiian inhabitants or Trump Twitter followers), and have shed light on the need to recognize the world has changed and policies and practices of old might not be enough for today. The ballistic missile false alarm revealed that more controls need to be placed on all mass communication, but also that Hawaii (or anywhere/anyone else) is extremely unprepared for the unthinkable. The use of Twitter as a channel for the President now raises questions over the validity of it as a Presidential record, asks who should control such a channel, and raises concerns on what security is around the President’s account?
Ask 5 Whys, look beyond the immediate impact to find collateral learnings, and take notice of all that an incident can reveal.
AKF Partners have been brought in by over 400 companies to avoid such incidents, and when they do occur, to learn from them. Let us help you.
Subscribe to the AKF Newsletter
Hosting Lessons from Harvey and Irma
September 19, 2017 | Posted By: Greg Fennewald
Everyone was saddened to see the horrific destruction storms caused to Houston and Florida, including deaths and extensive property damage. It seems reasonable that the impact of these hurricanes was lessened by advanced notice and preparation – stockpiling supplies, evacuating the highest risk areas, and staging response resources to assist with recovery and rebuilding.
Data centers operate every day with a similar preparation mindset: diesel generators to provide power should the utility fail, batteries to keep servers running during a transition, potentially stored water or a well to replace municipal water service for cooling systems, and food and water for personnel unable to leave the location.
What happens when a “prepared” location such as a data center encounters a hurricane with strong winds, heavy rain, and extensive flooding? In some cases, the data center survives without impact, although there certainly will be outages and failures. Examples of data centers surviving Harvey in good shape can be seen here, while accounts of the service impacts caused by Hurricane Sandy can be seen here.
Data Center Points of Failure
Let’s examine what may enable a data center to survive without functional impact. Extensive risk investigation goes into site selection for data centers. Data centers are expensive to build with costs measured in the tens or even hundreds of millions of dollars. The potential business impact of a failure can be costly with liquidated damage clauses in hosting contracts. These factors lead to data centers being located outside of flood plains, away from hazardous material routes, and stoutly constructed to endure storm winds likely in the region.
Losing utility power is regarded as a “when” not an “if” in the data center industry (be that an outage or a planned maintenance activity), and diesel generators are a common solution, often with 24 hours or more of fuel on hand and multiple replenishment contracts. Data centers can survive for days/weeks without utility power, and in some cases for months. How could flooding impact power? The service entrance for a data center, where the utility power is routed, is often buried underground. Utility power is likely to be lost during flooding, either from damage due to flooding or intentional actions to prevent damage by shutting down the local grid. A data center would operate on generator if the data center itself is not flooded, although fuel replenishment is not likely. If there are two feet of water in the main electrical room(s), the data center is going dark.
Many large data centers rely on evaporating water to cool the servers it hosts. Evaporative cooling is generally more energy efficient than other options, but introduces an additional risk to operations – water supply. In many locations, municipal water pressure is lost during an extend power outage. Data centers can mitigate this risk by using water storage tanks or water wells onsite. Like diesel generators, the data centers can operate normally for hours or days without municipal water. A data center should be outside the flood plain, able to operate without utility power or municipal water for hours or days, is structurally strong enough to handle the winds of a major storm – is there any other risk to mitigate? Network connectivity and bandwidth.
Most data centers need to communicate with other data centers to fulfill their OLAP or OLTP purpose. Without connectivity, services are not available. Data should be fine, but it is becoming increasingly stale. Transactions and traffic are done. Like utility power, network connections are usually buried. With distance and geographic limitations involved, network pathways may get flooded as may the facilities that aggregate and transmit the data. Telecom facilities generally have generators and other availability measures, but can be forced into less advantageous locations and may have a shorter runtime standard than a data center.
Data centers that are serious about availability generally have carrier diversity and physical pathway diversity to mitigate carrier outages and “backhoe fades”. This may help in the event of widespread flooding as well. The reality is a data center without connectivity is generally useless. All the risk mitigation going into structural design, power and cooling redundancy, and fire protection is moot if connectivity fails.
Preparing for the Inevitable
The best way to mitigate these risks is to not rely on a single data center location. One is none and two is one. Owned, colo, managed hosting, or cloud – be able to survive the loss of a single location. The RTO and RPO of the business will guide the choice of active – active, hot – cold, or data backup with an elastic compute response plan. Hurricanes can cause regional impact, such as Irma disrupting most of Florida. In years past, many companies decided to have two data center within 20 miles of each other to support synchronous data base replication. A primary site in one borough of New York City, and the DR site in a different borough. Replication options and data base management techniques have advanced sufficiently to allow far greater dispersion today. Avoid a regionally impacting event by choosing data centers in diverse regions.
Operating from 3 locations can be cheaper than 2, and can also improve customer satisfaction with reduced response times produced by serving customers from the nearest location. See Rule 12 in Scalability Rules. The ability to operate from multiple locations also enables a choice to adjust the redundancy of those locations. A combination of Tier II and III locations may be a more economical choice than a pair of Tier IV locations.
Developing a hosting plan can be complicated and frustrating, particularly since the core competency of your business is likely not data centers. AKF Partners can help – not only with hosting strategy, but also the product architecture and operational processes needed to weld infrastructure, architecture, and process into a seamless vehicle that delivers services to your clients with availability the market demands.
Hurricanes aren’t the only disasters that can take down your data center. Solar flares, runaway SUVs, civil disruption, tornadoes and localized power outages have all caused data centers to fail. Natural disasters of all types trail equipment failures and human error as causes of service impacting events (source: 365DataCenters). According to FEMA, 40% of businesses that close due to a disaster don’t reopen, and of those that do only 29% are in business two years after the disaster (source: FEMA). Don’t be a statistic. AKF Partners can help you with the product architecture and data center planning necessary to survive nearly any disaster.
Reach out to AKF
Subscribe to the AKF Newsletter
1 2 >