March 19, 2019 | Posted By: Marty Abbott
Tim Berners-Lee and his colleagues at CERN, the IETF and the W3C consortium all understood the value of being stateless when they developed the Hyper Text Transfer Protocol. Stateless systems are more resilient to multiple failure types, as no transaction needs to have information regarding the previous transaction. It’s as if each transaction is the first (and last) of its type.
First let’s quickly review three different types of state. This overview is meant to be broad and shallow. Certain state types (such as the notion of View state in .Net development) are not covered.
The Penalty (or Cost) of State
State costs us in multiple ways. State unique to a user interaction, or session state, requires memory. The larger the state, the more memory requirement, the higher cost of the server and the greater the number of servers we need. As the cost of goods sold increase, margins decrease. Further, that state either needs to be replicated for high availability, and additional cost, or we face a cost of user dissatisfaction with discrete component and ultimately session failures.
When application state is maintained, the cost of failure is high as we either need to pay the price of replication for that state or we lose it, negatively impacting customer experience. As memory associated with application state increases, so does the memory requirement and associated costs of the server upon which it runs. At high scale, that means more servers, greater costs, and lower gross margins. In many cases, we simply have no choice but to allow application state. Interpreters and java virtual machines need memory. Most applications also require information regarding their overall transactions distinct from those of users. As such, our goal here is not to eliminate application state but rather minimize it where possible.
When connection state is maintained, cost increases as more servers are required to service the same number of requests. Failures become more common as the failure probability increases with the duration of any connection over distance.
Our ideal outcome is to eliminate session state, minimize application state and eliminate connection state.
But What if I Really, Really, Really Need State?
Our experience is that simply saying “No” once or twice will force an engineer to find an innovative way to eliminate state. Another interesting approach is to challenge an engineer with a statement like “Huh, I heard the engineers at XYZ company figured out how to do this…”. Engineers hate to feel like another engineer is better than them…
We also recognize however that the complete elimination of state isn’t possible. Here are three examples (not meant to be all inclusive) of when we believe the principle of stateless systems should be violated:
Shopping carts need state to work. Information regarding a past transaction - (add_to_cart) for instance needs to be held somewhere prior to check_out. Given that we need state, now it’s just a question of where to store it. Cookies are good places. Distributed object caches are another location. Passing it through the URL in HTTP GET methods is a third. A final solution is to store it in a database.
No sane person wants to wrap debits and credits across distributed servers in a single, two-phase commit transaction. Banks have had a solution for this for years – the eventual consistent account transaction. Using a tiny workflow or state machine, debit in one transaction and eventually (ideally quickly) subsequently credit in a second transaction. That brings us to the notion of workflow and state machines in general.
What good is a state machine if it can’t maintain state? Whether application state or session state, the notion of state is critical to the success of each solution. Workflow systems are a very specific implementation of a state machine and as such require state. The trick with these is simply to ensure that the memory used for state is “just enough”. Govern against ever increasing session or application state size.
This brings us to the newest cube model in the AKF model repository:
The Session State Cube
The AKF State Cube is useful both for thinking through how to achieve the best possible state posture, and for evaluating how well we are doing against an aspiration goal (top right corner) of “Stateless”.
The X axis describes size of state. It moves from very large (XL) state size to the ideal position of zero size, or “No State”. Very large state size suffers from higher cost, higher impact upon failure, and higher probability of failure.
The Y axis describes the degree of distribution of state. The worst position, lower left, is where state is a singleton. While we prefer not to have state, having only one copy of it leaves us open to large – and difficult to recover from – failures and dissatisfied customers. Imagine nearly completing your taxes only to have a crash wipe out all of your work! Ughh!
Progressing vertically along the Y axis, the singleton state object in the lower left is replicated into N copies of that state for high availability. While resolving the recovery and failure issues, performing replication is costly both in extra memory and network transit. This is an option we hope to avoid for cost reasons.
Following replication are several methods of distribution in increasing order of value. Segmenting the data by some value “N” has increasing value as N increases. When N is 2, a failure of state impacts 50% of our customers. When N is 100, only 1% of our customers suffer from a state failure. Ideally, state is also “rebuildable” if we have properly scattered state segments by a shard key – allowing customers to only have to re-complete a portion of their past work.
Finally, of course, we hope to have “no state” (think of this as division by infinite segmentation approaching zero on this axis).
The Z Axis describes where we position state “physically”.
The worst location is “on the same server as the application”. While necessary for application state, placing session data on a server co-resident with the application using it doubles the impact of a failure upon application fault. There are better places to locate state, and better solutions than your application to maintain it.
A costly, but better solution from an impact perspective is to place state within your favorite database. To keep costs low, this could be an opensource SQL or NoSQL database. But remember to replicate it for high availability.
A less costly solution is to place state in an object cache, off server from the application. Ideally this cache is distributed per the Y axis.
The least costly solution is to have the client (browser or mobile app) maintain state. Use a cookie, pass the state through a GET method, etc.
Finally, of course the best solution is that it is kept “nowhere” because we have no state.
The AKF State Cube serves two purposes:
- Prescriptive: It helps to guide your team to the aspirational goal of “stateless”. Where stateless isn’t possible, choose the X, Y and Z axis closest to the notion of no state to achieve a low cost, highly available solution for your minimized state needs.
- Descriptive: The model helps you evaluate numerically, how you are performing with respect to stateless initiatives on a per application/service basis. Use the guide on the right side of the model to evaluate component state on a scale of 1 to 10.
AKF Partners helps companies develop world class, low cost of operations, fast time to market, stateless solutions every day. Give us a call! We can help!
March 19, 2019 | Posted By: Pete Ferguson
The Crippling Cost of Distractions
Perhaps the biggest thief of progress is the misuse of time. Not talking about laziness here – I’m referring to what we often see with our clients – losing focus on what matters most.
Young startups are able to accomplish great things in a very short amount of time because there is rarely anything else to distract them from achieving success and everyone is focused on a shared outcome. Any money they have will likely be quickly exhausted. So food, sleep, vacations, and a life are all put on hold - or at least on the back burner - until products are built, tested, released, and adopted.
As a corporate survivor of 18 years at eBay, I’ve seen a few common themes within my own experience that are also exhibited at other companies where I’ve been involved in the Technical Due Diligence, three day technical architectural workshops, and longer term engagements where I’ve been embedded one of several technical resources. These distractions include:
- Supporting legacy applications at the cost of new innovation
Often M&A teams are highly focused on the short term – what can be accomplished this year or this quarter.
I’ve been involved in many due diligence activities as an internal consultant and as a third-party conducting an external technical due diligence. There is a lot of pressure on the M&A team to sell the deal, and in my experience I’ve seen the positives inflated and the negatives – well, negated, to get the deal done.
Often the full impact on the day-to-day operations team is either not considered or underplayed – if not optimistically overvalued as to how much work can get done in a week realistically over time.
Acquisition Distractions Development is Required to Resolve:
- Integration of Delivery (How to ship, security requirements, etc.)
- Process (Agile vs. Waterfall, what flavor of Agile, etc.)
- Product (Sharing of technologies and features)
An older example, but one that has renewed relevance with the recent focus of another activist investor is the story of eBay’s acquisition of Skype in Q4 of 2005. For shared resources within eBay, this was a huge distraction of already overly leveraged operations teams having to take on additional tasks. The distraction from eBay’s core mission was the true tragedy as it lost focus and opened a window for Amazon to grow 600% over the next 7 years (during one of the worst markets since the Great Depression) while eBay struggled to rebound its’ stock price. When eBay reemerged, Amazon had become a market leader in online commerce and has since pulled significantly ahead in many additional markets.
In our technical due diligences and three day architectural workshops, we can see how companies who were once lean and mean become quickly overwhelmed and distracted from innovation by trying to integrate a flood of acquisitions.
At winning companies, we see that M&A teams carefully evaluate:
- If acquisition will bring immediate ROI (PayPal immediately impacted eBay’s stock price positively and was able to sustain for several years, where Skype did the opposite)
- How much effort integrating the acquisition will require and build enough resources in the acquisition cost to accomplish a speedy induction
- How well the two companies’ cultures meld together
From an internal perspective in my time at eBay, the value Skype would bring wasn’t apparent, and externally most analysts were similarly scratching their heads. For Amazon, it was a godsend as it gave them an opportunity to bear down and focus and take a lot of market share from a sleeping and distracted giant.
Supporting Legacy Applications at the Cost of New Innovation
Legacy applications will keep every business from certain opportunities. We have seen multiple times where a company who was once a leader and innovator became too sales-focused and started squeezing out every last ounce of profitability from their legacy products. As a result, a large % of top developers were focused on building out new features into a declining market allowing a window of opportunity for new startups and competition to edge in.
Top of mind examples are Workday’s stealing of market share from SAP before they purchased Success Factors, or Apple’s edging into PC sales as they created a larger ecosystem of phones, tablets, laptops, TV boxes, etc.
Google is one of the best examples of being willing to dump legacy products - even successful ones - if they see support and maintenance as not keeping up with ROI or cutting into areas where they can innovate faster (Froogle, and Google+ and their capping of support of Chromebooks and Pixel phones at 5 years).
Reasons to sunset legacy products in favor of focusing efforts on developing the “next big thing:”
- Rising Maintenance and Talent Costs – Cost and scope creep happen suddenly and often are much greater than companies are willing to measure or to admit. One of the more informative questions we always ask is: “if you were to build a new product, what would it look like?” or “if you were to go out to market today, would you develop a new product the same way?” The answers are usually very revealing and conclusive - the legacy products have run their course.
- Longer Term Profitability – It takes a bold move to suspend the current cash cow in favor of focusing “all hands on deck” to develop the next big idea and future cash cow. Even when the potential future could be a 3, 4, or 5X revenue generator, companies get short-sighted and are only focused on this quarter’s revenue.
- Lost Opportunity Cost – Put aside your current quarterly projections for sales of legacy products, what is your current and potential competition building today that could greatly diminish your company in just a few years?
To compete with young, well-funded and high octane startups, legacy companies need to create a labs environment to attract top developers and free them from daily distractions of supporting legacy software to remain focused on emerging products and markets.
To succeed, teams must be focused on the highest possible ROI.
- Ensure acquisitions have been properly evaluated for the amount of time and energy it will take to integrate their products and that the cultures of your company and that being acquired are compatible.
- Supporting legacy applications is important for ongoing profitability, but likely can also be done by teams with less experience and expertise. What is often not factored properly is the opportunity cost of keeping legacy products in play long past their shelf life.
- Keep your A-teams focused on new product development, not on keeping the lights on. Create a separate Labs organization and treat them like a startup, and free them from legacy decisions on infrastructure, coding languages, etc. Your competition is starting with a fresh slate, you should too.
We’ve helped many startups and Fortune 500 companies make the transition from legacy software and hardware to SaaS and cloud infrastructure to increase scalability while providing high availability. Let us help your organization!
(Photo Credit: Matthew Henry downloaded from Burst.Shopify.com)
March 15, 2019 | Posted By: Marty Abbott
I’m no Nostradamus when it comes to predicting the future of technology, but some trends are just too blatantly obvious to ignore. Unfortunately, they are only easy to spot if you have a job where you are allowed (I might argue required) to observe broader industry trends. AKF Partners must do that on behalf of our clients as our clients are just too busy fighting the day-to-day battles of their individual businesses.
One such very concerning probability is the eventual decline – and one day potentially the elimination of – the colocation (hosting) business. Make no mistake about it – if you lease space from a colocation provider, the probability is high that your business will need to move locations, move providers, or experience a service disruption soon.
Let’s walk through the factors and trends that indicate, at least to me, that the industry is in trouble, and that your business faces considerable risks:
Sources of Demand for Colocation (Macro)
Broadly speaking, the colocation industry was built on the backs of young companies needing to lease space for compute, storage, and the like. As time progressed, more established companies started to augment privately-owned data centers with colocation facilities to avoid the burden of large assets (buildings, capital improvements and in some cases even servers) on their balance sheets.
The first source of demand, small companies, has largely dried up for colocation facilities. Small companies seek to be “asset light” and most frequently start their businesses running on Infrastructure as a Service (IaaS) providers (AWS, GCP, Azure etc.). The ease and flexibility of these providers enable faster time to market and easier operational configuration of systems. Platform as a Service (PaaS) offerings in many cases eliminate the need for specialized infrastructure and DevOps skill sets, allowing small companies to focus limited funds on software engineers that will help create differentiating experiences and capabilities. Five years ago, successful startups may have started migrating into colocation facilities to lower costs of goods sold (COGS) for their products, and in so doing increase gross margin (GM). While this is still an opportunity for many successful companies, few seem to take advantage of it. Whether due to vendor lock-in through PaaS services, or a preference for speed and flexibility over expenses, the companies tend to stay with their IaaS provider.
Larger, more established companies continue to use colocation facilities to augment privately-owned data centers. That said, in most cases technology refresh results in faster and more efficient compute. When the rate of compute increases faster than the rate of growth in transactions and revenue within these companies, they start to collapse the infrastructure assets back into wholly-owned facilities (assuming power, space, and cooling of the facilities are not constraints). Bringing assets back in-house to owned facilities lowers costs of goods sold as the company makes more efficient use of existing assets.
Simultaneously these larger firms also seek the flexibility and elasticity of IaaS services. Where they have new demand for new solutions, or as companies embark upon a digital transformation strategy, they often do so leveraging IaaS.
The result of these forces across the spectrum of small to large firms reduces overall demand. Reduced demand means a contraction in the colocation industry overall.
Minimum Efficient Scale and the Colocation Industry (Micro)
Data centers are essentially factories. To achieve optimum profitability, fixed costs such as the facility itself, and the associated taxes, must be spread across the largest possible units of production. In the case of data centers, this means achieving maximum utilization of the constraining factors (space, power, and cooling capacity) across the largest possible revenue base. Maximizing utilization against the aforementioned constraints drops the LRAC (long run average cost) as fixed costs are spread across a larger number of paying customers. This is the notion of Minimum Efficient Scale in economics.
As demand decreases, on a per data center (colocation facility) basis, fixed costs per customer increases. This is because less space is used, and the cost of the facility is allocated across fewer customers. At some point, on a per data center basis the facility becomes unprofitable. As profits dwindle across the enterprise, and as debt service on the facilities becomes more difficult, the colocation provider is forced to shut down data centers and consolidate customers. Assets are sold or leases terminated with the appropriate termination penalties.
Customers who wish to remain with a provider are forced to relocate. This in turn causes customers to reconsider colocation facilities, and somewhere between a handful to a majority on a per location basis will decide to move to IaaS instead. Thus begins a vicious cycle of data center shutdowns engendering ever-decreasing demand for colocation facilities.
Excluding other macroeconomic or secular events like another real estate collapse, smaller providers start to exit the colocation service industry. Larger providers benefit from the exit of smaller players and the remaining data centers benefit from increased demand on a dwindling supply, allowing those providers to regain MES and profitability.
Does the Trend Stop at a Smaller Industry?
We are likely to continue to see the colocation industry exist for quite some time – but it will get increasingly smaller. The consolidation of providers and dwindling supply of facilities will stop at some point, but just for a period. Those that remain in colocation facilities will either not have the means or the will to move. In some cases, a lack of skills within the remaining companies will keep them “locked into” a colocation. In other cases, competing priorities will keep an exit on the distant horizon. These “lock in” factors will give rise to an opportunity for the colocation industry to increase pricing for a time.
But make no mistake about it, customers will continue to leave – just at a decreased rate relative to today’s departures. Some companies will simply go out of business or contract in size and depart the data centers. Others will finally decide that the increasing cost of service is too high.
While it’s doubtful that the industry will go away in its entirety, it will be small and comparatively expensive. The difference between costs of colocation and costs to run in an IaaS solution will start to dwindle.
Risks to Your Firm
The risk to your firm comes in three forms, listed in increasing order of risk as measured by a function of probability of occurrence and impact upon occurrence:
- Pricing of service per facility. If you are lucky enough that your facility does not close, there is a high probability that your cost for service will increase. This in turn increases your cost of goods sold and decreases your gross margin.
- Risk of facility dissolution. There exists an increasingly high probability that the facilities in which you are located will be shut down. While you are likely to be given some advance notice, you will be required to move to another facility with the same provider, or another provider. There is both a real cost in the move, and an opportunity cost associated with service interruption and effort.
- Risk of firm as a going concern. Some providers of colocation services will simply exit the business. In some cases, you may be given very little notice as in the case of a company filing bankruptcy. Service interruption risk is high.
Strategies You Must Employ Today
In our view, you have no choice but to ensure that you are ready and able to easily move out of colocation facilities. Whether this be to existing data centers you own, IaaS providers, or a mix matters not. At the very least, we suggest your development and operations processes enable the following principles:
- Environment Agnosticism: Ensure that you can run in owned, lease, managed service, or IaaS locations. Ensuring consistency in deployment platforms, using container technologies and employing orchestration systems all aid in this endeavor.
- Hybrid Hosting: Operate out of at least two of the following three options as a course of normal business operations: owned data centers, leased/colocation facilities, IaaS.
- Dynamic Allocation of Demand: Prove on at least a weekly basis that you can operate any functionality within your product out of any location you operate – especially those that happen to be located within colocation facilities.
AKF Partners helps companies think through technology, process, organization, location, and hosting strategies. Let us help you architect a hybrid hosting solution that limits your risk to any single provider.
March 6, 2019 | Posted By: Marty Abbott
Our typical assessment goes something like this: We spend 1.75 days with an energized product team comprised of engineers and product managers. We feel the passion and engagement of the team, and we see the signs of stress the team endures in trying to meet product delivery schedules. Then we meet the security person. The person is not very stressed, does not have delivery goals and seems to steal the energy from the room.
This is an angry post. I won’t apologize for that. I’m fed up with the ridiculous way that most CISOs approach security, and you should be too. The typical approach, in more than 80% of the companies with which we work, results in slow time to market, increased response time for transactions, higher than necessary cost, lower than appropriate availability, and no demonstrable difference in the level of security related incidents. Put another way, most CISOs reduce rather than increase shareholder value.
Here is a handy tool to identify value-destroying CISOs. We’ve compiled 5 common statements uttered by CISOs out of touch with the needs of the corporation, customers and shareholders. Each of these assumes that the CISO both believes the statement and acts consistently with the statement (a high probability chance). Each statement is followed by why it is bad, what it is costing you, and what (besides replacing the person) you should do.
“No, we can’t do that”
Wrong answer. The purpose of security is to help move the company towards the right business outcome, as quickly as possible, with the right level of risk for the endeavor. This means that in some cases, where the probability and impact of compromise is low, we simply do not apply much “security” to the solution. In other cases, where probability and impact is high, we put measures in place to reduce probability and impact.
We never say “No” to an ethical outcome. Rather than saying “No” to a path, we attempt to ensure the path includes the right level of risk adjustment to make it successful.
The right answer: “That may work if we make a few modifications to help reduce the following probability of an incident, and reduce the impact of an incident should it occur. Here’s how my team can help you”.
“My job is to keep us out of the paper”
Incomplete, and as a result, incorrect answer. The role of security is to ethically maximize profits, by ensuring that risk is commensurate with the endeavor. A great security team helps decrease the probability of incidents and decrease the impact of an incident should one arise. This in turn helps ensure that profits achieve an appropriate level. “Keeping us out of the paper” makes no reference to the fiduciary responsibility of providing returns to shareholders. It’s further not a path to that responsibility, as there is no tie to enabling or maintaining profitability. Hell, if you want to achieve this goal, all you have to do is go out of business!
The right answer: “My job is to ensure an appropriate risk approach to stakeholder return – specifically through helping us to achieve an appropriate risk posture for our initiatives that meets our time to market, revenue and profitability objectives”.
“We have to work the process” or “Put in a request – we’ll review it”
Wrong answer. Security isn’t a “team” in and of itself because it can’t “score” and “win”. Security is part of a larger team responsible for delighting end customers such that we can ensure appropriate profitability through superior and appropriately secure offerings. To that end, security needs to adopt an agile mindset – specifically “individual interactions over processes and tools” and be embedded within the value creation teams that are the lifeblood of a company. Further, product and operational teams need to “own” and be accountable for the risk associated with the solutions they create and maintain. Software and servers need to be secure consistent with the needs of the business and end users.
The right answer: “Let’s get together immediately and make this work. Furthermore, how could I have ensured we had folks involved earlier to help you get this out faster?”
“My job is governance” or “We need the right governance model”
Wrong answer. The implication of the above statement is that the value the security team provides is in judging the work of others and ensuring compliance. The best security teams understand that compliance is best achieved through embedding themselves within product teams – not sitting in judgement of them during the process. The fastest and highest value creating teams are those that understand and have the right tools to accomplish the necessary outcomes embedded within their teams (read the related white paper here).
The right answer: “We embed people in teams to get the right answer quickly and get the product to market faster. Good governance happens in real time and in many cases can be automated within the product development lifecycle (CICD pipelines for instance).
“We have to slow things down a bit”
Wrong answer. If you have a compelling growth business with a big vision, you are going to attract competitors. If you have competitors, getting the right solution to market quickly is critical to your success. No one “wins” by playing only defense or by being just “careful”. You win by making the best risk measured decisions quickly and releasing a good enough product before your competitors.
The right answer: “We have to figure out how to make the right decisions early without slowing down our delivery.”
Another way to determine if you have the “right” security team and correct security leader is to evaluate the number of security related engineers embedded within teams relative to the number of people evaluating approaches or “governing”. If the number of “governing” employees exceeds the number of embedded employees, you have a problem. Ideally, we want a very small number of “brakes” (governance) and more security product “gas pedals” (embedded). The latter results in better decisions and better product security in real time. The former results in delay, overhead, and an ivory tower.
We perform dozens of security assessments and technical due diligence reviews every year. Contact us and let us help!
February 22, 2019 | Posted By: Greg Fennewald
On multiple occasions over the years, we have heard our clients state a use case they want to avoid in product design sessions or as a reason for architectural choices made for existing products. These use cases can be given more credence than they deserve based on objective data – they become boogeyman legends, edge cases that can result in poor architectural choices.
One of our clients was debating the benefit of multiple live sites with customers pinned to the nearest site to minimize latency. The availability benefits of multiple live sites are irrefutable, but the customer experience benefit of less latency was questioned. This client had millions of clients spread across the country. The notion of pinning a client to a “home” site nearest them raised the question of “what happens when the client travels across the country?”. The answer is to direct them to that same home site. That client will experience more latency for the duration of the visit. The proportion of clients that spend 50% of their time on either coast is vanishingly small – keep it simple. Have a work around for clients that permanently move to a location served by a different site – client data resides in more than one location for DR purposes anyway, right?
This client also had hundreds of service specialists that would at times access client accounts and take actions on their behalf, and these service specialists were located near the west coast. Objections were made based on the latency a west coast service specialist would encounter when acting on the behalf of an east coast client whose data was hosted near the east coast. Millions of clients. Hundreds of service specialists. The math is not hard. The needs of the many outweigh the needs of the few.
A different client had a concern about data consistency upon new user registration for their service. To ensure a new customer could immediately transact, the team decided to deploy a single authentication server to preclude the possibility of a transaction following registration hitting an authentication server that had not yet received the registration data. Intentionally deploying a SPOF should have raised immediate objections but did not. The team deployed a passive backup server that required manual intervention to work.
The new user process flow was later revealed to be less than 3% of the overall transactions. 97% of the transactions suffered an impactful outage along with the 3% new users when the SPOF authentication server failed. Designing a workaround for the new users while employing a write master with multiple, load balanced read only slaves would provide far better availability. The needs of the many outweigh the needs of the few.
It is important to remain open minded during early design sessions. It is also important to follow architectural principles in the face of such use cases. How can one balance potentially conflicting concepts?
• Ask questions best answered with objective data.
• Strive for simplicity, shave with Occam’s Razor
• Validate whether the edge case is a deal breaker for the product owner
• Propose a work around that addresses the edge case while optimizing the architecture for the majority use case and sound principles.
Catering to the needs of the business while adhering to architectural standards is a delicate balancing act and compromises will be made. Everyone looks at the technologist when a product encounters a failure. Know when to hold the line on sound architectural principles that safeguard product availability and user experience. The product owner must understand and acknowledge the architectural risks resulting from product design decisions. The technologist must communicate these risks to the product owner along with objective data and options. A failure to communicate effectively can lead to the tail wagging the dog – do not let that happen.
With 12 years of product architecture and strategy experience, AKF Partners is uniquely positioned to be your technology partner. Learn more here.
December 6, 2018 | Posted By: AKF
The United States Special Operations Command (SOCOM) has 5 truths that they live by. All of them have guided SOCOM to be the premier fighting force in the world and they all center around people:
-Humans are more important than hardware
Never prioritize your equipment over people. Hardware is always easily replaced. The same is true in technology. Buying a new server is easy. Replacing good engineers is not.
-Quality is better than quantity
Anytime you have the choice between one fully capable person and two okay people, always choose quality over quantity. It gives you the ability to still get the job done and provides more overhead for future hires.
-Special Operations Forces cannot be mass produced
Just like SOCOM, engineers cannot be made in a factory. Yes there are boot camps that help to fill the void in engineering roles but even those graduates still require time to get up to speed on what it is your organization does. No two companies are the same.
-Competent Special Operations Forces cannot be created after emergencies occur
If you hire engineers based upon a disaster, they will be ready for tomorrow’s disaster, not today’s. Always be prepared.
-Most special operations require non-SOF assistance
Engineers can’t work alone. If QA, Security, Marketing, etc. are not present and supporting, then the work that is produced will not be what is required.
It takes an entire team beyond just engineers to get the job done.
People are what make organizations great, nothing else. You may be known for a product, but it is your people that developed it and keep it running.
Seed, Feed and Weed
AKF has a principle of Seed, Feed and Weed. Broken down into each component:
Every day new technology and talent is being produced. Additionally skilled individuals are always looking for their next challenge, particularly if they currently work in a non-favorable company. Some companies use entities to find and evaluate new talent to see if they would be a good fit. Others rely heavily upon their own brand to attract talent. And still others network throughout the community to make sure they are constantly aware of locally available skill.
Given that every day distance from the workplace matter less and less for employees, using all three can be very integral to staying viable. If using an outside entity to evaluate new talent isn’t feasible, make sure that you have a solid method in place for determining not only people’s skill, but their fit into your culture. If you don’t have a strong name to fall back on, make your software the differentiator. You may not be the biggest mass producer of widgets in the world, but maybe you are the only one who can make them blue. Gravitate towards that for attracting talent. And lastly if you can’t network then take some tips from marketing. Figure out how to get out there and sell yourself and your company. If you can’t talk excitedly about what you do with someone you meet, then they won’t be excited to work for you.
There are different facets to Feed. An important aspect of Feed is feedback. Continually communicating with your employees about how they are doing (and not just the wrong stuff) gives them a sense of direction for where you think they should be going. It also lets them know that you pay attention and value their work. Beyond feedback is also training. Managers need to be aware of the required training that someone needs to do their job. If they are managing your databases and have zero database knowledge that is an issue. But employees need more than just required training. They need something that stimulates them. Leaders ensure that employees grow beyond what their role defines them as. If they want to take Underwater Basket Weaving, well encourage it. It may not always benefit the company, but if it benefits the employee it usually has a way of having positive returns for everyone.
Last, and certainly not least, is Weed. No one likes to be the bad guy. Firing people is tough - but getting rid of people with bad behaviors or repeatedly poor results is part of our duty as managers and executives. Have you tried to make them better? Maybe they were just in the wrong position or didn’t have the right training. Or worse, they just weren’t a good fit for the company. If someone has bad behavior it can be very difficult to remedy. If someone is lacking experience, then it just takes training and mentorship. 9 times out of 10, experience can be fixed. 9 times out of 10 it is just better for everyone to part ways with behavioral issues.
The key to a good workplace is identifying which is most important to you at any given time as sometimes it can shift. Which of the three to apply more focus on depends on the condition of your team.
So why are people so important? Unless you have already developed Artificial Intelligence they are the ones doing the work. The Special Operations Community is aware that money can be used to purchase new equipment but the right people take time and nurturing. The same should be applied for your business. If availability and scalability are your concerns, money can get you all the rack space or cloud storage you need. But only time and effort can get you the right people to manage it.
December 4, 2018 | Posted By: Marty Abbott
During the last 12 years, many prospective clients have asked us some variation of the following questions: “What makes you different?”, “Why should we consider hiring you?”, or “How are you differentiated as a firm?”.
The answer has many components. Sometimes our answers are clear indications that we are NOT the right firm for you. Here are the reasons you should, or should not, hire AKF Partners:
Operators and Executives – Not Consultants
Most technology consulting firms are largely comprised of employees who have only been consultants or have only run consulting companies. We’ve been in your shoes as engineers, managers and executives. We make decisions and provide advice based on practical experience with living with the decisions we’ve made in the past.
Engineers – Not Technicians
Educational institutions haven’t graduated enough engineers to keep up with demand within the United States for at least forty years. To make up for the delta between supply and demand, technical training services have sprung up throughout the US to teach people technical skills in a handful of weeks or months. These technicians understand how to put building blocks together, but they are not especially skilled in how to architect highly available, low latency, low cost to develop and operate solutions.
The largest technology consulting companies are built around programs that hire employees with non-technical college degrees. These companies then teach these employees internally using “boot camps” – creating their own technicians.
Our company is comprised almost entirely of “engineers”; employees with highly technical backgrounds who understand both how and why the “building blocks” work as well as how to put those blocks together.
Product – Not “IT”
Most technology consulting firms are comprised of consultants who have a deep understanding of employee-facing “Information Technology” solutions. These companies are great at helping you implement packaged software solutions or SaaS solutions such as Enterprise Resource Management systems, Customer Relationship Management Systems and the like. Put bluntly, these companies help you with solutions that you see as a cost center in your business. While we’ve helped some partners who refuse to use anyone else with these systems, it’s not our focus and not where we consider ourselves to be differentiated.
Very few firms have experience building complex product (revenue generating) services and platforms online. Products (not IT) represent nearly all of AKF’s work and most of AKF’s collective experience as engineers, managers and executives within companies. If you want back-office IT consulting help focused on employee productivity there are likely better firms with which you can work. If you are building a product, you do not want to hire the firms that specialize in back office IT work.
Business First – Not Technology First
Products only exist to further the needs of customers and through that relationship, further the needs of the business. We take a business-first approach in all our engagements, seeking to answer the questions of: Can we help a way to build it faster, better, or cheaper? Can we find ways to make it respond to customers faster, be more highly available or be more scalable? We are technology agnostic and believe that of the several “right” solutions for a company, a small handful will emerge displaying comparatively low cost, fast time to market, appropriate availability, scalability, appropriate quality, and low cost of operations.
Cure the Disease – Don’t Just Treat the Symptoms
Most consulting firms will gladly help you with your technology needs but stop short of solving the underlying causes creating your needs: the skill, focus, processes, or organizational construction of your product team. The reason for this is obvious, most consulting companies are betting that if the causes aren’t fixed, you will need them back again in the future.
At AKF Partners, we approach things differently. We believe that we have failed if we haven’t helped you solve the reasons why you called us in the first place. To that end, we try to find the source of any problem you may have. Whether that be missing skillsets, the need for additional leadership, organization related work impediments, or processes that stand in the way of your success – we will bring these causes to your attention in a clear and concise manner. Moreover, we will help you understand how to fix them. If necessary, we will stay until they are fixed.
We recognize that in taking the above approach, you may not need us back. Our hope is that you will instead refer us to other clients in the future.
Are We “Right” for You?
That’s a question for you, not for us, to answer. We don’t employ sales people who help “close deals” or “shape demand”. We won’t pressure you into making a decision or hound you with multiple calls. We want to work with clients who “want” us to partner with them – partners with whom we can join forces to create an even better product solution.
November 20, 2018 | Posted By: AKF
“Quality in a service or product is not what you put into it. It’s what the customer gets out of it.” Peter Drucker
The Importance of QA
High levels of quality are essential to achieving company business objectives. Quality can be a competitive advantage and in many cases will be table stakes for success. High quality is not just an added value, it is an essential basic requirement. With high market competition, quality has become the market differentiator for almost all products and services.
There are many methods followed by organizations to achieve and maintain the required level of quality. So, let’s review how world-class product organizations make the most out of their QA roles. But first, let’s define QA.
According to Wikipedia, quality assurance is “a way of preventing mistakes or defects in products and avoiding problems when delivering solutions or services to customers. But there’s much more to quality assurance.”
There are numerous benefits of having a QA team in place:
- Helps increase productivity while decreasing costs (QA HC typically costs less)
- Effective for saving costs by detecting and fixing issues and flaws before they reach the client
- Shifts focus from detecting issues to issue prevention
Teams and organizations looking to get serious about (or to further improve) their software testing efforts can learn something from looking at how the industry leaders organize their testing and quality assurance activities. It stands to reason that companies such as Google, Microsoft, and Amazon would not be as successful as they are without paying proper attention to the quality of the products they’re releasing into the world. Taking a look at these software giants reveals that there is no one single recipe for success. Here is how five of the world’s best-known product companies organize their QA and what we can learn from them.
Google: Searching for best practices
How does the company responsible for the world’s most widely used search engine organize its testing efforts? It depends on the product. The team responsible for the Google search engine, for example, maintains a large and rigorous testing framework. Since search is Google’s core business, the team wants to make sure that it keeps delivering the highest possible quality, and that it doesn’t screw it up.
To that end, Google employs a four-stage testing process for changes to the search engine, consisting of:
- Testing by dedicated, internal testers (Google employees)
- Further testing on a crowdtesting platform
- “Dogfooding,” which involves having Google employees use the product in their daily work
- Beta testing, which involves releasing the product to a small group of Google product end users
Even though this seems like a solid testing process, there is room for improvement, if only because communication between the different stages and the people responsible for them is suboptimal (leading to things being tested either twice over or not at all).
But the teams responsible for Google products that are further away from the company’s core business employ a much less strict QA process. In some cases, the only testing done by the developer responsible for a specific product, with no dedicated testers providing a safety net.
In any case, Google takes testing very seriously. In fact, testers’ and developers’ salaries are equal, something you don’t see very often in the industry.
Facebook: Developer-driven testing
Like Google, Facebook uses dogfooding to make sure its software is usable. Furthermore, it is somewhat notorious for shaming developers who mess things up (breaking a build or causing the site to go down by accident, for example) by posting a picture of the culprit wearing a clown nose on an internal Facebook group. No one wants to be seen on the wall-of-shame!
Facebook recognizes that there are significant flaws in its testing process, but rather than going to great lengths to improve, it simply accepts the flaws, since, as they say, “social media is nonessential.” Also, focusing less on testing means that more resources are available to focus on other, more valuable things.
Rather than testing its software through and through, Facebook tends to use “canary” releases and an incremental rollout strategy to test fixes, updates, and new features in production. For example, a new feature might first be made available only to a small percentage of the total number of users.
Canary Incremental Rollout
By tracking the usage of the feature and the feedback received, the company decides either to increase the rollout or to disable the feature, either improving it or discarding it altogether.
Amazon: Deployment comes first
Like Facebook, Amazon does not have a large QA infrastructure in place. It has even been suggested (at least in the past) that Amazon does not value the QA profession. Its ratio of about one test engineer to every seven developers also suggests that testing is not considered an essential activity at Amazon.
The company itself, though, takes a different view of this. To Amazon, the ratio of testers to developers is an output variable, not an input variable. In other words, as soon as it notices that revenue is decreasing or customers are moving away due to anomalies on the website, Amazon increases its testing efforts.
The feeling at Amazon is that its development and deployment processes are so mature (the company famously deploys software every 11.6 seconds!) that there is no need for elaborate and extensive testing efforts. It is all about making software easy to deploy, and, equally if not more important, easy to roll back in case of a failure.
Spotify: Squads, tribes and chapters
Spotify does employ dedicated testers. They are part of cross-functional teams, each with a specific mission. At Spotify, employees are organized according to what’s become known as the Spotify model, constructed of:
- Squads. A squad is basically the Spotify take on a Scrum team, with less focus on practices and more on principles. A Spotify dictum says, “Rules are a good start, but break them when needed.” Some squads might have one or more testers, and others might have no testers at all, depending on the mission.
- Tribes are groups of squads that belong together based on their business domain. Any tester that’s part of a squad automatically belongs to the overarching tribe of that squad.
- Chapters. Across different squads and tribes, Spotify also uses chapters to group people that have the same skillset, in order to promote learning and sharing experiences. For example, all testers from different squads are grouped together in a testing chapter.
- Guilds. Finally, there is the concept of a guild. A guild is a community of members with shared interests. These are a group of people across the organization who want to share knowledge, tools, code and practices.
Spotify Team Structure
Testing at Spotify is taken very seriously. Just like programming, testing is considered a creative process, and something that cannot be (fully) automated. Contrary to most other companies mentioned, Spotify heavily relies on dedicated testers that explore and evaluate the product, instead of trying to automate as much as possible. One final fact: In order to minimize the efforts and costs associated with spinning up and maintaining test environments, Spotify does a lot of testing in its production environment.
Microsoft: Engineers and testers are one
Microsoft’s ratio of testers to developers is currently around 2:3, and like Google, Microsoft pays testers and developers equally—except they aren’t called testers; they’re software development engineers in test (or SDETs).
The high ratio of testers to developers at Microsoft is explained by the fact that a very large chunk of the company’s revenue comes from shippable products that are installed on client computers & desktops, rather than websites and online services. Since it’s much harder (or at least much more annoying) to update these products in case of bugs or new features, Microsoft invests a lot of time, effort, and money in making sure that the quality of its products is of a high standard before shipping.
What you can learn from world-class product organizations? If the culture, views, and processes around testing and QA can vary so greatly at five of the biggest tech companies, then it may be true that there is no one right way of organizing testing efforts. All five have crafted their testing processes, choosing what fits best for them, and all five are highly successful. They must be doing something right, right?
Still, there are a few takeaways that can be derived from the stories above to apply to your testing strategy:
- There’s a “testing responsibility spectrum,” ranging from “We have dedicated testers that are primarily responsible for executing tests” to “Everybody is responsible for performing testing activities.” You should choose the one that best fits the skillset of your team.
- There is also a “testing importance spectrum,” ranging from “Nothing goes to production untested” to “We put everything in production, and then we test there, if at all.” Where your product and organization belong on this spectrum depends on the risks that will come with failure and how easy it is for you to roll back and fix problems when they emerge.
- Test automation has a significant presence in all five companies. The extent to which it is implemented differs, but all five employ tools to optimize their testing efforts. You probably should too.
Bottom line, QA is relevant and critical to the success of your product strategy. If you’d tried to implement a new QA process but failed, we can help.
November 20, 2018 | Posted By: Roger Andelin
Diagnosing the cause of poor performance from your engineering team is difficult and can be costly for the organization if done incorrectly. Most everyone will agree that a high performing team is more desirable than a low performing team. However, there is rarely agreement as to why teams are not performing well and how to help them improve performance. For example, your CFO may believe the team does not have good project management and that more project management will improve the team’s performance. Alternatively, the CEO may believe engineers are not working hard enough because they arrive to the office late. The CMO may believe the team is simply bad and everyone needs to be replaced.
Often times, your CTO may not even know the root causes of poor performance or even recognize there is a performance problem until peers begin to complain. However, there are steps an organization can take to uncover the root cause of poor performance quickly, present those findings to stakeholders for greater understanding, and take steps that will properly remove the impediments to higher performance. Those steps may include some of the solutions suggested by others, but without a complete understanding of the problem, performance will not improve and incorrect remedies will often make the situation worse. In other words, adding more project management does not always solve a problem with on time delivery, but it will add more cost and overhead. Requiring engineers to start each day at 8 AM sharp may give the appearance that work is getting done, but it may not directly improve velocity. Firing good engineers who face legitimate challenges to their performance may do irreversible harm to the organization. For instance, it may appear arbitrary to others and create more fear in the department resulting in unwanted attrition. Taking improper action will make things worse rather than improve the situation.
How can you know what action to take to fix an engineering performance problem? The first step in that process is to correctly define and agree upon what good performance looks like. Good performance is comprised of two factors: velocity and value.
Velocity is defined as the speed at which the team works and value is defined as achievement of business goals. Velocity is measured in story points which represent the amount of work completed. Value is measured in business terms such as revenue, customer satisfaction or conversion. High performing engineering teams work quickly and their work has a measurable impact on business goals. High performing teams put as much focus on delivering a timely release as they do on delivering the right release to achieve a business goal.
Once you have agreement on the definition of good engineering performance, rate each of your engineering teams against the two criteria: velocity and value. You may use a chart like the one below:
Once each team has been rated, write down a narrative that justifies the rating. Here are a few examples:
Bottom Left: Velocity and Value are Low
“My requests always seem to take a long time. Even the most simple of requests takes forever. And, when the team finally gets around to completing the request, often times there are problems in production once the release is completed. These problems have negatively impacted customers’ confidence in us so not only are engineers not delivering value – they are eroding it!”
Upper/Middle Left: Velocity is Good and Value is Low
“The team does get stuff done. Of course I’d like them to go faster, but generally speaking they are able to get things done in a reasonable amount of time. However, I can’t say if they are delivering value – when we release something we are not tracking any business metrics so I have no way of knowing!”
Upper Right: Velocity is High and Value is High
“The team is really good. They are tracking their velocity in story points and have goals to improve velocity. They are already up 10% over last year. Also, they instrument all their releases to measure business value. They are actively working with product management to understand what value needs to be delivered and hypothesize with the stakeholders as to what features will be best to deliver the intended business goal. This team is a pleasure to work with.”
Unknown Velocity and Unknown Value
“I don’t know how to rate this team. I don’t know their velocity; its always changing and seems meaningless. I think the team does deliver business value, but they are not measuring it so I cannot say if it is low or high.”
With narratives in hand it’s time to begin digging for more data to support or invalidate the ratings.
Diagnosing Velocity Problems
Engineering velocity is a function of time spent developing. Therefore, the first question to answer is “what is the maximum amount of time my team is able to spend on engineering work under ideal conditions?”
This is a calculated value. For example, start with a 40 hour work week. Next, assuming your teams are following an Agile software development process, for each engineering role subtract out the time needed each week for meetings and other non-development work. For individual contributors working in an Agile process that number is about 5 hours per week (for stand up, review, planning and retro). For managers the number may be larger. For each role on the team sum up the hours. This is your ideal maximum.
Next, with the ideal maximum in hand, compare that to the actual achievement. If your teams are not logging hours against their engineering tasks, they will need to do this in order to complete this exercise. Evaluate the gap between the ideal maximum and the actual. For example, if the ideal number is 280 hours and the team is logging 200 hours, then the gap is 80 hours. You need to determine where that 80 hours is going and why. Here are some potential problems to consider:
- Teams are spending extra time in planning meetings to refine requirements and evaluating effort.
- Team members are being interrupted by customer incidents which they are required to support.
- The team must support the weekly release process in addition to their other engineering tasks.
- Miscellaneous meeting are being called by stakeholders including project status meetings and updates.
As you dig into this gap it will become clear what needs to be fixed. The results will probably surprise you. For example, one client was faced with a software quality problem. Determined to improve their software quality, the client added more quality engineers, built more unit tests, and built more automated system tests. While there is nothing inherently wrong with this, it did not address the root cause of their poor quality: Rushing. Engineers were spending about 3-4 hours per day on their engineering tasks. Context switching, interruptions and unnecessary meetings eroded quality engineering time each day. As a result, engineers rushing to complete their work tasks made novice mistakes. Improving engineering performance required a plan for reducing engineering interruptions, unnecessary meetings, and enabling engineers to spend more uninterrupted time on their development tasks.
At another client, the frequency of production support incidents were impacting team velocity. Engineers were being pulled away from their daily engineering tasks to work on problems in production. This had gone on so long that while nobody liked it, they accepted it as normal. It’s not normal! Digging into the issue, the root cause was uncovered: The process for managing production incidents was ineffective. Every incident was urgent and nearly every incident disrupted the engineering team. To improve this, a triage process was introduced whereby each incident was classified and either assigned an urgent status (which would create an interruption for the team) or something lower which was then placed on the product backlog (no interruption for the team). We also learned the old process (every incident was urgent) was in part a response to another velocity problem; stakeholders believed that unless something was considered urgent it would never get fixed by the engineering team. By having an incident triage process, a procedure for when something would get fixed based on its urgency, the engineering team and the stakeholders solved this problem.
At AKF, we are experts at helping engineering teams improve efficiency, performance, fixing velocity problems, and improving value. In many cases, the prescription for the team is not obvious. Our consultants help company leaders uncover the root causes of their performance problems, establish vision and execute prescriptions that result in meaningful change. Let us help you with your performance problems so your teams can perform at their best!
November 15, 2018 | Posted By: AKF
“but in this world nothing can be said to be certain, except death and taxes.”
...and data breaches. Given the era that Benjamin Franklin lived in the concept of a data breach was far from any possibility. But in the world we live in it is a certainty. Death, taxes and data breaches. Welcome to the 21st Century.
So how did we get here? The death and taxes is for someone else to explain, but the data breach I will help flesh out. The following article will begin to explore the world we live in where we know data breaches are not something you hope never happens, but something you prepare for to happen. Following this article, in the coming weeks I will explore what can be done when the inevitable does occur.
Data Breaches in the 21st Century
“My system is completely secure,” says the guy who is already breached and just doesn’t know it.
Why is a data breach such a certainty in these days? It comes down to four areas: similarity, interconnections, users and motive.
In 2015 Windows had a great marketing plan to upgrade as many older OSes to the current release: offer the upgrade for free. Issues with upgrading (or tricking users to upgrade against their will) aside this built a quick base for Windows 10 and quickly allowed Windows 10 to overtake version 7 in December 2017 as the most adopted version of Windows. Couple this with the fact that Windows is one of the highest used OSes and you now have a nicely populated target base.
This isn’t to say that Windows machines are more susceptible than other machines, but that given their popularity and the scheduled release of updates, malicious people are able to identify the weaknesses being patched and target machines that are slower to update. In an ideal world patches would be applied in a timely manner but there always extenuating circumstances that keep this from happening. So now if your POS (Point of Sale) system is several patches behind there is an exploit that can target its weakened state.
Don’t feel like being breached and exploited via the internet? Never get on the internet. Simple answer, but not a feasible one given the world in which we live.
At AKF we have a tenet of Build vs. Buy. From a cost perspective it doesn’t make sense to build something that you know very little about if a 3rd party already offers it for a reasonable price. But cost isn’t the only decision to weigh when it comes to connecting to a 3rd party. Risk is another major factor. Is the interaction between your system and the 3rd party enough to help insulate you from their potential compromise? Integrations through API usually help solve this issue, but sometimes a more thorough coupling of the software is necessary.
So now being reliant on an additional entity (or even more) in that 3rd party helps create another vector with which to be compromised. And to top it off, you usually don’t have any insight into their security posture. They may be obligated to provide you with quarterly security scans, but that doesn’t mean they don’t turn off their highly vulnerable machines prior to each scan.
Congratulations on having an extremely secure system that doesn’t rely on 3rd parties being secure as well. You are now brought down by an employee who thinks they won a raffle.
If this all sounds like a horrible “Choose Your Own Adventure” then you are in the right mindset. It doesn’t matter what you do to protect your systems because you have Users. This isn’t to say that all Users can’t be trusted but there are degrees to how much trust they should have. And whether inadvertent or purposeful they are an extremely susceptible entry point for a breach to occur. Advanced threats are getting smarter and smarter at crafting emails that get past basic email filters and once opened, create a backdoor for them to access the system. Once persistent call backs are established, all traffic now looks like the User is generating it internally and most security allows User initiated traffic a higher degree of freedom.
Have you ever had a bone to pick with a company and didn’t care about the legal ramifications of what you did? Hopefully not. But that segment does exist. Whether you inadvertently wronged a former customer, at least according to them, or you have something that someone else covets, they are going to move hell and high water to get it. The only thing worse than a malicious actor casting a wide net in the hopes of getting a compromise to stick, is someone specifically targeting your business. It can become an obsession for them.
Maybe they want access to the banking records you protect, or they would just like to see you embarrassed, this is a worst case scenario for a business. Someone who refuses to stop until they compromise your system. They will use everything available, leveraging similarity, interconnections and your Users to gain access.
You’ve Been Breached
Congratulations! You’ve been breached?!? Not really the accolade you were looking for, but one you need to accept. They say the first step is Acceptance, so if you’ve made it this far, you’re where you need to be. Don’t believe you are breached, or will be in the future? Feel free to read the article again and start to really ask yourself if you are secure as you think you are. The above are just small snippets of the overall vulnerability you may have.
-Don’t use Windows? Well Linux doesn’t guarantee not being compromised.
-Not connected to anyone? Possible if you are brick and mortar store that only accepts cash.
-You employ the savviest Users? Everyone makes a mistake from time to time.
-Never upset someone or owned something they could want? You aren’t a business then.
So what comes next? Well for that there is a lot of articles out there explaining how to help shore up your system. One such article comes from our very own Larry Steinberg, Are you compromised? The important thing is to pick an area where you feel that you are weakest at and go from there. More often than not this revolves around user training. But maybe checking off some items from the Australian Cyber Security Centre’s Essential Eight will help.
Ultimately you should have the best ideas on how to help secure your system, but if you find that you may need some assistance looking at you product holistically, with security in mind, AKF can help.
‹ First < 4 5 6 7 8 > Last ›