AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Category » CTO/CIO

Foster Creativity

With the economic downturn in full force, you are probably spending a great deal of time thinking about how to cut cost, reprioritize revenue generating features, or delivering more in 2009 with less resources.  You might think now is not the time to care about “creativity” and “energy” but we think this it is even more important.  Having a team that is fully engaged with all of their creative forces focused on your business is crucial to achieve any of those other objectives.  The way to achieve this is by creating an environment where people know where they stand in terms of performance, get to own deliverables, can openly question decisions or standards, and show each other respect.  

 

A couple ideas that we have either read about or seen in practice in organizations are team or individual training events, four day work weeks, allocated time to work on personal interests, self selection of features/stories, and mentoring.  Training can take the shape of many different forms including formal classes at universities, external workshops (WARNING: self-promotional plug….such as our Technology Workshop), or internal classes taught to each other by members of the team.  Everyone knows different things, sharing this knowledge is good for both the team as well as the presenter, giving her practice explaining technical items verbally  and ensuring she knows the subject completely.  

Mentoring is another low cost method of helping foster a more open and creative environment.  Pairing junior and senior engineers together provides both parties the opportunity to practice different skills.  Additionally, it helps facilitate what are likely two different groups to begin a dialog.  Mentoring can be extended in many different forms.  Ask the CEO to take a different engineer as a mentee each quarter, meeting with them for lunch or breakfast every second or third week for the quarter.  This is a great way to remind the top executive to appreciate the engineers and gets engineers exposure to the business challenges that the CEO faces daily, a real win-win proposition.

Some of the more radical approaches for developing a creative environment are already well documented by some very popular companies including Google and 37Signals.  If you haven’t read the 37Signals book, we recommend this as a great source of ideas for fostering a creative and unique environment for your team.


Comments Off

Team Size

As you consider hiring plans for next year, one aspect of your organization that you should give some thought to is what is the optimal team size?  You may have heard of Amazon’s rule of “two pizzas”, which basically says a team should be no larger than what it takes to feed with two large pizzas.  If our experience in feeding engineers pizza is typical, this means about 8 to 10 people.  As a general rule, we think this is fine but we have some other factors that we recommend incorporating if you want a more precise number.  These factors include experience of the managers, how long the team has been together, and manager responsibilities.

Before we explore the factors that influence optimal team size, first we should discuss why team size is important.  Consider a team of two people, they know each other’s quirks, they always know what each other are working on, and they never forget to communicate with each other, sounds perfect right?  Well consider they also don’t have enough engineering effort to tackle big projects in a timely manner, they don’t have the flexibility to transfer to another team because each one probably knows stuff that no one else does and they probably have their own coding standards that are not common among other two person teams.  Obviously, small teams and large teams each have their pros and cons.  They key is to balance each to get the optimal result for your organization. 

The first factor that you should consider is the experience level of the managers.  If the managers are experienced they should be capable of handling more direct reports.  New managers should be given fewer direct reports in order for them to have time to develop their management skills.  Keeping resource maps up to date for 15 engineers can be overwhelming for a new manager but one that has done it for years would have no problem doing it.  If you are filling vacancies in your management ranks try to be as detailed in the organization chart as possible spelling out the manager positions that you are keeping for junior managers and those that your are expecting to hire a more senior manager.  If you have a team that will be assigned a very large project and therefore needs to be large in numbers, mark it down as requiring a more seasoned manager.

The next factor to consider is how long the team has been together.  A team of twelve people who have been together for eighteen months is likely to have entrenched processes that make management of them easier.  The team may already have mentoring relationships established or clear divisions of code for easier feature assignment.  A brand new team probably has none of these as well as no bonding and possibly personality conflicts that need to be worked through. 

The last factor that we consider key to determining the optimal team size is what management responsibilities are expected from the manager.  Do you expect managers to conduct a weekly half hour one-on-one meeting with every engineer?  Do your managers need to create and maintain resource maps for the engineers?  Do managers need to periodically assign themselves features to code?  All of these questions and more will influence how many direct reports they should have.  Obviously the more managerial or development tasks that you expect your managers to handle the fewer team members they should have.  If you have a Project Management Organization that helps managers handle assignments and statuses, the teams can be much larger. 

So the answer to how large the team should be is, it depends.  It depends on a variety of factors that we’ve outlined here.  As you start preparing your budget for next year, take a few minutes to consider these factors and ask your existing managers for feedback on how their team size has affected their ability to perform as a manager.


Comments Off

Recommended Reading

We mentioned in a previous post, Business Acumen and the CIO/CTO, that we would provide a list of our recommended reading material.  We’ve decided to break our list into three sections, business, technology, and just for fun we’ve thrown in some fiction.  This isn’t a complete list, so don’t expect to read these books and be a technology or business genius.  This is just a starter list that we feel every CIO/CTO should have read.  There are plenty more great business, technology, and fiction books that we all should keep reading.  Let us know some of your favorites.

Technology
The Mythical Man Month – Frederick Brooks
Design Patterns: Elements of Reusable Object-Oriented Software – Gamma et al
Patterns of Enterprise Application Architecture – Martin Fowler
The C Programming Language – Kernighan & Ritchie
The C++ Programming Language – Bjarne Stoustrup
The Art of Computer Programming – Donald Knuth
Data Structures and Algorithms – Aho, Ullman, Hopcroft
Inspired: How to Create Product Customers Love – Marty Cagan
The Singularity Is Near – Ray Kurzweil
On Intelligence – Jeff Hawkins & Sandra Blakeslee 
A New Kind of Science – Stephen Wolfram

Business
Purple Cow, The Dip, etc - Seth Godin
Good to Great, Built to Last – James Collins
Crossing the Chasm – Geoffrey Moore
The Art of War – Sun Tzu
The Prince – Machiavelli
Malcom Gladwell – Blink and The Tipping Point
Black Swan – Nassim Nicholas Taleb
The Innovator’s Dilemma, The Innovator’s Solution – Clayton Christensen
Competitve Strategy – Michael Porter

Fiction
Works by William Gibson i.e. Neuromancer, Spook country, etc
Works by Neal Stephenson i.e.  Quicksilver, Cryptonomicon, etc


1 comment

Incenting Success in Technology Organizations

As we’ve discussed before in articles like Be A Leader!, the primary job of a CTO is to help the executive team maximize shareholder value.  Notice our choice of verb in the last sentence, “maximize”.  It is a much stronger word than what an average performing company would select – that word typically being “create”.  Maximizing shareholder value is the goal of a high performing team – a team which desires to say that “no other team in our position could provide the type of shareholder return that we do”.

The CTO however cannot maximize shareholder value and potentially can’t even prove that he or she is creating shareholder value without a set of aggressive goals along with the metrics and measurements that help define success or failure enroute to achieving those goals. 

We prefer to group our goals thematically, making it easier to determine how the goals impact the maximization of shareholder value.  Our themes include the reduction of cost, availability, the efficiency of engineering spend, the effectiveness of our product selection process, quality, and time to market.

Cost
No list of aggressive goals is complete without finding a set of goals to minimize the cost of operating a SaaS site.  In our experience, the best cost metrics are those normalized by transaction (cost per transaction) or normalized by cost of transaction type (cost per checkout, cost per signup, etc).  The associated goal is to reduce the cost by some relative value over time or to reduce the cost to an absolute value thereby increasing profit and shareholder returns.

Availability
No SaaS site can realistically operate in this day and age without considering the impact of availability on revenue.  Our desire here is to identify the lost opportunity (in most cases lost revenue) associated with outages rather than just the amount of downtime a site has.  While measuring absolute downtime is valuable and should be tracked if possible, the measurement of revenue loss as a percentage calculation is more easily associated with shareholder value maximization (less revenue loss the better) and further takes into consideration that most sites don’t produce as much revenue in the middle of the night as they do during the middle of the day.

Engineering Efficiency and Productivity
You can’t be maximizing shareholder value if you aren’t measuring and improving your engineering team.  These measurements are arguably difficult, but we try to break them into two component parts: 

1) Efficiency – How many engineering days are you getting out of the theoretical maximum?  This is a measurement of how many engineering days you lose due to environment issues, training problems, tool issues, etc.  Most organizations that don’t measure this are surprised that their engineers spend well over 33% of their time on things other than designing systems and writing code.

2) Productivity – How much do you produce per engineering day? This one is tougher and there are lots of metrics out there from which you can select, KLOC, stories, function points, etc.  All of them have issues, but that’s no excuse not to select the best for you and measure how well you are doing.

Product Efficacy
Simply put, this is a measure of how your product choices are performing.  You undoubtedly have more ideas than you can implement in any given year.  Are you choosing the right things?  Are you hitting your key metrics such as increasing revenue, decreasing drop outs, or increasing signups?

Time to Market
Assuming that you are building the right things, are you getting them out to the market in time to create barriers to entry and/or switching costs?  Are you faster or slower than your competitors?

Quality
How defect dense is your product?  Are you fixing the problems in engineering and product management that lead to bugs in production?  Are you making the right time, cost and quality tradeoffs?  How many defects do you introduce per new release, line of code or story?

You may have several other key metrics that you use and which you find valuable and we’d love to hear about them.  What you cannot do, at least without significantly damaging shareholder value, is ignore the need for improvement.  You simply cannot improve your team’s performance without a core set of metrics against which you measure absolute and relative performance.  And if you are not measuring your performance you simply cannot increase and ideally maximize shareholder value.


1 comment

Joint Application Design & Architecture Review Board

We have mentioned a couple key processes in other posts that we want to explain in a little more detail.  These two fundamental processes to producing scalable and highly available architectures are the Joint Application Design (JAD) and Architecture Review Board (ARB).  These two processes help create strong bonds of communication between organizations thereby enabling shared ownership of products by all of the organizational disciplines within the extended technology team.  These processes can fit into any PDLC be it waterfall, iterative (including Agile), or any variant of those.  If you don’t have similar processes in place, we highly recommend you consider adding them. 

The JAD is usually accomplished through a series of small meetings where the architecture and design of any feature of significant size is discussed.  The participants of the JAD are the engineers assigned to a feature along with the operations/infrastructure engineers who have been assigned to assist with the feature in question.  Ideally, the meetings are held early in the development process to ensure that the design of the feature receives input from both software and operations engineers and that it does not violate the architecture principles of scalability and availability.  In an Agile development process these people can be normal members of the project team augmented by DBAs or systems administrators.  The JAD members will present to the ARB if the feature meets the criteria for board review.

The ARB is intended to catch potential scale and availability problems before they are launched to the site.  The ARB team should consist of the highest quality software and hardware engineers and members of the leadership team.  The membership of the ARB ideally be static (i.e. change very little over time).  The ARB should convene once every development cycle (monthly is usually sufficient) to review all features that are either greater than a specified number of development days (e.g. 5) or introduce a significant new technology (caching, language, service, etc).  The ARB members should a set of clearly defined architectural principals against which to test the new product by asking questions such as “How does this allow us to scale horizontally, maintain higher availability, etc”.  The development engineers and operation engineers who are responsible for the design of the feature present to the board and the board decides whether the feature was designed in such a manner that it will meet the scalability and availability requirements. 

Hopefully these descriptions of the processes will give you general understand of what is required and help you see why they are critically important to the development of scalable architectures.  There are obviously a lot of details about each of the processes that we have not covered in a post but this should get you started. 


4 comments

Business Acumen and the CIO/CTO

In an earlier article we discussed how technical the CEO needed to be in a technology company.  No discussion on this topic would be complete without addressing how business savvy the CIO or CTO needs to be in nearly any company.

In keeping with our “bottom line up front” tradition, the executive in charge of technology decisions needs to be a leader first, a business executive second and a technology decision maker last.  That is not to say that this executive should not also have some understanding of technology, rather it is our position that their primary role is to help make the right business decisions as they relate to technology.

Unfortunately, most technologists do not learn about business, finance or marketing within their undergraduate or graduate courses and most non-technologists do not have an opportunity to learn about the inner workings of technology within their fields of studies.  As a result, the teams have very little in common when it comes to training and they often find it hard to communicate and find common ground.  This is very different from the relationships that exist between other disciplines within a company like marketing and finance wherein most of the employees within those organizations have had some exposure to the fundamentals of the other organizations.  We refer to this gap between the technology organization and other organizations as the “experiential chasm” and it is the role of the chief technology executive within a company to partner with the CEO to build a bridge across this chasm. 

Just as we have argued that the CEO needs to make an attempt to better understand technology,  technology process and the “physics” of product development (like technical project management, Brooks’ law, etc) so must the CTO/CIO better understand the fundamentals of the business in which they operate.  Just as importantly, the CTO/CIO should also understand the fundamentals of each organization’s responsibilities.

For example, while the chief technology expert does not need to be the expert on capital markets, he or she should be able to debate the relative merits and issues associated with the assumption of debt vs. the issuing of equity.  He or she should also be able to completely understand each of the statements used in running a company (e.g. Income Statement, SOCF, and balance sheet).  From a marketing perspective it is important that the person understand such basics as the 5Cs and the 6Ms to name just a few.  From a strategy perspective, it is useful to understand such basics as Porter’s forces.  These topics just scratch the surface and in no way are meant to be an all encompassing list.

Not having a background in such topics means that you cannot effectively function as part of the senior executive team or executive committee.  And not contributing as part of the executive team means that you are not performing your responsibilities in helping to maximize shareholder wealth.  And, of course, if you cannot help maximize shareholder wealth you simply should not be in your job.

We are not arguing that you need to go get an MBA to be effective or to provide value in the boardroom, though getting an MBA or going to an executive MBA program is certainly a great way to jumpstart the process.  We are arguing that it is absolutely your job to get better every day in the things that you do not know and are essential to an appropriate level of performance.  Here are some ideas:

Develop a professional reading list
Seek out ideas of great books on each of the functional areas within your company and read and learn.  We will post our recommended reading list soon.

Take community college business classes
You do not have to take masters level classes to learn basic business concepts.  Your local community college probably has some first and second year undergraduate classes that will fit your needs and your schedule.

Take online classes in each of the disciplines
This is the information age after all, and we can all leverage the internet to learn.  We recommend taking structured course work as it is one of the easiest ways to learn.

Discuss business concepts and seek help from peers
Be honest with yourself and with your peers.  You might think it shows a weakness, but it actually builds trust and strengthens relationships.  Your peers will walk away thinking “Here is a person who really wants to know how this works”.  Think about it – wouldn’t you have great respect for a peer who wanted to know more about technology?

Start and Executive MBA Program
This is probably the best and easiest way to get a good foundation in all of the areas, but it is also the most costly.  There is a chance that your company will pay for it and there are several great schools with very flexible programs including weekend and evening coursework or accelerated programs that limit your time away from work.


2 comments

Top 20 Mistakes in Technology

We often get asked to encapsulate our experience into a top 10 list for CTOs and CEOs. As is the case in golf, in technology it is as much about ensuring that your bad hits (aka blunders, mistakes, and failures) are recoverable as it is ensuring that you nail your great hits or successes. We are all going to have failures in our careers but avoiding the really big pitfalls will help ensure that we keep our companies and our products on the right growth path.

So, without further ado, and in keeping with our high standards of “raising the bar”, here are the top 20 things (rather than 10 and in no particular order) we believe are most important to avoid when developing platforms:

1) Failing to design for rollback

We said these were in no particular order, but right out of the gate we are going to provide an exception to the rule. If you are developing a SaaS platform and you can only make one change to your current process make it so that you can always roll back any of your code changes. Yes, we know that it takes additional engineering work and additional testing to make nearly any change backwards compatible but in our experience that work has the greatest ROI of any work you can do. It only takes one really bad code roll in which your site performance is significantly degraded for several hours or even days while you attempt to “fix forward” for you to agree this is of the utmost importance. The one thing that is most likely to give you an opportunity to find other work (i.e. “get fired”) is to roll a product that destroys your business. In other words, if you are new to your job DO THIS BEFORE ANYTHING ELSE; if you have been in your job for awhile and have not done this DO THIS TOMORROW.

2) Confusing product release with product success

Do you have “release” parties? Stop it! You are sending your team the wrong message! A release has nothing to do with creating shareholder value and very often it is not even the end of your work with a specific product offering or set of features. Align your celebrations with achieving specific business objectives like a release increasing signups by 10%, or increasing checkouts by 15% or increasing the average sale price of a all checkouts by 12% or increasing click-through-rates by 22%. See #10 below on incenting a culture of excellence. The point here is that you are paid to increase shareholder wealth, so have success parties when you achieve objectives specifically tied to that wealth creation. Don’t celebrate the cessation of work – celebrate achieving the success that makes shareholder’s wealthy.

3) Insular product development/engineering

How often does one of your engineering teams complain about not “being in the loop” or “being surprised” by a change? Does your operations team get surprised about some new feature and its associated load on a database? Does engineering get surprised by some new firewall or routing infrastructure resulting in dropped connections? Do not let your teams design in a vacuum and “throw things over the wall” to another group. Use best practices like teaming or a process that we later will discuss called Joint Applications Development. We are not arguing that designs should be done by committee, but rather than collaborative designs with a clear owner and decision maker are better than designing without input or checks and balances.

4) Over engineering the solution

Your job is to maximize shareholder value as cost effectively as possible. To that end, one of your mottos should be “simple solutions to complex problems”. The simpler the solution, the lower the cost and the more likely it is that it will be easily and cost effectively maintained. If you get blank stares from peers or within your organization when you explain a design do not assume that you have a team of idiots – assume that you have made the solution overly complex and ask for assistance in resolving the complexity.

5) Allowing history to repeat itself

Organizations do not spend enough time looking at past failures. In the engineering world, a failure to look back into the past and find the most commonly repeated mistakes is a failure to maximize shareholder value and grounds for dismissal. In the operations world, a failure to correlate past site incidents and find thematically related root causes should be a cause for termination. The best and easiest way to improve our future performance is to track our past failures, group them into groups of causation and treat the root cause rather than the symptoms. Keep incident logs and review them monthly and quarterly for repeating issues and improve your performance. Perform post mortems of projects and site incidents and review them quarterly for themes.

6) Scaling through 3d parties

Every vendor has a quick fix for your scale issues. If you are a hyper growth SaaS site, however, you do not want to be locked into a vendor for your future business viability; rather you want to make sure that the scalability of your site is a core competency and that it is built into your architecture. See our articles on database scalability and platform scalability. This is not to say that after you design your system to scale horizontally that you will not rely upon some technology to help you; rather, once you define how you can horizontally scale you want to be able to use any of a number of different commodity systems to meet your needs. As an example, most popular databases provide for the technology of log shipping to keep read or standby databases in synch with the primary. Per our discussion in technology agnostic design, define how your platform scales through your efforts, not through the systems that a 3d party vendor or opensource software company provides. If you say we use ACME database clusters to scale our database we would argue you have the wrong solution. If, on the other hand you say we split our databases into read and write systems and further split them by customer id you are attacking the problem appropriately.

7) Relying on QA to find your mistakes

You cannot test quality into a system and it is mathematically impossible to test all possibilities within complex systems to guarantee the correctness of a platform or feature. QA is a risk mitigation function and it should be treated as such. Defects are an engineering problem and that is where the problem should be treated. If you are finding a large number of bugs in QA, do not reward QA – figure out how to fix the problem in engineering. Consider implementing test driven design as part of your PDLC. If you find problems in production, do not punish QA; figure out how you created them in engineering. All of this is not to say that QA should not be held responsible for helping to mitigate risk – they should – but your quality problems are an engineering issue and should be treated within engineering.

8) Revolutionary or “big bang” fixes

In our experiences, complete re-writes or re-architecture efforts end up somewhere on the spectrum of not returning the expected ROI to complete and disastrous failures. 9 out of 10 times they are simply not warranted and should be avoided. The best projects we have seen with the greatest returns have been evolutionary rather than revolutionary in design. That is not to say that your end vision should not be to end up in a place significantly different from where you are now, but rather that the path to get there should not include “and then we turn off version 1.0 and completely cutover to version 2.0”. Go ahead and paint that vivid description of the ideal future, but approach it as a series of small (but potentially rapid) steps to get to that future. And if you do not have architects who can help paint that roadmap from here to there, go find some new architects.

9) The Multiplicative Effect of Failure

Every time you have one service call another service in a synchronous fashion you are lowering your theoretical availability. If each of your services are designed to be 99.999% available, where a service is a database, application server, application, webserver, etc then the product of all of the service calls is your theoretical availability. 5 calls is (.99999)^5 or 99.995 availability. Eliminate synchronous calls wherever possible and create fault-isolative architectures to help you identify problems quickly.

10) Failing to create and incent a culture of excellence

Bring in the right people and hold them to high standards. You will never know what your team can do unless you find out how far they can go. Set aggressive yet achievable goals and motivate them with your vision. Understand that people make mistakes and that we will all ultimately fail somewhere, but expect that no failure will happen twice. If you do not expect excellence and lead by example, you will get less than excellence and you will fail in your mission of maximizing shareholder wealth. Read our article on being a leader.

11) Under-engineering for scale

The time to think about scale is when you are first developing your platform. If you did not do it then, the time to think about scaling for the future is right now. That is not to say that you have to implement everything on the day you launch, but that you should have thought about how it is that you are going to scale your application services and your database services. You should have made conscious decisions about tradeoffs between speed to market and scalability and you should have ensured that the code will not preclude any of the concepts we have discussed in our scalability postings. Hold quarterly scalability meetings where you discuss what you need to do to scale to 10x your current volume and create projects out of the action items. Approach your scale needs in evolutionary rather than revolutionary fashion as in #8 above.

12) “Not Built Here” Culture

We see this all the time. You may even have agreed with point (6) above because you have a “we are the smartest people in the world and we must build it ourselves” culture. The point on relying upon third parties to scale was not meant as an excuse to build everything yourselves. The real point to be made is that you have to focus on your core competencies and not dilute your engineering efforts with things that other companies or open source providers can do better than you. Unless you are building databases as a business, you are probably not the best database builder. And if you are not the best database builder, you have no business building your own databases for your SaaS platform. Focus on what you should be the best at: building functionality that maximizes your shareholder wealth and scaling your platform. Let other companies focus on the other things you need like routers, operating systems, application servers, databases, firewalls, load balancers and the like.

13) A new PDLC will fix my problems

Too often CTO’s see repeated problems in their product development life cycles such as missing dates or dissatisfied customers and look for something to blame. The PDLC is often the biggest target of this blame. Too often people believe that changing the process without addressing root causes will fix the problem. . Going from Waterfall to Scrum or from Scrum to RUP, is not the complete answer. All organizations are different in terms of level of skills, maturity level (as in the Capability Maturity Model), structure, and culture, so each organization needs to perform their own evaluations but here are some problems that we see over and over again in organizations blaming their PDLC.

A lack of involvement and ownership from the business tops the list of problems. In the Scrum model there needs to be consistent involvement from the business or product owner. If this is not the case, it is impossible to follow the Scrum principles. Another very common problem is an incomplete understanding or training on the existing PDLC. Everyone in the organization should have a working knowledge of the entire process and how their roles fit within it. Change the PDLC if there are valid reasons such as increasing engineering productivity or a better cultural fit but do not change it before addressing the core issues. Most often, the biggest problem with your PDLC is the lack of project management to meet dates and the lack of an appropriate “product discovery” phase to meet customer needs and demands. Changing your PDLC won’t address either of these issues; properly managing your teams to meet dates and appropriately understanding customer needs will help fix these problems.

14) We cannot hire great people quickly

Often when growing an engineering team quickly the engineering managers will push back on hiring plans and state that they cannot possibly find, interview, and hire engineers that meet their high standards. We agree that hiring great people takes time and hiring decisions are some of the most important decisions managers can make. A poor hiring decision takes a lot of energy and time to fix. However, there are lots of ways to streamline the hiring process in order to recruit, interview, and make offers very quickly. A useful idea that we have seen work well in the past are interview days, where potential candidates are all invited on the same day. This should be no more than 2 – 3 weeks out from the initial phone screen, so having an interview day per months is a great way to get most of your interviewing in a single day. Because you optimize the interview process people are much more efficient and it is much less disruptive to the daily work that needs to get done the rest of the month. Post interview discussions and hiring decisions should all be made that same day so that candidates get offers or letters of regret quickly; this will increase the likelihood of offers being accepted or make a professional impression on those not getting offers. The key is to start with the right answer that “there is a way to hire great people quickly” and the myriad of ways to make it happen will be generated by a motivated leadership team.

15) It is a SPOF (Single Point of Failure) but we can recover it onto another host quickly

A SPOF is a SPOF and even if the impact to the customer is low it still takes time away from other work to fix right away in the event of a failure. And there will be a failure…because that is what hardware and software does, it works for a long time and then eventually it fails! As you should know by now, it will fail at the most inconvenient time. It will fail when you have just repurposed the host that you were saving for it or it will fail while you are releasing code. Plan for the worst case and have it run on two hosts (we actually recommend to always deploy in pools of three or more hosts) so that when it does fail you can fix it when it is most convenient for you.

16) No Business Continuity plan

No one expects a disaster but they happen and if you cannot keep up normal operations of the business you will lose revenue and customers that you might never get back. Disasters can be huge like Hurricane Katrina, where it take weeks or months to relocate and start the business back up in a new location. Disasters can also be small like a winter snow storm that keeps everyone at home for two days or a HAZMAT spill near your office that keeps employees from coming to work. A solid business continuity plan is something that is thought through ahead of time, before you need it, and explains to everyone how they will operate in the event of an emergency. Perhaps your satellite office will pick up customer questions or your tech team will open up an IRC channel to centralize communication for everyone capable of working remotely. Do you have enough remote connections through your VPN server to allow for remote work? Spend the time now to think through what and how you will operate in the event of a major or minor disruption of your business operations and document the steps necessary for recovery.

17) No Disaster Recovery Plan

Even worse, in our opinion, than not having a BC plan is not having a disaster recovery plan. If your company is a SaaS based company, the site and services provided is the company’s sole source of revenue. Moreover, with a SaaS company, you hold all the data for your customers that allow them to operate. When you are down they are more than likely seriously impaired in attempting to conduct their own business. When your collocation facility has a power outage that takes you completely down, think 365 Main datacenter in San Francisco, how many customers of yours will leave and never return? Our preference is to provide your own disaster recovery through multiple collocation facilities but if that is not yet technically feasible nor in the budget, at a minimum you need your code, executables, configurations, loads, and data offsite and an agreement in place for both collocation services as well as hosts. Lots of vendors offer such packages and they should be thought of as necessary business insurance.

18) No Product Management team or person

In a similar vein to #13 above, there needs to be someone or a team of people in the organization who have responsibility for the product lines. They need to have authority to make decisions about what features get added, which get delayed, and which get deprecated (yes, we know, nothing ever gets deprecated but we can always hope!). Ideally these people have ownership of business goals (see #10) so they feel the pressure to make great business decisions.

19) It is okay to bring the site down to roll code

Just because you call it scheduled maintenance does not mean that it does not count against your uptime. While some of your customers might be willing to endure the frustration of having the site down when they want to access it in order to get some new features, most care much more about the site being available when they want it. They are on the site because the existing features serve some purpose for them; they are not there in the hopes that you will rollout a certain feature that they have been waiting on. They might want new features, but they rely on existing features. There are ways to roll code, even with database changes, without bringing the site down. It is important to put these techniques and processes in place so that you plan for 100% availability instead of planning for much less because of planned down time.

20) Firewalls, Firewalls, Everywhere!

We often see technology teams that have put all public facing services behind firewalls while many go so far as to put firewalls between every tier of the application. Security is important because there are always people trying to do malicious things to your site, whether through directed attacks or random scripts port scanning your site. However, security needs to be balanced with the increased cost as well as the degradation in performance. It has been our experience that too often tech teams throw up firewalls instead of doing the real analysis to determine how they can mitigate risk in other ways such as through the use of ACLs and LAN segmentation. You as the CTO ultimately have to make the decision about what are the best risks and benefits for your site.

And for those that made it all the way through this long, long post here is one of the designs that we are considering for our new logo.  Let us know what you think.


5 comments

The Bug is in the Code!

We are engineers by training and vocation, so we understand what it is like to be a software developer. Too often during the course of any site or product problem we hear developers saying “It can’t be the code”. In our experience it is most often the case that the code is the problem. That is not to say that we have not seen our share of operating system, database, webserver and application server bugs, but statistically you are going to be right way more often by suspecting the code first. Here is why that is so.

As we mentioned, operating systems, databases and any other piece of third party or open source software including firmware have bugs. But these pieces of software are changed far less frequently than your SaaS application code and the amount of testing performed before a release is more often than not an order of magnitude or more than what you are performing. And that is okay, as you are working in two completely different worlds where the cost of a defect and the opportunity cost of a delay resulting from testing are much different. A bug in your code that slows your application from 2sec response to 5sec is terrible but you should be able to quickly recover from it assuming that you have designed for rollback and have processes to quickly “fix forward” any release. A bug in a database that causes a loss of data integrity is disastrous because hundreds of thousands of organizations rely on that database to keep their data safe. So, given the likely differences in code quality, defect density and change frequency, you would be better off always suspecting your code first but there is another reason as well.

A simple but golden rule is whatever changed last caused the problem. This is one reason we harp so much on a rigorous change management process. Since you likely update the code between ten and twenty times more often than you update a piece of infrastructure it is reasonable to suspect your frequently changing code is the culprit. Even with this overwhelming evidence, the argument that engineers will typically use is that the one place in the code that is responsible for the broken feature has been checked and is fine. The number of times we have seen a fourth, fifth or sixth attempt to find a defect in the code yield a bug would astound you, further proving our point that “the defect is in the code”. Not reading with a critical eye, knowing that the bug is there waiting to be found by you, will guarantee that you will not find the defect. Secondly, most code bases have a pretty high cyclomatic complexity. This is a fancy term for how many unique code paths exist in the code, usually broken down by class and method. If something has 50 – 100 logical paths most of us cannot keep them straight in our head and thus should be using unit tests to verify them, but that is for a different post.

The bottom line is have every engineering discipline look in earnest for the possible cause. The bug is in your code more often than not. As our childhood friend Dr. Seuss would say, it is 98 and 3/4% guaranteed.

*Image courtesy of krelic from flickr creative commons


2 comments

What to do in the first 30 days!

So you’ve just accepted the offer from the new company to be their CTO and you start in two weeks. Lots of things are probably going through your mind like will you get along with the CEO, will the team respect you, will you meet all the goals, etc. Hopefully one thing that you will spend some time thinking about is what you will do in the first thirty days of the new job. Here are some do’s and don’ts that we have either done or seen over the years. And yes, for those of you who we’ve managed in the past no need to comment that we’ve done some of the don’ts…that’s how we know not to do them. In all seriousness feel free to comment and call us out on them, they might serve as reminders to others.

Don’t
Don’t make edicts. A big no-no is to jump in demanding that things be done a certain way. For better or worse the company has run for a while without you and unless it is critical do not change their path until you understand what decisions brought them to this point.

Don’t try to be their best friend. You are ultimately there to be the boss and will have to make some very hard decisions about people including salaries and employment. You can be social, informal, casual, and easy going but there is a line that you should not cross.

Don’t keep saying “this is how we did it at my old job”. You are no longer at your old job and everything you did there will not work here for a myriad of reasons, not the least of which is that the new company is a different culture.

Don’t try to impress them with your brilliance. There is no need to be overbearing with your knowledge of design patterns or the newest version of Hibernate. There will be plenty of time for people to figure out your skills and intellect, let it come out naturally, not forced.

Do
Do meet everyone 1-on-1. Definitely meet every one of your direct reports and if possible their direct reports in the first thirty days. Ask questions and learn about their career, family, goals, and challenges. Answer questions about yourself but stay away from spending 2/3 of the meeting giving a speech about your background.

Do get involved in discussions but listen more than talk. Attend all the important meetings such as Architectural Review Boards, Product Council Meetings, Change Management Meetings, etc but listen at least 3 times as much as you talk.

Do get hands on training on the application, source code control system, environments, etc. Learn what the developers, QA engineers, and others go through by getting some of them to stop by and help you setup an environment on your desktop. You might not be a programmer and you might never check-in a line of code into Git but you should understand the basic process.

Do set goals. Work with your boss on setting goals for yourself and your teams. If goals are already set this is a great chance to review the progress and modify them if necessary. This will probably be your only grace period so take advantage of the opportunity to have frank discussions on why these goals and not others.

Obviously there are a lot of other tactical and strategic things you will need to accomplish during your first thirty days but hopefully these will give you some ideas of thing you should work towards as well as what things to avoid. If you have some do’s and don’ts for your first thirty days let us hear about them.


1 comment

Build v. Buy

In many of our engagements, we find ourselves helping our clients understand when it’s appropriate to build and when they should buy.

If you perform  a simple web search for “build v. buy” you will find hundreds of articles, process flows and decision trees on when to build and when to buy.  Many of these are cost-centric decisions including discounted cash flows for maintenance of internal development and others are focused on strategy.  Some of the articles blend the two.

Here is a simple set of questions that we often ask our customers to help them with the build v. buy decision:

  1. Does this “thing” (product / architectural component / function) create strategic differentiation in our business?
    Here we are talking about whether you are creating switching costs, lowering barriers to exit, increasing barriers to entry, etc that would give you a competitive advantage relative to your competition.  See Porter’s Five Forces for more information about this topic.  If the answer to this question is “No – it does not create competitive differentiation” then 99% of the time you should just stop there and attempt to find a packaged product, open source solution, or outsourcing vendor to build what you need.  If the answer is “Yes”, proceed to question 2.
  2. Are we the best company to create this “thing”?
    This question helps inform whether you can effectively build it and achieve the value you need.  This is a “core v. context” question; it asks both whether your business model supports building the item in question and also if you have the appropriate skills to build it better than anyone else.  For instance, if you are a social networking site, you *probably* don’t have any business building relational databases for your own use.   Go to question number (3) if you can answer “Yes” to this question and stop here and find an outside solution if the answer is “No”.  And please, don’t fool yourselves – if you answer “Yes” because you believe you have the smartest people in the world (and you may), do you really need to dilute their efforts by focusing on more than just the things that will guarantee your success?
  3. Are there few or no competing products to this “thing” that you want to create? We know the question is awkwardly worded – but the intent is to be able to exit these four questions by answering “yes” everywhere in order to get to a “build” decision.  If there are many providers of the “thing” to be created, it is a potential indication that the space might become a commodity.  Commodity products differ little in feature sets over time and ultimately compete on price which in turn also lowers over time.  As a result, a “build” decision today will look bad tomorrow as features converge and pricing declines.  If you answer “Yes” (i.e. “Yes, there are few or no competing products”), proceed to question (4).
  4. Can we build this “thing” cost effectively?
    Is it cheaper to build than buy when considering the total lifecycle (implementation through end-of-life) of the “thing” in question?  Many companies use cost as a justification, but all too often they miss the key points of how much it costs to maintain a proprietary “thing”, “widget”, “function”, etc.  If your business REALLY grows and is extremely successful, do you really want to be continuing to support internally developed load balancers, databases, etc. through the life of your product?  Don’t fool yourself into answering this affirmatively just because you want to work on something neat.  Your job is to create shareholder value – not work on “neat things” – unless your “neat thing” creates shareholder value.

There are many more complex questions that can be asked and may justify the building rather than purchasing of your “thing”, but we feel these four questions are sufficient for most cases.

A “build” decision is indicated when the answers to all 4 questions are “Yes”. 

We suggest seriously considering buying or outsourcing (with appropriate contractual protection when intellectual property is a concern) anytime you answer “No” to any question above.


1 comment