AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Tag » CTO

It’s Not About the Technology

Perhaps it’s because we’re technologists that we love shiny new technologies. However, for years now AKF has been telling anyone that will listen or read, that “scaling is not about the technology”. Scale comes from system-level or deployment-focused architecture which is the intersection of software and hardware. No doubt, we have some amazingly scalable technologies available to us today like Hadoop and MongoDB but when your entire datacenter goes down (like Amazon and Calgary and GoDaddy and Sears and the list goes on…) these scalable technologies don’t keep your service available. And if you think that your customers care whether it was your fault or your vendor’s fault…you’re wrong. They pay you for your service and expect it available when they need it.

No doubt, the technology decisions are important. Whether you use Backbone or Knockout or whether you choose Memcached or Redis, all of these technology decisions have pros and cons which can effect your team for a long time. But, at the end of the day these decisions are not ones that will affect whether your application and organization can scale with growth. These technology decisions affect the speed and cost factors of your organization and technology. Some technologies your team knows more about or are naturally faster to learn; therefore, these cost you less. Other technologies are very popular (PHP) and thus engineers’ salaries are lower because there is more supply. Yet still other technologies (assembly language) are complex, appeal to a select group of engineers, are very costly to develop in but might cost very little to process transactions because of the efficiency of that technology.

Technology decisions are important but for different reasons than scaling. Relying on a technology or single vendor to scale is risky. To the vendor or open source project, you are one of many customers and the longevity of their business or project doesn’t depend on keeping your service available. However, your business does depend on this. Take control of the future of your business by scaling your service and organization based on systems-level or deployment-focused architecture. Leave the technology decisions outside of the systems architecture.


Comments Off on It’s Not About the Technology

Battle Captains and Outage Managers

The other day at a client, we were trying to describe what an outage manager does and a term from my time in the military came back to me, battle captain. The best description I could come up with for an outage manager was that they perform the same duties during an outage that a battle captain does for a unit in battle. For those non-military types, a battle captain resides in the tactical operations center (TOC) of a unit and take care of tasks such as tracking the battle, enforcing orders, managing information, and making decisions based on commander’s intent when the commander is unavailable. This is exactly what an outage manager does for an outage – keep track of the outage (timeline), follow up with people to make sure tasks are completed (i.e. investigate logs for errors), makes sure information is retained and passed along, and when the VP of Ops or CTO is briefing the CEO or on the phone with a vendor, the outage manager makes decisions.

From an atricle What Now, Battle Captain? The Who, What and How of the Job on Nobody’s Books, but Found in Every Unit’s TOC by CPT Marcus F. de Oliveira, Deputy Chief, Leaders’ Training Program, JRTC here is the definition of the role:

The battle captain should be capable of assisting the command group in controlling the brigade or battalion. Remember, the commander commands the unit, and the XO is the chief of staff; BUT, those officers and the S3 must rest. They will also get pulled away from current operations to plan future operations, or receive orders from higher headquarters. The battle captain’s role then is to serve as a constant in the CP, someone who keeps his head in the current battle, and continuously assists commanders in the command and control of the fight.

A great battle captain can provide a tactical advantage to units in combat. If you have a great outage manager or have seen one work, you know how important they can be in reducing the duration of the outage. Most outage managers have primary jobs such as managing a shift in the NOC or managing an ops team but when an outage occurs they jump into the role of an outage manager. If you don’t currently have an outage manager junior military officers (JMO) just leaving the service often make great ones.

Comments Off on Battle Captains and Outage Managers

Why A Technology Leader Should Code

After I left the military, I started in corporate America as a software developer. I spent several years programming on various projects in a variety of languages. Perhaps more quickly than I wanted, I entered the management ranks. Starting as an engineering manager, I progressed into a number of executive roles including VP of Engineering, CIO, and CTO. It has now been well over a decade for me as a manager and executive but through these years I have continued to program. From the technology executives that I’ve met this is fairly unusual. Most tech execs gladly give up programming upon entering management and never look back.

I’ve never considered myself a great programmer and what I do today compared to a professional developer is like comparing a weekend gardener with an industrial farmer. Recently I’ve been considering whether continuing to program is clutching to my technical youth or actually beneficial as a technology leader. We’ve written about How Technical a CTO Should Be but here are a few more specific thoughts on programming.

Technical and Tactical Proficiency
As a junior officer I was taught that in order to lead one had to be “technically and tactically” proficient. I owed it to the soldiers in my unit to understand the equipment our unit employed and the basic combat tactics that we would be following. This concept has stuck with me and I believe that technology leaders need to understand the tools that their team is working with and the processes that they are following. The exact level of understanding is a personal choice and highly debatable. For me, I like if at all possible to have hands on experience. Periodically having to code a feature and deploy it will provide the engineering manager a better understanding and appreciation for what her engineers go through on a daily basis.

Tangible Results
Leading people can be one of the most challenging and yet rewarding jobs. Getting a team to buy into a single vision and motivating them to deliver that vision is a day-to-day challenge that can wear the best of us down. When that team finally delivers or when the junior employee that you’ve been coaching starts performing like the star that you knew they could be, it all seems worth it. Unfortunately, those reward days are months or years in between. During the interim days and weeks it can be difficult to not achieve tangible results.

This is where programming fits. Coding provides immediate feedback and accomplishment of short-term goals. When your function works perfectly the first time you test it or when the solution to that very difficult problem becomes clear, you receive instant gratification and tangible results.

Some leaders use other hobbies like woodworking or gardening to provide this short-term gratification. Start working on a garden and within a couple of hours or days you can see the impact of your work. The ground is turned over, weeds are removed, seeds are planted. After a couple of weeks or months the project is completed with the results on your dinner table, proof of your achievement.

While these physical activities are enjoyable and rewarding they don’t expand your knowledge of developing systems. Consider deliberate practice by picking up a programming project to receive tangible rewards and improve your technical and tactical proficiency.


Evolution of Roles in a Startup

We often see in the life cycle of startups that the organization starts with a couple of engineers who handle all aspects of technology and as the team grows specialization starts to be required. At some point, QA engineers are hired, sys admins take over deploying and maintaining hardware, and DBA’s are brought on board to tune databases. This is a very natural evolutionary process but does require some adjustment by the individuals as they are forced to give up responsibility and become more specialized. One of the toughest hurdles to overcome is getting engineers to relinquish their access to the production environment. Taking control or responsibility away from someone is very hard on people’s egos.

Another often seen necessity in hyper growth startups is to upgrade leaders. A leader who was capable of leading and managing five engineers isn’t necessarily capable of running a 50 person tech organization. Often people in particular leadership roles don’t scale with the fast pace growth rate of the organization. In these cases the individuals either need to relinquish their roles or be replaced in order to continue to scale the company. This doesn’t mean pushing them out but more likely it means finding a more suitable role for them. A great role for many CTO’s who need to step aside is to remain in a leadership and technical role as chief architect.

The key to being successful in this evolution is to be open and address people’s fears and concerns. It is much better to speak openly during reviews about an individual’s capabilities rather than have that person worry about their future. The same goes for engineers being asked to relinquish control of the production environment. Be open, talk to them, and listen to their concerns. An open dialogue about why the organization needs to change at this particular time in order to continue to grow and scale is usually accepted very well.

1 comment

Using Vendor Features to Scale

If you’ve had the opportunity to participate in an engagement with us you know that we tend to utilize the Socratic method of asking questions. And, it’s not uncommon for us to hear a company explain how they plan to utilizing a particular vendor’s feature to scale. For databases this is often clustering. We believe relying on a vendor to scale violates several of our architectural and scalability principles. Let us explain a couple of our more significant concerns with this practice of relying on a vendor.

First and foremost you should want the fate of your company, your team, and your career in your own hands. Do not look for vendors to relieve you of this burden. As a CTO if the vendor you vetted and selected fails, causing downtime for your business, your are just as responsible as if you had written every line of code. And if it is significant or frequent enough you should expect to be relieved of your position. All code has bugs, even vendor provided code, and personally I would rather have the source code to fix it rather than have to rely on a vendor to find the problem and provide me with a patch. This statement should not be taken to imply that you should do everything yourself such as writing your own database. Use vendors for things that they can do better than you and that are not part of your core competency. See our post on build vs buy for the four question to answer when making this decision.

With scalability, as with many other things in life, simple is better. The more complex you make your system to provide scalability the more you are likely to suffer from availability issues. More complex systems are more difficult and more costly to maintain. Clustering technologies are much more complex than straightforward log shipping for creating read replicas.

One of our beliefs, and should be one of yours also, is that it is most cost effective to be vendor neutral. Locking yourself into a single vendor, whether hardware or software, gives them the upper hand in negotiations. If you don’t think the sales rep knows this, ask yourself why they are willing to throw in some of these features at such an initially steep discount. The reason is that they can make up for it next year when you have to renegotiate.

If these concerns resonate with you, check out our database scalability cube post for ideas on how to design a scaling strategy that you can own, is relatively simple, and is vendor agnostic.


VP of Operations

One of the most common questions we get from individuals is “what is the path to becoming a CTO?” We posted about this before and focused on the skill sets required as opposed to the path to get there.  We highlighted 1) good knowledge of business in general 2) great technical experience 3) great leadership 4) great manager 4) great communicator and 5) willing to let go.  This time we’re going to one of the jobs that is often a stepping stone to the CTO job.

The VP of Operations is the person who leads the Technology Operations or Production Operations team.  This team has responsibility for running the hardware and software systems of the company. For SaaS or Web2.0 companies this is the revenue generating systems. For corporate IT this is the ERP, CRM, HRM, etc. This team is often comprised of project managers, operations managers, and technical leads. As the head of the Operations team the VP of Operations has responsibility for monitoring, escalating, managing issues, and reporting on availability, capacity, and utilization. Incident and problem management as well as root cause analysis (postmortem) are some of the most important jobs that their team accomplishes. In order to perform this role well the VP of Operations must have good process skills, a strong leadership presence, able to remain calm under fire, and goof overal knowledge of the system.

The VP of Operations is often also responsible for the Infrastructure team. This team is usually comprised of system administrators, database administrators, and network engineers. This team procures, deploys, maintains, and retires systems. As the head of this team the VP of Operations has requirements for budgeting, balancing time between longer term projects and daily operations on the systems. This team understands the system holistically and are often the most useful when performing scalability summits. In order to perform this role well, the VP of Operations must have a good understanding of each of the technical roles that this team is responsible for, including the databases, operating systems, and the network. This doesn’t mean in order to succeed in this role a person must be able do each of these jobs but they do need a good, solid understanding in order to converse, brainstorm, debate, and make decisions in each of these technical realms.

If you compare this list of skills that we mentioned at the top of this post with those mentioned as necessary to succeed as the VP of Operations you’ll see they overlap a good deal. Great technical experience, great leadership, and great management skills will serve you well as the head of operations and will also go a long way to developing most of the skills you will need as a CTO.

We’re approaching the end of the year, a time that many people and organizations use to reflect on what they have accomplished and what they want to accomplish next year.  A good idea as part of your personal growth is to use the list above and score yourself as honestly as possible in terms of skills.  If you’re missing some of them make sure you have some goals in place that help you acquire a few more of these each year. Do this and not only will succeed one of the important jobs that lead to the CTO job but when you do arrive at the CTO position you will be one of the successful ones.

Comments Off on VP of Operations

Continuous Deployment

You probably have heard of continuous integration that is the practice of checking code into the source code repository early and often.  The goal of which is to ease the often very painful process of integrating multiple developer’s code after weeks of independent work. If you have never had the pleasure of experiencing this pain, let me give you another example that we have experienced recently. In the process of writing The Art of Scalability, we have seven editors including an acquisition editor, a development editor, and five technical editors who all provide feedback on each chapter. Our job is to take all of this separate input and merge it back into a single document, which at times can be challenging when editors have different opinions for the direction of certain parts of the chapter. The upside of this process is that it does make the manuscript much better for having gone through the process. Luckily software engineering has developed the process of continuous integration designed to reduce wasted engineering effort. In order to make this process the most effective the automation of builds and smoke tests are highly recommended. For more information on continuous integration there are a lot of resources such as books and articles.

The topic of this post is taking continuous integration to an extreme and performing continuous deployment. And it is exactly what it sounds like, all code that is written for an application is immediately deployed into production. If you haven’t heard of this before you’re first thought is probably that this is the ultimate in Cowboy Coding but it is in use by some household technology names like Flickr and IMVU. If you don’t believe this check out code.flickr.com and look at the bottom of the page, last time I checked it said:

Flickr was last deployed 20 hours ago, including 1 change by 1 person.

In the last week there were 34 deploys of 385 changes by 17 people.

Eric Ries, co-founder and former CTO of IMVU, is a huge proponent of continuous deployment as a method of improving software quality due to the  discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment. Other folks at IMVU also seem to be fans of the continuous deployment methodology as well from the post by Timothy Fitz. Eric suggest a 5 step approach for moving to a continuous deployment environment.

The topic of this post is taking continuous integration to an extreme and performing continuous deployment. And it is exactly what it sounds like, all code that is written for an application is immediately deployed into production. If you haven’t heard of this before you’re first thought is probably that this is the ultimate in ‘Cowboy Coding’ but it is in use by some household technology names like Flickr and IMVU. If you don’t believe this check out code.flickr.com and look at the bottom of the page, last time I checked it said:
Flickr was last deployed 20 hours ago, including 1 change by 1 person.
In the last week there were 34 deploys of 385 changes by 17 people.
Eric Ries, CTO of IMVU, is a huge proponent of continuous deployment as a method of improving software quality due to the  discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment. Eric suggest a 5 step approach for moving to a continous deployment environment.
  1. Continuous Integration – Obviously before moving beyond integration into full deployment, this is a prerequisite that must be in place.
  2. Source Code Commit Checks – This feature which is available in almost all modern source code control systems,  allows the process of checking in code to halt if one of the tests fail.
  3. Simple Deployment Script – Deployment must be automated and have the ability to rollback, which we wholeheartedly agree with here and here.
  4. Real-time altering – Bugs will slip through so you must be monitoring for problems and have the processes in place to react quickly
  5. Root Cause Analysis – Eric recommends the Five Why’s approach to find root cause, whatever the approach, finding and fixing the root cause of problems is critical to stop repeating them.

Admittedly, this concept of developers pushing code straight to production scares me quite a bit, since I’ve seen the types of horrific bugs that can make their way into pre-production environments. However, I think Eric and the other continuous deployment proponents are onto something that perhaps the reason so many bugs are found by weeks of testing is a self-fulfilling prophecy. If engineers know their code is moving straight into production upon check in they might be a lot more vigilant about their code, I know I would be. How about you, what do you think about this development model?