AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Tag » Technology/Internet

Newsletter – Trends

Below is our most recent newsletter. If you would like to subscribe and have it delivered to your inbox, you can do so here.

Trends
We’re privy to meeting and talking with hundreds of high tech, hyper growth companies and therefore we have a unique opportunity to see trends that are taking place in the industry. Here are a few that we thought would be interesting to share with our friends.

NoSQL Skills – Many companies are experimenting with NoSQL solutions including Membase, Hadoop, Cassandra, and many others but are finding that employees with skills in these are hard to come by. Even in areas rich in talent such as the Silicon Valley, Boston, Austin, Seattle and NY the demand for this rather nascent skill set is higher than the current supply. Many companies are relying on on-the-job training for these types of open source solutions. Most importantly, if you’ve properly fault isolated your architecture you can tolerate the risk associated with small segments going down during beta releases while you iron out the kinks.

Oracle’s NoSQL – Oracle recently announced their entry into NoSQL with announcements around Hadoop integration and their own NoSQL key-value solution. The Hadoop integration isn’t much more than a conduit to allow data to and from an Oracle database and a Hadoop cluster. This technology has been around for quite a while in other relational database solutions such as GreenPlum and Asterdata. It was also recently announced by Microsoft that SQL Server would support this Hadoop connection as well.

Oracle’s NoSQL is a distributed, replicated key-value store and is new and exciting even though it has attributes similar to other product offerings including their acquisition, Berkeley DB. Given trend #1 above, it may sound like an interesting alternative to adopting an enterprise supported NoSQL alternative to open source solutions but beware. The marketing materials for Oracle’s NoSQL solution include the BASE acronym (Basically Available, Soft State, Eventaully Consistent) but unlike other NoSQL products such as Dynamo, SimpleDB, or Cassandra, the Oracle NoSQL Database does not support eventual consistency. Oracle’s solution to eventual consistency appears to be by not accepting writes when the primary node for that key is down.

ORM – many companies start off using an object relational mapping solution such as Hibernate or Active Record but we are seeing many of them having difficulty scaling with them. The solution for several companies has been to use ORMs for simple queries but resort to ODBC or their own Data Access Layer for handling more complex queries. Be wary of using a solution to handle your query development as we’ve had a number of clients with incredibly complex and costly queries bring down their platforms for extended periods of time.

Enterprise Monitoring Frameworks – Until very recently, we had not seen a proprietary third party (non open-sourced) monitoring solution at a customer for at least 2 years and that includes our large Fortune 500 clients. Many of our clients have adopted innovative new monitoring solutions from Wilytech and Coradiant (now CA and BMC respectively) that look for and help diagnose patterns “on the wire” – whether it be from the browser to their servers or from app servers to databases. While these are interesting and potentially worth some of your attention, our very best clients design their systems to be monitored from the ground-up – ensuring that their software helps identify performance problems as they happen.

Distributed File Systems – Take your pick of implementations, but many of our clients are eschewing traditional NAS/NFS devices for distributed storage pools the likes of Gluster, MogileFS and Ceph. Nearly every case has resulted in significant savings relative to proprietary systems with few reported impacts to availability or response times. Of course, as with any other architectural change you need to ensure that you are properly managing your risk through pods or swim lanes.


Comments Off

Scalability Rules Android App

Whether you love Scalability Rules or you haven’t gotten around to purchasing your copy, check out the new android application. The app has the what, when, why, and how for each of the 50 rules. Follow this link, scan this QR code, or search for “scalability” in the android market place.


Comments Off

Alternative Solutions to Old Problems

Are you like @devops_borat and not a fan of DevOps? Or, maybe you think deploying dozens of time each day to production is ludicrous. I’m actually a fan of both DevOps and continuous deployment but if you’re not don’t worry these are just new solutions to old problems and there are alternatives.

devops_borat

The Problems
As long as people have been divided into separate organizations there has existed strife and competition between the teams. In the technology field this is no place more apparent than between development and operations. In at least 50+% of the companies that we meet with they have problems getting these teams to work together. If you’ve been around for a few years you’ve surely heard one team pointing to the other as the problem, whether that problem is an outage or slow product development.

A solution to this problem is DevOps. Wikipedia states that DevOps “relates to the emerging understanding of the interdependence of development and operations in meeting a business’ goal to producing timely software products and services.”

Another common tech problem is that large changes are risky. It is called “Big Bang” for a reason…things go bang! If you’ve been part of an ERP implementation that took months if not years to prepare for you know how risky these large changes are.

A solution to this problem is to make small changes more frequently. According to Eric Ries, co-founder and former CTO of IMVU, continuous deployment is a method of improving software quality due to the discipline, automation, and rigorous standards that are required in order to accomplish continuous deployment.

Alternative Solutions
Admittedly, DevOps and continuous deployment are somewhat extreme for some teams. For those or for teams that just don’t believe that these are the solutions, don’t fret there are alternatives.

JAD/ ARB – For improving the coordination between development and operations, we’ve recommend the JAD and ARB processes. These are very lightweight processes that force the teams to work together for better architected and better supported solutions.

Progressive Rollout – For reducing risk by making smaller changes, we recommend progressive rollout. This is a simple concept that involves first pushing code to a very small set of servers, monitoring for issues, and then progressively increasing the percentage of servers that receive the new code. The time between rollouts can be 30 min to 24 hours depending on how quickly you are likely to detect problems. We often suggestion the percentage of servers in the progressive rollout to be 1%, 5%, 20%, 50%, 100%.

The bottom line is something technologists know – there are almost always multiple ways to solve a problem. If you don’t like the current or new solution look for an alternative.


Comments Off

Cascading Failures

I was chatting with Nanda Kishore (@nkishore) the ShareThis CTO about the recent problems Amazon had in one of their zones. Even though ShareThis is 100% in the cloud, because they have properly architectured their system, these regional outages didn’t affect ShareThis services at all. Of course kudos to Nanda and his team for their design and implementation but more interesting was our discussion about this being a cascading failure in which one small problem cascades into a much bigger problem. A few days later Amazon provided a bit of a postmortem confirming that a simple error during a network change started the problem. The incorrect traffic shift left the primary and secondary EBS nodes isolated, each thinking the other had failed. When they were reconnected they rapidly searched for free space to re-mirror, which exhausted spare capacity and led to a “re-mirroring storm.”

As we were discussing the Amazon issue, I brought up another recent outage of a major service, Facebook. In Sep 2010 they had a several hour outage for many users caused by an invalid configuration value in their cahcing tier. This caused every client that saw the value to attempt to fix it, which involved a query to the database. The DBs were quickly overwhelmed by hundreds of thousands of queries per second.

Both of these are prime examples of how in complex systems, small problems can cascade into large incidents. Of course there has been a good deal of research on cascading failures, including models of the probability distributions of outages to predict their occurrence. What I don’t believe exists and should is a framework to prevent them. As Chapter 9 in Scalability Rules states the most common scalability related failure is not designing to scale and the second most common is not designing to fail. Everything fails, plan for it! Of course utilizing swim lanes or fault isolation zones will certainly minimize the impact of any of these issues but there is a need for handling this at the application layer as well.

As an example, say we have a large number of components (storage devices, caching services, etc) that have a failsafe plan such as refreshing the cache or re-mirroring the data. Before these actions are executed, the component should check in with an authority that determines if the request should be executed or if too many other components are doing similar tasks. Alternatively, a service could monitor for these requests over the network and throttle/rate limit them much like we do in an API. This way a small problem that causes a huge cascade of reactions can be paused and handled in a controlled and more graceful manner.


Comments Off

Newsletter – Spring 2011

Below is part of our Fall 2010 Newsletter.  If you haven’t subscribed yet, click here to do so.

In this newsletter:

Scalability Rules

Scalability Rules: 50 Principles For Scaling Websites is available for presale. We are just a few short weeks away from the release date and are very excited about this project. This book is meant to serve as a primer, a refresher, and a lightweight reference manual to help engineers, architects, and managers develop and maintain scalable Internet products. It is laid out in a series of rules, each of them bundled thematically by different topics. Most of the rules are technically focused, while a smaller number of them address some critical mindset or process concern – each of which is absolutely critical to building scalable products.

It is available for preorder from these sites:

You can also help us get the word out about this book by liking and sharing the book’s Facebook page or the book’s official website, where we’ll keep up to date information about reviews and speaking engagements.

With the success of The Art of Scalability, we’ve been asked by a few folks, why write another book on scale? Our answer is that there simply aren’t many good books on scalability on the market yet, and Scalability Rules is unique in its approach in this sparse market.  Also, this is the first book to address the topic of scalability in a rules-oriented fashion. One of our most-commented-on blog posts is on the need for scalability to become a discipline. We and the community of technologists that tackle scalability problems believe that scalability architects are needed in today’s technology organizations. This book will help scalability architects, scalability evangelists, and the like to share their knowledge with others in scaling their systems.  See More…

Our first book The Art of Scalability is still available at these retailers:

 

Most Popular Posts

We know everyone is busy and often our RSS readers get filled with too many interesting articles to keep up with.  Here are summaries of a few of our posts and some by other authors that we particularly enjoyed.

Why A Technology Leader Should Code
The military teaches that a leader should be “technically and tactically” proficient. Military leaders owe it to their subordinates to understand the equipment that the unit employed and the basic combat tactics that would be followed. This concept is transferable to technology companies; the CTO owes it to their subordinates to understand the technology. They also owe it to the business to understand the economic aspects of the business and be able to straddle these two worlds. Additionally, periodically having to code a feature and deploy it will provide the engineering manager a better understanding and appreciation for what her engineers go through on a daily basis. Read more

What Is That Delay Costing?
Most technologists know that the slower the page the more likely the user will flee the page or the transaction flow and not make the purchase.  Research is teaching us that it may be less important to reduce actual delay rather than create a system where users will be less likely to attribute the delay to the site. An example that we sometimes see is to give the user the option of selecting a low or high graphic site in order to provide the users with the control. Users will likely perceive this as an active effort on the part of the SaaS provider to minimize download time and thus attribute delays to themselves, their computer, their ISP, etc but not the site. Read more

DevOps
DevOps is an umbrella concept that refers to anything that smoothes out the interaction between development and operations and is a response to the growing awareness of the disconnect between development and operations. There is an emerging understanding of the interdependence of development and operations in meeting a business’ goals. While not a new concept, we’ve been living and suggesting ARB and JAD as cornerstones of this coordination for years, DevOps has recently grown into a discipline of its own. Read more

Google Megastore
Google provided a paper detailing their design and development of “Megastore.” This is a storage system developed to meet the requirements of today’s interactive online services and according to the paper it blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, providing strong consistency and high availability. The system’s underlying datastore is Google’s Bigtable but it additionally provides for serializable ACID semantics and fine-grained partitions of data. Read more

Scalability at the Cost of Availability
One subtle concept that is sometimes misunderstood is that if not careful an increase in scalability can actually decrease your availability. The reason for this is the multiplicative affect of failure with items in series.  If two pieces of hardware independently have 99.9% uptime, when combined into a single system that relies on both to respond to requests, the availability of the system to go down to 99.9% x 99.9% = 99.8%. Read more

8 Lessons We Can Learn From The MySpace Incident
Robert Scoble wrote a case study, MySpace’s death spiral: insiders say it’s due to bets on Los Angeles and Microsoft, in which he reports that MySpace insiders blame the Microsoft stack on why they lost to Facebook.  Some lessons can be gleaned from this including All computer companies are technology companies first and Enterprise Programming != Web Programming and Intranet != Intranet. Read more

Aztec Empire Strategy: Use Dual Pipes For High Availability
The Aztecs built the great aqueduct 600 years ago but even then thought about uninterrupted supply.  This post states that the purpose of the twin pipes was to keep water flowing during maintenance.  When one pipe got dirty, the water was diverted to the other pipe while the dirty pipe was cleaned. Read more

 

Research Update and Request for Help
Marty and Mike will both be presenting their research at the 2011 Academy of Management Conference. Marty’s research deals with tenure based conflct and Mike’s research is focused on social contagion (a.k.a. viral growth). You can read the abstracts and full text for both papers here.

We are continuing our research and could use your help. Please consider completing one or both surveys.

HELP!
If you are an executive team member at a startup, please take this survey and pass it along to your colleagues within your company.

If you participate in any of the following social networks (Facebook, Friendster, LinkedIn, Twitter, MySpace, Ning, Orkut, or Yahoo!360), please take this survey and pass it along to your friends or colleagues.

Thanks for your support!


Comments Off

DevOps

What do you call a set of processes or systems for coordination between development and operations teams? Give up? Try “DevOps”. While not a new concept, we’ve been living and suggesting ARB and JAD as cornerstones of this coordination for years, but it has recently grown into a discipline of its own. Wikipedia states that DevOps “relates to the emerging understanding of the interdependence of development and operations in meeting a business’ goal to producing timely software products and services.” Tracking down the history of the DevOps Wikipedia page, shows that this topic is a recent entry.

There are a lot of other resources on the web that many not have been using this exact term but have certainly been dealing with the development and operations coordination challenge for years.  Dev2Ops.org is one such group and posted earlier this year their definition of DevOps “an umbrella concept that refers to anything that smoothes out the interaction between development and operations.”  They continue in their post highlighting that concept of DevOps is in response to the growing awareness of a disconnect between development and operations. While I think that is correct I think it’s only partially the reason for the recent interest in defining DevOps.

With ideas such as continuous deployment and Amazon’s two-pizza rule for highly autonomous dev/ops teams there is a blurring of roles between development and operations. Another driver of this movement is cloud computing. Developers can procure, deploy, and support virtual instances much easier than ever before with the advent of GUI or API based cloud control interfaces. What used to be clearly defined career paths and sets of responsibilities are now being blended to create a new, more efficient and highly sought after technologist. A developer who understands operations support or a system administrator who understands programming are utility players that are very valuable.

While perhaps DevOps is a new term to an old problem, it is promising to realize that organizations are taking interest in the challenges of coordination between development and operations. It is even more important that organizations pay attention to this topic given the blurring of roles.


Comments Off

“Internal Customer”: The “C” Word of SaaS Companies

If you are a technology organization within a Software as a Service (SaaS) company, there is no such thing as an “internal customer”.

If you are a technology organization within a Software as a Service (SaaS) company, there is no such thing as an “internal customer”.  We often see this anachronistic IT phrase thrown around in web X.0 companies by executives and engineers who simply have not adopted the new SaaS mindset.  Do you think you’ll hear the left offensive tackle of an NFL team refer to the quarterback as his “internal customer”?  The quarterback consumes services (energy to block opponents) of the left tackle – so why wouldn’t he be a customer?  The answer is simple – because the notion of a customer relationship is different than the notion of a relationship within teammates.

The first reason why your teammate isn’t your customer is because he or she is, well, your TEAMMATE.  Customers are someone for whom you produce a service or product and teammates are someone with whom you work to accomplish a goal.  The difference between working FOR someone and working WITH someone is HUGE.  This difference creates a contextually activated identity that forces you to think about customers in a different light than you would a teammate.  Very often, as we’ve written before, this can result in affective (role based or bad) conflict between teams.  Affective conflict is bad and it destroys shareholder value.  Working as a team is important and customers aren’t part of your team.

The next reason that your teammate isn’t your customer is that the customer is always right.  Your teammate isn’t always right.  You need to debate certain points as a team to come to better solutions.  This isn’t affective conflict, it is cognitive conflict and if handled properly it is good and helps to create shareholder value.

The most important reason there isn’t a customer relationship here is that your teammate isn’t paying!   “Servicing” your teammate (uggh…that’s an ugly term) doesn’t create shareholder value.  Working as  a team to delivery a service or product to your  “real” customer is what creates shareholder value.  One design, one approach, one ruthless drive as a team to get across the goal line is what is necessary to thrive and succeed.

So stop using the ugly “internal” C word in your SaaS company.  It doesn’t have a place there.  Let the old world, internal IT folks continue to provide services to their internal customers.  Start acting like a team, designing and building services rather than software.


2 comments

Moving from Packaged Software to SaaS

You can be successful both shipping software and delivering services through software. But you can't be successful at both without distinct architectures.

It’s probably no surprise to our readers that many old packaged software companies are attempting to take their software and hence their business models “online”.  And why not?  The model is attractive and benefits accrue to both the providers of service through software and those who outsource portions of what was once bothersome internally hosted software.  The providers benefit from economies of scale in hosting that generate attractive profits for the provider and savings for the customer, lower maintenance costs resulting from custom customer deployments, predictable revenue streams fostered through closer customer contact, more frequent and smaller releases that reduce risk and faster implementation times that result in faster profit recognition.  Customers benefit from outsourcing non-core IT functions, providers who specialize in delivering specific services, lower capital expenditures and faster deployment times.  SaaS is both a desert topping and a floor wax!  It’s the cure for cancer and the answer to the riddle of life!

But what many of these companies don’t realize is that the way one architects a product and runs a company focused on service delivery is simply different than the approach of a company focused on delivering software.  Customers expect that you are going to give them higher availability and fewer headaches.  Software alone simply won’t meet this goal; it is imperative that one design SaaS systems holistically which in turn requires skills in both infrastructure and software architecture (or “systems” architecture).   The cost leverage necessary to both increase profit margins and decrease customer cost typically requires multi-tenancy which has its own share of headaches.  Fault isolation and rollback capabilities are a must to minimize customer impact and mitigate rapid deployment risks.

It is not enough to simply bundle up an application in a hosted fashion and label yourself a “SaaS” company.  If you don’t work aggressively to increase availability and decrease your cost of operations, someone with greater experience will come along and simply put you out of business.  After all, your business is now about SERVICE – not SOFTWARE.  This is a fundamental mind-shift that some companies simply can’t overcome or maybe simply don’t recognize.  This isn’t to say that a good engineer or product manager can’t be equally good at developing packaged and SaaS applications, but it does mean that the approach is completely different.

Stop trying to figure out how to leverage your existing assets with minimal work and start thinking about having two different products.  Or, determine which business you want and kill the other one off.  If you decide to keep both products alive, you can share services and code between these platforms, but you should not do so at the expense of optimizing your SaaS solution.  Attempting to satisfy both with a single architecture will likely result in you failing at both.


2 comments

Scalability Warning Signs

Is your system trying to tell you that you're going to have scalability problems? We like to think that we couldn't have predicted problems at 10x our last year's traffic but there are often warning signs that we can heed if we know what to look for.

Unless you’re one of the incredibly lucky sites where the traffic spikes 100x overnight, scalability problems don’t sneak up on you. They give you warning signs that if you are able to recognize and react to appropriately, allow you to stay ahead of the issues. However, we’re often so head down getting the next release out the door that we don’t take the time to realize we’re experiencing warning signs until they become huge problems staring us in the face.  Here are a few of the warnings that we’ve heard teams talk about in the past couple of months that were clearly signs of problems on the horizon.

Not wanting to make changes – If you find yourself denying request for changes to certain parts of your system, this might be a warning sign that you have scalability issues with that component. A completely scalable system has components that can fail without catastrophic impact to customers. If you’re avoiding changes to a component because of the risk of problems this is a warning sign that you need to re-architect to eliminate or at least mitigate the risk.

Performance creep – If after each release you need to add hardware to a subsystem or you accept a performance degradation in a service you could have a scaling issue approaching quickly. Consistently increasing consumption of CPU or memory resources in a service with each release will lead you into an unsustainable situation. If today you’re comfortably sitting at 40% CPU utilization and you allow a modest 10% degradation in each release you have less than nine releases before you are well above 100% but the reality is you won’t get close to that without significant issues.

Investigating larger hardware – If you’ve started asking your vendors or VAR about bigger hardware you’re heading down the path of scalability problems. The scale of more computational resources per dollar is not linear, it’s closer to cubic or even exponential scales. Purchasing more expensive hardware might seem like the economical way out when you compare the cost of the first hardware upgrade versus developer time but run the calculation out several iterations. When you get to a Sun Fire™ E25K Server with Oracle Database 10g at a $6M price tag you might feel differently about the decision.

Asking vendors for advanced features – When you start exploring advanced options of your vendor’s software you’re likely heading down the path of increased complexity and this is a clear warning sign of scalability problems in your future. Besides potentially locking you into a vendor which lowers your negotiating power it puts the fate of your company in someone else’s hands, which wouldn’t make me sleep very well at night. See our post on using vendor features to scale for more information.

Watch out for these or similar warning signs that scalability problems are looming on the horizon. Dealing with the problems today while you have time to plan properly might not get you an award for being a firefighter but you’ll more likely deliver a quality product without costly interruption.


1 comment

Agile Architects

If you think agile development methods and architecture are at odds, think again. They can not only coexist but can thrive together to build better products and platforms.

We recently posted what agile is not, where we outlined questions that we often hear about agile development. Another question that is often raised is how to combine the seemingly long-term process of architecture with the short-term nature of agile development. We believe that architecture is not at odds with agile development and that the two can not only coexist but complement each other. To ensure your architecture standards are being integrated with each sprint, resulting in a scalable and available architecture, we rely on the Joint Architecture Development (JAD).

We’ve covered JAD before but as a recap, this is the process by which features are designed in a series of meetings where developers, architects, and tech operations come together to create the design. This multi-function representation early in the development process ensures that individuals are aware of standards, there is buy-in from all concerned parties, and that the design is benefitted by the knowledge that exists in different technical fields.

In the IEEE Software article “Architects as Service Providers,” Roland Faber says that “the architect role is to champion system qualities and stability, while the developer role is to champion the system functions and change.” The agile architect interacts frequently and flexibly with the developers, building a trust relationship with them.

The JAD is ideal for flexible interaction that can happen in short bursts of effort that correspond to sprints. The agile architect must understand that, because of the nature of agile development, architecture must be dynamic and not static. Architects must rely on personal interaction with developers not documentation to understand the requirements.

Faber continues in his article describing two phases of the architecture process as preparation and support. During preparation the architect engages in processes such as prepare rules, frameworks, and structures. During support the architect helps resolve conflicts, engages in firefighting, and stimulates architecture communication. He makes a point that if the developers don’t believe the architects will provide support they won’t tell them when they are breaking the rules.


1 comment