AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Tag » Scalability

Scalability Rules Videos and Reviews

Here are a few videos that we did to explain Scalability Rules…ignore the scary faces:

You can find more videos by following this link.

Here are a couple reviews of the book:
InfoQ
Code Ranch
LinkedIn Reviews


Comments Off

Scalability Rules Android App

Whether you love Scalability Rules or you haven’t gotten around to purchasing your copy, check out the new android application. The app has the what, when, why, and how for each of the 50 rules. Follow this link, scan this QR code, or search for “scalability” in the android market place.


Comments Off

Multi-paradigm Data Storage Architectures

We often have clients ask about one or more of the NoSQL technologies as potential replacements for their RDBMS (typically MySQL) to simplify scaling. What I think makes much more sense with regard to these NoSQL and SQL storage systems is an AND instead of an OR discussion. Consider implementing a multi-paradigm data storage layer that provides the appropriate persistent storage system for the different types of data in your application. This approach is similar to our RFM approach to data storage. Consider questions such as how often do you need the data, how quickly do you need it, how relational is the data, etc. There are at least four benefits of this multi-paradigm approach: simpler scaling, improved application performance, easier application development, and reduced cost.

Scaling
The AKF Scale Cube provides a straightforward way to scale any relational database through the three axes but we know that splitting data relationships once they’ve been established isn’t easy. It requires work and lots of coordination between teams. By limiting what gets stored relationally to only the minimum that is required means fewer splits along any axis. Many of the NoSQL technologies provide auto sharding and asynchronous replication. Re-indexing keys across another node is much simpler than migrating tables into another database.

Performance
While relational databases can have great performance, unless the table is pinned in memory or the query results are cached in memory, an in memory data store will always outperform SQL. In many applications we could make use of in memory solutions like Memcache or MongoDB to improve performance of retrieving high value data.

Application Development
As Debasish Ghosh states in his article Multiparadigm Data Storage for Enterprise Applications, storing data the way it is used in an application, simplifies programming and makes it easier to decentralize data processing. If the application treats the data as a document why break it apart to store it relationally when we have viable document storage engines. Storing the data in a more native format allows for quicker development.

Cost
For data that’s not needed often, cache it in other places (such as a CDN) or lazy load it from a low cost storage tier such as Amazon’s S3. This might work well for applications hosted in the cloud. The benefit of this a lower cost per byte stored, especially when considering all costs including administrators for the more complex data storage systems such as relational databases.

A final step in implementing a multi-paradigm data storage layer is an asynchronous message queue for data that needs to move up and down the stack. Implementing ActiveMQ or RabbitMQ to asynchronously move data from one layer to another as needed relieves the application of this burden. As an example consider an application that routes picking baskets for inventory in a warehouse. This is typically thought of as a graph with bins of inventory as nodes and the aisles as edges. For faster retrieval you could store this in a graph database such as Neo4J for ease of development and performance reasons. You could then asynchronously persist these maps in a MySQL database for reporting and older versions into an S3 bucket for historic archiving. This combination provides faster performance, easier development, simpler scaling, and reduced cost.


Comments Off

Scalability Rules – Released This Week

Our newest book, Scalability Rules, has just been released. Here are a few places you can purchase the book:

You can also help us get the word out about this book by liking and sharing the book’s Facebook page or the book’s official website, where we’ll keep up to date information about reviews and speaking engagements.

Scalability Rules brings together 50 rules that are grounded in experience garnered from over a hundred companies such as eBay, Intuit, PayPal, Etsy, Folica, and Salesforce. Put together and organized to be easily read and referenced for rapid application to nearly any technical environment. The rules are technology agnostic and have been applied to LAMP, .net, and even midrange system architectures.

We are very thankful for everyone’s help in making this project come together and here are just a few of those folks:

    Technical Reviewers – Robert Guild, Geoffrey Weber, and Jeremy Wright
    Pre-reviewers – Chad Dickerson, Chris Lalonde, Jonathan Heiliger, Jerome Labat, and Nanda Kishore.
    Senior Acquisitions Editor – Trina MacDonald
    Development Editor – Songlin Qiu
    Project Editor – Anne Goebel

We dedicated this book to to our friend and partner Tom Keeven who in our mind is the originator of many of these concepts and has helped countless companies in his nearly 30 years in the business.


Comments Off

Newsletter – Spring 2011

Below is part of our Fall 2010 Newsletter.  If you haven’t subscribed yet, click here to do so.

In this newsletter:

Scalability Rules

Scalability Rules: 50 Principles For Scaling Websites is available for presale. We are just a few short weeks away from the release date and are very excited about this project. This book is meant to serve as a primer, a refresher, and a lightweight reference manual to help engineers, architects, and managers develop and maintain scalable Internet products. It is laid out in a series of rules, each of them bundled thematically by different topics. Most of the rules are technically focused, while a smaller number of them address some critical mindset or process concern – each of which is absolutely critical to building scalable products.

It is available for preorder from these sites:

You can also help us get the word out about this book by liking and sharing the book’s Facebook page or the book’s official website, where we’ll keep up to date information about reviews and speaking engagements.

With the success of The Art of Scalability, we’ve been asked by a few folks, why write another book on scale? Our answer is that there simply aren’t many good books on scalability on the market yet, and Scalability Rules is unique in its approach in this sparse market.  Also, this is the first book to address the topic of scalability in a rules-oriented fashion. One of our most-commented-on blog posts is on the need for scalability to become a discipline. We and the community of technologists that tackle scalability problems believe that scalability architects are needed in today’s technology organizations. This book will help scalability architects, scalability evangelists, and the like to share their knowledge with others in scaling their systems.  See More…

Our first book The Art of Scalability is still available at these retailers:

 

Most Popular Posts

We know everyone is busy and often our RSS readers get filled with too many interesting articles to keep up with.  Here are summaries of a few of our posts and some by other authors that we particularly enjoyed.

Why A Technology Leader Should Code
The military teaches that a leader should be “technically and tactically” proficient. Military leaders owe it to their subordinates to understand the equipment that the unit employed and the basic combat tactics that would be followed. This concept is transferable to technology companies; the CTO owes it to their subordinates to understand the technology. They also owe it to the business to understand the economic aspects of the business and be able to straddle these two worlds. Additionally, periodically having to code a feature and deploy it will provide the engineering manager a better understanding and appreciation for what her engineers go through on a daily basis. Read more

What Is That Delay Costing?
Most technologists know that the slower the page the more likely the user will flee the page or the transaction flow and not make the purchase.  Research is teaching us that it may be less important to reduce actual delay rather than create a system where users will be less likely to attribute the delay to the site. An example that we sometimes see is to give the user the option of selecting a low or high graphic site in order to provide the users with the control. Users will likely perceive this as an active effort on the part of the SaaS provider to minimize download time and thus attribute delays to themselves, their computer, their ISP, etc but not the site. Read more

DevOps
DevOps is an umbrella concept that refers to anything that smoothes out the interaction between development and operations and is a response to the growing awareness of the disconnect between development and operations. There is an emerging understanding of the interdependence of development and operations in meeting a business’ goals. While not a new concept, we’ve been living and suggesting ARB and JAD as cornerstones of this coordination for years, DevOps has recently grown into a discipline of its own. Read more

Google Megastore
Google provided a paper detailing their design and development of “Megastore.” This is a storage system developed to meet the requirements of today’s interactive online services and according to the paper it blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, providing strong consistency and high availability. The system’s underlying datastore is Google’s Bigtable but it additionally provides for serializable ACID semantics and fine-grained partitions of data. Read more

Scalability at the Cost of Availability
One subtle concept that is sometimes misunderstood is that if not careful an increase in scalability can actually decrease your availability. The reason for this is the multiplicative affect of failure with items in series.  If two pieces of hardware independently have 99.9% uptime, when combined into a single system that relies on both to respond to requests, the availability of the system to go down to 99.9% x 99.9% = 99.8%. Read more

8 Lessons We Can Learn From The MySpace Incident
Robert Scoble wrote a case study, MySpace’s death spiral: insiders say it’s due to bets on Los Angeles and Microsoft, in which he reports that MySpace insiders blame the Microsoft stack on why they lost to Facebook.  Some lessons can be gleaned from this including All computer companies are technology companies first and Enterprise Programming != Web Programming and Intranet != Intranet. Read more

Aztec Empire Strategy: Use Dual Pipes For High Availability
The Aztecs built the great aqueduct 600 years ago but even then thought about uninterrupted supply.  This post states that the purpose of the twin pipes was to keep water flowing during maintenance.  When one pipe got dirty, the water was diverted to the other pipe while the dirty pipe was cleaned. Read more

 

Research Update and Request for Help
Marty and Mike will both be presenting their research at the 2011 Academy of Management Conference. Marty’s research deals with tenure based conflct and Mike’s research is focused on social contagion (a.k.a. viral growth). You can read the abstracts and full text for both papers here.

We are continuing our research and could use your help. Please consider completing one or both surveys.

HELP!
If you are an executive team member at a startup, please take this survey and pass it along to your colleagues within your company.

If you participate in any of the following social networks (Facebook, Friendster, LinkedIn, Twitter, MySpace, Ning, Orkut, or Yahoo!360), please take this survey and pass it along to your friends or colleagues.

Thanks for your support!


Comments Off

Scalability at the Cost of Availability

Do you associate scalability with availability? Sometimes these go hand-in-hand but sometimes these are at odds with each other. We’re obviously big proponents of architecting your systems so that you have the necessary scalability when you need it but we’re also realistic. We often help young companies make tradeoffs between capital expenditure and scalability. It’s not uncommon for us to spend a good deal of time explaining the concepts of Design-Implement-Deploy and Recency-Frequency-Monetization to help with this discussion.

One subtle concept that is sometimes misunderstood is that if not careful an increase in scalability can actually decrease your availability. In order to understand how this can happen we need to talk about the multiplicative affect of failure with items in series. Let’s take for example a system with a single web server with 99.9% availability, forget about network gear for now but it has the same affect. The availability of the system is 99.9% If we now add a database, also with 99.9% availability, to the system. Assume that the DB is required for the web server to respond i.e. pages are built by querying the DB. This causes the availability of the system to go down to 99.9% x 99.9% = 99.8%. The reason is that with 99.9% availability the system is going to be down for ~43 min per month. The chance that the database experiences its 43 min of downtime at the same time as the web server is down is very small. Much more likely is that you experience 86 min of downtime each month, half caused by the DB and half by the web server.

Back to scaling causing problems with availability. Let’s take the same example, a single web server and a single DB server, both with 99.9% availability. If our database is starting to get busy and we decide to split it, most likely we’d start by adding a read slave (X-Axis split), where the write queries (insert, update, delete) go to the master and the reads (select) go to the slave. To accomplish this we need to introduce another piece of hardware and replicate the database. If the web pages in our system require both read and write queries to the DB, then we’ve just decreased the overall system availability by increasing its scalability. This is a very simplistic example and makes a lot of assumptions but hopefully it gets the point across that you can actually decrease your availability by increasing your scalability.

So why make this tradeoff? In most cases the availability of our hardware is much higher than three-nines so the addition of a small amount of downtime is worth the gain in scalability. Also, by using swim lanes we can mitigate this by splitting our downtime across parts of our users, effectively cutting downtime in half with our first swim lane split.

All of this reminds us that scalability is much more of an art than a science, hence the name of our first book The Art of Scalability. But don’t despair, there are definite rules that govern how to scale effectively, such as the X, Y, or Z Axis splits, and why we’re calling this book Scalability Rules. You just need to use art in applying them. As an analogy, think about an artist painting. Mixing red with blue will always result in purple, a rule, but how the artist applies that color to the canvas is pure art.


1 comment

Outbrain’s CTO on Scalability

I had the opportunity to speak with Ori Lahav, Outbrain’s co-founder and CTO, about his experience scaling Outbrain to handle 1.5 billion page views in just three years.  Outbrain is a content recommendation service that provides for blogs and articles what Netflix does for videos or Amazon does for products.

Ori is oversees the R&D center for the company located in the heart of Israel’s technology center. Prior to founding Outbrain, Ori led the R&D groups at Shopping.com (acquired by eBay) in search and classification. Before joining the internet revolution Ori led the video streaming Server Group at Vsoft.

Below are my notes from the interview.  Here is the link to the audio version but the quality is fairly poor.

What is your background and how did you come to start Outbrain?
Outbrain was founded by Yaron Galai and myself (Navy friendship).  Before that I spent 3.5 years at shopping.com, which was acquired by eBay, leading software groups.  I liked the challenges of a start-up and joining forces with Yaron was a great opportunity.

How has Outbrain grown over the past couple of years?
We did some false starts with several directions but when we started with recommendations on content sites it started catching fire.  We started with self serve blogs, then professional blogs, small publishers, national publishers, and now we are international.  In 3 years we grew from 0 to over 1.5B page views, 3 production servers to over 150, 0 system administrators to 4, and from 3 developers to 15 today.

How has your architecture changed over this period?
Surprisingly… not much, it was first 2 application servers and a MySQL database.  Then we started adding more application servers and replicating the MySQL.  As the product grew, we added more and more components including memcached, Solr, TokyoTyrant, and Cassandra.  We recently added layer that warms the caches and is being notified by ActiveMQ.

On the backend, we have machines that are fetching content from the sites we are on.  We gather article text and images and save them to MogileFS where we have indexers that index them in Solr.
We started investing in reporting infrastructure and started with MySql but with the amount of data we have we shortly understood that Hadoop is the way to go – we now have a cluster of more than 10 nodes in our Hadoop cluster.

What is Outbrains view of scalability?
First – scalability is culture – if you think big – you will be big.  The regular rules of scalability apply here too:

  • No singles
  • Scale out instead of up
  • Replicate data to ease the load
  • Shard data to scale
  • Utilize commodity hardware

Specifically for Outbrain I would add:

  • open-source is crucial for scalability
  • be cost conscious
  • build many simple/small environments and not a single big one

How have you managed data centers to be so cost effective?
We simplify it!  We create many small datacenters and not a big one.  We have no upfront costs of network gear (stacking) and real-estate (Cages).  We use open-source for our network (load balancer and firewalls) and infrastructure (OS).

In order to ease the step function of cost, grow as you go. Our data center provider, Atlantic Metro, has helped with metered power instead of paying by the circuit.  This way we’re motivated to power down servers not needed.

What has been the most difficult scalability challenge?
Scaling the team…we are more of a patrol boat DNA not an aircraft Carrier DNA.  Technology challenges are fun – never too hard but the team and process are much more challenging.

We have also started using the Continuous Deployment process and this has been a great help. By empowering the team members to act and change the system as they need – you can grow to be an Aircraft carrier and still keep the maneuverability of a patrol boat.

What do you think of the NoSQL movement?
We are big fans.  We use Hadoop, Hive, Cassandra, TokyoTyrant, MySql, and others.  This helps us maintain our low serving cost and attract the type of talent we need on the team.
Outbrain is proud to be 100% open-source.  We use open-source from the office telephony system to all platforms and network infrastructure.

And… we are hiring top technology talent in Israel, so contact Outrbain if you are interested.


1 comment

Newsletter – Firesheep

Below is part of our Fall 2010 Newsletter.  If you haven’t subscribed yet, click here to do so.

In this newsletter:

Scalability Rules

In between working with some terrific new clients this year we’ve been busy writing the second book, Scalability Rules.  With the help of some terrific technical reviewers we feel the book is taking shape very nicely and the first five chapters are now available on Safari Rough Cuts for those interested in helping review.  Scalability Rulesbrings together 50 rules that we have gathered from our experiences working with over a hundred hyper-growth companies. This format of practical rules of scalability should make it ideal for use as reference manual in formal meetings and informal discussions.

See More…

If you haven’t picked up your copy of The Art of Scalability or have technologist on your holiday gift list here are a couple links for you:

Putting Out Firesheep (Protecting Your Users’ Cookies)

One of our recent blog posts that we found most interesting was submitted by a guest blogger.  Randy Wigginton is a seasoned technologist who after a discussion with us about the security risks, brought to light recently by the Firefox plugin called Firesheep, came up with a solution that we thought should be shared.  Randy’s solution is ideal for companies who want to protect their user session data (login, browsing history, etc) but doesn’t want to be encumbered by the overhead of running their entire site behind SSL.

We also have a simple demo setup for those interesting in testing this solution.  The way it works is when a user logs in, TWO cookies are dropped.  In the demo, one is called “session”, the other is called “authenticate”.  These two cookies are identical except for a single attribute: “authenticate” is a secure cookie.  We authenticate users on non-secure pages by including a reference to a secure javascript at the top of each page.  At the top of pages requiring authentication is this simple line of code:

<script type=”text/javascript” src=”https://verify.akfdemo.com/authenticate.php“>
</script>

See More…

Lot18′s Series A

Lot18, a membership-by-invitation marketplace for wine from renowned producers at fantastic values, announced that it has completed a $3 million Series A round of funding led by FirstMark Capital, a New York City-based venture capital firm. Lot18 was founded by Kevin Fortuna and Philip James. Philip was the founder of Snooth.com, the world’s largest wine website, and Kevin was most recently a partner at AKF partners.

Lot18 Screen Shot

See More…


Comments Off

Scalability as a Discipline

Just as we discussed in an earlier post about the evolution of roles in technology startups, we’ve seen the same thing in the technology discipline as a whole. Computer science as a discipline started in mathematics with Kurt Gödel’s incompleteness theorem.  From there Alan Turing and Alonzo Church formalized the notion of an algorithm and the concept of a Turing machine. The first computer that could run stored programs, based on the Turing machine model, was built in 1948 and called the Manchester Baby.

In the beginning there were only programmers, then came system operators, and DBA’s, and architects, etc. We now have many different disciplines that one can specialize in for either part or all of their careers. One of the missing disciplines, in my opinion, is the scalability architect or scalability as a discipline.

While understanding the rules, patterns, and principles of scalability are completely achievable by anyone in the technology organization, this does not mean that they are widely known. Scalability architects would be more like evangelist and teachers rather than the gatekeepers of secret knowledge. Unlike DBA’s or network engineers, whose jobs really aren’t to educate any other technology person on how to create an index or open a port, the scalability architect would educate tech people. All other disciplines from software developers to DBA’s could benefit from additional knowledge about scaling.

If you’re serious about scaling is it time that you looked for or anointed a scalability architect?


5 comments

Defining Pods, Shards and Swim Lanes

In the course of our engagements we often have to pause for a few minutes to acquaint everyone with a few terms that we use. It is often the case that they have heard or even use some terms common in the industry. Three of these that are often used and/or confused are pods, shards, and swim lanes. Let’s start by defining each one and then explaining the differences

Shards
According to Merriam-Webster a shard is a small piece or part. Wikipedia defines a database shard as “…a method of horizontal partitioning in a database or search engine.” The term horizontal partitioning refers to a database design principle whereby rows of a database table are separated possibly onto physically distinct database servers.

A shard to AKF is an Z-axis split on the AKF Scale Cube. This involves splitting the tables in the database between two or more database servers based on some appropriate key such as customer ID or sales items. An X-axis split involves replicas such as read-only slaves or standbys that are complete copies of the primary database. The Y-axis splits are one done by service, which usually aligns to a sub-set of tables. An example of this would be pulling session off the primary database an onto it’s own database server.

Pods
One of our clients, Salesforce.com, uses the term pods especially for its Force.com software-as-a-service platform. Pods are self-contained sets of functionality that can consist of an app server or database. If a pod goes down because the platform isn’t running it, only the customers on that pod will be effected. Salesforce executives claimed that it delivered 99.95 percent uptime last year.

Swim Lanes
AKF uses the term “swim lane” to describe a failure domain or fault isolation architecture. A failure domain is a group of services within a boundary such that any failure within that boundary is contained and failures do not propagate outside. The benefit of such a failure domain is two-fold:

  1. Fault Detection: Given a granular enough approach, the component of availability associated with the time to identify the failure is significantly reduced. This is because all effort to find the root cause or failed component is isolated to the section of the product or platform associated with the failure domain.
  2. Fault Isolation: As stated previously, the failure does not propagate or cause a deterioration of other services within the platform. As such, and depending upon approach only a portion of users or a portion of functionality of the product is affected.

Between swim lanes synchronous calls are absolutely forbidden because any synchronous call between failure domains, even with appropriate timeout and detection mechanisms, is very likely to cause a cascading series of failures. An example of how this happens is in your database when one long running query slows down all the other queries competing for locks or resources.

Similarity and Differences
All of these terms describe similar architectures (splitting by customers or similar key) but they are done for different purposes. Shards are very specific to databases and don’t imply whether or not the application tier is sharded or not. The purpose of shards are to scale an RDBMS onto many different servers instead of larger hardware. Pods and Swim Lanes aim to achieve both scalability of the overall system (application and database) as well as achieve fault isolation.


1 comment