AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Why We Write

For those of you who dream about writing a book and getting rich from all the royalties, please think again. That is unless you have the last manuscript from The Girl with the Dragon Tattoo series. The reason is that according to recent surveys, authors on average earn as little as $5K annually. True, some authors such as J.K. Rowling have sold over 400 million copies and make millions in royalties. But of the ~275K books published in the US each year, only 1,000 of them sell more than 50K and only 25,000 sell more than 5K. 93% of all books published sell less than 1K copies. The average royalty rate is around 8% of the actual sale price which is lower than the list price because of discounts to the distributor, etc. So, if an author sells 5K copies of their book, each for $20, their earnings are (5,000 x $20 x .08) = $8,000. Not bad but when you consider it takes hundreds of hours of work to write, illustrate, edit, and proof a book the ROI is pretty low.

So, why do authors write? Certainly there are personal reasons such as self satisfaction, name recognition, etc but I think authors, especially those who write several books, want to share their story/message. If you couldn’t tell from our blog, site, consultancy practice, books, etc we’re passionate about scaling. As technologist, we’ve felt the pain of struggling with scalability issues. We’ve had to explain to our business colleagues that customer facing features had to be delayed because we had to work on keeping the site up. We’ve felt the pain and over the years we’ve learned how to scale. It definitely wasn’t overnight and we made our share of mistakes but ultimately we were taught or figured out methods that work when scaling systems. We want everyone to know about these methods. Whether we get the chance to meet with your team in one of our engagements, you read our blog, or you buy the books we want people to know about these concepts.

Help us get the scalability message out by liking and sharing our books’ Facebook pages The Art of Scalability and Scalability Rules or their official websites theartofscalability.com & scalabilityrules.com.

This is why we write. We are passionate about scaling and want to share our knowledge with people. We hope you enjoy reading our writing and most of all we hope you get at least a couple good ideas on how to scale from it.


Comments Off

Availability as a Feature

It doesn’t matter if you run a commerce site, a services product (such as a SaaS offering) or simply use your homepage to distribute information:  The table stakes for playing online is high availability.  So many companies just take for granted that they will be highly available because they have multiple instances of systems and multiple copies of their data.  This assumption of availability will likely, at the very least, cost you significant pain and in the extreme cost you either significant market share or close your doors as a business.  Customers expect the unachievable – 100% availability.  At the very least you need to give them something close to that.  What will happen to you if you have a data center failure?  How about if a DBA accidentally drops a critical table in your production database?  What will you do when that marketing campaign triggers a near overnight doubling of traffic?  What happens when that new feature has a significant performance bug and gets adopted so quickly that it brings your entire site to its knees?

We often tell our clients that they should treat high availability as a feature.  Unfortunately, it is a somewhat expensive feature that requires constant investment overtime to achieve and maintain. It is a must have feature that will only differentiate your firm if you have competitors who do not make similar investments; when competition exists, customers are more likely to leave a site for a competitor due to availability and performance issues than nearly any other reason.  If you don’t believe us on this topic, just go ask the folks at Friendster.

Treating availability as a feature means measuring availability from a customer perspective rather than a systems or device perspective.  How many times did customer requests not complete?  In this regard, availability now becomes a percentage of failed transactions against an expected number of transactions.   We define an approach to accomplish this in our first book “The Art of Scalability”.  Every executive in the company should “own” the availability metric and understand its implication to the business.    You should track how much you invest in availability over time and significant decreases in engineering or capital should be questioned as it may be an early indicator that you are under investing and a harbinger of hard times to come.

One of the most common failures we see is to assume that disaster recovery is something that only big companies need.  Make no mistake about it, disasters do happen and given enough time they will happen to you.  Data centers catch on fire, have water (sprinkler) discharges that ruin equipment, have complete power equipment failures that take hours to fix and are prone to damages from vehicles, earthquakes, employees and tornados.  In our past lives as executives and current roles as advisors we’ve seen no less than 4 data center fires, 2 data centers incapacitated from earthquakes and tornados and one data center leveled by a truck running into it.  You are never too young to invest in DR.

And DR need not break your bank.  The dials of RTO (recovery time objective) and RPO (recovery point objective) allow you to determine how much you will invest.  Perhaps you simply replicate your databases to a smaller set of databases at a remote datacenter and have a copy of each of your systems there with an additional copy ready “in the cloud”.  While you won’t be able to run production from that data center, you may be able to leverage the cloud to add capacity for relatively low cost by cloning the cloud based systems.  Such a solution has a fast recovery point objective (you lose very little data) and a moderate recovery time objective (several hours) for very low comparative cost.  Of course, you would need to test the solution from time to time to show that it is viable, but it’s a cheap and effective insurance policy for the business.

So remember – availability is your most important feature.  Customers expect it always and will run away from you to competitors if you do not have it.  Create an availability metric and ensure that everyone understands it as a critical KPI.  Evaluate the company spend against availability quarterly or annually as an additional indicator of potential problems.   Assume that disasters happen and have a DR plan regardless of your company size.

 


Comments Off

Enterprise Cloud Computing – Book Review

A topic of particular interest to us is cloud computing so I picked up a copy of Gautum Shroff’s Enterprise Cloud Computing: Technology, Architecture, Applications published in 2010 by Cambridge University Press. Overall I enjoyed the book and thought it covered some great topics but there were a few topics that I wanted the author to cover in more depth.

Enterprise Cloud Computing Book Cover

The publisher states that the book is “intended primarily for practicing software architects who need to assess the impact of such a transformation.” I would recommend this book for architects, engineers, and managers who are not currently well versed with cloud computing. For individuals who already possess a familiarity on these subject this will not be in depth enough nor will it have enough practical advice on when to consider the different applications.

Of minor issue to me is that this book spends a good deal of time upfront covering the evolution of the internet into a cloud computing platform. A bigger issue to me is that coverage of topics is done very well at an academic or theoretical level but doesn’t follow through enough on the practical side. For example, Shroff’s coverage of topics such as MapReduce in Chapter 11 are thorough in describing how the internal functionality but fall short on when, how, or why to actually implement them in an enterprise architecture. In this 13 page chapter, he unfortunately only gives one page to the practical application of batch processing using MapReduce. He revisits this topic in other chapters such as Chapter 16 “Enterprise analytics and search” and does an excellent job explaining how it works but his coverage of the when, how, or why this should be implemented is not given enough attention.

He picks up the practical advice in the final Chapter 18 “Roadmap for enterprise cloud computing”. Here he suggests several ways companies should consider using cloud and Dev 2.0 (Force.com and TCS InstantApps). I would like to have seen this practical side implemented throughout the book.

I really enjoyed Shroff”s coverage of the economics of cloud computing in Chapter 6. He addresses the issue by showing how he compares the in-house (collocation center) vs cloud. Readers can adopt his approach using their own numbers to produce a similar comparison.

The book does a great job covering the fundamentals of enterprise computing, including a technical introduction to enterprise architecture. It will of interest to programmers and software architects who are not yet familiar with these topics. It is suggested by the publisher that this book could serve as a reference for a graduate-level course in software architecture or software engineering, I agree.


Comments Off

Mergers and Acquisitions Revisited

We wrote a post last Sept about successful acquisitions. In that post we first struggled with how to actually define a “successful” acquisition. After that we postulated that there were two primary methods of achieving what would in general be considered a successful outcome from an acquisition.

The first method is by overwhelming the acquisition’s culture and turning it into the acquiring culture as fast as possible. I called this the GE-approach because of all the acquisitions I saw while at General Electric during the 90′s, this appeared to be the dominant strategy. The second approach is to leave the acquisition completely alone and let it run autonomously. The only tie to the acquiring company is through financials. Reading an article recently I was shocked but pleased to see that academic research had arrived at similar strategies for successful mergers and acquisitions.

Clayton Christensen, Harvard professor and author of books such as The Innovator’s Dilemma, wrote an article recently in HBR with several other authors about the new M&A playbook. In this article the authors state that studies indicate that mergers and acquisitions fail over 70% of the time. Much has been written and studied about this but from the perspective of attributes of the deal. Christensen et al suggest that the problem isn’t attributes of the deal but that executives fail to match candidate acquisitions to the strategic purpose of the deal.

The article states that there are two reasons to acquire a company, to boost your company’s current performance or to reinvent your business. To extend your business but not fundamentally change how you compete, an executive should buy a company with resources aligned with the current business and fold them in, letting the acquisition eventually die. This is what we described as “overwhelming the acquisition’s culture” or the GE-approach. To reinvent your business, executives should seek companies that have a different business model put resources into it and let it grow. This is what I see as our approach of leaving the acquisition alone letting it continue to grow, perhaps providing financial resources from the parent company.


Comments Off

What Is That Delay Costing?

As a side practice in our scalability and availability engagements we often work with companies on the performance of their SaaS offerings by attempting to speed up their web page load times. Citing a Google white paper, “Speed Matters for Google Web Search” by Jake Brutlag, we point to the fact that even tenths or hundredths of a second matter. Brutlag states that through experiments they have shown that increasing web search latency from 100 to 400 ms reduces the daily number of searches per user by upwards of 0.6%. Given that we are attempting to become practitioner-scholars, in order to bridge the gap between academia and practice, we decided to dive into this subject area a little deeper. Our goal was to understand what other research had been done and if there was anything more practitioners could learn besides “speed up your pages!”

Research in computer system delay has been taking place for decades and has shown that excessive computer system delay results in negative responses such as anxiety (Guynes, 1988) and satisfaction with the system itself (Rushinek & Rushinek, 1986). However, the research does not support the relationships between increases in delay and the attitude toward the company (Rose & Straub, 2001). It was shown that increases in delay treatments from near 0 to 15 seconds did not correlate with a reduction in satisfaction measures such as ease of use or content appeal (Otto et al., 2000). It was also found that increases in delay treatments did not consistently predict likelihood of future patronage (Rajala and Hantula, 2000).

So from all this research we have the notion that delays cause frustration, even anxiety, but yet they don’t appear to cause a decrease in satisfaction or even predict continued usage. Why is this?

Attribution theory, which deals with how individuals infer causality between events (Kelley and Michela, 1980), would explain this phenomenon as the customers assigning blame for the delay to something or someone other than the SaaS provider. This theory has also been used to show that the presence of a self-serving attribution bias and an actor–observer attribution bias in entrepreneurs’ representations of events (Rogoff et al. 2004) but we’ll save that for another post. It turns out that perceived wait time is much more critical than the actual wait time (Baker & Cameron, 1996).

Rose, et al. (2005) content that “it may be less important to reduce objective delay than it is to create a system where users will be less likely to attribute the delay to the retailer.” An example would be to give the user the option of selecting a low or high graphic site in order to provide the users with the control. Users will likely perceive this as an active effort on the part of the SaaS provider to minimize download time and thus attribute delays to themselves, their computer, their ISP, etc but not the site.


References:

Baker, J., & Cameron, M. (1996). “The effects of the service environment on affect and consumer perception of waiting time: An integrative review and research propositions.” Journal of the Academy of Marketing Science, 24, 338–349.

Guynes, J. L. (1988). “Impact of system response time on state anxiety.” Communications of the ACM, 31, 342–347.

Kelley, H. H. and J. L. Michela (1980). “Attribution theory and research.” Annual review of psychology 31(1): 457-501.

Otto, J. R., Najdawi, M. K., & Caron, K. M. (2000). “Web-user satisfaction: An exploratory study.” Journal of End User Computing, 12, 3–10.

Rajala, A. K., & Hantula, D. A. (2000). “Toward a behavioral ecology of consumption: Delay-reduction effects on foraging in a simulated internet mall.” Managerial and Decision Economics, 21, 145–158.

Rogoff, E., Lee, M., and Suh, D. 2004. ““Who Done It?’ Attributions by Entrepreneurs and Experts of the Factors That Cause and Impede Small Business Success,” Journal of Small Business Management (42:4), pp 364-376.

Rose, G.M., & Straub, D. (2001). “The effect of download time on consumer attitude toward the e-service retailer.” e-Service Journal, 1, 55–76.

Rose, G. M., M. L. Meuter, et al. (2005). “On line waiting: The role of download time and other important predictors on attitude toward e retailers.” Psychology and Marketing 22(2): 127-151.

Rushinek, A., & Rushinek, S. (1986). “What makes users happy?” Communications of the ACM, 29, 594–598.


Comments Off

Scalability at the Cost of Availability

Do you associate scalability with availability? Sometimes these go hand-in-hand but sometimes these are at odds with each other. We’re obviously big proponents of architecting your systems so that you have the necessary scalability when you need it but we’re also realistic. We often help young companies make tradeoffs between capital expenditure and scalability. It’s not uncommon for us to spend a good deal of time explaining the concepts of Design-Implement-Deploy and Recency-Frequency-Monetization to help with this discussion.

One subtle concept that is sometimes misunderstood is that if not careful an increase in scalability can actually decrease your availability. In order to understand how this can happen we need to talk about the multiplicative affect of failure with items in series. Let’s take for example a system with a single web server with 99.9% availability, forget about network gear for now but it has the same affect. The availability of the system is 99.9% If we now add a database, also with 99.9% availability, to the system. Assume that the DB is required for the web server to respond i.e. pages are built by querying the DB. This causes the availability of the system to go down to 99.9% x 99.9% = 99.8%. The reason is that with 99.9% availability the system is going to be down for ~43 min per month. The chance that the database experiences its 43 min of downtime at the same time as the web server is down is very small. Much more likely is that you experience 86 min of downtime each month, half caused by the DB and half by the web server.

Back to scaling causing problems with availability. Let’s take the same example, a single web server and a single DB server, both with 99.9% availability. If our database is starting to get busy and we decide to split it, most likely we’d start by adding a read slave (X-Axis split), where the write queries (insert, update, delete) go to the master and the reads (select) go to the slave. To accomplish this we need to introduce another piece of hardware and replicate the database. If the web pages in our system require both read and write queries to the DB, then we’ve just decreased the overall system availability by increasing its scalability. This is a very simplistic example and makes a lot of assumptions but hopefully it gets the point across that you can actually decrease your availability by increasing your scalability.

So why make this tradeoff? In most cases the availability of our hardware is much higher than three-nines so the addition of a small amount of downtime is worth the gain in scalability. Also, by using swim lanes we can mitigate this by splitting our downtime across parts of our users, effectively cutting downtime in half with our first swim lane split.

All of this reminds us that scalability is much more of an art than a science, hence the name of our first book The Art of Scalability. But don’t despair, there are definite rules that govern how to scale effectively, such as the X, Y, or Z Axis splits, and why we’re calling this book Scalability Rules. You just need to use art in applying them. As an analogy, think about an artist painting. Mixing red with blue will always result in purple, a rule, but how the artist applies that color to the canvas is pure art.


1 comment

Corporate Mouth Diarrhea

It’s been awhile since I’ve gone on a rant. Leaving my rants bottled up inside me just means that I’ll take it out on my partners in some fashion (most likely by hiding in their houses and beating them upon entry in the fashion of Cato and the Pink Panther) .   Chief among the things that are ticking me off right now are some ridiculous phrases and approaches to speech that I will term “Corporate Speak”.  This mouth diarrhea is closely related to the “buzzword bingo” games of the mid 2000’s where employees would wait for half retarded executives to flip clichés out at company meetings.   Unlike buzzword bingo, this dribble simply doesn’t seem to have a purpose and we should try to excise it form our speech like the tumor that it is.  Candidly speaking, my partners and I are guilty of some of these ridiculous statements and we are working actively with a speech oncologist to send them into remission.  One warning: if you are speaking about yourself in the third person you may be beyond help.  Third person speech is a clear indication that you should just end your career.

Without further ado, here are the most cancerous phrases:

Phrase Intent of Phrase Real Meaning 

“To your point…” I’d like to give you credit for saying the following… I’m going to say something I want you to agree with and I’m a tool. 

.

“Put on your CEO hat for a minute…” (or some other position hat) I would like you to think across all functional and business unit areas rather than just your own. I don’t agree with you – just change your opinion to how I think.  Oh, and I’m a tool too. 

.

“From a shareholder’s perspective…” or “Let’s put on our shareholder hat for a minute…” Let’s think about what shareholders care about I don’t agree with you and I’m going to pretend the shareholders don’t either.  The shareholders probably think I’m a tool. 

.

“This is best in class…” or “This is world class…” or “This is a best demonstrated practice…” Good job for achieving excellence. What you are doing makes me look good or makes my job easier – congratulations.  By the way, I’m a tool. 

.

“Ask yourself what Marty would do”.  Any reference to oneself in the third person Anyone speaking in the third person (referencing him or herself) is making themselves feel important Who knows – this is just a ridiculous practice.  Nothing yells I’m a tool bigger than this one.  You might as well scream it from a clock tower. 

.

“Let’s level up” or “Let’s look at this from a 30,000 foot level” We need some overall context to make this meaningful and make appropriate decisions I’m not comfortable with details and don’t want to look like an idiot, or I don’t agree with where this is going – I’m going to change the direction.  Have I told you I’m a tool? 

.

“I think you have a valid point, but…” Interesting and valid, but one of many potential explanations or positions. I don’t agree with you so let me tell you how we’re going to do it.  Guess what?  I’m a tool. 

.

“I would say…” or “I would tell you that…” I have no idea what the intent of this opening is – this is just a weird, stupid saying. I’m too much of an idiot and a tool to just tell you what I think without a lame qualification 

.


Comments Off

11 Leadership Principles – Part II

This is a continuation post from last week where we covered the first six leadership principles from the perspective of a technology leader. In this post we’ll cover the remaining five principles.

7) Keep your soldiers informed – In the military the headquarter staff makes an overall plan for a mission, presents the plan to subordinate units, and lets them plan the details of how to execute. This planning hierarch can occur across many levels such as starting from the Division HQ, then to the Brigade HQ, then to the Battalion HQ, then to the Company, the Platoon, and finally to the Squad. All this planning takes time but each subordinate plan requires information from the higher plan. You can imagine that with all this planning you could quickly run out of time to actually perform the mission, assuming it is time sensitive, which most things in life and business are. The rule of thumb that we used as a staff in the military was to take 1/3 of the remaining time to put your plan together and leave 2/3 for your subordinates. If we the Division staff had 1 week prior to the mission they would take 2.3 days to develop their plan, leaving 4.7 days remaining. The Brigade would use 1.5 days, leaving 3.2 days. The Battalion would use 1 day, leaving 3.1. The Company would use 0.7 days, leaving 2.1, the Platoon 0.5 day, leaving 1.4, the Squad 0.3 day, leaving 0.9. This same concept should apply to your teams. Don’t sit on ideas, contracts, or problems for 3/4 of the time before it is due and make your team scramble. Give your teams the majority of the time to plan and execute. Reiterating from principle #3, make sound and timely decisions. Taking up all the time so that you get 95% of the information before you make a decision is being a whimp. Gather the most pertinent data, make the best decision and make sure your team gets as much time to react as possible.

8) Develop a sense of responsibility in your subordinates – This principle is a combination of #2 “Seek responsibility and take responsibility for your actions” and #4 “Set the Example”. If you seek responsibility for your actions and set the example for your team, they will follow and behave the same way. For a technology leader, this might manifest itself in a situation where you expect your subordinates to admit when they changed the production environment without approval resulting in an incident. You should expect and demand this behavior from your team but you first must ensure that you are’re taking responsibility for your own actions. Instill in your team a sense of ownership over the site. In order to do so you need to set the example of owning the business. Blaming the business leaders for poor decisions and expecting your team to take responsibility for the technology will never work. Create a culture where people want to own their areas and do so by owning yours.

9) Ensure that the task is understood, supervised and accomplished – There is a big difference between micromanaging and providing adequate guidance on a project. Per principle #2 “Seek responsibility and take responsibility for your actions”, you can’t delegate your responsibility only your authority. Miscommunication is probably the single biggest cause of rework. Clear, concise, articulate criteria are the way to ensure tasks are started properly and accomplished. If you assign an unclear task without a deadline and without guidance on resources you can expect it to come in over budget, after the deadline, and not accurately completed. Communication of the task is your responsibility. In military aircraft we used a three-way positive transfer of control. This meant that if I were flying and I wanted by co-pilot to fly I would say “you have the controls”, they would respond with “I have the controls”, and I would reiterate “you have the controls.” This way there was very little chance of no one actually flying the aircraft…which in tandem seat aircraft you’d be surprised how easy that can occur.

10) Build the team – As a leader you must be constantly building the team. This doesn’t always mean hiring more engineers. Building the team can take the shape of improving their skill sets or replacing underperforming individuals. We’ve written before about seed-feed-weed-succeed. This simple garden analogy means that you need to hire (seed) the team with great players, grow (feed) them through challenging projects, greater responsibility, and training, get rid of (weed) underperformers once they’ve had a chance to improve, and by doing these three things well you and your teams will succeed.

11) Employ your unit in accordance with its capabilities – This last principle requires that you first follow principle #5 “know your soldiers and look out for their welfare.” In order to deploy or ask your team to take on certain projects, you need to first know what their capabilities are as a team and as individuals. If your team is newly formed, you probably don’t want to take on critical projects for major customers. If you don’t have anyone who knows databases you don’t want to architect a DB heavy application. Setting your team up for failure is not what leaders do but you won’t know if you’re doing that unless you know your team and the team members.


Comments Off

11 Leadership Principles – Part I

In several previous posts, I’ve referenced a set of 11 leadership principles that I was taught in the military many years ago. These are apparently still in use by the Marine Corps and studied by the Air Force. As appropriate as they were for leading small units, they have also served me well in many other roles. No doubt you’ll quickly see the relevance these principles have to a military leader but to a technology leader you might balk. In this post I want to cover them using the lens of a technology leader. I think they are extremely relevant to all leaders and might help you either improve yourself or coach a rising star in your organization. I’ll start with the first six this week and finish up the remaining five in next week’s post.

1) Know yourself and seek self-improvement – In no other discipline, that I can think of, are practitioners and scholars faced with more rapid change than information technology. Not only is the underlying technology rapidly changing and sub-disiplines, like scalability, evolving but entire business models are being made obsolete to make way for new models. As technologist, we must all constantly improve and the first step down that path is to know your own limitations. Too often we run into technologist, usually men…but that’s another post, who think they know everything there is to know. Even if that were true this morning by tonight they would be behind in their knowledge.

2) Be technically and tactically proficient – I’ve covered this several times before in other posts but its worth repeating. To be a great technology leader you need to understand the technology. Your team will respect your opinions more and you won’t have to run back to your architect to answer questions raised by your peers. If you don’t know how to code or setup a database or haven’t done it in a while, start this weekend on a project that will require you to learn hands on about your chosen profession.

3) Seek responsibility and take responsibility for your actions – There is a passage in The Art of Scalability that reads “You can delegate anything you would like, but you can never delegate the accountability for results.” I can’t think of a more concise way to say this, you as the leader are responsible, period…end of the story.

4) Make sound and timely decisions – We are proponents of data driven decisions and have seen many times in our careers where someone’s world-class opinion about a product feature turned out to be completely wrong. However, analysis-paralysis occurs every day, even in small startups where you would think there wouldn’t be such an issue. Use the Pareto principle. Gather the 20% of information that makes 80% of the impact, make a decision and execute. The entire point of the Agile methodology, that almost all technologist are fans of, is to admit that we don’t know. The correct way to get there isn’t thinking long and hard about stuff but rather make things happen and see the results in order to make course corrections for the next iteration.

5) Set the example – You’re probably familiar with the proverb that a fish rots from the head. Organizations, whether government, monarchies, military, or tech startups, are the same. A leader who lies or has a poor work ethic will quickly have a team that imitates their shortfalls. Arrive earlier, leave later, work harder. Show more passion, more grace, and more thankfulness for your employees. Act like the leader you want to be led by and you’ll have an amazing team that will follow you anywhere.

6) Know your soldiers and look out for their welfare – Similar to the principle above, in order to led well over the long term, you need to build a relationship with your team by getting to know them and letting them get to know you. I’m incredibly private and reserved but you have to open up some in order to let peers and subordinates know that you’re human. Once you know your team members, their capabilities, their fears, their hopes, their dreams…look out for them. This means helping them become the manager they want to become or help them earn the promotion or pay raise they desire. Put them in positions that will stretch them but don’t leave them hanging without a safety net. A good leader gives team members opportunities to succeed or fail. A great leader puts team members in positions where they can succeed or fail but makes sure they catch them if they fall. I relate this to teaching a child how to ride a bike without training wheels. If the child feels you holding them they don’t build the confidence they need to do it themselves. If you don’t keep a close hand they might fall and hurt themselves, which will make them shy away from wanting to learn. The perfect combination is far enough away to make them feel like they could fall but close enough to catch them.


1 comment

Google’s Megastore

Papers from the 2011 Conference on Innovative Data Systems Research (CIDR) have been posted and one that is particularly interesting is the Google paper detailing their design and development of Megastore. Megastore is a storage system developed to meet the requirements of today’s interactive online services. According to the paper “Megastore blends the scalability of a NoSQL datastore with the convenience of a traditional RDBMS in a novel way, and provides both strong consistency guarantees and high availability.” The system’s underlying datastore is Google’s Bigtable but it additionally provides for serializable ACID semantics and fine-grained partitions of data.

Here is the link to the paper for you to read all the details yourself but I thought I’d point out a couple things that I found interesting about the design.

Data Split
The Megastore design is what AKF would call a Z-axis split of the data. Google does this because partitioning allows for the synchronous replication of each write across a wide area network (between datacenters) with reasonable latency. The key being the smaller the amount of data the faster the replication. The paper states “…data for most Internet services can be suitably partitioned (e.g., by user) to make this approach viable.”

Joins in Code
While most of our clients are likely to never require extreme scaling on a relational database but if you’re one of the lucky ones, the way to do so is to minimize the use of relational features. This would include things like joins. While joining in the DB is terribly efficient from a coding perspective, by joining in the code you remove load from the DB. You can scale by adding web servers much easier and cheaper than you can add relational databases. Google’s paper states that normalized relational schemas that rely on joins at query time were not the right model for Megastore because high-volume interactive workloads benefit more from predictable performance, reads dominate writes in most web applications so it pays to move work from read time to write time, and key-value stores make querying hierarchical data very simple.

Paxos and Two-Phase Commit
Google’s Megastore utilizes two algorithms that I personally thought would not scale at very large transaction volumes. The first is the Paxos algorithm, which is a way to reach consensus among a group of replicas on a single value. It allows up to F faults with 2F + 1 replicas by essentially voting among the replicas which is notoriously slow. The second algorithm is Two-Phase Commit which allows for atomic updates across entity groups. The paper does admit that these transactions have “much higher latency and increase the risk of contention.” That, in my opinion, is very understated but they do offer the discouragement of applications from using the feature in favor of queues.

I highly recommend you put this paper and some of the others from CIDR on your instapaper list for reading on your next flight or while bored during your next meeting.


Comments Off