Scaling & Availability Anti-patterns
Most of you are familiar with patterns in software development. If you are not a great reference is Patterns of Enterprise Application Architecture by Martin Fowler or Code Complete by Steve McConnell. The concept of a pattern is a reusable design that solves a particular problem. No sense in having every software engineer reinvent the wheel. There is also the concept of an anti-pattern, which as the name implies, is a design or behavior that appears to be useful but results in less than optimal results and thus something that you do not want engineers to follow. We’ve decided to put a spin on the anti-pattern concept by coming up with a list of anti-patterns for scaling and availability. These are practices or actions that will lead to pitfalls for your application or ultimately your business.
- SPOF – Lots of people still insist on deploying single devices. The rationale is that either it cost too much to deploy in pairs or that the software running on it is standalone. The reality is that hardware will always fail it’s just a matter of time and when originally deployed that software might have been standalone but in the releases since then other features might depend on it. Having a single point of failure is asking for that failure to impact your customer experience or worse bring down the entire site.
- Synchronous calls – Synchronous calls are unavoidable but engineers should be aware of the potential problems that this can cause. Daisy chaining applications together in a serial fashion decreases the availability due to the multiplicative affect of failure. If two independent devices both have 99.9% expected availability, connecting them through synchronous calls causes the overall system to have 99.9% x 99.9% = 99.8% expected availability.
- No ability to rollback – Yes, building and testing for the ability to rollback every release can be costly but eventually you will have a release that causes significant problems for your customers. You’ve probably heard the mantra “Failing to plan is planning to fail.” Take it one step further and actually plan to fail and then plan a way out of that failure back to a known good state.
- Not logging - We’ve written a lot about the importance of logging and reviewing these logs. If you are not logging and not periodically going through those logs you don’t know how your application is behaving.
- Testing quality into the product – Testing is important but quality is designed into a product. Testing validates that the functionality works as predicted and that you didn’t break other things while adding the new feature. Expecting to find performance flaws, scalability issues, or terrible user experiences in testing and have them resolved is a total waste of time and resources as well as likely to fail at least 50% of the time.
- Monolithic databases – If you get big enough you will need the ability to split your database on at least one axis as described in the AKF Scale Cube. Planning on Moore’s law to save you is planning to fail.
- Monolithic application – Ditto for your application. You will eventually need to scale one or more of your application tiers in order to continue growing.
- Scaling through 3rd parties – If you are relying on a vendor for your ability to scale such as with a database cluster you are asking for problems. Use clustering and other vendor features for availability, plan on scaling by dividing your users onto separate devices, sharding.
- No culture of excellence – Restoring the site is only half the solution. If your site goes down and you don’t determine the root cause of the failure (after you get the site back up not while the site is down) then you are destined to repeat that failure. Establish a culture of not allowing the same mistake or failure to happen twice and introduce processes such as root cause analysis and postmortems for this.
Anti-patterns can be incorporated into design and architecture reviews to compliment a set of architectural principles. Together, they form the “must not’s” and “must do’s” against which all designs and architectures can be evaluated.