The following guest post comes from another long time friend and business associate Chris LaLonde. Chris was with us through our eBay/PayPal journey and joined us again at both Quigo and Bullhorn. After Bullhorn, Chris and co-founders Erik Beebe and Kenny Gorman (also eBay and Quigo alums) started Object Rocket to solve what they saw as a serious deficiency in cloud infrastructure – the database. Specifically they created the first high speed Mongo instance offered as a service. Without further ado, here’s Chris’ post – thanks Chris!
Three Mistakes in Scaling Non-Relational Databases
In the last few years I’ve been lucky enough to work with a number of high profile customers who use and abuse non-relational databases (mostly MongoDB) and I’ve seen the same problems repeated frequently enough to identify them as patterns. I thought it might be helpful to highlight those issues at a high level and talk about some of the solutions I’ve seen be successful.
At first everyone tries to scale things up instead of out. Sadly that almost always stops working at some point. So generally speaking you have two alternatives:
- Split your database – yes it’s a pain but everyone gets there eventually. You probably already know some of the data you can split out; users, billing, etc.Starting early will make your life much simpler and your application more scalable down the road. However the number of people that don’t consider that they’ll ever need a 2nd or a 3rd database is frightening. Oh and one other thing, put your analytics some place else; the fewer things placing load on your production database from the beginning the better. Copying data off of a secondary is cheap.
- Scale out – Can you offload heavy reads or writes ? Most non-relational databases ’s have horizontal scaling functionality built in (e.g. sharding in MongoDB). Don’t let all the negative articles fool you, these solutions do work. However you’ll want an expert on your side to help in picking the right variable or key to scale against ( e.g shard keys in MongoDB ). Seek advice early as these decisions will have a huge impact on future scalability.
Pick the right tool:
Non-Relational databases are very different by design and most “suffer” from some of the “flaws” of the eventually consistent model. My grandpa always said “use the right tool for the right job” in this case that means if you’re writing a application or product that requires your data to be highly consistent you probably shouldn’t use a non-relational database. You can make most modern non-relational databases more consistent with configuration and management changes but almost all lack any form of ACID compliance Luckily in the modern world databases are cheap; pick several and get someone to manage them for you, always use the right tool for the right job. When you need consistency use an ACID compliant database, when you need raw speed use an in-memory data store, and when you need flexibility use a non-relational database .
Write and choose good code:
Unfortunately not all database drivers are created equal. While I understand that you should write code in the language you’re strongest in sometimes it pays to be flexible. There’s a reason why Facebook writes code in PHP, transpiles it to C++, and then compiles it into a binary for production. In the case of your database the driver is _the_ interface to your database, so if the interface is unreliable or difficult to deal with or not updated frequently, you’re going to have a lot of bad days. If the interface is stable, well documented and is frequently updated, you’ll have a lot time to work on features instead of fixes. Make sure to take a look at the communities around each driver, look at the number of bugs reported and how quickly those bugs are being fixed. Another thing about connecting to your database: please remember nothing is perfect so write code as if it’s going to fail. At some point in time some component, the network, the NIC, load balancer, a switch or the database itself crashing, is going to cause your application to _not_ be able to talk to your database. I can’t tell you how many times I’ve talked to or heard of a developer assuming that “the database is always up, it’s not my responsibility” and that’s exactly the wrong assumption. Until the application knows that the data is written assume that it isn’t. Always assume that there’s going to be a problem until you get an “I’ve written the data” confirmation from the database, to assume otherwise is asking for data loss.
These are just a few quick pointers to help guide folks in the right direction. As always the right answer is to get advice from an actual expert about your specific situation.
Comments Off on Guest Post: Three Mistakes in Scaling Non-Relational Databases