AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Tag » MySQL

Multi-paradigm Data Storage Architectures

We often have clients ask about one or more of the NoSQL technologies as potential replacements for their RDBMS (typically MySQL) to simplify scaling. What I think makes much more sense with regard to these NoSQL and SQL storage systems is an AND instead of an OR discussion. Consider implementing a multi-paradigm data storage layer that provides the appropriate persistent storage system for the different types of data in your application. This approach is similar to our RFM approach to data storage. Consider questions such as how often do you need the data, how quickly do you need it, how relational is the data, etc. There are at least four benefits of this multi-paradigm approach: simpler scaling, improved application performance, easier application development, and reduced cost.

Scaling
The AKF Scale Cube provides a straightforward way to scale any relational database through the three axes but we know that splitting data relationships once they’ve been established isn’t easy. It requires work and lots of coordination between teams. By limiting what gets stored relationally to only the minimum that is required means fewer splits along any axis. Many of the NoSQL technologies provide auto sharding and asynchronous replication. Re-indexing keys across another node is much simpler than migrating tables into another database.

Performance
While relational databases can have great performance, unless the table is pinned in memory or the query results are cached in memory, an in memory data store will always outperform SQL. In many applications we could make use of in memory solutions like Memcache or MongoDB to improve performance of retrieving high value data.

Application Development
As Debasish Ghosh states in his article Multiparadigm Data Storage for Enterprise Applications, storing data the way it is used in an application, simplifies programming and makes it easier to decentralize data processing. If the application treats the data as a document why break it apart to store it relationally when we have viable document storage engines. Storing the data in a more native format allows for quicker development.

Cost
For data that’s not needed often, cache it in other places (such as a CDN) or lazy load it from a low cost storage tier such as Amazon’s S3. This might work well for applications hosted in the cloud. The benefit of this a lower cost per byte stored, especially when considering all costs including administrators for the more complex data storage systems such as relational databases.

A final step in implementing a multi-paradigm data storage layer is an asynchronous message queue for data that needs to move up and down the stack. Implementing ActiveMQ or RabbitMQ to asynchronously move data from one layer to another as needed relieves the application of this burden. As an example consider an application that routes picking baskets for inventory in a warehouse. This is typically thought of as a graph with bins of inventory as nodes and the aisles as edges. For faster retrieval you could store this in a graph database such as Neo4J for ease of development and performance reasons. You could then asynchronously persist these maps in a MySQL database for reporting and older versions into an S3 bucket for historic archiving. This combination provides faster performance, easier development, simpler scaling, and reduced cost.


Comments Off

Delayed Replication

Do you think your database replica will save your data in a disaster? Think again because there are a lot of scenarios that will cause you to corrupt all your data.

Recently on the MySQL Performance Blog they had a post that did a great job explaining a problem that we often try to warn our clients about. The crux of the problem is that if you are relying only on a replica for disaster recovery then you are going to lose data when something bad happens.

For minimizing the impact of eventual consistency in our BASE applications, we want our replicas to be very near real time. This unfortunately can be unintended consequences in a disaster. Whether you’re relying on MySQL’s statement-based replication or Oracle’s redo apply replicating at the block-level, both are vulnerable to data corruption.

Any scenario resulting in data corruption on the primary will immediately be replicated to the standby. If a DBA drops a table by the time he stops cursing the drop table has been replicated to the standby. Storage subsystem or HA failover both can corrupt data files which can get propagated to the standby.

The solution to this problem is to create a standby or replica that has a delay on applying the log files. We recommend between 6 – 12 hours delay which gives you plenty of time to catch a logical corruption and stop the replication. You don’t need a large production sized server for this since you’ll never use this database in production but simply recover the database from it. Do this simple thing and it might save your data.


1 comment