Recently on the MySQL Performance Blog they had a post that did a great job explaining a problem that we often try to warn our clients about. The crux of the problem is that if you are relying only on a replica for disaster recovery then you are going to lose data when something bad happens.
For minimizing the impact of eventual consistency in our BASE applications, we want our replicas to be very near real time. This unfortunately can be unintended consequences in a disaster. Whether you’re relying on MySQL’s statement-based replication or Oracle’s redo apply replicating at the block-level, both are vulnerable to data corruption.
Any scenario resulting in data corruption on the primary will immediately be replicated to the standby. If a DBA drops a table by the time he stops cursing the drop table has been replicated to the standby. Storage subsystem or HA failover both can corrupt data files which can get propagated to the standby.
The solution to this problem is to create a standby or replica that has a delay on applying the log files. We recommend between 6 – 12 hours delay which gives you plenty of time to catch a logical corruption and stop the replication. You don’t need a large production sized server for this since you’ll never use this database in production but simply recover the database from it. Do this simple thing and it might save your data.