Why Test SQL Server Backups
For a lot of us data professional types, this seems like a silly topic, however I constantly hear of companies that aren’t testing their backups by restoring them, or don’t have proper backups at all.
- The number one reason to test SQL Server backups is to ensure they are good. You need to know that you can restore them by having valid backups. Many times in my career as a DBA I’ve found databases missing from backup routines, or that backup jobs just aren’t running at all.
- By regularly restoring backups, you help find gaps like what I described above. Often times a database may be excluded from a backup routine for some valid reason at the time. Maybe a big upgrade was happening and they didn’t want a backup process to interfere with their upgrade. However once the upgrade was done, someone forgot to add the database back to the routine. Maybe you are doing a point-in-time restore as part of your test and you discover the database is in SIMPLE recovery instead of FULL. I could list countless scenarios, however you get the point. Regularly restoring your backups can help find gaps in your recovery strategy.
- Your DBAs learn the process of restoring. I’m always making the statement, “The time to learn a task is not in the heat of the battle”. By practicing restores, your staff learns the process and you refine your documentation. Whomever could be responsible for a restore in the middle of the night, should be very well aware of the process to restore databases with minimal data loss. Regularly restoring databases helps build those skills.
- Another top reason I recommend doing regular restores is to time how long they take. Being a DBA, we are under Service Level Agreements (SLAs). The main two are Recovery Point Objective (RPO) which is a fancy way of saying, “How much data can you afford to lose”, and Recovery Time Objective (RTO) which is a fancy way of saying “How long can you be down”. Knowing how long restores take directly relates to RTO. If you have a small RTO and backups take a very long time to restore, you have a problem with meeting that RTO. Example, you have a 4 hour RTO with up to 30 minute RPO. If it takes longer than 4 hours to restore the entire SQL Server (OS, storage, everything, then restore databases to meet the 30 minute RPO) then you have a problem and need to start looking at other options.
- With all the above, that brings me to my final reason. Peace of Mind. Knowing that my backups are good, meet our recovery objectives, that everyone in the line of support is fully up to speed on how to perform restores, and that our documentation is up to date, I can rest easy knowing that we can recover our clients data within our agreed upon Service Level Agreements.
Leave a Reply